Animal Testing: pros, cons, and a bit of ethics thrown in for good measure

This is one of those subjects that often invokes strong opinionated debates, but at the end of the day if we didn’t disagree with each other we wouldn’t be human. Personally I kind of sit on the fence on some aspects of this debate; of course animal testing for cosmetic products etc is wrong, but what about animal testing in medicine? Is it ethical to test cures for diseases on animals? Are animals similar enough to humans to test on anyway? I’ll be addressing all of these questions within this blog, and will cover the pros, cons and take into account the ethical implications of testing on animals.

There are copious reasons why researchers choose to use animals to test upon. There are so many studies using animals it would be ridiculous to name them all but in psychology there are a few key things we use animal studies for. For instance, classical conditioning (Pavlov, 1927), operant conditioning (Skinner, 1947), aggressive tendencies in males and females (Wright and Wrangham, 1998; Wagner et al 1980; Carlson, 1998) … I could go on forever (but I won’t).

I will however discuss Skinner (1947) and his box. Skinner wanted to investigate the concept of operant conditioning through the use of his Skinner Box and either rats of pigeons. In the experiment the animals were required to complete behaviours such as pressing a lever, in the case of the rats, in order to release a food reward. The animals learnt through differential reinforcement (or punishment); i.e. if they pressed the leaver by accident they got a food reward and from this they began to learn that pressing the lever delivered the reward. As the image below shows there were both rewards and punishments within the box. From this piece of research Skinner was able to conclude that consequences such as rewards and punishments help to shape and also to predict human behaviour. We can see this happening in everyday life. For example, when you were a child if you put your hand on the front of the oven and it hurt you probably learnt not to do it again because the pain acted as a punisher. Whereas if you got a good school report when you were younger and your parents bought you sweets you would probably want to be rewarded again and so would continue to do well at school.

Experiments such as this have given us such an insight into the human mind, and also demonstrate our similarities with animals. However, as I usually like to do I will start with the cons associated with animal testing, and where better to start than with generalisability? Can we really generalise findings from animals to humans? It can be difficult to make generalisations from one species to another, however some people may disagree with that point reminding us that humans are in fact animals and have developed and changed over time just as rats, pigeons, and monkeys etc have. Starkey (2008) suggested that the obvious differences between humans and animals make it difficult to make generalisations. For example, it is not always appropriate to generalise findings from the brains of rats due to biological differences across species. We know that rats have a significantly smaller neo-cortex to brainstem ratio. We also know that primates, such as chimpanzees, have a larger neo-cortex but the proportions are still different to human beings. This makes any generalisations we may wish to make from the brains of one species to another difficult to do reliably.

Further disadvantages of using animal studies have been discussed in the literature (Stubblefield, 2009). Firstly, one interesting fact I discovered whilst researching this blog is that researcher’s bias towards gender is not just confined to human studies. Zucker and Beery (2010) suggested that many researchers avoid using female animals, just like in the past they avoided female humans. This bias can have just as many implications for research using animals in that, like with humans, we may not get a true representation of behaviour if we only look at one gender. If we’re measuring levels of aggression in rats we would need to observe males and females in order to compare if there is a difference between genders, and also within genders.

Another disadvantages or “con” of using animals for research is that it can be very costly. For example it’s very costly to house the animals for testing to ensure that they are healthy and ready to be studied. Similarly it is difficult for researchers to determine whether it is necessary to use animals; so is the product a necessity? Will the behaviour be reproducible in humans? These are both common questions asked by researcher when determining whether to use animals in their research. Murnaghan (2010a) has suggested that different methods of research are needed in order to reduce the number of animal studies. It was suggested that in vitro techniques may help to reduce the number of animal studies, but unfortunately the human body may be far more complex than we can study in this way. In later work Murnaghan (2010b) suggested that we can help to reduce ‘treating’ the animal by instead using computer technology to simulate answers to research questions (in other words we can use data from previous animal studies to predict findings using computer programs in the present).

I wanted to discuss the positives of animal research as well but have really struggled to find much support for animal studies, but this is what I did find: Animals are much easier to find than humans and researchers do not have to worry about the animals withdrawing from the study (which some may argue is also a disadvantage). Animals generally breed a lot quicker than humans which also aids studies into hereditary behaviours etc.

It’s also a lot easier for researchers to control and manipulate the situations and conditions that animals are in. Such things as how much food they’ve eaten, what they’ve drank, how much they weigh, how much exercise/activity they do, how long they sleep for etc etc. For example, one study I stumbled upon whilst looking for something else used flatworms to demonstrate memory transfer through cannibalism in flatworms. Basically, flatworms were taught that when a light came on above where they were they should expect a small electrical shock. The worms were described to have a conditioned response if they contracted their body at the first sign of light, even without an electrical shock. The worms were then cut in half and left to regenerate for 4 or so weeks before being retested in the light/shock condition. Results showed that both the head and tail sections retained what they had previously learnt, an interesting finding if you consider that we would expect the head half with the brain to retain previous learnt behaviours. It sounds cruel to cut the worms in half, and obviously this is not something that researchers can repeat with humans (definitely not ethical to cut people in half!) but it did provide an insight into different structures within the DNA of planarians, and if you’re interested in the weird and wonderful the link to the paper is below in the references (McConnell, 1962). This study required a lot of control, and it’s pretty difficult to get such a high level of control in a study with humans.

This leads neatly back to a few more issues with animal studies. Most animal studies are conducted in laboratory settings; which of course means high control and high internal validity. If you look at this from one point of view, high control and internal validity are good as we can be relatively confident that we are testing what we intend to test. However, laboratory experiments have very low ecological validity and low external validity as it’s very difficult to see how the behaviour may occur in the real world. For example, if you cage a chimpanzee its behaviour is going to be very different when it’s in a cage compared to when it’s in the wild. Imagine if you were confined in a cage and couldn’t do anything you wanted to do. You’d probably be pretty grouchy and not your normal self, thus a bad representation of real life behaviours.

Right before I drone on even longer I will quickly mention the ethics bit. In recent years there have been much stricter rules put in place to ensure the protection of animals in experiments. Animals should not be subjected to harm, just like humans, which helps to protect animals from cruelty in present day experiments. In the past animals were treated as if they did not have feelings and so were subjected to environments that were extremely damaging to them just for the progression of psychology. For example Harlow (1950s) conducted a relatively well known experiment in which he separated baby monkeys from their mothers and replaced the mothers with either a wire monkey with food or a furry, warn monkey. It was found that the monkeys that had a choice between either of the two “surrogate” mothers generally chose the furry monkey as they felt safer with it. The monkeys formed attachments with the furry surrogate, but still ate from the wire monkey. This study has given us valuable evidence for attachment in human infants, but at what cost? Many of the monkeys that were deprived of feelings of safety and comfort were unable to develop at the typical rate, often becoming aggressive or depressed, suggesting that children too need to form emotional attachments with their primary caregivers.

So, should we use animals in experiments? It’s a difficult question really. At the start of this blog I thought I was undecided about animal testing. Personally, now, I think that animals should be left alone as much as possible. They don’t have the choices that humans have when participating in an experiment. They can’t ask the experimenter to stop or tell them that they don’t want to participate as they simply don’t have the language. Animal testing in recent years has been monitored better to try and ensure animals are not damaged in experiments. Their use in psychology is something I’m undecided on as it’s difficult to generalise and if something is too unpleasant to test on humans then should we really do it to animals? Testing cosmetics on animals is a big no no for me, as for medicines I’m not too sure. I think if someone you knew well had a serious disease that animal testing may be able to provide a cure for your opinion may be very different that if you wanted to know if the latest shampoo has been tried and tested on animals.


Carlson (1998). Physiology of Behaviour, 6th edition.

Harlow, H. (1950s). Retrieved:

McConnell, J. V. (1962). Memory transfer through cannibalism in planarians. Journal of neuropsychology.

Murnaghan, I. (2010a). About Animal Testing. Retrieved from

Murnaghan, I. (2010b). New Technologies as Alternatives to Animal Testing. Retrieved from:

Pavlov, I. P. (1927). Conditioned Reflexes: An Investigation of the Physiological Activity of the Cerebral Cortex. Translated and Edited by G. V. Anrep. London: Oxford University Press

Skinner, B. F. (1947). Substitution in the pigeon. Journal of Experimental Psychology, 38, 168-172.

Starkey, G. (2008). Animal Models of the Brain: Ethical Considerations and Alternatives

Stubblefield (2009). The Pros and Cons of Animal Testing. Medical Science

Wagner et al. (1980). Aggressive Behaviour, 6, 1-7.

Wright & Wrangham. (1998). Morals, Demonic Males and Evolutionary Psychology. In Information and Biological Revolutions: Global Governance Challenges.

Zucker, I., & Beery, A. K. (2010). Males still dominate animal studies. Nature, 465.

Image 1 from: scheme _01.png/300px-Skinner_box_scheme_01.png

Image 2 from:

Should we use correlations in research?

What is a correlation? Well, a correlation is defined as a relationship between two variables. And just like any kind of relationship there are both positive and negative aspects to correlational designs. I’m going to start with a bit of correlation basics before discussing the negative aspects to correlations (to get them out of the way first), and then I will talk about the more positive side to this type of design before finally reaching a conclusion to the question “should we use correlations in research?”

I’m pretty sure that by now everyone knows that we can have positive or negative relationships between variables or even no relationship at all. However, we need to be able to measure how strong the relationship between the two variables is. If you look at the image below it shows the different relationships that can occur and their strength. The strength of relationship is assigned a numerical value with -1.00 being a perfect negative correlation and +1.00 being a perfect positive correlation. The first image (top left) shows no correlation (a value of 0.00 shows no relationship between the two variables being measured). However the bottom right image shows a correlation equal to 0.99 which is suggestive of a very strong positive relationship between variables. Similarly a value of -0.99 shows a strong negative relationship.

Now that little introduction is out of the way we can get into the more interesting stuff (can’t believe I’ve just said that!) As I just mentioned above, a correlation of 1.00 (+/-) shows a perfect correlational relationship between two variables. But even so we cannot infer causation from correlational research. For instance, we may see that there is a relationship between A and B but we do not know whether A causes B.  The first of the negative aspects of correlation studies I will discuss. One of the main problems to do with causation is that we often do not have tight control over variables so we may not always know whether the two variables we aim to study are the only variables at effect. The third variable problem suggests that there may be a third variable at play in a study that you are not aware of! So we might think that there is a relationship between A and B, when in fact a third variable (let’s call it C) is affecting A, or B, or both! I presented a piece of evidence in one of my comments the other week that shows the third variable problem in action and probably helps to get across what I mean. Li (1975)* wanted to find out which variables were the best predictors of the use of birth control in Taiwan. To cut a long story short it was found that the variable that correlated the most highly with the use of birth control was the number of electrical items that there were in the house! Clearly the researchers could tell something was amiss there, no way is owning a kettle going to increase the use of birth control right? Exactly, and this is why researchers realised that there was a third variable contributing to the correlation they were seeing. After some more research they discovered that the third variable was actually how well educated the individuals were; those who attended school regularly learnt about birth control, they probably got better jobs and so could afford more electrical appliances. So therefore it wasn’t actually whether you owned a toaster (A) causing the use of contraceptives (B) but actually our third variable, education (C). Whilst this piece of research was quite easy to spot that there was something else contributing to the correlation that was seen it isn’t always that easy and often things such as this can go unnoticed.

And unfortunately the third variable problem isn’t the only negative aspect to correlations. We can’t see which way the relationship goes; does A affect B or is it B that affects A? Often it is difficult to know which direction the relationship goes for definite. Gentile and Anderson (2003)** were interested in studying the relationship between aggression and the use of video games. The results of their study found that the amount of time that children spent playing violent video games (D) correlated positively with aggressive behaviour (E). However there is no way that we can say that the violent video games were causing children to act aggressively. Yes, possibly violent video games can increase aggressive tendencies, but it is also just as likely that children who are already more aggressive may choose to play violent video games. In other words, it is a “bi-directional model” as we don’t know which the determining factor is.

Now we’ve got two of the main negatives out of the way I’m going to show you that correlations aren’t all bad. Correlations are used throughout research as they are an easy way to determine if there is a relationship between variables.  Correlation studies are often used in medicine. For example, McNeal and Cimbolic (1986)*** noticed a correlation between depression and low serotonin levels. This has consequently led to the development of new drugs to treat depression, such as Selective Serotonin Reuptake Inhibitors (SSRIs) that increase the levels of serotonin in the brain. Without correlation studies, we might miss relationships like this!

Correlations are also good because they allow researchers to study naturally occurring relationships between variable that it would be unethical to manipulate in, for example, a laboratory experiment. One study found that there was a correlation between increasing unemployment levels and instances of alcohol abuse, suicides and homicides. You can read more about it in the link above, but the study collected information from various sources such as the World Health Organisation (WHO). It was found that unemployment increases of 3% were correlated with a 28% increase in alcohol related deaths. The reason I mention this piece of research is because we couldn’t possible test it in a laboratory as it would be extremely unethical to make people unemployed to see how they’re health deteriorated as a result. Therefore researchers have to use the information that is available for them to observe. This is why correlational studies can be a great benefit to researchers as they show us things that we may otherwise miss. They’re relatively easy to run and can produce some extremely useful results without manipulating any variables and simply observing natural interactions between different variables.

I suppose I should really conclude before this gets even longer: Correlational research is a pain in the neck when it comes to inferring causation- we just can’t do it. But do we always need to know if one thing causes another? The issue of the third variable problem is, let’s face it, similar to problems that arise in laboratory experiments. We say we are better able to infer causation in lab experiments because they are controlled, however extraneous variables can still go unnoticed. They’re good at showing relationships, and can lead to further research once a relationship is established.


The End.


(Oh, actually I haven’t answered the question: “should we use correlations in research?”… My simple answer is yes. Why not? As my conclusion shows, there are strengths and weaknesses but nothing that’s bad enough to completely dismiss correlational research all together.)


Image from:

*Li (1975) in S. L. Jackson’s Research Methods and Statistics: A Critical Thinking Approach

**Gentile, D.A. and Anderson, C.A. (2003). Violent video games: the newest media violence hazard. In D. A. Gentile (Ed.) Media violence and children.

***McNeal, E.T. and Cimbolic, P. (1986). Antidepressants and biochemical theories of depression

What is the best type of sample to use?

As I mentioned last week I’m going to talk about samples in this blog and discuss the advantages and disadvantages of different kinds of samples.

So I’m sure we all know that the samples we use come from populations so it seems obvious to start by defining what they both are.

A population is the entire set of individuals that a researcher is interested in. These could be populations such as adolescents, the disabled, primary school children etc. However, these groups are often made up of thousands of individuals so it is nearly impossible to study everyone in a population. Therefore researchers need to use samples representative of the population they want to study so that the results can be generalised back to the wider population.

There are many important things to remember when selecting a representative sample. One of the most important things to remember when selecting participants is that the process should use a random procedure ensuring that everyone has an equal chance of being selected.

Probability sampling*- this type of sampling technique is used when the entire population is known to the researcher. So the chances of selecting a specific individual are known.

Simple random: A simple random sample is obtained using, as the name suggests, a random process. All participants are randomly selected from a list of the larger population and everyone has a fair chance of being selected. However as researchers have very little control over who is selected from that list the sample may not be representative of the larger population. For example, it might end up randomly containing people from the upper end of the IQ scale.

Systematic sampling: So a systematic sample is just that. Systematic. It uses a system to select participants from a list of the larger population. It starts in a random place and then from that every nth participant is selected. However this isn’t really a random sample as the researchers have used a set system to select participants, e.g. selecting every 5th person.

Random stratified sampling: Whilst this type of sampling ensures every part of the population is represented in the sample it is not always particularly representative of the population. Basically, the sample is selected by dividing the larger population into smaller subgroups. Researchers then select equal numbers of participants from each of the groups randomly.

Proportionate stratified sampling: Proportionate stratified sampling starts off the same way as random stratified sampling with the larger population divided into subgroups. But this time researchers randomly select a number of individuals from each of the groups that is in proportion to the larger population. This type of sampling is slightly more representative of the actual population than random stratified sampling but it’s still not perfect.

Cluster sampling: This type of sampling involves using pre-existing groups, or clusters, of people by selecting them from the larger population. This type of sampling method is good at selecting a random sample of people even though it’s not technically a random process.

Non-probability sampling**- this type of sampling technique is used when we do not know the population. So we do not know the odds of picking a certain individual. 

Convenience sampling: This is an easy way to select a sample as it uses people who are available to participate at the time of the study. However this is a poor way of gaining a random sample as there is no procedure in place to ensure that the sample collected is representative of the population. For example, asking people to answer questions in the street or shopping centre etc is an example of a convenience sample. However people who are willing to participate in a study in this way are often people who like to help, which could in theory cause problems of demand characteristics in a study.

Bickman (1974)*** conducted an experiment researching obedience to authority. Researchers dressed up as either a civilian, milkman or a guard and asked people to “pick up that piece of rubbish” as they walked past. Bickman found that people obeyed the orders from the guard most often as they viewed them as a a figure of authority. So to conduct the study Bickman sampled individuals from the streets as they walked past which means he used a convenience sample of people who were available at the time of the study. This method was very cheap and easy for collecting a sample. However, participants were not aware that they were participating in a study and so they could not choose not to participate.

Quota sampling: And finally quota sampling is a lot like stratified random sampling but is used to try and control who is selected in a convenience sample. It identifies different subgroups and aims to select participants through convenience from each of the different subgroups. To demonstrate what I mean here I’ll give a short example. Say a researcher wanted to select a bunch of children to participate in a study using a convenience sample but wanted to ensure that they selected equal numbers of boys and girls they might choose a quota sampling technique.

If they 100 primary school children to participate in a study the researchers may sample the first 50 girls that came along, but once 50 girls have been sampled the quota is full and no more girls can participate. The same then applies for the boys. This type of sampling can help to control a convenience sample but usually results in a biased sample, which as a result does not represent the wider population well.

And now to conclude what was probably a very boring blog to read, samples come in all shapes and sizes and the most important thing to remember when collecting a sample is that it needs to be representative of the general population if you plan on generalizing the results (and also the sample should be as random as possible!)

* (probability sampling)

** (non-probability sampling)


Hypothesis Testing

For the first week back writing blogs I have decided to go with hypothesis testing, a nice topic to get back into the swing of things. So, for this blog I’ll define what hypothesis testing is, then talk about the steps giving examples as I go.

What is a hypothesis test? Well basically a hypothesis test is a method used in statistics whereby data is collected from a sample to evaluate a hypothesis about a population. Obviously we can not sensibly test an entire population (well not usually) and so we have to use samples which can bring issues with them.

Four Step Procedure

There are four main steps in the hypothesis testing procedure and I will briefly mention all here. One thing we must remember when hypothesis testing is that, statistically, we test the null hypothesis not the experimental hypothesis.

To demonstrate the hypothesis testing procedure I’m going to use Loftus and Palmer’s (1974) study of eyewitness testimony. For anyone who isn’t sure what this study did I’ll briefly describe it

Participants were assigned to different conditions and all viewed a slideshow of a head on collision between two cars. They were then asked questions such as “how fast were the two cars going when they hit?” In some conditions the verb “hit” was replaced by “smashed”, “collided”, “bumped” or “contacted”. (The findings are displayed below.) For more information on Loftus and Palmers (1974) study have a look at this it goes into a lot more detail than I will here.

  1. Firstly, we need to state our hypothesis about the intended population. So using the Loftus and Palmer example; it was hypothesized that: the language used when questioning eyewitnesses can alter memory (with the null hypothesis being that: the language used when questioning eyewitnesses will have no effect on memory).
  2. We must then use our hypothesis to make predictions about a sample, such as its particular characteristics. So if our hypothesis is that the language used when questioning eyewitnesses can alter memory we are suggesting that the memory of people in the general population will be affected by the language used in eyewitness testimony and therefore we should see that in our sample. REMEMBER: our sample should be similar but may not be exactly the same as the greater population.
  3. Next we need to select our sample. To do this we should aim to sample individuals randomly from the population. We should use as random a sample as possible to try and avoid biases in participants (e.g. we don’t want to end up with a sample full of individuals who are very similar as this may not reflect the general population). Loftus and Palmer used a sample of n=45 American students, who were more of an opportunity sample than a random sample and this may therefore affect the generalisability of the results. It is not always possible to use a random sample and so we must be aware of that when testing a hypothesis as different samples have different limitations.
  4. And finally we compare the data we have collected from our sample with our hypothesis. If we find that our data are consistent with the predictions made by our hypothesis then we can assume that our hypothesis is good and we should reject the null hypothesis. However if we find that our data are inconsistent with the hypothesis then we must conclude that our hypothesis is not correct and we will fail to reject the null hypothesis. Loftus and Palmer found that the speed judged by participants increased significantly from approximately 32mph when the verb “contacted” was used compared to approximately 41mph when it was replaced by “smashed”. As their results were significant we can confidently reject the null hypothesis. (This link shows a graph that represents the results from the study:

Next week I will carry on with the general theme of hypothesis testing and go into more detail about samples and the various strengths and weaknesses of different samples and methods used for testing a hypothesis.

Laboratory vs. Natural

 So this week I’ve decided to debate on laboratory and natural experiments, hopefully ending up with a blog a lot shorter than last week’s!

Right well I’m sure we all know the basics of a laboratory experiment. In general participants come into a laboratory (obviously) which is essentially a controlled environment. The researcher generally has something (a hypothesis) that they want to test and so they manipulate the independent variable to see if it has an effect on the dependent variable- the thing they’re measuring. Craik and Lockhart (1972)* conducted a laboratory experiment investigating their Levels of Processing Model of memory. They suggested that information could be encoded into memory at a shallow, deeper or deepest level and so to test this they conducted an experiment in which participants’ memory for different questions was tested.  For example they were asked questions such as:

    1. Is the word FISH in lower case or capital letters? (Shallow processing-appearance of the word)
    2. Does the word STYLE rhyme with ‘pin’? (Deeper processing- appearance and sound of the word)
    3. Is the word PANCAKE a form of transport? (Deepest processing- the actual meaning of the word)

So in their study Craik and Lockhart manipulated the independent variable (the types of questions) and measure the dependent variable (how well/deeply participants remembered the question). As you would expect, the question that required processing the actual meaning of a word was remembered significantly better. However, just because this study was a lab experiment doesn’t mean it was perfect. There are several positive and negative aspects of laboratory experiments that I will now discuss, with reference to Craik and Lockhart. So let’s evaluate.

Firstly, laboratory experiments have a high level of internal validity as they are conducted in a controlled environment in which the experimenter is responsible for manipulating the variables. Because of the controlled environment it is also easy to replicate the experiment as there are usually standardised procedures in place, and as we all know replication is an important aspect of a science. And what about cause and effect? Well laboratory experiments are much better at showing us cause and effect relationships than natural experiments as the control over variables means that there are less extraneous variables present. But now for the bad bits…

The Craik and Lockhart study has been criticised as lacking validity and representativeness. This is mainly because the study has internal validity, which is good, but as a result it lacks external and ecological validity- thus making it difficult to generalise outside of the laboratory setting. Generalisation issues also occur because it may be difficult to operationalise certain variables. For example Craik and Lockhart were interested in the “depth of processing”* however something that may be seen as deep by one individual may actually only require shallow processing in another. Laboratory experiments can also be criticised on the grounds that.

Laboratory experiments are good if we want to find out if one thing causes another. However as they are conducted in an ambiguous environment participants may not react how they would in real life. For example, testing reaction time in a laboratory may produce a different result than testing driving reaction times in real life.

Before I rant on about lab experiments forever (you can probably tell I’m not a fan of them in psychology) let’s discuss natural experiments.

Instead of manipulating the independent variables in a lab experiment, a natural experiment looks for naturally occurring variables in the environment. For example, comparing the school grades of boys with those of girls would be a natural experiment as no variables have been manipulated, researchers are simply comparing what they have available in front of them. However, many people suggest that natural experiments are not true experiments as there is no control over extraneous variables and if participants are not aware that they are being observed is it really ethical? I suppose you could argue that as participants are none the wiser they are not going to come to any harm and therefore don’t really need to consent to take part.

Also, natural experiments are pretty cheap and easy to conduct. There are many naturally occurring events that psychologists are able to study through natural experiments. For example, in the case of Oxana** – a young girl who was left out with the dogs in the garden at her parent’s house when she was a child and began to behave like them, barking instead of talking and walking on all fours instead of upright – it would be incredibly unethical to put a child in this situation to study. However, psychologists were able to study her behaviour through the naturally occurring event and apply it to prior knowledge they had about nurture in a child’s environment and behaviour.

I guess you can say that lab and natural experiments have their positives and also their negatives. You have to think, lab experiments may be more scientific but in psychology the findings from them are often not as representative as they could be. And whilst natural experiments are more representative of everyday life they lack the control of lab experiments.