Reliability … The good and the bad

Reliability is an essential part of research as without it how would we know which results to trust? For the purpose of this blog I’m only going to talk about it within research terms as otherwise I will end up writing some massive essay and going off on a tangent.

 There are so many different types of reliability within research that you’d have thought all published research would have high reliability trying to fit in with all of the guidelines. However, if you look into it, no piece of research is going to be perfect. I think there will always be some unreliable aspects of research, particularly when studying humans. For example, how can we possible account for every type of variable? Is the person hungry? Are they nervous? Or are they tired?

 So, to define: reliability is when we are able to repeat a measure and gain the same (or similar) result time and time again. But how do we know if an experiment is reliable? Well there are several different methods that can be used to determine reliability.

 First, the test-retest reliability method can be useful in determining how reliable a measure is. For example, if a class of psychology students participate in a study to test reaction time in which they have to respond to certain stimuli and then perform the same task a week later we would hope for similar results. However, one of the main flaws of this method is that it would be likely to see a testing effect on participants. For instance, if the students do the same test twice there may be an issue if practice effects. By this I mean they will be more familiar with the test and because of this their reaction time may increase. Which, may in turn, reduce the reliability of the study. This is why it is best to use this method of testing reliability with things that remain stable over time, such as intelligence or personality.

 Another measure of reliability is inter-rater reliability. This is used for simultaneous measurements between more than one researcher and is often used when observing behaviour. This measure makes an observation more reliable as if two or more observers are watching then it is less likely that something will be missed. I can remember learning about one study, but I can’t remember who did it. In the study there were two observers that went out into the real world and conducted a study of children’s aggression by observing how many aggressive acts the children demonstrated. By using two observers the reliability of the study was improved as it would have provided more accurate results. Cohen’s Kappa coefficient is a measure of inter-rater agreement for qualitative data, such as observational studies, and is an effective measure as it also takes into account that an agreement between observers may be due to chance*.

 There are other factors that can affect the reliability of a research study, and as I don’t want to waffle on forever I will briefly mention two of them.

 The first factor is observer effect. It has been suggested by Eagly and Carli (1983)** that characteristics of the experimenter, such as age, sex or behaviours such as body language can affect the participant during a study, which can lead to a loss of reliability. For instance, Bickman (1974)*** conducted a study in which three confederate participants randomly asked people on the street to, for example, “pick up that bag”. They were all dressed differently; one confederate was a milkman, another was a civilian and the third dressed as a guard. The study found that people were more likely to obey the guard as they saw him as an authority figure. Therefore, we could suggest from this that participants in research studies may react differently than they normally would because they view the experimenter as an authority figure, particularly if they are wearing a white lab coat, so they may try extra hard to please them or may do the complete opposite, thus reducing the reliability of the study.

 The second factor I want to briefly mention is environmental changes. Whilst researchers take every effort to make the conditions that same for all participants it would be extremely difficult to account for everything. Changes in the time of day or time of year can affect how a participant will respond in an experiment or study, even a slight change in temperature could affect how likely a person will be to complete a task compared to another participant. If it’s hot then the participant may feel tired or if it’s too cold a participant may not be able to concentrate#.

 So, to conclude, reliability in research is always important as it helps us to ensure that our measures are consistent. Unfortunately when working with people it is difficult to account for every possible factor that could affect the reliability of a study. Most of the time researchers try to account for the most likely variables and understand that they will never have the perfect experiment.

 *http://www.experiment-resources.com/cohens-kappa.html

**http://www.gerardkeegan.co.uk/glossary/gloss_e.htm

*** http://scienceaid.co.uk/psychology/social/obedience.html

#http://www.ccohs.ca/oshanswers/phys_agents/heat_health.html

Advertisements

8 thoughts on “Reliability … The good and the bad

  1. psychjs1 says:

    Another interesting and well thought out post! In your argument you state two forms of checking reliability these are test re-test reliability (which checks successive measures) and inter-rater reliability (which checks simultaneous measurements). However there is another well respected form of reliability which you failed to mention in your argument, split half reliability (SHR) which tests internal consistency.
    SHR reliability is when items on a questionnaire or testing method are split in half, resulting in a separate score in each half and then calculating the consistency between these two scores for participants. There are several techniques to dividing items in half and the value you gain all depends on which method you apply to split the items. To deal with this issue, Kuder and Richardson (1937) produced a formula to estimate the average of all possible split-half correlations that are gathered from all the possible procedures used to split a test in half, this is referred to as the Kuder-Richardson formula 20 (K-R 20).
    I feel explaining this form of reliability would have been useful in your argument to give an overall view of reliability and the issues associated.

  2. psychmja1 says:

    I did think about including that in my blog, however it was already at 850 words so I thought I should probably stop while I was ahead 🙂

    You mention split-half reliability in your comment, I also neglected to mention about parallel forms reliabilty. This type of reliability test is created by comparing your test to different tests that were produced using the same content. The two tests are then used at the same time to establish reliability.

    For example….

    When testing a hypothesis it would be appropriate to use a diiferent pre-test and a different post-test to ensure that memory effects can not occur. The two tests should be parallel (an equal measure) to ensure reliability. To then establish parallel forms reliability a coefficient should be calculated of the scores on the two different measures for the group of participants. Of we then gain a high positive correlation we can assume that the two forms are parallel. We then know that we have a reliable measure of memory.

    Whilst there are many ways of testing reliability I don’t think, when working with humans, that it’s a very easy thing to do… Humans change constantly, behaviour wise, so how do we know that someone who was in a good mood one day completed a test well but then another time they were in a bad mood and so did not participate to their full ability.. This means that even if the measure is reliable we may not get a reliable result.

  3. uzumakiabby says:

    Hi there, nice blog!
    Just wanted to point out a little flaw I found in your argument. You say that 2 observers would make it more reliable, however I don’t think that would always be the case. For one thing, there is the little problem of consistency errors – I.e. all the researchers would have to know exactly what it is they are looking for. Using your example, they’d have to know what exactly is an aggressive act. Would it be a full on punch in the face, or just brushing past someone – two people would have two different opinions on this… reliability is at risk here!
    Also, when you talk about the issues of observer effects affecting reliability… in my opinion, they are actually affecting validity. Observer effects are, according to Dewey* (2007) are when people are aware of a researcher and act differently because of it. This would mean that the study isn’t really testing what it sets out to be testing, because the people aren’t being true to themselves because of the white coated man in the corner – this is a validity issue! I understand what you’re saying about not getting the same result again because of this, but I think that’s due to validity more so than reliability.
    Anyway, thanks for an interesting topic, well done!
    Abby

    *http://www.psywww.com/intropsych/ch01_psychology_and_science/measurement_and_observer_effects.html

  4. psuca7 says:

    Your point about having more researchers to increase reliability was an interesting one; you higlighted the fact that having another set of eyes is more reliable as behaviours etc are less likely to be missed, however this doesn’t account for experimenter bias; if two researchers observe a behaviour, then – as individuals – they make a judgement as to whether or not it affects their results. One experimenter’s view of such a behaviour could be different to the other experimenter’s view, or they may not be aware of the behaviour actually taking place, all of which affects the reliability of a study. I also agree with uzumakiabby and her point that researchers have to know what behaviour constitues an aggressive act, to continue the example. Every type of behaviour that is typical of an aggressive act would have to be known to the observers, as individual differences would affect the way the participant actually expresses such an emotion.
    I thought this was an extremely interesting topic this week and look forward to your next blog!

  5. […] https://psychmja1.wordpress.com/2011/10/14/reliability-the-good-and-the-bad/#comment-23 Share this:TwitterFacebookLike this:LikeBe the first to like this post. ← Previous post […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: