Concept Of Reliability

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

“Concept Of Reliability “


Generally defined as the ability of a product to perform as expected over time.

Formally defined as the probability that a product, piece of equipment, or system performs its intended function for a stated period of time under specified operating conditions

The reliability of a research instrument concerns the extent to which the instrument yields the same results on repeated trials. Although unreliability is always present to a certain extent, there will generally be a good deal of consistency in the results of a quality instrument gathered at different times. The tendency toward consistency found in repeated measurements is referred to as reliability

In scientific research, accuracy in measurement is of great importance. Scientific research normally measures physical attributes which can easily be assigned a precise value. Many times numerical assessments of the mental attributes of human beings are accepted as readily as numerical assessments of their physical attributes

This fact makes it very important that the researcher in the social sciences and humanities determine the reliability of the data gathering instrument to be used:-


In the social sciences, testing reliability is a matter of comparing two different versions of the instrument and ensuring that they are similar. When we talk about instruments, it does not necessarily mean a physical instrument, such as a mass-spectrometer or a pH-testing strip.

An educational test, questionnaire, or assigning quantitative scores to behavior are also instruments, of a non-physical sort. Measuring the reliability of instruments occurs in different ways.

Retest Method

One of the easiest ways to determine the reliability of empirical measurements is by the retest method in which the same test is given to the same people after a period of time. The reliability of the test (instrument) can be estimated by examining the consistency of the responses between the two tests.

If the researcher obtains the same results on the two administrations of the instrument, then the reliability coefficient will be 1.00. Normally, the correlation of measurements across time will be less than perfect due to different experiences and attitudes that respondents have encountered from the time of the first test.

The test-retest method is a simple, clear cut way to determine reliability, but it can be costly and impractical. Researchers are often only able to obtain measurements at a single point in time or do not have the resources for multiple administration.

Alternative Form Method

Like the retest method, this method also requires two testings with the same people. However, the same test is not given each time. Each of the two tests must be designed to measure the same thing and should not differ in any systematic way. One way to help ensure this is to use random procedures to select items for the different tests.

The alternative form method is viewed as superior to the retest method because a respondent's memory of test items is not as likely to play a role in the data received. One drawback of this method is the practical difficulty in developing test items that are consistent in the measurement of a specific phenomenon.

Split-Halves Method

This method is more practical in that it does not require two administrations of the same or an alternative form test. In the split-halves method, the total number of items is divided into halves, and a correlation taken between the two halves. This correlation only estimates the reliability of each half of the test. It is necessary then to use a statistical correction to estimate the reliability of the whole test. This correction is known as the Spearman-Brown prophecy formula

Pxx" = 2Pxx'/1+Pxx'

where Pxx" is the reliability coefficient for the whole test and Pxx' is the split-half correlation.

Internal Consistency Method

This method requires neither the splitting of items into halves nor the multiple administration of instruments. The internal consistency method provides a unique estimate of reliability for the given test administration. The most popular internal consistency reliability estimate is given by Cronbach's alpha. It is expressed as follows:

where N equals the number of items equals the sum of item variance and the variance of the total composite.

If one is using the correlation matrix rather than the variance-covariance matrix then alpha reduces to the following:

alpha = Np/[1+p(N-1)]

where N equals the number of items and p equals the mean interitem correlation.

Types of Reliability

1) Inherent reliabilityà Quality of a system or equipment that guarantees a certain performance by production characteristics if used correctly. Measureofmaintainabilityorreliabilityof an item based either on itscurrentoperatingcontextor the designed reliability under ideal operatingconditions.

2)Achieved reliabilityà Measure of maintainability reliability of an item based on observed failure data


Reproducibility is different to repeatability, where the researchers repeat their experiment to test and verify their results.

Reproducibility is tested by a replication study, which must be completely independent and generate identical findings, known as commensurate results. Ideally, the replication study should utilize slightly different instruments and approaches, to ensure that there was no equipment malfunction.

If a type of measuring device has a design flaw, then it is likely that this artefact will be apparent in all models.


Reliability is something that every scientist, especially in social sciences and biology, must be aware of.

In science, the definition is the same, but needs a much narrower and unequivocal definition.

Another way of looking at this is as maximizing the inherent repeatability or consistency in an experiment. For maintaining reliability internally, a researcher will use as many repeat sample groups as possible, to reduce the chance of an abnormal sample group skewing the results.

If you use three replicate samples for each manipulation, and one generates completely different results from the others, then there may be something wrong with the experiment.

1. For many experiments, results follow a ‘normal distribution' and there is always a chance that your sample group produces results lying at one of the extremes. Using multiple sample groups will smooth out these extremes and generate a more accurate spread of results.

2. If your results continue to be wildly different, then there is likely to be something very wrong with your design; it is unreliable.


Reliability is also extremely important externally, and another researcher should be able to perform exactly the same experiment, with similar equipment, under similar conditions, and achieve exactly the same results. If they cannot, then the design is unreliable.

A good example of a failure to apply the definition of reliability correctly is provided by the cold fusion case, of 1989

Fleischmann and Pons announced to the world that they had managed to generate heat at normal temperatures, instead of the huge and expensive tori used in most research into nuclear fusion.

This announcement shook the world, but researchers in many other institutions across the world attempted to replicate the experiment, with no success. Whether the researchers lied, or genuinely made a mistake is unclear, but their results were clearly unreliable.


Physical scientists expect to obtain exactly the same results every single time, due to the relative predictability of the physical realms. If you are a nuclear physicist or an inorganic chemist, repeat experiments should give exactly the same results, time after time.

Ecologists and social scientists, on the other hand, understand fully that achieving exactly the same results is an exercise in futility. Research in these disciplines incorporates random factors and natural fluctuations and, whilst any experimental design must attempt to eliminate confounding variables and natural variations, there will always be some disparities.

The key to performing a good experiment is to make sure that your results are as reliable as is possible; if anybody repeats the experiment, powerful statistical tests will be able to compare the results and the scientist can make a solid estimate of statistical reliability.


Reliability and validity are often confused, but the terms actually describe two completely different concepts, although they are often closely inter-related. This distinct difference is best summed up with an example:

A researcher devises a new test that measures IQ more quickly than the standard IQ test:

* If the new test delivers scores for a candidate of 87, 65, 143 and 102, then the test is not reliable or valid, and it is fatally flawed.

* If the test consistently delivers a score of 100 when checked, but the candidates real IQ is 120, then the test is reliable, but not valid.

* If the researcher's test delivers a consistent score of 118, then that is pretty close, and the test can be considered both valid and reliable.

Reliability is an essential component of validity but, on its own, is not a sufficient measure of validity. A test can be reliable but not valid, whereas a test cannot be valid yet unreliable.Reliability, in simple terms, describes the repeatability and consistency of a test. Validity defines the strength of the final results and whether they can be regarded as accurately describing the real world.


For any research program that requires qualitative rating by different researchers, it is important to establish a good level of interrater reliability, also known as interobserver reliability.

This ensures that the generated results meet the accepted criteria defining reliability, by quantitatively defining the degree of agreement between two or more observers.


Interrater reliability is the most easily understood form of reliability, because everybody has encountered it.

For example, watching any sport using judges, such as Olympics ice skating or a dog show, relies upon human observers maintaining a great degree of consistency between observers. If even one of the judges is erratic in their scoring system, this can jeopardize the entire system and deny a participant of their rightful prize.

Outside the world of sport and hobbies, inter-rater reliability has some far more important connotations and can directly influence your life.

Examiners marking school and university exams are assessed on a regular basis, to ensure that they all adhere to the same standards. This is the most important example of interobserver reliability - it would be extremely unfair to fail an exam because the observer was having a bad day.

For most examination boards, appeals are usually rare, showing that the interrater reliability process is fairly robust.


Any qualitative assessment using two or more researchers must establish interrater reliability to ensure that the results generated will be useful.

One good example is Bandura's Bobo Doll experiment, which used a scale to rate the levels of displayed aggression in young children. Apart from extensive pre-testing, the observers constantly compared and calibrated their ratings, adjusting their scales to ensure that they were as similar as possible.

How do you measure Reliability?

• Failure rate (λ) - number of failures

per unit time

• Alternative measures

-Mean time to failure

-Mean time between failures (MTBF)

Reliability Function for Service Life

• Probability density function of failure time

is exponential: f(t) = λe-λt for t > 0

• Probability of failure from (0, T)

F(t) = 1 - e-λT

• Failure rate = λ

• Reliability function

R(T) = 1 - F(T) = e-λT

Types of Failures

• Functional failure - failure that occurs at the start of product life due to manufacturing or material detects

• Reliability failure - failure after some period of use

These relate to the “bathtub curve”.

Failure Rate Curve

Cumulative Failure Rate Curve

Average Failure Rate = 0.02

Reliability Management

• Define customer performance requirements

•Determine important economic factors and relationship with reliability requirements

• Define the environment and conditions of product use

• Select components, designs, and vendors that meet reliability and cost criteria

• Determine reliability requirements for machines and equipment

• Analyze field reliability for improvement