This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
The purpose of hypothesis testing is to determine whether there is enough statistical evidence in favor of a certain belief about a parameter. And to permit generalizations from a sample to the population from which it came. The word hypothesis is just slightly technical or mathematical term for "sentence" or "claim" or "statement". In statistics, a hypothesis is always a statement about the value of one or more population parameter (s).
Hypothesis: A statement that something is true concerning the population.
There are two hypotheses (about one more population parameter (s)
H0 - the null hypothesis
HA - the alternative hypothesis
The hypothesis testing is the operation of deciding whether or not data obtained from a random sample supports, fails to support a particular hypothesis.
The statistical hypothesis test is a five step procedure.
The first two steps of the hypothesis test procedure are to formulate two hypotheses.
Step 1: Formulating the null hypothesis
Null hypothesis, H0: The hypothesis upon which we wish to focus our attention, generally this is a statement that a population parameter has a specified value. It always contains '=' sign.
Step2: Formulating the alternative hypothesis.
Alternative hypothesis, HA: A statement about the same population parameter that is used in the null hypothesis. Generally this is a statement which specifies that the population parameter has a value different from the value given in the null hypothesis. It doesn't contain "=" sign.
Two tailed One-tailed
H0: Âµ=Âµ0 H0: Âµ â‰¤ Âµ0 H0: Âµ â‰¥ Âµ0
HA: Âµâ‰ Âµ0 HA: Âµ > Âµ0 HA: Âµ < Âµ0
Right tailed Left tailed
At the conclusion of the hypothesis test, we will reach one of two possible decisions. We will decide in agreement with null hypothesis and say that we fail to reject H0. Or we will decide in opposition to null hypothesis and say that we reject H0.
There are four possible outcomes that could be reached as result of the null hypothesis being either true or false and the decision being either "fail to reject" or "reject".
Null Hypothesis is
Decision True False
Accept H0 Correct Decision Type II Error
(1 - Î±) Î²
Reject H0 Type I Error Correct Decision
Î± (1- Î²)
Î± = Probability of committing a type I error
Î²= Probability of committing a type II error
Step 3: Determining the test criteria
Test criteria: Consists of
determining a test statistic
specifying a level of significance Î±
and determining the critical region
Test statistic: A random variable whose value will be used to make the decision "fail to reject H0" or "reject H0".
Critical Region: The set of values for the test statistic that will cause us to reject the null hypothesis.
Critical value is the first value in the critical region.
Level of significance: The probability of committing the type I error, Î±
Step 4: Obtaining the value of the test statistic.
The test statistic is some statistic that may be computed from data of the sample. The test statistic serves as a decision maker, since the decision to reject or not to reject the null hypothesis depends on the magnitude of the test statistic. An example of a test statistic is the quantity
Step 5: Making a decision and interpreting it,
Decision Rule: If the test statistic falls within the critical region, we will reject 'H0'. If the test statistic does not fall in the critical region, we will fail to reject H0.
The set of values that are not in the critical region is called the acceptance region.
T - test
Testing hypothesis about a population mean: One-tailed test
Right tailed test:
*We assume that, Gollamari to Dak-Banglo is a very busy road. Different types of vehicles run throughout the day. Easy- Bike is the major one. A random sample of 9 hours in a day show that average 500 Easy-Bikes run per hour. Hypothesis testing that the number of Easy-Bikes runs per hour is more than 480? Î± = 0.05, ðœŽ =40
H0: Âµâ‰¤ 480
HA: Âµ > 480
Î± = 0.05
Critical value: 1.645
t = = = 1.50
0.0 1.50 1.64
Decision: As test statistic doesn't fall in the critical region; so we don't reject the null hypothesis.
Conclusion: Average number of Easy-Bike is not more than 480.
*Again we assume that,(in previous example) a random sample of 9 hours show that average 480 Easy-Bike run per hour. Hypothesis testing that the number of Easy-Bike is 500.ðœŽ=40
HA: Âµâ‰ 500
As it is a two-tailed test, so = 0.025
Critical value: Â± 1.96
t= = -1.50
Critical region Critical region
Î±= 0.025 Î±= 0.025
-1.96 - 1.50 0.0 + 1.96
Decision: Null hypothesis is true.
Conclusion: The average number of Easy-Bike is 500.
Testing hypothesis about a population proportion
*A traffic man in the Mailapota more claims that, 60% of Easy-Bike drivers don't pay heed to the traffic signal. A random sample of 50 Easy-Bike drivers shows that 35 of them don't pay heed to the signal. Are these sample results consistent with the claim of the traffic police? Î±=0.05, =40.
Population proportion, Ï€ = = 0.60
Sample proportion, P = = 0.70
Critical value: Â± 1.96
Z = = = 1.45
Î±= 0.025 Î±=0.025
-1.96 0.0 1.45 +1.96
Decision: As the test statistic doesn't fall in the critical region, so we don't reject the null hypothesis.
Conclusion: So the claim of the traffic police is true.
So the concerned authority should need to take necessary steps for utilizing the traffic rules.
The Chi-square Test
The Chi-square test is a statistical method used to determine goodness of fit.
_Goodness of fit refers to how close the observed data are to those predicted from a hypothesis.
_The chi square test does not prove that a hypothesis is correct.
It evaluates to what extent the data and the hypothesis has a good fit.
The general formula is
_O = observed data in each category
_E = observed data in each category based on the experimenter's hypothesis
_ âˆ‘ = sum of the calculations for each category.
Applying the chi square test
Step 1: stating a null hypothesis
Step 2: calculating the expected values for each cell
Step3: applying the chi square formula
Step 4: interpreting the chi square value
_The calculated chi square value can be used to obtain probabilities, or P values, from a chi square table
These probabilities allow us to determine the likelihood that the observed deviations are due to random chance alone.
_ Low chi square values indicate a high probability that the observed deviations could be due to random chance alone
_ High chi square values indicate a low probability that the observed deviations are due to random chance alone
_If the chi square value results in a probability that is less than 0.05 (i.e. less than 5%), it is considered statistically significant
The hypothesis rejected.
*Before taking fine to utilizing traffic rules (in the Mailapota more of the road_ Gollamari to Dak-Banglo ,by observing 12 Easy-Bike drivers it was found that 5 drivers follow signals and others 7 don't follow. After starting taking fine again observing 12 drivers it was found that 8 drivers follow and 4 drivers don't follow the traffic signals. Testing hypothesis that fine was effective?
Let us take the null hypothesis that the fine was not effective.
Traffic rules followers
Before taking fine
After taking fine
Expected Traffic follower
Expected value in a cell =
Table of counts
"Actual / expected" with two rows and two columns.
Degree of freedom = (2-1) (2-1) = 1
= .34+.40+.35+.41 = 1.50
So the P value is between 0.25 and 0.20 (from the P-value table)
0 .25 < P-value < 0.20
So the null hypothesis is accepted.
So the fine taking action was not effective.
Fisher's exact test
Fisher's exact test is a statistical significance test used in the analysis of contingency tables.
Traffic rules followers (Easy Bike driver)
Fisher exact test shows the probability of obtaining any such set of values. "Hyper geometric distribution is used to Fisher's exact test.
Some notable criteria of Fisher's exact test:
Fisher's exact test can be used when one or more of the expected counts in a contingency table is small ( <2)
Fisher's exact test is based on exact probabilities from a specific distribution (the hyper geometric distribution).
There's really no lower bound on the amount of data that is needed for Fisher's exact test. We can use Fisher's exact test when one of the cells in the table has a zero in it.
Fisher's exact test is also very useful for highly imbalanced tables If one or two of the cells in a two by two table have number in the thousands and one or two of the other cells has numbers less than 5, we can still use Fisher's exact test.
Fisher's exact test has no formal test statistic and no critical value, and it only give us a P-value.
Traffic rules followers (Easy-Bike driver)
*We hypothesize perhaps that the proportion of rule following individuals is higher among the literate drivers and we want to test whether any difference of proportions that we observe is significant. The questions we ask about these data is knowing that 10 of these 24 drivers are rule followers ; what is the probability that 10 of these 24 drivers would be so unevenly distributed between literate and illiterate drivers? If we were to choose 10 of the drivers at random, what is the probability that 9 of them would be among 12 literate drivers and only 1 from among illiterate drivers?
Fisher's exact test uses hyper geometric distribution to calculate the "exact" probability of obtaining such set of values.
As extreme as we observed more extreme
= 0.00134 = 0.00003
Here P-value is the probability of observing data as extreme or more extreme if the null hypothesis is true. So the P-value in this problem is 0.00134
Implication of hypothesis testing in Urban Planning:
Of all the statistical tools, hypothesis test is the one which is commonly used in urban planning. In the examples, we have shown a bit of the uses of hypothesis testing in transportation.
T-test: The road _ Gollamari to Dak-Banglo is not so spacious. Traffic jam is a common matter. The city planners of KCC (Khulna City Corporation) are responsible for this miserable problem. By random sampling they found that Easy-Bike drivers are the main problem creators. Using t-test they (city planners) came to the conclusion that average 500 Easy-Bike run per hour throughout the day.
Z-test: For identifying the cause of traffic problem, planners talk with a traffic police in Mailapota more. He claims that 60 % of Easy-Bike drivers do not follow the traffic rules. And by using z-test they (planners) found that it is correct.
Chi square test: In order to utilizing the traffic rules, the authority suggests that taking extra money as fine would be the possible solution. By testing Chi square test, it is found that this initiative (fine) is not effective.
Fisher's exact test: In order to identifying the cause behind the fact that why Easy-Bike drivers are not following the rules; randomly talking with 24 drivers, it is found that literate drivers are much conscious about traffic rules as compare to the illiterate drivers. Planners take it true by testing fisher's exact test.