This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Before we start our data collection strategy, we need to make sure that we can collect all the population that we decide to study. For example, every citizen that live in a village and every student that studies in the school. Can we observe every citizen or every student that study in the school? If we can observe all of them then we will obtain an accurate data for our report. If we can obtain all the data correctly, we will have a little chance to make mistake in our report. A complete coverage of the population is called a census.
Often we are not able to collect data accurately from every citizen and every student. It is costly for us and we need a long time to do the research and collect the data. This will lead us to take a sample that is a subset of the entire population. The sample that we use will help us to make an inferences about the population based on the sample. Then we can estimate that what the population like and what the population dislike through the sample.
In whatever we do, sample will be used almost all the time. For example, when we are doing the blood test to make sure we are healthy, they will take our blood sample and not all the blood. Then the laboratory will take our blood sample and make a research about it. The result they get from the sample is the accurate result to prove the health of our body. Sampling not just focuses on large amount of quantity research but they also focus on small amount of quantity. When we are doing a highly qualitative, we use one week to conduct a field visit program to explore a large area for example; we still need to make sure where we want to do the research.
From International Program for Development Evaluation Training (2007), there is another form of sampling and it is called cluster sampling. Cluster occurs naturally to the aggregate of a unit that is used to be a sample. For example, homes are clusters of people and the towns are cluster of homes. Cluster sample is used when they have a complete list of everyone in the population of interest but it need more times and expensive to send the data collector out to a simple random sample. The cluster sample is randomly sampled and the data will be collect from the entire target. If the evaluation needs them to collect data on the weight and height of a child about 5 to 7 years old, the evaluators will randomly sample 15 of the villages from the 50 villages to take the program. Then they will collect the data from the 15 villages that they choose.
For another example such as interview about 200 Diabetes patients. We will not have the entire list of the diabetes patients. If the area has 30 clinics and the average of the clinic serves 50 diabetes patients then we can choose 4 of the 30 clinics to do the study. Then we need to do all of the study of those diabetes patients in those 4 clinics and it would provide about 200 diabetes patients. Cluster samples are likely to be less accurate in estimating of the population if compared to the simple random samples or stratified random samples. The selected clinic will serve the diabetes patient that has different kind of background. If this kind of thing happens, then they will influence the result of the entire diabetes patients.
Next is the multi stage random sampling. It is a combination of two or more forms of random sampling. Multi stage is a second form of random sampling after they get the result from the first form of random sampling. Normally, it begins with random cluster sampling then continues with simple random sampling or in other word, it called stratified random sampling. For example, according to the above sampling of diabetes patients, multi stage random sample will come out or draw a cluster sample of 8 clinics instead of 4 clinics. Next, it will draw a simple random sample with 25 diabetes patients from each clinic that will provide a sample of 200 patients from the above example but they will give a large amount of clinics. The cost will be expensive rather than collecting data from the four clinics but it will provide less biases estimates of the population parameter.
According to William M.K. Trochim (2006) in the figure, the researcher can see a map of the counties in New York State. They will go to the town personally when they do the research at the town government. If they do a simple random sample in a large state, they will have to cover the entire state geographically. They also need to decide to do a cluster sampling of five counties (marked in red in the figure). Once the five countries are selected, they will go to every of the town government in the five areas. Cluster or area sampling is useful in this kind of situations. The researchers also probably don't have to worry about the using of this approach if they are conducting a mail or telephone survey.
In multi stage sample, they also can combine the non-random and random sampling together. The clinics will be sampled non-randomly and for the diabetes patients that have been study in the selected clinic, they can be sampled randomly. According to the example of collecting data on the diabetes patients, the thing that will be taken is the clinic and it will be the random sample for a cluster sample. Then they will collect the data of all the diabetes patients from the selected clinics. The weakness of cluster sampling and multi stage random sampling is the population that will be represented may not be accurate. For example, they want to interview the 200 diabetes patients and that 200 sample will be selected from the four random clinics because the resources are limited. The clinic that they choose will serve the populations that have nice family background and it will cause all the diabetes patients to be not representative.
The time is important for them to do the research. When they want to do interview to the people that live in small and remote farm, it will cause them to lose a lot of time to sample the people. They also need to travel around the farms because the villagers live separately and it will cost them to lose a lot of time in collecting data. So, in cluster sampling, they will sample 5 of the 20 farms and they will interview all of the villagers from each of the farms that they selected.
Cluster sampling also have the advantages and the disadvantages. The advantage is the sample size is constant across sampling methods. The main disadvantages of cluster sampling is it's generally provides less accurate information if compared to simple random sampling or stratified sampling. For cluster sampling, the cost per sample point is less than other sampling methods. The researcher can use a larger sample with cluster sampling than other sampling methods. Cluster sampling will be the best choice if the increased sample size has the limit to offset the loss of accurate information.
Cluster sampling should be used when it is economically justified and when reduced costs can be used, it will be used to handle the less accurate information. It is hard, costly and impossible for a researcher to list out all the population. For example, it is impossible to list out all the customers in a hardware store. But, it is possible to randomly select a subset of stores (this progress known as the stage 1 of cluster sampling) and then they will interview a random sample of customers who visit that store (this progress is the stage 2 of cluster sampling).
Then the population will be known as natural cluster such as schools, hospitals, city blocks and many more. For example, to conduct the interviews of the HIV patients, they will randomly select a sample of hospitals and then they will interviews all the HIV patients in the hospital that they selected. When the cluster sampling is used, the interviewer can conduct many interviews in a single day at each hospital that they selected. If they use simple random sampling, it will require the interviewer to spend the whole day for travelling to conduct an interview at one hospital. When this problem occurs, it is unclear when to use that sampling method. The differences between strata and clusters are all strata are represented in the sample but only the subset of clusters is in sample. With stratified sampling, the best survey results occur when elements within clusters are internally heterogeneous.
For the sample size, the sample size cannot control the correlations. To estimate the value of correlations, most of the survey variables are little for any empirical research that has been attempted. Although it is hard to conceive of many household variables, the intra-cluster correlation can be theoretically between -1 to +1. To keep deff to a minimum, the only possibility is to urge that cluster sizes be as small as the budget.
According to the table 1, we can see that the cluster sample above 20 did not give an unacceptable deffs. In the table, ñ refers to the target population and it is not a number of households. The number of the ñ is also as the same number of the households in the cluster. For example, if the target group for the women is 14-49, it is about one per household for this target group in which a case of cluster size of b households will have exactly the same number of women 14-49. In this case, ñ and b are roughly exact for that target group and the thing that will apply as it stands is the Table 1.
When the number of households and target population are not same, it suppose that the target population will be all the persons and in case of the health survey to estimate acute and chronic conditions. If the survey wants to use of 10 households as cluster, the amount of ñ will be 10 times from the average household size. For example, if the latter is 5.0, then the amount of ñ will be 50. Then the amount of ñ will be viewed in the Table 1 to assess its potential deff. From Table 1, its reveal that the deff is very large but it is not when it is about 0.02. Then the cluster sample will use as little as 10 households per cluster and it will give them an unreliable result for the health survey that estimates the acute and chronic conditions.
When designing a household survey, the example of illustrates is important to take into account the cluster size. The stated cluster size will be known as the sample design and it will generally refer to the number of households.
The size of the cluster is always beyond of its effect on sampling precision. The cluster have the relation to the overall sample size. This is because it will determine the number of different locations that will be visited in the health survey. For example, for a 10000 household sample, the cluster of 10 households will require 1000 clusters while if it is only 20 household, the clusters will only be 500.
For the sampling in stages, it will have use of dummy stages and the two stages design. The perfect household survey sample plan is to select the sample of household randomly among appropriate identified strata encompassing the entire populations. The stratified random sample will provide maximum precision. The stratified random sample is far too expensive to undertake feasibly.
According to the Paul D. Bryan and Thomas M. Conte (Center for Efficient, Secure and Reliable Computing, 2007), nowadays processor design is driven by stimulation. The simulation inferences is it should have a long instruction traces that should be stimulated. Then the process of the stimulation will become slower as more cycle-accurate features are modelled. Program that can be executed completely on hardware for minutes will take a longer time to stimulate it such as weeks or months. Many researchers come out with the ways to reduce the simulation time since the stimulation has a limited factor to new processor technology.
After initialization code, the researchers will execute an arbitrary instruction stream. The effectiveness will reduce the simulation time and it will lead to the misleading or inaccurate. The researcher will use the sampling technique to reduce the number of instruction that require by the simulation. Many techniques will be use in the simulation, such as cluster sampling, set sampling and stratified sampling. The methods will differentiate which element will be sampled from the overall population. For the cluster sampling, a group of the contiguous elements from the population will be selected as a cluster. The data that the researcher collected from the cluster will be used to measure as an individual sampling unit.
According to Luo Yong, Guo Xiuchun, Li Hui and Zhu Xiaohai (International Conference on Intelligent Computation Technology and Automation, 2008), they do research for the traffic safety study base on cluster analysis. In the modern social economy, the transportations bring much contribution to the human society. But these contributions have increase more serious traffic accidents. The traffic accident have been increase from 1978 to 1993 in China have shows that the past 15 years, 3100000 traffic accidents occur and 632000 people died in the accident and 1987000 people have been injured in the accident. Then they will use the cluster method to do the analysis. Cluster analysis will be applied for both at home and at aboard. There will be N prototype samples, at first the N samples are independent and then they will compute the distance between those samples and those with the short distance will be form into a new group. Then they will compute the distance between the new groups until the sample meet what the research is requested. Then the cluster analysis will include the minimum distance method, maximum distance method and the middle distance method. When using clustering method, they need to take attention on these parts such as they should be combined with the qualitative analysis; they should combine with the traditional clustering and when there are many constituents, the programming technique is the best ways to be used in the analysis.
In the conclusion, the use of cluster sampling is when the research is used to do a study of population that is spread across a wide area. If they use simple random sampling, it may be difficult for them to access the selected sample. The methods that will be used in cluster sampling are they will divide the populations into a set of different areas. Then they will randomly select the area that they want to access. If they cannot access all the subjects in the selected areas, they should select a significant random sample and use the same selection rules for each cluster.
For example, in a research of the opinions of homeless across a country is better than the research of a few homeless people in all the towns. Then they can select a number of towns to do the research and a significant number of homeless people will be interviewed in each town.
The biggest problem with the sample will help us to reach our targets and the common experience is having them to spread out over a large geographic area. When we select a cluster, we may be having problem to access everyone in that cluster. Then we will select a significant and similar sample in each cluster that we select. For example, if we want to interview people in a coffee shop, we should do this at the same time on the same weekday in each cluster that we selected. The cluster sampling also can be combined with other kind of sampling such as proportionate quota sampling. The risk for the cluster sampling is some of the geographic areas can have a different characteristic such as political bias.