Racial differences in intelligence
A question that requires one to examine differences between groups in a particular field is probably best served by quantitative analysis that begins by looking at the majority first. It also seems that the research community agrees with this notion as qualitative papers to answer a question such as this were almost impossible to come by. Perhaps in due time, someone may choose to look at the outliers and explore in-depth as to what makes them so different from the “norm”. However, for this instance, it would be better served to look at quantitative analysis papers to see if the findings presented are worth considering.
The main criteria selected to evaluate the papers were namely: methodology, validity and reliability as well as ethics and morality.
Given that the research topic itself is controversial, consideration if the data was being manipulated to give the desired results to support the hypothesis of the researchers is both important and relevant. Hence the methodology of the papers was scrutinized to understand if the results were truly significant.
Validity and reliability are examined together as several of the issues raised challenge both the validity and hence the reliability of the research. Both of these criteria are important if one has to make sense of the research findings to draw conclusions or conduct further investigations into the topic. Findings must be reliable before inferences can be valid.
Last but not least, the ethics and morality of conducting research into this topic is examined. Given the controversial nature of the topic statement, one has to wonder if exploring this particular issue would cause more problems than help in finding solutions.
In-depth exploration into the methods used to make sense of the raw data makes one wonder if researchers are trying too hard to have their data fit into their hypotheses.
Muller, Stage and Kinzie (2001) used hierarchical linear modelling and longitudinal data to specifically look at precollege science achievement scores to determine if there was a difference between racial-ethnic groups and gender. In this study, the researchers adjusted the readings (by the use of equations) such that individuals with fewer observations and less precise estimates are given less weight. While some may feel that this makes the data more equitable, it does raise the question if a better method would have been to discard the data completely and use only those of equal weighs for analysis.
Another practice noted in the paper was that students with missing values on individual-level variables were given the mean value of the variable across their racial-ethnic subgroup. This does not seem to be an ideal practice as it seems to force what could have been potential outliers into an acceptable range. One could also argue that this practice could cause the data to be statistically significant though this cannot be verified as the raw data was not provided for analysis. However, the authors did mention that variables with more than 20% missing were not included in the analysis, so the correction for missing values is technically statistically acceptable.
It is harder to critique the specific methodologies of Rushton and Jenson (2005) as it is a meta-data paper that compiles thirty years of research. As it is a compilation of different researches done over a period of time, the paper is a summary that lacks detailed methodology which could have been examined in-depth.
Validity and Reliability
Suzuki and Aronson (2005) and Sternberg, Grigorenko and Kidd (2005) both point out that race is a social construct created by people based predominantly on physical appearance. Suzuki and Aronson (2005) go on to cite research which attests that the gene controlling skin colour makes up a very small part of the human genome such that that any variation in this gene is statistically insignificant. This implies that race has no genetic backing which then raises the question of justification of the grouping.
On detailed examination, the quantitative papers did not carry out any genetic tests to back up their division of the sample selected into the different races. In fact, Muller et al. (2001) did not even attempt to define the different races in their research. They simply chose to group the students into racial-ethnic groups without explanations or discussion as to how a student was placed into each of the different races.
Rushton and Jenson (2005) gave a far clearer breakdown of the races by defining the various groups based on their ancestral heritage and claimed that there were some genetic markers that supported these racial groupings. However, they too did not undertake any genetic tests on the subjects to justify that their grouping was accurate.
Muller et al. (2001) grouped the subjects into African Americans, Latinos, Asian Americans and Whites. Rushton and Jenson (2005) grouped the races as Whites, Blacks and East Asians. However, one has to question the validity of such grouping. Should all East Asians be grouped as one? For example, in Singapore, the races are defined into Chinese, Indians and Malays. This then raises the question of how far one should breakdown the individual racial groupings so that the test results would be valid upon analysis. How then should children of mixed parentage and those who are second or third generations (whose ancestors were of mixed parentage) be grouped? How much specificity is required when grouping for the findings to be deemed valid?
Another issue of contention is Rushton and Jenson's (2005) chosen samples. For reaction time tasks (a type of IQ test), they sampled students from Hong Kong and Japan as representatives of East Asians, children in Britain and Ireland for Whites and those in South Africa for Blacks and concluded that East Asians had the highest IQ. Given the very different environment factors present in each of these different countries, one wonders if the results can be valid despite the authors' claims that the test is culture-free (the items on the test are not affected by culture influences). This point has also been highlighted by Suzuki and Aronson (2005) in their critique of Rushton and Jenson (2005).
A key difference was noted in the sample sizes of the two quantitative studies. In the case of Muller et al. (2001), they oversampled the minorities and designed the research such that the maximum numbers of the students in the minority were retained. They then weighted the data to obtain unbiased population estimates. They claimed that this method ensured that the data obtained would give very small standard errors and a greater likelihood to find significant differences.
Some of the research highlighted in Rushton and Jenson (2005) however kept to sample sizes that were representative of the population at large. However this led to some rather small sample sizes, especially for their studies on twins and adopted children. In fact, one of the studies looked at only 19 subjects. While the sample size is statistically acceptable and reflects the proportion in the population, a larger sample size would have helped in convincing that the findings were truly significant.
Sternberg et al. (2005) point out that intelligence per se is very hard to define and even harder to test. The most controversial of all is the measurement of “g” (general intelligence and related sub-abilities such as verbal and spatial skills). While some researchers argue that IQ tests have been specifically designed to measure “g” only, detractors of IQ tests claim that “g” itself may be heavily influenced by culture and the results obtained may not be a true measure of pure IQ.
In addition, Sternberg et al. (2005) pointed out other flaws in traditional IQ tests Firstly, different IQ tests measure different abilities in different measures (differential loading) and thus the results obtained from one IQ test may not be comparable to that of another IQ test. Suzuki and Aronson (2005) have pointed out that any new IQ tests introduced to address the problem of differential loading ends up being unreliable as they have to be correlated to those already in the market.
This is especially problematic when one reads Rushton and Jenson (2005) which is a compilation of thirty years of research. Not only have the researchers complied and compared a variety of tests ranging from university grades, reaction time, studies, academic achievement scores, and standardized IQ tests; they did not furnish details about the tests or their items. As a result, one has to question if the summarized findings as well as the conclusions drawn and presented within the paper are valid.
Secondly, as Sternberg et al. (2005) pointed out, some IQ test items measure achievements at various levels of competency. The items are open to interpretation based on level of education, environment and socio-economic and even cultural factors and thus do not measure IQ independently (ceteris paribus). Proponents of IQ tests seem to forget that just because a respectable village elder cannot understand and answer IQ test items does not imply he has low IQ! This is clearly evident in Rushton and Jenson (2005). The authors seem to have ignored interactions between genes, anatomy and culture and environment, all of which have an effect on IQ that cannot be separated.
Suzuki and Aronson (2005) have also criticized Rushton and Jenson (2005) by questioning if tests of reaction times (which Rushton and Jenson deem culture-free) and anatomy studies of brain size be related to IQ. According to them, the researchers are using physiological studies to understand psychology (which essentially would be affected by culture). In short, one group believes that the items are reliable while another contests this opinion.
Muller et al. (2001) chose to look at one particular form of intelligence specifically by comparing achievements on science tests administered to public secondary school students at 8th, 10th and 12th grades. While they claim that the questions covered fields of life sciences, earth science and physical science, they also admit that they have no information regarding the actual test items. This is a little worrying as one cannot analyze if there were any bias in the way the test items were worded and if they were open to interpretations based on the students' cultures etc. Though this is highly unlikely because the test items are subject specific, it is still something that could have been explored further. This also makes one question the reliability of the test items.
To further confound matters, Muller et al. (2001) used a student and parent questionnaire to determine and calculate the subjects' socio-economic status. Once again, details of the items on the questionnaire were not revealed. As before, one cannot again analyze if there were any bias in the way the items were worded and if they were open to interpretations based on culture etc. Given that the key argument is if socio-economic status has an effect on intelligence, this causes one to question the reliability of the questionnaire and subsequently the validity of the results that have been reported.
Ethics and morality
One has to agree that while research into racial differences in IQ is intriguing, it is perhaps more important to question if such research and analysis is really necessary? In what ways would this information be of help either psychologically or pedagogically?
As pointed out by Suzuki and Aronson (2005), stereotype threat may have a serious implication on data and results. According to the authors, negative stereotypes that exist in reference to one's group leads to anxiety regarding one's performance in a particular domain. Hence research that indicates that a particular group may have lower IQ scores could perpetuate similar data in other studies simply due to this stereotype threat. As such, what is seen is a vicious cycle where poor scores in tests reinforce a negative stereotype that is reconfirmed in subsequent research simply due to stereotype threat.
Socially, this information could be a double edged sword. It could help promote understanding when students of a particular group fail to perform. However, it could also cause problems when students get stereotyped based on their groups, groups use this research as an excuse to not perform better or demand special treatment because of what they may perceive as essentially a genetic flaw.
In summary, it is perhaps best to take research for this particular topic with a generous pinch of salt. A holistic view would suggest that IQ scores are merely numbers and not a prediction of one's ability to achieve in life if one puts one's mind to it. References
Muller, P.A., Stage, F.K., & Kinzie, J. (2001). Science Achievement Growth Trajectories: Understanding Factors Related to Gender and Racial-Ethnic Differences in Precollege Science Achievement. American Educational Research Journal, 38(4), 981-1012
Rushton, J.P. & Jensen, A.R. (2005). Thirty Years of Research on Race Differences in Cognitive Ability. Psychology, Public Policy and Law, 11(2), 235-294.
Sternberg, R.J., Grigorenko, E.L., & Kidd, K.K. (2005). Intelligence, Race and Genetics. American Psychologist, 60(1), 46-59.
Suzuki, L., & Aronson, J.M. (2005). The Cultural Malleability of Intelligence and its Impact of the Racial/Ethnic Hierarchy. Psychology, Public Policy and Law, 11(2), 320-327.