Children are exposed to thousands of words at a young age and yet they all seem to acquire similar early vocabularies. It is this curiosity that inspires much of the paper 'Does frequency count?' by Goodman, Dale and Li. It is posited by the authors that the accomplishments of children's production of language is well-established, having spawned innumerable studies. A less well researched aspect of study in lexical acquisition is that of comprehension, but theorists in this field have arrived at a consensus that a child can comprehend more than they can produce. While previous researchers were content with uncovering facts, Goodman, Dale and Li are more interested in finding an explanation.

The authors explore some of the other research conducted in relation to frequency and lexical acquisition. The majority of theorists have assumed a positive role for frequency, if not a direct one. There are arguments for other factors being equally as influential for word-learning (Shatz & Naigles, 1997), the degree to which external variables can contribute (Tomasello, 2003) and the importance of the medium of exposure (Patterson, 2002). These studies have been conducted on the general assumption that the more a word features in adult speech, the quicker the child will be in acquiring it.

The authors, however, argue that the methods used in previous studies to reach these conclusions are not competent enough to provide accurate results. For example, some theorists have deduced that a parent who has interacted more with a child that produces a larger lexicon proves a positive role for frequency. Goodman, Dale and Li note flaws in this method, namely that the study focused on vocabulary sizes as a whole rather than individual words in relation to their parents usage. Other studies have shown that frequently spoken words help children distinguish them from surrounding words, thus expediting their acquisition. These studies have been limited by their restricted use of novel words which, although relevant, are not sufficient in providing conclusions for the impact of frequency across grammatical categories. Further research has revealed that the prevalence of words in caregiver output from certain syntactic categories dictates the corresponding usage by the child. However, the flawed data used to support these findings has come from written language directed at adults or older children, resulting in data that is disparate to that of Child Directed Speech (henceforth; CDS) and therefore unlike the speech children are exposed to.

These impediments acknowledge the absence of conclusive validation of the hypothesis that greater exposure to a lexical item will result in hastened acquisition. Previous studies have partially substantiated this, but its scope has yet to be established. It is clear that the hypothesis is not categorical as previous research has shown closed class words to be common in adult speech but absent until late in a child's lexical acquisition.

The authors recognise the importance of other factors in determining lexical acquisition, but stress the importance of evaluating frequency first and foremost. The authors outline their paramount goal of the paper as examining the relationship between lexical frequency and age of acquisition (for both production and comprehension vocabularies) and will accomplish this with a comparison of the two across lexical categories. The paper hypothesises a negative correlation.

Estimates of age of acquisition were taken from the MacArthur-Bates CDI and the CHILDES database was used to determine input frequency. The authors explained a number of distinct advantages of using data sets from two different databases. Data from just one data set may result in a limited number of words or at the very least a number of mother-child word couplets. The data in this study is sourced from transcripts of adult CDS, rendering the results produced more credible than previous studies. The current strategy administers a relatively conservative test of input frequency, meaning that any strong correlations that occur can be comfortably generalised. The strategy employed removes the possibility of anomalous results through genetic inheritance, such as a referential approach to language acquisition.

The authors used the two databases to accomplish separate acts. The CDI was used to ascertain the age of acquisition of lexical items, accomplished by parents noting when a child articulated a word or both comprehended and articulated a word. The lexical data from two inventories was used; one for children aged 0;8 to 1;4 and the other for those aged 1;4 to 2;6. These inventories contained 396 and 680 words respectively. The age of acquisition for production was determined as being the first month in which at least 50% of the children were noted as articulating a word. For comprehension, the same concept is employed for the child's understanding.

Frequency of parental input was established by searching the 3.8 million word CHILDES database corpus for caregiver's contributions correlating with the items from the CDI inventories used previously. A number of words from this corpus were deemed ineligible by the authors. These included words that fewer than 50% of children had acquired by 2;6, animal noises and homophones. These criteria produced 562 words and each were categorised as common nouns, people words, verbs, adjectives, closed case or others.

The results yielded some surprises. The first completed test found a negative correlation between parental input and age of acquisition for individual words in production. This may contradict the postulations of some but the authors note that closed class words feature prominently in adult's speech, where as they are very seldom found in a child's output until late in development. Paradoxically, the individual nouns that dominate much of a child's early speech were revealed to be the least common word categories in parental output.

The authors then did the same test for lexical categories as opposed to individual words and obtained a predicted positive correlation for the production data. The importance of this finding is emphasized by the strength of the correlation being vastly in excess of the correlation found in written-language based studies. This denotes that, despite the lack of parent/child dyads, the results were sufficiently consistent to comfortably generalize; only reinforced by the disparity with adult-directed written discourse. The paper acknowledges a possible criticism of these findings, namely the exclusion of words not acquired by 2;6. The authors deal with this data separately and conclude that it supports their finding that the acquisition of closed class words is not influenced by high frequency parental output.

The correlations for the comprehension data were not as strong in comparison to production. The authors attribute this to the reduced number of entries in the comprehension data as it focuses exclusively on an early stage of development, limiting its scope.

The paper questioned whether the effect of frequency altered over time. In order to estimate this, the words from the production data were divided in to two sets; those found in the first 100 words of a typical child's speech and those found after. The relationship between age of acquisition and parental input was tested for on both sets of data separately, with differences most noticeable for nouns and closed class words. The effect frequency has on nouns appears to significantly increase after the first 100 words, during the 'naming explosion'. The opposite is true of closed class words, where frequency assumes greater importance during earlier development. However, the authors acknowledge the results as flawed due to the limited size of a child's vocabulary at this early stage in development and the high frequency of particular lexical items.

The paper selects three key aspects to discuss. First, the function of semantic-syntactic classes do not concur with the authors hypothesis relating to frequency and age of acquisition. For example, nouns are the preferred lexical category for children despite being used the least by adults. There is also a poverty of closed class words in the children's speech, despite being the most common category found in parental speech. The authors encourage further research in an effort to identify the factors behind the acquisition of nouns.

Secondly, the correlations between age of acquisition and parental frequency based on the CHILDES transcripts were significantly stronger than those based on written materials. The authors suggest that previous studies based on written materials may have produced inaccurate results due to the lack of age appropriate norms.

The final point of interest discussed the stronger relation between parental frequency and age of acquisition for production vocabularies as opposed to comprehension. The authors account for this by noting the child's level of exposure to a word. They reason that since comprehension is an earlier developmental stage, the child would be sufficiently accustomed to the word by the time production is attempted.

The authors attempt to locate underlying mechanisms which may help explain the natural presumption that frequency facilitates lexical acquisition. A possible explanation is that a parent will make frequent use of a particular term if they believe the child understands. In addition to this, the child's first utterance of a particular word may encourage the parent to use said term frequently.

The authors conclude that frequency does count, but acknowledge that there are variables that need research in order to provide a more complete vision of lexical acquisition.


The authors of this paper have identified a conspicuous gap in the research conducted on lexical acquisition. The combination of an innovative methodological approach (the use of separate databases) and unique research aims manufactures an intriguing study. The presentation is neat and the structure orderly, with a copious amount of commentary on previous research framing the studies' occupation of a gap in the literature.

Throughout the paper, there is an assertion that previous research has assumed a positive role for frequency. Indeed, the examples cited appear to corroborate this. However, in Brown's (1973) famous Adam, Eve and Sarah study on morphology, it was claimed that the acquisition of grammatical morphemes was not determined by frequency. Of course, there is no link between lexical acquisition and morphological acquisition, so although research on frequency in general had been conducted in the past, the authors were warranted in specifying the need for a more scrupulous look at the lexical aspect.

The authors make reference to many studies conducted in languages other than English, but fail to mention perhaps the most relevant; Kauschke and Klann-Delius (2007). This study concerned the relationship between maternal input and vocabulary development in German; very similar to that of Goodman et al. There is a possibility that the authors were not aware of this piece, as it was published only a short time before their own work. The statistical calculations in the Kauschke and Klann-Delius paper make use of the same method employed by Goodman et al. However the database they have sourced their lexical items from is significantly smaller, diminishing the ability to generalize the results. There is a key difference between the studies; Goodman et al are more interested in the child's speech in relation to the input, where as the German paper is more interested in how the mother's speech adapts to a developing child. Therefore, the studies don't encroach on one another's material, as they are examining findings from opposing ends. However, they do complement each other well and can amalgamate to provide a more lucid, complete analysis of the field.

Considering the perceived overt nature of the poverty of research in the area of frequency and lexical acquisition, the authors may have considered offering an explanation for this phenomenon. This is done by Demuth (2007), who alludes to the influence of prominent linguists such as Chomsky (1965) who advised that such matters should be left to sociology. Demuth claims that it is only in the last twenty years that boundaries between fields have been relaxed, allowing linguists and sociologists to cooperate and share methodological approaches. In addition to this, the augmentation of computers and the internet have made the storing and sharing of data a great deal easier and more accessible.

An issue which merits further discussion is the selection of two disparate databases as a data source. For the purposes of this review, the use of one shall be evaluated; the Communicative Development Inventories (henceforth CDI). The reasoning behind the selection of this database is cogent; the elimination of possible mother-child dyads. However, one must consider the construction of the CDI in order to evaluate its reliability. The data used was gathered in the form of parental reports on which words their child understood and/or produced (Berko Gleason, 2005, p. 46). Theorists in this field generally seem to endorse this method as reliable (Dale 1996; Fenson et al. 1994) and there is some empirical evidence to advocate this (Meadows et al. 1999), however there is also research that suggests inaccuracy in parental reporting of their child's speech (Roberts et al. 1998). Whilst this method is not ideal and can lead to possible inconsistencies, it presents distinct advantages over other methods of data collection such as the inauthentic environment of laboratory based work. The possibility of distorting the data through the 'Observer's Paradox' (Labov, 1972) is also diminished.

The CHILDES database, although universally regarded as revolutionary in the study of first language acquisition, has not been without its detractors. Edwards (1992) criticized the system for being devoid of a standard transcription method, rendering the data incomparable. Macwhinney and Snow (1992) rebutted this suggestion as outdated considering their recent revisions of the system. However, with contributions from hundreds of researchers totalling thousands of hours of conversation, it cannot be asserted that every transcript conforms precisely to one another.

A major asset of this study that adds credence to the accuracy of the results is its use of actual CDS data, extracted from transcripts of caregiver-child interaction. As basic a requirement as this may seem, the majority of studies on lexical acquisition have used written material, symptomatic of the then bias toward the printed medium. In the 1960's, DeVito noticed the problem with this, and in a study of his concluded that written language had "more lexical diversity, more difficult words... more nouns and adjectives... fewer verbs," (Chafe and Tannen, 1987, p.385). The disparity between written and spoken language would, therefore, heavily influence the results. The authors' use of spoken language improves the accuracy and reliability of the results. The actual impact this has on the results is noticeable, with there being a considerably stronger correlation between age of acquisition and parental frequency. This paper has demonstrated that the use of written materials substituting for spoken data is not reliable and should not be adopted in future studies.

An unfortunate stipulation of the methodology excluded lexical items "with two common uses". According to Bridges' (2006) survey, there are 1075 homophones in English which, placed out of context, can become ambiguous. A number of these words are ordinary, everyday words that a child may well use, such as:




The reasoning behind the exclusion of these words is understandable; the sheer mass of the data involved makes it extremely time consuming to examine every transcription and determine the meaning of every ambiguous word. However, the degree to which the findings are tainted is uncertain, since the composition of English homophones do not follow a regular pattern i.e. some are nouns, some verbs, others adjectives, etc.

As part of their discussion, the authors highlight that the correlation of input frequency and age of acquisition contradicts their hypothesis. They surmise that other factors aside from frequency must be involved in order to explain the ease of noun acquisition, particularly since the results show nouns to be the least used syntactic category in parent's CDS. They present no argument for what these factors could be and suggest further study should be conducted. However, these results could be used be used as evidence of Chomsky's innatist perspective on language learning. The lack of nouns in adults speech and the abundance in children's early speech could suggest that a child is pre-programmed to acquire nouns before other syntactic categories. However, others such as * would suggest that the linguistic complexity of concrete nouns compared to other syntactic categories is the most pertinent factor in lexical acquisition. Of course, the authors did not begin this study with the objective of finding a solution to the debate over language innateness, but the authors are interested in why particular syntactic categories are acquired before others and this surely ties in with theories of the origin of language.

There is a sensible differentiation between comprehension and production vocabularies; an important aspect neglected by previous research. The importance of this split is emphasized by the finding that parental input is a more consistent predictor of production than comprehension, proving the existence of a distinction between the acquisition of production and comprehension vocabularies. This is an important contribution to the field as future researchers will have empirical evidence justifying their treatment of both comprehension and production lexicons as separate entities.

This unique and engaging paper answers and raises some interesting questions. The paper is not perfect, with a number of aspects perhaps diminishing the accuracy of the results such as the exclusion of certain words and breadth in terms of age of the children sampled. However, the size of the corpus involved and use of two databases eliminating the possibility of mother-child dyads makes the results of this paper the most amenable to generalization. The appearance of this and other papers on the topic indicates a bright and fertile future for the study of frequency effects and lexical acquisition in general.