Corpus Driven Linguistic Study Of Hong Kong English Language Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

The inadequate land supply in Hong Kong has become a concern of many Hong Kong people as the shortage leads to inflation in property price. The government has noticed the problem and promised to ease the burden on land development. Since the announcement of the 2010 Policy Address, heated discussion has been aroused. News concerning this hot topic occupied large coverage of local newspapers. Newspaper, as a 'broker' that process and present information (Williams 1989:74), I would like to see how this media takes up its role in reporting news about land supply. Therefore I chose news about land supply from newspaper for my corpus-driven study.

Though media is an information source, not everything concerning the topic will be included. 'Newspapers are produced by professional workers who select some events for reporting as news, and exclude others' (Bignell, 1997:85). This selection process is closely related to what we, as readers can receive. The inclusion of certain information reflects the concern of readers. In other words, it shows which information is considered as important that owns value to be presented. In order to carry out the analysis, a corpus about land supply is built. With the use of software WordSmith 5.0, useful data are obtained.

This paper applies the Sinclair (2004) descriptive model of lexical items. The model consists of five categories of co-selection, including semantic prosody, semantic preferences, collocation, colligation and core. Sinclair (1991) remark 'the nature of the world around us is reflected in the organization of language and contributes to the unrandomness' (p.110). So, there are reasons for making a particular lexical choice instead of another. Every combination of lexical items helps build meaning of a piece of text. Therefore, through studying the data with Sinclair's model, we can get an idea of the meaning made in the text.

My research question is therefore identified as:

What are the extended meanings (i.e. lexical, grammatical, semantic and functional) associated with the most frequent lexical word in my land supply corpus?

Data - the Land Supply Corpus

The corpus used in this project is a self-built corpus. The corpus has complied 65 pieces of English news related to land supply in Hong Kong. The size of the land supply corpus is 46,202 tokens. All pieces of news are taken from Wise News which is a database of news and magazine articles. The search date was from the 13th October 2010 which is the day of announcement of the 2010 Policy Address to 13th November 2010 which is the day I built the corpus.


WordSmith 5.0 was used to generate a wordlist. As it is a corpus-driven study, I chose the most frequent lexical word to study in the wordlist. Concordance of the most frequent lexical word was then generated. Applying the Sinclair (2004) descriptive model of lexical items, I have studied the colligation and collocation for core. To facilitate the analysis of semantic preference and semantic preference, I limited the scope. 100 concordance lines (out of 338 lines) were randomly selected. I also observed if there was any specific pattern for core.

Findings and discussion

The most frequent lexical word in the land supply corpus is 'said'. The frequencies of 'said' in the corpus are . It was a bit surprised to find out the first lexical word is 'said' instead of words relating to 'land supply' directly or any aspects of land supply. It shows that there is no direct relationship between the topic of corpus and the most frequent lexical word.


According to Stubbs (1996), collocation is 'the habitual co-occurrence of two or more words' (as cited by Zhang 2010:191).

'Said' has a relatively strong tendency to collocate with 'the' (108 times, 32.0%) at N+1 position, and less strong with 'that' (36 times, 10.2%) and 'it' (9 times, 2.67%) (See Figure 1). It was also observed that 'said' co-occur with 'the' the most frequently at N+2 to N+ 5 positions though the tendency is not strong (See Figure 2). (N+2: 19 times, 5.6%; N+3: 18 times, 5.3%; N+4: 8 times, 2.34%; N+5: 14 times, 4.14%)

Figure 1 Sample concordance lines of said

There is no strong collocates to the left. 'Said' collocate with 'he' for 64 times (18.9%) at N-1 position, followed by 'Tsang' (24 times, 7.10%) and 'she' (19 times, 5.62%). It was found that on the left positions of 'said', 'the' collocate most frequently with 'said' at most of the positions (See figure 2) (N-2 :15 times, 4.43%; N-3: 23 times, 6.80%; N-5 positions: 17 times, 5.03%).

Figure 2 Extracted collocate pattern table

The definite article 'the' at the left positions mostly refer to a particular proper noun. However, it was noted that most of the definite article 'the' at the right position are not referred to proper nouns but nouns that have already been mentioned (See figure 3).

Proper nouns

Not Proper nouns

Figure 3 Sample concordance lines of said


Colligation is 'the relation between content and function words, and between words and grammatical categories' (Stubbs, 2002: 238). Collocates are group according to the corresponding categories. Table 1 summarizes the colligation of 'said'.

Right (R1-R5)

Word class






Proper Nouns




Auxiliary verbs

Left (L1-L5)

Word class

Frequency (%)


65 (19.2%)


16 (4.73%)


4 (1.18%)


5 (1.48%)


113 (33.4%)

Proper Nouns

80 (23.7%)


79 (23.4%)


104 (30.8%)


43 (12.7%)

Auxiliary verbs

3 (0.89%)

Table 1 Colligation of said in the land supply corpus

At the left positions, a comparatively significant pattern of colligation is the use of pronouns (33.4%). The pronouns include 'he', 'she' and 'it' and so on. 'He' (64 times) takes up the largest proportion. Apart from pronouns, prepositions and proper nouns co-occur with 'said' frequently. The percentage is 30.8 and 33.4 respectively. The prepositions mainly include 'of' (33 times), 'in' (26 times) and 'at' (19 times). The proper nouns refer to people and organizations.

When focusing on the right positions, different findings were observed. A main colligational pattern is the use of articles (38.5%) including mainly 'the' (108 times) and 'a' (22 times). It is followed by prepositions such as 'of' (24 times), 'to' (22 times) and 'in' (19 times) and nouns.

Through the co-occurrence of either 'pronouns + said' or 'proper nouns + said', we know the use of lexical choice 'said' is to project a voice. The writers indirectly or directly quoted what have been said by others, probably because the writers consider the public are interested in how authorities view the situation in Hong Kong and they treated the views as important information to be included in the newspaper. The writers may also want to use comments from authorizes to develop their arguments. It is supported by Thorne (2008) that the function of using other voices, especially through the use of direct quote is to 'allow eminent people to make voice their views accurately' and 'add weight to the argument' (P.273).

Semantic preference

According to Stubbs (2002), semantic preference is defined as 'the relation between nodes [1] and sets of collocates from a well-defined semantic field' (p.225). McEnery and Xiao (2006) considered semantic preference as 'a feature of the collocates' (p.107). Table 2 summarized the semantic preference.

Semantic preferences

Examples of collocates


Property price, Average value, billions, home prices, capital


Implementation, practice, adoption, Approach, strategies, measures, policies, initiatives

House supply

Sites, units, flats


Guarantee, help, support, benefit, return, result


Conflicts, illness, uncertainties, crisis, threat


in the long run

Table 2 Semantic preference of 'said'

By looking at the semantic preference, it is possible to recognize people's concern for land supply in Hong Kong. As the amount of land supply will affect the price of people's assets, it is reasonable to see one of the semantic fields is related to 'money'. A main concern is how the government reacts to the current situation of land supply as many of the semantic preference are related to the government's reaction. These semantic preferences include 'action', 'house supply' and 'effect'. From which we know the government is expected to carry out appropriate policies to solve the problems exist.

Semantic prosody

Sinclair (1996) proposed that 'the initial choice of semantic prosody is the functional choice which links meaning to purpose; all subsequent choices within the lexical item relate back to the prosody' (as cited in Zhang, 2010: 190) . It is 'the collocational meaning arising from the interaction between a given node and its typical collocates' (McEnery and Xiao, 2006: 105-106). Through looking at the semantic prosody, we can understand the 'attitude and evaluation' of a speaker or writer as it is the primary function of semantic prosody' (Louw, 2000:58).

Three sets of semantic prosodies are identified through studying the 100 randomly selected concordance lines of 'said' and its collocates (See table 3).

Semantic prosodies

Frequency (%)

Examples of collocates

Set 1:Critical

60 (60%)

Have been effective, immediately ease property price, had done too little, just as difficult, be sensible to rent now, should not be seen as, did not offer immediate help

Set2: Descriptive

27 (27%)

A Hong Kong University survey shows, in the project, the bureau is working to…, in his sixth policy address, announced in Tsang's policy address, the report was based on

Set3: Predictive

13 (13%)

will likely use, is likely to worsen, is likely to increase, might become

Table 3 Semantic prosody of 'said'

The first set is 'critical', more than a half of the concordance lines are grouped under this set. Comments are mainly given on the action of the government and the effect of the government's measures. Co-occurring phrases like 'have been effective' and 'had done too little' indicate speakers hold different attitudes. Some criticize the government while some give praise. It shows that newspapers choose to report a various opinions from others instead of reporting a limited number of voices. This helps provide a much comprehensive picture to the readers.

Example from concordance line: criticize the strategies of the government

The second set is 'descriptive'. The context provides factual information which reflects the role of newspaper to 'convey information' without adding personal feeling (Thorne, 2008:262). With the pattern of 'somebody + said + inclusion of factual description', writers 'add authority' to the news (Thorne, 2008:280). In addition, factual information helps readers grasp the background information.

Example from concordance line: provide details about a fact

The final set is 'predictive'. A feature of this set is the appearance of the auxiliary verbs 'will' and the phrase 'is likely to' in the concordance lines. Both of them are indicators of the future Evidence from the concordance lines like 'will receive less' and 'is likely to increase' indicates what attitude is held toward the future situation. Through the prediction we can guess if people are optimistic about the future.

Example from concordance line: prediction about the future


Core is 'invariable, and constitutes of the evidence of the occurrence of the item as a whole' (Sinclair 2004, as cited in Cheung, 2006:327). Referring to my corpus, 'said' has no strong colloates, 'said' is the core of this lexical items.


The study shows that it is possible to get a 'picture' on how newspapers report news about land supply through analyzing a lexical item and its co-occur items by Sinclair's (2004) descriptive model of lexical items.

For collocation, 'said' in my corpus collocates most frequently with 'the' at N+1 position while with 'he' at N-1 position. For colligation, a comparatively significant pattern of colligation is the use of pronouns at the left positions. Through the co-occurrence of certain lexical items and 'said', it was seen that in reporting news concerning land supply in Hong Kong, writers tend to cite viewpoints from others especially authorities.

Semantic preferences of 'said' are mainly related to the government's reaction such as 'action' and 'effect'. Semantic prosody of 'said' was namely 'critical', 'descriptive' and 'predictive'. Through studying the semantic preference and prosody, it is noted that the reports about land supply cover a wide range of area. Factual information and various opinions including both positive and negative are presented. We can therefore gain a deeper understanding on the approach adopted by writers when reporting news concerning land supply in Hong Kong.


Bignell, J. (1997). Media Semiotic. UK: Manchester University Press

Cheung, W. (2006). Describing the extended meanings of lexical cohesion in a corpus of SARS spoken discourse. International Journal of Corpus Linguistics, 11(3): 325-344

Francis, G., Hunston, S., (1999). Pattern grammar A Corpus-driven approach to the lexical grammar of English. US: John Benjamins Publishing Co.

Hunston, S. (2002). Corpora in Applied Linguistics. United Kingdom: Cambridge University Press.

Louw, B. (2000). Contextual prosodic theory: Bringing semantic prosodies to life' in C. Heffer, H. Sauntson, and G. Fox (eds): Words in Context: A Tribute to John Sinclair on his Retirement. Birmingham: University of Birmingham.

McEnery & Xiao, R. (2006). Collocation, Semantic Prosody, and Near Synonymy-A Cross-Linguistic Perspective. Applied Linguistics, 27(1): 103-129.

Sinclair, J. (1991). Corpus, Concordance, collocation. New York: Oxford Press.

Sinclair, J. (2004). Trust the text: Language, corpus and discourse. Routledge. London

Stubbs, M. (2002). Two quantitative methods of studying phraseology in English. International Journal of Corpus Linguistics, 7(2): 215-244.

Thorne, S. (2008). Matering Advanced English Language. Great Britain: Cromwell Press Ltd.

Williams, F. (1989). The New Communications Second Edition. USA: Wadsworth Inc.

Zhang, C. H. (2010). An Overview of Corpus-based Studies of Semantic Prosody. Asian Social Science 6(6): 190-194.