Link to the home page of the English Language Centre

ITMELT '99 Conference

Link to Hong Kong Polytechnic University Home Page

On this site: Conference 1999 Home Page, Accepted Abstracts; Call for Papers; Hotel Accommodation, Conference Schedule.

Applying Information Technology to a corpus of student report writing to help students write better reports

Linda, H.F. Lin

English Language Centre

Hong Kong Polytechnic University

Hong Kong

 

Introduction

Information technology has the power to collect and analyse extensive bodies of text into the toolbox of the average language teacher/researcher. One tool made easily accessible through current information technology, concordancing software, has become highly useful within the data-driven approach to ESL teaching (Johns, 1991). The corpora on which the data is based are usually the well-known ones such as the British National Corpus (BNC), the Brown Corpus and London-Oslo-Bergen Corpus (LOB). Researchers like Tribble (1997) have tried to work with relatively small, informally produced corpora distributed on CD-ROM. However, the development and use of a corpus of student writing as a useful teaching resource has not been much explored. Research conducted to date and its application in the classroom shows considerable promise.

This paper describes a project that focused on improving students' report writing skills and facilitating teachers' ESL research through the use of a corpus built up from recent graduates from Hong Kong Polytechnic University reports about their experiences in the workplace. This corpus contains 429,048 words. The concordancer used is WordSmith, a concordancing program recommended by experts in the stream of multi-media language learning and teaching (Tribble, 1991; Stevens, 1993; Johns, 1991).

This paper first briefly describes the possible ways of using the wordlisting function in the concordancing package Wordsmith to help students gain a better understanding of the report-writing genre. It then focuses on ways of employing the concordancing function in Wordsmith to help students improve vocabulary selection, command of syntactical structure, and use of idiomatic expressions in report writing.

 

The Wordlist and Concordancing functions

Wordlist

Wordsmith allows users to make various kinds of wordlists from a corpus, including listing by word length and paragraph length; and by frequency. Almost every type of list can be employed to help student writing. Word length and paragraph length lists, for example, can be used to improve students' understanding of the genre of report writing. Most students understand report writing to be a type of formal academic writing. Their most common way of handling the demands of this formality, unfortunately, is to employ big words and write long sentences; students seem to believe that the crucial factor in writing is the length of words and sentences, not clarity in expressing ideas. By comparing the lists of words and paragraph length in student reports with those of published reports, we can more easily demonstrate this key point to students.

Wordlisting can also help improve students’ awareness of the problem of overuse of certain groups of words. It is well documented that Hong Kong learners tend to overuse collectives (Scollon, 1993; Tan & Ng, 1995). Learners, however, may not be aware of this problem or the seriousness of this problem until "…they have tested it against the various examples" (Stevens, 1995). One useful application of Wordsmith, therefore, is to compare a frequency list produced from students’ own writing with that of published writing to show precisely by how much student use of collective (such as and, but, thus, so, therefore and nevertheless) exceeds the norm in the published work in that genre. For example, the frequency of so used in the corpus of student writing is 0.21%, whereas the frequency of employing this word is only 0.15% in Micro-concord, a corpus of standard academic writings drawn from articles published by the Cambridge University Press.

 

Concordancing

In the project described in this paper, the concordancing function of Wordsmith plays a much more important role than does the wordlist function. It allows students to raise their awareness about deviant syntactical structures, use of idioms, and vocabulary usage within the genre of report writing.

The paper first presents findings from concordancing, and then goes on to discuss the ways these findings can be applied in ESL research and teaching.

 

Findings from the corpus

Vocabulary

To further help students with their use of connectives, we can compare and contrast the concordances from students’ writing with concordances from easily-available corpora such as Brown or LOB. This exercise allows students to see how and where to better use connectives. In the Brown corpus, for instance, no single and or so can be found at the initial position of a sentence, but many such placements of these words can be discovered in the corpus of student writing, as shown in Figure 1 for so.

1. enough for those graduates. So, we recommend that PolyU shou

2. maintain the earning power. So, they reduce the salary expens

3. more support to help them. So, Poly U should make some usefu

4. s increase from 1997 to 1998. So, we can see that the graduate

Figure 1

Problems with connectives are not the only problems found in this corpus; other misuses of vocabulary are at least as prevalent. For example, the nature of reports usually requires that the writer indicate the level of satisfaction with some situation or performance. This corpus reveals that many students are unable to use the word satisfy properly when writing about this type of point. They do not distinguish the noun form from its verb form, nor the original verb from its past participle form. In this corpus of 429,048 words, there are 131 instances of the misuse of satisfy. On average (based on the writing of 68 students comprised within this corpus), each student makes about two mistakes with this word. Most students were surprised when the search results were shown to them. The data in Figure 2 represents the misuse of this word.

1. uite satisfy) of the students satisfy with the salary. On the other

2. very satisfy and 74% of quite satisfy), work conditions (8% of ve

3. , only half of the subjects satisfy their salary and the benefits

4. some of our graduates do not satisfy their current job. The imp

Figure 2

 

Other words with similar levels of mistaken usage in this corpus are compare and recommend. Some sample results are seen in Figure 3.

1. students. Some of data will compare with 1997's one, which can sh

2. e appeared hard to find job compare to 1997 as the number of appl

3. had the salary below $9000 compare 1997. Also, there are 9% and

4. sful application in 1998 in compare with 1997. More than half sub

5. in PolyU. It would like to recommend that the employer was provide

6. conclusions. I would like to recommend that Job-seeking skill is

7. reliable that students can recommend by department. At the same ti

Figure 3

 

Syntactical Structure

Like all second language learners, the students in this example are syntactically affected by their mother tongue in acquiring their second language, in this case native Chinese speakers learning English. Data relating to this kind of interlanguage phenomenon found in this corpus are figured in the following sections.

 

Section A

1. ay, June and July, and people are hard to find their job in the fo

2. l results show that graduates are hard to seek their "prefect job"

3. It was found that graduates are difficult to find a job, the tendency

4. A lot of graduate students are difficult to find job from 1997 to

Figure 4

 

This structure bears a superficial resemblance with the Chinese structure:

Xuesheng nan zhaodao gonghuo

Students difficult find job

It is difficult for students to find job.

A comparison between the English and Chinese suggests that the students' English structure is influenced by the structure in Chinese, a view supported by Yip (1995:171).

 

Section B

A more significant example of syntactically ill-formed structure found in this corpus is the there be structure.

1. estionnaire,we will see that there are number of jobs do not fit th

2. in 1998. But it is show that there are still have 9% of the subjec.

3. ir first job offer. Moreover,there are only 91% of the subjects can

4. 1997. It would appear that there have 15% more subject in 1998 had

5. h the salary and benefits. There have more and more subjects felt t

6. eir job. And at June to July,there have half of the graduates can s

Figure 5

 

The nature of report writing often requires discussion of numbers. This need to deal with numbers leads to the following structure:

 

There are more and more subjects felt the difficulty in job hunting.

Here the obvious way to fix this sentence is to add a relative pronoun, who:

There are more and more subjects who felt the difficulty in job hunting.

or simply:

More and more subjects felt the difficulty in job hunting.

Then why do some students come out with syntactically ill-formed sentences such as those above? Some linguists such as Rutherford (1983) and Yip (1995) believe that the emergence of this structure is attributable to the influence of students' native language, Chinese. This structure bears a striking resemblance to the Chinese you structure, cf.;

 

You henduo biyesheng zaodao-le gongzuo

have many graduates found job there are

Many graduates have found jobs.

You in Chinese covers the meaning of have and there be in English, which is why instead of using there is/are, some low proficiency level students mistakenly mix up have and there is/are to use there have as seen above.

 

Idiomatic Expressions

Another finding from this corpus-based research is the identification of some idiomatic expressions used in the wrong pragmatic sense. By the way, for example, is an idiomatic expression used in casual discourse, but some students misemploy this expression in their reports. Some of the results (context enlarged) are presented in Figure 6 below.

Graph 3 shows the different sources of information about the subjects'

first jobs. It seems that most of the subjects tends to have newspaper

search for their first jobs. By the way, a small proportion of subjects

would have their information through Internet.

 

The number of jobs available in the market is getting lower. Even if you

could find a job, the salary may be shifted to a lower valve. By the way,

you could get the 'update' jobs information through Internet.

Figure 6

 

 

Methods of applying these functions and findings in teaching

The above findings are related to natural or normal phenomena in language acquisition. If we can heighten student awareness of the fact that they will be inevitably influenced by their first language on the path of achieving native-like proficiency in their second language, their development can potentially be much faster. The problem is how to find a proper way to explain these kinds of language phenomena to the students. Conducting a structural comparison between their native tongue and second language is too theoretical and lacks the power to motivate students. Pure grammar teaching may be considered boring by some students. However, when data from the corpus of students’ own writing are employed to help them to gain a better understanding of these structures, the motivational effect is increased. As teachers, what we actually have to do is simple: "…we simply provide the evidence needed to answer the learner's questions, and rely on the learner's intelligence to find answers" (Johns, 1991).

 

A procedure for applying these findings in teaching is as follows.

1. Present an item from the concordance data and ask the class to make comments on the use of this word, structure or idiomatic expression. Students will usually have mixed opinions on whether the usage is correct or not.

2. Have the students check their answers by making their own concord list using the standard English corpuses, such as Micro-concord, Brown or even from newspaper articles so as to compare their way of using this structure with ways used in standard English.

3. After consensus is reached with the class on the answer, conduct a brief discussion on the reasons why so many students misuse this particular word, structure or idiomatic expression. This procedure will help students to gain a better understanding of the standard usage.

4. Have students complete review exercises periodically on the usage of the words, structures or idiomatic expressions that have been discussed in class. These exercises might consist of multiple choice or error correction tasks.

5. Put similar exercises on the Web. These can be in a more interesting format such as games that motivate students and allow them to enjoy learning. This format also helps provide self-access learning for students.

It should be emphasised that the above procedures are not meant to be prescriptive in nature but are merely indicative of options available to teachers. It is not suggested that they are all-inclusive nor to be followed in lock-step fashion.

 

An example from the classroom

The following is a specific example of a technique that was used making use of concordancing. When dealing with the items hard and difficult, I used a cartoon that mocks an uncooperative and trouble-making child at home. The cartoon was shown on the overhead projector (OHP) and the students were asked to comment on it and then find some adjectives to describe the child. The words given by the students included difficult, hard and troublesome. I emphasised the words difficult and hard by putting them on the whiteboard, saying "Yes, this child is being difficult. He is hard to deal with". I then presented the concordances in Figure 7 on OHP, telling them the source for them.

ay, June and July, and people are hard to find their job in the fo

l results show that graduates are hard to seek their "prefect job"

It was found that graduates are difficult to find a job, the tendency

A lot of graduate students are difficult to find job from 1997 to

Figure 7

 

After explaining the context of each concordance, I asked the students if these sentences were correct. As usual, some said yes and others said no. I followed by asking, "Are these students/graduates/people difficult, or is the action, meaning the things they are doing difficult?" Students answered that the action is difficult, not the people/ students/ graduates. I followed by asking, "Then why, instead of the action, are the people/ students/ graduates in the position being described by the word difficult?" By then, most students understood what I meant and immediately gave me the correct version of these sentences:

For a lot of graduates, to find a job is difficult.

or It is difficult for a lot of graduates to find a job.

 

Conclusion

As shown above, the use of the current manageable information technology for concordancing and wordlist analysis can be effective. It "…makes the invisible visible" (Tribble, 1991). In addition, as with other applications of IT in language teaching and learning, using a corpus of students' own writing is self-motivating from the student perspective. It allows students opportunities to work with their own language production using the analytic tools available through computing, through which they will engage themselves in discovery mode - an effective state of mind for learning. In terms of teaching, this application of IT not only makes the invisible visible, but also provides a boost for students trying to solve their own real-language problems by bringing into the target language picture effectively-presented data drawn from their work. This method of using a computerised analysis of student language data for learning and teaching is applicable not only for teaching a specified genre as with the present study but also for teaching general writing as well.

The positive results from the research and its application in the classroom strongly suggest that there is a need for further student-corpus based research in second-language learning. There are many further dimensions to explore in bringing this kind of applied corpus analysis research into the learning situation to improve student writing.

 

References

Johns, T. (1991). From printout to handout: Grammar and vocabulary teaching in the context of data-driven learning. In T. Johns & P. King (Eds.). Classroom Concordancing, ELR Journal 4, 27-45. University of Birmingham.

Rutherford, W. (1983). Language Typology and Language Transfer. In S.M. Gass & L. Selinker (Eds.) Language Transfer in Language Learning. Rowley, Mass.: Newbury House. pp.358-370.

Scollon, R. (1993). Cumulative ambiguity: conjunctions in Chinese-English intercultural communication. Working Papers 5 (1), 55-73. Hong Kong: City University of Hong Kong.

Stevens, V. (1993). Concordances as enhancements to language competence. TESOL Matters 2(6), 11.

Stevens, V (1995). Concordancing with language learners: Why? When? What? CAELL Journal 6(2), 2-10.

Tang, E. & Ng, C. (1995) A study on the use of connectives in ESL Students’ writing. Perspectives Working Papers 5(1), 38-45. Hong Kong: City University of Hong Kong.

Tribble, C. (1991). Concordancing and an EAP writing program. CAELL Journal 1(2), 10-15.

Tribble, C. (1997). Improvising corpora for ELT: Quick and dirty ways of developing corpora for language teaching. In B. Lewandowska-Tomaszczyk & J. Melia (Eds.) Proceedings of 1st International Conference on Practical Applications in Language Corpora.

Yip, V. (1995). Interlanguage and Learnability: from Chinese to English. Amsterdam: John Benjamins Publishing Company.