Combining intuition with corpus linguistic analysis: A study

Combining intuition with corpus linguistic analysis: A study

Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students writing Maria Leedham [email protected] FLaRN 2010 BACKGROUND TO STUDY 2 FLaRN 2010 Maria Leedham Chunking through intuition: Study 1 RQ: To what extent can NSs and NNSs chunk NNS speech? Data: transcripts of 2 intermediate-level Japanese students speech students were recorded 3 times with a 2-month gap between each total of approx.1500 words across the 6 transcripts Method Step 1: 3 NS linguists asked to underline chunks in the 6 transcripts (training, examples and practice given first) Step 2: Japanese students asked to identify chunks in their own transcripts Step 3: author chunks transcripts with assistance from WordSmith Tools (Leedham, 2006) FLaRN 2010 Maria Leedham

Example of chunked transcript from Study 1 Key: italics - words classified by the NNS as a chunk. underline words 2 or 3 out of the 3 NSs classified as a chunk 1 ahhfirst err I, I learned, learnt? (mmhmm I learnt) err (2.0) I should err.. I 2 should be more positive? (right) positive in UK because ahhwhen, when I 3 went to London err last Sunday (mhmm) ahh (2.0) some, some of the 4 underground line (mm) line was no service (oh dear) ((speaker laughs)) I was 5 really surprised and, because it can, cannot be (mm) in Japan (mm) you know, 6 sun- in, in Sunday, on? (mm) on Sunday many, many people (mm) come to 7 London (mm) and go around some place (mm)... so everyone need to, need a 8 train (mm) so, but maybe four or five lines was not, no service (mm) so 9 I I have to think err what I should do ((speaker laughs)) and no, Ive never, I 10have never been to London that, so, this was the first time Ive been to London 11(mm) so FLaRN 2010 Maria Leedham Findings from Study 1 Findings: little inter or intra-rater reliabilitiy

many missing chunks (eg of course, you know) both across and within raters frustrating and time-consuming task for NSs BUT the Japanese ss could do this task AND also offered insights into when/why (eg student M: I used to say that but now I know its not usual.) the more time spent looking for chunks, the more will be found Coda a further recording, transcribing & awareness-raising cycle suggests that this resulted in uptake both students found it highly motivating to record and analyse transcripts of their talk FLaRN 2010 Maria Leedham Chunking through intuition: Study 1 Method Step 1: 3 NS linguists asked to underline chunks in the 6 transcripts (training, examples and practice given first) Step 2: Japanese students asked to identify chunks in their own transcripts Step 3: author chunks transcripts with assistance from WordSmith Tools v.5 FLaRN 2010 Maria Leedham STUDY 2: FLaRN 2010 Maria Leedham

Outline 1. Research questions 2. The students and the texts 3. The two methods 4. Findings 4.1. Method 1 4.2 Method 2 5. Conclusions and Implications FLaRN 2010 Maria Leedham Research Questions 1. What can a study of lexical chunks reveal about these Chinese students writing? 2. What does each method contribute? FLaRN 2010 Maria Leedham The Students Criteria - L1 Chinese (Mandarin or Cantonese) - All secondary education in home country - Contributions from years 1 & 2 and year 3 of undergraduate study Wei Male BSc Engineering Ping

Female BA Hospitality, Leisure & Tourism Management (HLTM) Feng Female BSc Food Science with Business Hong Male BA HLTM FLaRN 2010 Maria Leedham The texts Student Discipline No. words No. texts Wei Engineering

12,779 10 Feng 13,683 10 Ping Food Science HLTM 13,368 5 Hong HLTM 8,537 4

Corpus Totals 48,367 29 Reference corpora No. Texts No. Words EnglishEngineering 97 203,379 English-Food 28 73,402

English-HLTM 55 64,563 All-L1Chinese 146 279,695 All-L1English 611 1,335,676 FLaRN 2010 Maria Leedham Combining intuition and corpus searches Method 1: Manual analysis Read all 4 Chinese students texts Read twice, with 6 months between Read equivalent, randomlyselected English students texts Noted salient features, then searched corpora of the individuals

texts, the discipline, all Chinese students writing, all English students writing. Method 2: Key n-gram searches Used WordSmith Tools, v.5 (Scott, 2008) Searched for key n-grams in the corpus of texts from each student, using relevant discipline corpus from L1 English as reference Setting p=0.00001, deleted short ngrams within longer n-grams Compiled key n-gram lists Looked at concordance lines and texts for more context FLaRN 2010 Maria Leedham Formulaic sequences in sample of Weis writing (Engineering) Introduction A design methodology for a gearbox is presented in this report. The input horse power, the input speed and net reductions in the gearbox are the parameters to be specified. A gearbox takes an input shaft rotating and converts it via a gear train into up to three outputs, the process of designing a gearbox is to figure out which ratios are needed and to implement those

ratios in the form of positioning various sizes of connected gears. The specification of the gearbox depends on its area of application. In this report, a gearbox is designed for a commercial meat slicer which has its final shaft rotating at between 80 and 100 rev/min. The input of the meat slicer is a constant speed AC motor running at 1800 rev/min and delivering 1.2 kW. A few points have to be considered on this system, the size of the gearbox is severe restricted, since it has to go onto a work surface where there is severe competition for space. And the motor may be in-line or at right angles to the grinder. Furthermore, the duty is expected to be up to 6 hours per day. FLaRN 2010 Maria Leedham Outline 1. Research questions 2. The students and the texts 3. The two methods 4. Findings 4.1. Method 1 4.2 Method 2 5. Conclusions and Implications FLaRN 2010 Maria Leedham Idiosyncratic language In one word computer based tools contribute an In one word the overall system can be described (Wei, years 2 & 3) In light of this, it is suggested that buying IHG In light of this, it can be suggested that In light of this, it is recommended that buying IHG (Ping, year 3, in 1 text)

but simply writing a responsible tourism policy is no longer enough. It is a must to show practical action, (Hong, Year 1) a winning city, the authorities of Liverpool have to rebuild its image to get rid of the negative picture. (Hong, Year 2) and boost its marketing campaigns in order to catch the worlds eyes on Scotland. (Hong, year 3) FLaRN 2010 Maria Leedham Vague language In catering services, restaurants in Oxford and Bath are more or less the same. (Hong, Year 1) From those tables, the same thing as section 3.1 could be found (Wei, Year 1). a measurement system for measuring low-lever force, a kind of cantilever rig which is called A kind of variable inductance sensor has been chosen Furthermore, with processing data, a kind of filter is always needed to separate certain (Wei, year 2, same assignment) At that time, I found that this hotel is a little bit out of my expectation. (Hong, Year 2) FLaRN 2010 Maria Leedham

Vague language N Concordance 1 of albumin solution and perchlorate acid. Therefore a bit of RNA was digested, and that gives a relatively high 2 acid and the reaction. The absorbance of tube 1 is a bit higher than the control, there might be a bit of DNase 3 greater, so it seems that the process has a little bit more risk to produce products over the LSL than to 4 the IBT and the conferences; however, there is a little bit different in the rate structure of the ILT. Since there 5 for introduction of contributory negligence may be a bit tight. Although contributory negligence may be 6 and hence that person does not mind paying a little bit extra for this. There is also the public perception that 7

lead them to a food source. Trail pheromones pose a bit of a problem for ants though because they need to 8 is something I'm not used to doing, so it comes as a bit of a shock. I did encounter difficulties using Xemacs 9 everyday use, this type to identity recognition seems a bit extreme, and the use of passwords and usernames 10 Seeing so many sliders and buttons may seem a bit overwhelming for some people. After reading the L1 English students use: a bit of a + N eg a bit of a problem, a bit of a shock, a bit of a dogs breakfast Often this is from reflective writing The conclusion was also a bit of a victim in my editings, bringing it down to one small sentence for each of the areas of discussion. (6101c Cybernetics Year 3 essay) FLaRN 2010 Maria Leedham Chunks with and without I & we From the experiment, it was known that the mechanical properties of carbon steel AN and carbon steel N. It was found out the mechanical properties of carbon

steel AN was incorrect in this experiment, Meanwhile, if we clipped the current probe round one of the motor supply leads, and connected it to Ch1 of the oscilloscope, we could get two copies of the transient starting current of the motor from the oscilloscope. From these two copies, we could calculated (Wei, Year 1) FLaRN 2010 Maria Leedham Chunks with and without I & we 7000 L1English students 6000 Cluster Freq.

I FEEL THAT 8 I WAS DOING 7 WHAT I WAS 7 3000 I HAVE LEARNT 7 2000 HAVE LEARNT THAT 6 I SHOULD HAVE 6

FEEL THAT I 5 I NEED TO 5 5000 pmw 4000 I we 1000 0 student(s) FLaRN 2010 Maria Leedham Linkers

This can create a positive image for Scotland, on the other hand, (Ping Year 3) In other words, people are buying expectations... (Hong, year 3) As a consequence, it can attract many travelers (Hong, Year 2) On the contrary, the predominance of SMEs... (Ping, Year 2) First of all, the dimension of the brake disc is decided. (Wei, Year 3) What is more, Bath is served by a large number of local bus services (Hong, Year 1) References to data as shown in table (Wei x 2, Ping x 2) according to (Wei x 4) as illustrated in table + NUMBER (Ping x 2) FLaRN 2010 Maria Leedham Summary of method 1 findings Salient chunks in the Chinese students writing were: Idiosyncratic chunks (in light of the) Vague language (a bit of) though note English students use of a little bit of High use of chunks with we and low use of chunks with I partly due to English students reflective writing Use of favoured linkers (on the other hand)

Reference to data in tables and figures (according to the equation) BUT very difficult to intuit chunks in unfamiliar disciplines FLaRN 2010 Maria Leedham Outline 1. Research questions 2. The students and the texts 3. The two methods 4. Findings 4.1. Method 1 4.2 Method 2 5. Conclusions and Implications FLaRN 2010 Maria Leedham Method 2: Key n-gram searches Used WordSmith Tools, version 5 (Scott, 2008) Searched for key n-grams (= key clusters) in the corpus of texts from each of the 4 students Relevant discipline corpus from L1 English used as reference corpus P=0.00001, deleted short n-grams within longer n-grams Compiled a key n-gram list for each student Grouped these key n-grams into themes Looked at concordance lines for more context FLaRN 2010 Maria Leedham Ping: HLTM

Rank N-grams Cluster 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

24 25 25 the hospitality industry recruitment and selection in the hospitality industry please see appendix with reference to appendix higher than the original figure of the new level of net profit quality of service the cost of to the guests it is believed that of the employees there will be of the group to reach the break even point on the other hand will be a high quality of service cost of sales the nature of Watson and Head IHG annual report a higher contribution Atrill and McLaney

P E ratio served to the Ping Freq. Ping Texts 16 15 10 10 8 8 8 8 7 7 6 6 8 8 5 5 5 5 5

5 5 5 5 5 5 5 3 1 2 1 1 1 1 3 5 2 2 1 2 1 1 3 2 2 2 2

1 1 1 1 1 1 FLaRN 2010 Maria Leedham L1Eng L1Eng HLTM HLTM Freq. Texts Keyness 42 0 20 0 0 0 0 0 0 5 2 0 3 1 0 2

3 0 0 2 0 0 0 0 0 0 12 0 9 0 0 0 0 0 0 3 2 0 3 1 0 1 3

0 0 2 0 0 0 0 0 0 60 56 37 37 30 30 30 30 26 26 22 22 21 21 19 19 19 19

19 19 19 19 19 19 19 19 Idiosyncratic language N Concordance 1 the new level of net profit,559.5, is 62.17% higher than the original figure of 345, which is a significant growth. g) 2 The new level of net profit,609, is 76.52% higher than the original figure of 345. Business decision 8 Promotion 3 The new level of net profit,545, is 57.97% higher than the original figure of 345. Business decision 7The other 4 new level of net profit is477, which is 38.33% higher than the original figure of 345. Business decision 6There is a la 5 The new level of net profit,513, is 48.70% higher than the original figure of 345. Business decision 5It is clearly

6 The new level of net profit,541, is 56.81% higher than the original figure of 345. Business decision 4By 7 The new level of net profit,527, is 52.75% higher than the original figure of 345. Business decision 3Since the 8 The new level of net profit,625, is 81.16% higher than the original figure of 345. Business decision 2Since the Ping's year 2 proposal aim of the of the assignment is to design to develop an understanding of (Wei) (the) aim of the assignment object FLaRN 2010 Maria Leedham is to design is to develop an understanding of

Discipline-specific n-grams Marriott Liverpool city centre, the Liverpool tourism industry, the tourism industry (Hong) the hospitality industry, recruitment and selection, in the hospitality industry (Ping) Passive voice be worked out, can be calculated (Wei) there will be, it is believed that (Ping) References to data with reference to appendix, please see appendix (Ping) in the appendix, briefing sheet in appendix, is shown as, tables of data, were recorded as below was calculated with eq. (Wei) FLaRN 2010 Maria Leedham Favoured linkers decrease over time 350 300 250 200 Subcorpus 150

Chi12 Chi3 100 Eng12 50 0 pmw Eng3 on the in the long at the same other hand run time in other words Linker FLaRN 2010 Maria Leedham last but not least Summary of method 2 findings Many of the same findings from method 1

idiosyncratic chunks some linkers esp. on the other hand low use of chunks with I references to data Also. discipline-specific chunks Easy to compare one students texts with the discipline reference corpus & each L1 reference corpus Similar findings occur within the Chinese students overall NB Keyness measures difference FLaRN 2010 Maria Leedham Outline 1. Research questions 2. The students and the texts 3. The two methods 4. Findings 4.1. Method 1 4.2 Method 2 5. Conclusions and Implications FLaRN 2010 Maria Leedham Intuitive reading Key n-grams analysis Finds frequent chunks (n-grams) Finds semantically whole units

(formulaic sequences) Plus Plus A person can recognise single instances Large quantities of data can be analysed quickly that a computer would miss The text is read as a complete document Accurate - as intended by the writer Easily replicable Minus Time-consuming and tiring Problem of inter-rater reliability Problem of intra-rater consistency Hard to replicate Minus

Single chunks are missed Arbitrary parameters Conflation of writing from lots of individuals Sense of text as complete document is lost FLaRN 2010 Maria Leedham 30 of 10 Combining methods Combine the two methods through a recursive process of reading texts and checking the sequences in a corpus, also searching for key n-grams for less intuitive sequences. ultimately, the most revealing insights will be gained from a closer look at the texts, the speakers, and the situational variables; quantitative analysis alone can never provide a satisfactory picture (Simpson, 2004:41). FLaRN 2010 Maria Leedham FLaRN 2010 Maria Leedham References Foster, P. (2001). "Rules and routines: A consideration of their role in the task-based langage production of native and non-native speakers", in M.

Bygate, P. Skehan, and M. Swain, (eds.), Task-Based Learning: Language Teaching, Learning and Assessment. Longman: London. Heuboeck, A., Holmes, J. & Nesi, H. 2007 The Bawe Corpus Manual. Retrieved from http://www.coventry.ac.uk/researchnet/d/505/a/5160. Leedham, 2006. Do I speak better? A longitudinal study of lexical chunking in the spoken language of two Japanese students. In The East Asian Learner. Scott, M. 2008. WordSmith Tools v.5. Oxford University Press. Wray, A. (2002). Formulaic Language and the Lexicon. Cambridge University Press. BAWE corpus- ESRC project number: RES-000-23-0800 FLaRN 2010 Maria Leedham

Recently Viewed Presentations

  • Experience voice in 4G Cartagena , 27 August

    Experience voice in 4G Cartagena , 27 August

    Network deployments and developments (NGN and LTE) 1.2999999999999999E-2 0.126 0.155 0.16400000000000001 0.26100000000000001 0.28199999999999997 % of survey respondents Other (please specify) SDN Ethernet High-definition video Wi-Fi Multiscreen video Machine to machine (M2M) On-demand video Converged billing Apps and app stores Superfast...
  • Port Hope Elkton-PigeonBay Port Bad Axe Caseville Huron

    Port Hope Elkton-PigeonBay Port Bad Axe Caseville Huron

    Student Achievement Model Purpose: Improve student achievement Method: Build capacity in local districts by maximizing leadership potential of Teacher Leaders SAM is an RtI Model Strong core programs Early intervention Research-based practices Data-based decision making A continuum of instructional support...
  • Using SIF in a District: How You Can

    Using SIF in a District: How You Can

    Blogger of LeaderTalk Runner up in SIF Excellence Awards 2010 Father, Husband, Twin Brother and Former D1 Lacrosse Goalie The Vision IEP Stock photo. Release for web use of this photo on file. Curriculum Mapping IEP Professional Development Library System...
  • SUPER TURKEY! - musicbulletinboards.net

    SUPER TURKEY! - musicbulletinboards.net

    SUPER TURKEY! I'll tell you a story 'bout a little bird. Strangest story you've every heard. Super Turkey is his name, And getting' away is how he earned his fame. SUPER TURKEY always gets away SUPER TURKEY You won't eat...
  • 1453 - Loudoun County Public Schools

    1453 - Loudoun County Public Schools

    1450-1750The Early Modern Era. Review Questions. What was the general status of women from1450 - 1750? ... Increase in the influence of Neo-Confucianism in China. New art forms in the Mughal Empire in India. ... Mongol warlord - Babur (aka...
  • Cours De Partenariat Public/Prive

    Cours De Partenariat Public/Prive

    C'est dans cette loi qu'on retrouve les EPIC, les sociétés nationales, mais ces deux formes ne relèvent pas des partenariat public/privé parce qu'il n'y a pas de personnes privées. Au delà de ces deux formes, cette loi concerne aussi les...
  • Markus 16:1-8 God nooi ons uit  en ons

    Markus 16:1-8 God nooi ons uit en ons

    Vertel die verhaal. Nooi dan alle kinders vorentoe. Gee vir elkeen 'n leë vuurhoutjiedosie. Laat hulle daarin kyk en saam met die hele gemeente sê "Jesus se graf was leeg! Dit is 'n teken van nuwe lewe!" Gee vir elkeen...
  • Registering with NCID for the NCIR This presentation

    Registering with NCID for the NCIR This presentation

    Activate your account by clicking on the link in the authentication email sent from NCID. Give your username to your NCIR administrator* Once your NCIR Administrator adds you into the system, you will be able to log in. *Administrator is...