Skip to main content Accessibility help
×
Hostname: page-component-848d4c4894-75dct Total loading time: 0 Render date: 2024-05-07T08:24:14.868Z Has data issue: false hasContentIssue false

3 - Comparing Approaches to (Sub-)Register Variation

The ‘Press Editorials’ Sections in the British, Canadian and Jamaican Components of ICE

from Part II - Selection, Calibration and Preparation of Corpus Data

Published online by Cambridge University Press:  06 May 2022

Ole Schützler
Affiliation:
Universität Leipzig
Julia Schlüter
Affiliation:
Universität Bamberg
Get access

Summary

Two methods are applied to detect differences between corpus (sub )registers, exemplified by the press editorials sections in the British, Canadian and Jamaican components of the International Corpus of English. By design, these methods are apt to target differences between varieties that are represented by putatively comparable corpus material, but it turns out that many of the observed differences can in fact be laid at the door of different sampling strategies applied by corpus compilers. In the example at hand, contrasts can be traced back to the division into institutional and personal editorials. This finding gives rise to a call for a higher granularity of sampling schemes, richer metadata (e.g. on the situational characteristics of the language samples included), and better documentation. As for the methods chosen, the author demonstrates that corpus-driven profiling based either on POS monograms or on higher-level multi-dimensional analysis performs reasonably well, with smaller differences in robustness and computational expense.

Type
Chapter
Information
Data and Methods in Corpus Linguistics
Comparative Approaches
, pp. 75 - 100
Publisher: Cambridge University Press
Print publication year: 2022

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Further Reading

Biber, Douglas, and Conrad, Susan. 2019. Register, Genre, and Style. Cambridge Textbooks in Linguistics. 2nd ed. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Bohmann, Axel. 2020. Variation in English Worldwide: Registers and Global Varieties. Studies in English Language. Cambridge: Cambridge University Press.Google Scholar
Fang, Alex C., and Cao, Jing. 2015. Text Genres and Registers: The Computation of Linguistic Features. Heidelberg: Springer.CrossRefGoogle Scholar
Meyer, Charles F. 2004. Can You Really Study Language Variation in Linguistic Corpora? American Speech 79(4). 339–55.CrossRefGoogle Scholar
Sigley, Robert. 2012. Assessing Corpus Comparability Using a Formality Index: The Case of the Brown/LOB Clones. In Yamazaki, Shunji, Sigley, Robert and Saito, Toshio, eds. Approaching Language Variation Through Corpora: A Festschrift in Honour of Toshio Saito. Linguistic Insights. Bern: Lang. 65114.Google Scholar

References

Aggarwal, Charu C. 2018. Machine Learning for Text. Cham: Springer.Google Scholar
Alonso Belmonte, Maria Isabel. 2007. Newspaper Editorials and Comment Articles: A “Cinderella” Genre? Revista Electrónica de Lingüística Aplicada 19.Google Scholar
Anthony, Laurence. 2018. AntConc. Tokyo: Waseda University.Google Scholar
Bell, Allan. 1991. The Language of News Media. Language in Society. 1st ed. Oxford: Blackwell.Google Scholar
Biber, Douglas. 1988. Variation Across Speech and Writing. 1st ed. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Biber, Douglas, and Conrad, Susan. 2009. Register, Genre, and Style. Cambridge Textbooks in Linguistics. 1st ed. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Bonyadi, Alireza. 2011. Linguistic Manifestations of Modality in Newspaper Editorials. International Journal of Linguistics 3(1) 16 pages.Google Scholar
Cavnar, William, and Trenkle, John. 1994. N-Gram-Based Text Categorization. Proceeding of the Third Annual Symposium on Document Analysis and Information Retrieval. Reno, NV: Information Science Research Institute, University of Nevada. 161–77.Google Scholar
Cotter, Colleen. 2010. News Talk: Investigating the Language of Journalism. Cambridge: Cambridge University Press.Google Scholar
Fang, Alex C. 1996. AUTASYS: Grammatical Tagging and Cross-Tagset Mapping. In Greenbaum, Sidney, ed. Comparing English Worldwide: The International Corpus of English. Oxford: Clarendon Press. 110–24.Google Scholar
Fang, Alex C., and Cao, Jing. 2015. Text Genres and Registers: The Computation of Linguistic Features. Heidelberg: Springer.CrossRefGoogle Scholar
Fartousi, Hassan, and Dumanig, Francisco P.. 2012. Rhetoric of Daily Editorials: A Review Study of Selected Rhetorical Analyses on Daily Editorials. Advances in Asian Social Science 2(1). 373–6.Google Scholar
Garside, Roger, and Smith, Nicholas. 1997. A Hybrid Grammatical Tagger: CLAWS4. In Roger Garside, Geoffrey N. Leech and McEnery, Tony, eds. Corpus Annotation: Linguistic Information from Computer Text Corpora. London: Longman. 102–21.CrossRefGoogle Scholar
Greenbaum, Sidney. 1996. Introducing ICE. In Greenbaum, Sidney, ed. Comparing English Worldwide: The International Corpus of English. Oxford: Clarendon Press. 313.CrossRefGoogle Scholar
Gries, Stefan T., Newman, John and Shaoul, Cyrus. 2011. N-Grams and the Clustering of Registers. Empirical Language Research 5(1).Google Scholar
Grieve, Jack. 2014. A Comparison of Statistical Methods for the Aggregation of Regional Linguistic Variation. In Szmrecsanyi, Benedikt and Wälchli, Bernhard, eds. Aggregating Dialectology, Typology, and Register Analysis: Linguistic Variation in Text and Speech. Linguae and Litterae 28. Berlin: Mouton de Gruyter. 5388.Google Scholar
Hundt, Marianne. 2015. World Englishes. In Biber, Douglas and Reppen, Randi, eds. The Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press. 381400.Google Scholar
James, Gareth, Witten, Daniela, Hastie, Trevor and Tibshirani, Robert. 2013. An Introduction to Statistical Learning. Springer Texts in Statistics 103. Heidelberg: Springer.CrossRefGoogle Scholar
Kirk, John, and Nelson, Gerald. 2017. Review of the ICE Project 2016/17. Paper presented at ICAME38, Prague, 25 May.Google Scholar
Kirk, John, and Nelson, Gerald. 2018. The International Corpus of English Project: A Progress Report. World Englishes 37(4). 697716.Google Scholar
Liaw, Andy, and Wiener, Matthew. 2002. Classification and Regression by randomForest. R News 2(3). 1822.Google Scholar
Ljung, Magnus. 2000. Newspaper Genres and Newspaper English. In Ungerer, Friedrich, ed. English Media Texts, Past and Present: Language and Textual Structure Pragmatics and Beyond 80. Amsterdam: John Benjamins. 129214.Google Scholar
McNair, Brian. 2009. I, Columnist. In Franklin, Bob, ed. Pulling Newspapers Apart: Analysing Print Journalism. 1st ed. London: Routledge. 112–20.Google Scholar
Moisl, Hermann L. 2015. Cluster Analysis for Corpus Linguistics. Quantitative Linguistics 66. Berlin: Mouton de Gruyter.CrossRefGoogle Scholar
Morley, John, and Murphy, Amanda. 2011. The Peroration Revisited. In Bhatia, Vijay K. and Gotti, Maurizio, eds. Explorations in Specialized Genres. Bern: Lang. 199216.Google Scholar
Müller, Horst. 2011. Journalistisches Arbeiten: Journalistische Grundlagen Journalistische Arbeitstechniken Journalistische Darstellungsformen. Reihe Mediengestützte Wissensvermittlung 5. 1st ed. Mittweida: Hochschulverlag.Google Scholar
Nelson, Gerald. 1996. The Design of the Corpus. In Greenbaum, Sidney, ed. Comparing English Worldwide: The International Corpus of English. Oxford: Clarendon Press. 2735.CrossRefGoogle Scholar
Nelson, Gerald, Wallis, Sean and Aarts, Bas. 2002. Exploring Natural Language: Working with the British Component of the International Corpus of English. Varieties of English Around the World. Amsterdam: John Benjamins.CrossRefGoogle Scholar
Nini, Andrea. 2014. Multidimensional Analysis Tagger. https://sites.google.com/site/multidimensionaltagger/home.Google Scholar
Petrenz, Philipp, and Webber, Bonnie. 2011. Stable Classification of Text Genres. Computational Linguistics 37(2). 385–93.CrossRefGoogle Scholar
R Development Core Team. 2008. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing.Google Scholar
Reeves, Ian, and Keeble, Richard. 2014. The Newspapers Handbook. Media practice. 5th ed. London: Routledge.Google Scholar
Richardson, John. 2009. Readers’ Letters. In Franklin, Bob, ed. Pulling Newspapers Apart: Analysing Print Journalism. 1st ed. London: Routledge. 5869.Google Scholar
Rüdiger, Sofia. 2016. Cuppa Coffee? Challenges and Opportunities of Compiling a Conversational English Corpus in an Expanding Circle Setting. In Christ, Hanna, Klenovšak, Daniel, Sönning, Lukas and Werner, Valentin, eds. A Blend of MaLT: Selected Contributions from the Methods and Linguistic Theories Symposium 2015. Bamberger Beiträge zur Linguistik Band 15. Bamberg: University of Bamberg Press. 4971.Google Scholar
Santini, Marina. 2004. A Shallow Approach to Syntactic Feature Extraction for Genre Classification. In Lee, Mark, ed. Proceedings of the 7th Annual CLUK Research Colloquium: 6th & 7th January 2004, University of Birmingham. Birmingham: University of Birmingham, School of Computer Science.Google Scholar
Sigley, Robert. 2012. Assessing Corpus Comparability Using a Formality Index: The Case of the Brown/LOB Clones. In Yamazaki, Shunji, Sigley, Robert and Saito, Toshio, eds. Approaching Language Variation through Corpora: A Festschrift in Honour of Toshio Saito. Linguistic Insights. Bern: Lang. 65114.Google Scholar
Straßner, Erich. 2000. Journalistische Texte. Grundlagen der Medienkommunikation 10. Berlin: Mouton de Gruyter.Google Scholar
Tang, Xiaoyan, and Cao, Jing. 2015. Automatic Genre Classification via N-grams of Part-of-Speech Tags. Procedia – Social and Behavioral Sciences 198. 474–8.Google Scholar
Thompson, Geoff. 2014. Intersubjectivity in Newspaper Editorials: Construing the Reader-in-the-Text. In van de Velde, Freek, Brems, Lieselotte and Ghesquière, Lobke, eds. Intersubjectivity and Intersubjectification in Grammar and Discourse: Theoretical and Descriptive Advances. Benjamins Current Topics 65. Amsterdam: John Benjamins. 77100.Google Scholar
Vetter, Fabian. 2021. Issues of Corpus Comparability and Register Variation in the International Corpus of English: Theories and Computer Applications. PhD Dissertation, University of Bamberg. doi: https://doi.org/10.20378/irb-52406.CrossRefGoogle Scholar
Vetter, Fabian. 2022. ICEtree. https://osf.io/ztfsx/.Google Scholar
Wahl-Jorgensen, Karin. 2009. Op-ed Pages. In Franklin, Bob, ed. Pulling Newspapers Apart: Analysing Print Journalism. 1st ed. London: Routledge. 70–8.Google Scholar
Werner, Valentin. 2014. The Present Perfect in World Englishes: Charting Unity and Diversity. Bamberger Beiträge zur Linguistik 5. Bamberg: University of Bamberg Press.Google Scholar
Westin, Ingrid. 2002. Language Change in English Newspaper Editorials. Language and Computers 44. Amsterdam: Rodopi.CrossRefGoogle Scholar

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×