Skip to main content Accessibility help
×
  • Cited by 4
    • Show more authors
    • You may already have access via personal or institutional login
    • Select format
    • Publisher:
      Cambridge University Press
      Publication date:
      May 2024
      June 2024
      ISBN:
      9781108904094
      9781009486781
      9781108822589
      Dimensions:
      (229 x 152 mm)
      Weight & Pages:
      0.306kg, 114 Pages
      Dimensions:
      (229 x 152 mm)
      Weight & Pages:
      0.18kg, 114 Pages
    • Subjects:
      Research Methods in Linguistics, Applied Linguistics, Language and Linguistics
      Series:
      Elements in Corpus Linguistics
    You may already have access via personal or institutional login
  • Selected: Digital
    Add to cart View cart Buy from Cambridge.org
    Subjects:
    Research Methods in Linguistics, Applied Linguistics, Language and Linguistics
    Series:
    Elements in Corpus Linguistics

    Book description

    This Element offers intermediate or experienced programmers algorithms for Corpus Linguistic (CL) programming in the Python language using dataframes that provide a fast, efficient, intuitive set of methods for working with large, complex datasets such as corpora. This Element demonstrates principles of dataframe programming applied to CL analyses, as well as complete algorithms for creating concordances; producing lists of collocates, keywords, and lexical bundles; and performing key feature analysis. An additional algorithm for creating dataframe corpora is presented including methods for tokenizing, part-of-speech tagging, and lemmatizing using spaCy. This Element provides a set of core skills that can be applied to a range of CL research questions, as well as to original analyses not possible with existing corpus software.

    References

    Anthony, L. (2020). Programming for corpus linguistics. In Paquot, M. and Gries, S. T., eds. Practical Handbook of Corpus Linguistics. Springer, pp. 181207.
    Biber, D., Conrad, S., & Cortes, V. (2004). If you look at … : Lexical bundles in university teaching and textbooks. Applied Linguistics, 25(3), 371405.
    Biber, D., & Egbert, J. (2018). Register Variation Online. Cambridge University Press.
    Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge University Press.
    Dunning, T. E. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 6174.
    Egbert, J., & Biber, D. (2019). Incorporating text dispersion into keyword analyses. Corpora, 14(1), 77104.
    Egbert, J., & Biber, D. (2023). Key feature analysis: A simple, yet powerful method for comparing text varieties. Corpora, 18(1), 121133.
    Gabrielatos, C. (2018). Keyness analysis: Nature, metrics and techniques. In Taylor, C. & Marchi, A., eds. Corpus Approaches to Discourse: A Critical Review. Routledge, pp. 225258.
    Hetland, M. L. (2014). Python Algorithms: Mastering Basic Algorithms in the Python Language. Apress.
    Honnibal, M., Montani, I., Van Landeghem, S., & Boyd, A. (2020). spaCy: Industrial-strength natural language processing in Python. https://spacy.io/
    Ide, N., & Suderman, K. (2004, May). The American National Corpus First Release. In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04), Lisbon, Portugal. European Language Resources Association (ELRA). https://aclanthology.org/L04-1313/
    Lee, K. D., & Hubbard, S. H. (2015). Data Structures and Algorithms with Python. Springer.
    Nivre, J., Agić, Ž., Ahrenberg, L. et al. (2017). Universal Dependencies 2.1. https://universaldependencies.org/u/pos/
    Rayson, P. (n.d.). Log-likelihood and effect size calculator. http://ucrel.lancs.ac.uk/llwizard.html
    Rychlý, P. (2008). A lexicographer-friendly association score. Proceedings from Recent Advances in Slavonic Natural Language Processing (pp. 69). Karlova Studánka, Czech Republic: Masaryk University. nlp.fi.muni.cz/raslan/2008/raslan08.pdf

    Metrics

    Altmetric attention score

    Full text views

    Total number of HTML views: 0
    Total number of PDF views: 0 *
    Loading metrics...

    Book summary page views

    Total views: 0 *
    Loading metrics...

    * Views captured on Cambridge Core between #date#. This data will be updated every 24 hours.

    Usage data cannot currently be displayed.

    Accessibility standard: Unknown

    Why this information is here

    This section outlines the accessibility features of this content - including support for screen readers, full keyboard navigation and high-contrast display options. This may not be relevant for you.

    Accessibility Information

    Accessibility compliance for the PDF of this book is currently unknown and may be updated in the future.