Natural Language Processing for Corpus Linguistics

Jonathan Dunn

doi:10.1017/9781009070447

References

Biber, D. (2012). Register as a Predictor of Linguistic Variation. Corpus Linguistics and Linguistic Theory, 8(1), 9–37.

Church, K., & Hanks, P. (1990). Word Association Norms, Mutual Information, and Lexicography. Computational Linguistics, 16(1), 22–29.

Diermeier, D., Godbout, J., Yu, B., & Kaufmann, S. (2011). Language and Ideology in Congress. British Journal of Political Science, 42(1), 31–55.

Dunn, J. (2013a). Evaluating the Premises and Results of Four Metaphor Identification Systems. In Gelbukh, A. (ed.), Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics, vol. 1 (pp. 471–486). Heidelberg: Springer.

Dunn, J. (2013). How Linguistic Structure Influences and Helps to Predict Metaphoric Meaning. Cognitive Linguistics, 24(1), 33–66.

Dunn, J. (2014). Measuring Metaphoricity. In Toutanova, K. & Wu, H. (eds.), Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 745–751). Stroudsburg, PA: Association for Computational Linguistics.

Dunn, J. (2015). Modeling Abstractness and Metaphoricity. Metaphor & Symbol, 30, 259–289.

Dunn, J. (2017). Computational Learning of Construction Grammars. Language & Cognition, 9(2), 254–292.

Dunn, J. (2018a). Finding Variants for Construction-Based Dialectometry: A Corpus-Based Approach to Regional CxGs. Cognitive Linguistics, 29(2), 275–311.

Dunn, J. (2018b). Modeling the Complexity and Descriptive Adequacy of Construction Grammars. In Jarosz, G., O’Connor, B., & Pater, J. (eds.), Proceedings of the Society for Computation in Linguistics (pp. 81–90). Stroudsburg, PA: Association for Computational Linguistics.

Dunn, J. (2018c). Multi-Unit Directional Measures of Association Moving Beyond Pairs of Words. International Journal of Corpus Linguistics, 23(2), 183–215.

Dunn, J. (2019a). Frequency vs. Association for Constraint Selection in Usage-Based Construction Grammar. In Chersoni, E., Jacobs, C., Lenci, A., Linzen, T., Prévot, L., & Santus, E. (eds.), Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics (pp. 117–128). Stroudsburg, PA: Association: for Computational Linguistics.

Dunn, J. (2019b). Global Syntactic Variation in Seven Languages: Towards a Computational Dialectology. Frontiers in Artificial Intelligence, Collection on Computational Sociolinguistics, 2. DOI: https://doi.org/10.3389/frai.2019.00015.

Dunn, J. (2019c). Modeling Global Syntactic Variation in English Using Dialect Classification. In Zampieri, M., Nakov, P., Malmasi, S., Ljubešić, N., Tiedemann, J., & Ali, A. (eds.), Proceedings of NAACL 2019 Sixth Workshop on NLP for Similar Languages, Varieties and Dialects (pp. 42–53). Stroudsburg, PA: Association for Computational Linguistics.

Dunn, J. (2020). Mapping Languages: The Corpus of Global Language Use. Language Resources and Evaluation, 54, 999–1018. DOI: https://doi.org/10.1007/s10579-020-09489-2.

Dunn, J. (2021). Representations of Language Varieties Are Reliable Given Corpus Similarity Measures. In Zampieri, M., Nakov, P., Ljubešić, N., Tiedemann, J., Scherrer, Y., & Jahuiainen, T. (Eds.), Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties, and Dialects (pp. 28–38). Stroudsburg, PA: Association for Computational Linguistics.

Dunn, J., & Adams, B. (2019). Mapping Languages and Demographics with Georeferenced Corpora. In Adams, B., de Roiste, M., Gahegan, M., Hulbe, C., O’Sullivan, D., Sila-Nowicka, K., Whigham, P., & Wilson, M. (eds.), Proceedings of Geocomputation 2019 (16 pp.). Auckland: N.p.

Dunn, J., & Adams, B. (2020, May). Geographically-Balanced Gigaword Corpora for 50 Language Varieties. In Calzolari, N., Béchet, F., Blache, P., Choukri, K., Cieri, C., Declerck, T., Goggi, S., Isahara, H., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., & Piperidis, S. (eds.), Proceedings of the 12th Language Resources and Evaluation Conference (pp. 2528–2536). Marseilles, European Language Resources Association.

Dunn, J., Argamon, S., Rasooli, A., & Kumar, G. (2016). Profile-Based Authorship Analysis. Literary and Linguistic Computing, 31(4), 689–710.

Dunn, J., Coupe, T., & Adams, B. (2020, Nov.). Measuring Linguistic Diversity During COVID-19. In Bamman, D., Hovy, D., Jurgens, D., O’Connor, B., & Volkova, S. (eds.), Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science (pp. 1–10). Online: Association for Computational Linguistics.

Dunn, J., & Nini, A. (2021). Production vs Perception: The Role of Individuality in Usage-Based Grammar Induction. In Chersoni, E., Hollenstein, N., Jacobs, C., Oseki, Y., Prévot, L., & Santus, E. (Eds.), Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics (pp. 149–159). Stroudsburg, PA: Association for Computational Linguistics.

Dunn, J., & Tayyar Madabushi, H. (2021). Learned Construction Grammars Converge Across Registers Given Increased Exposure. In Bisazza, A. & Abend, O. (Eds.), Proceedings of the Conference on Computational Natural Language Learning (pp. 471–486). Stroudsburg, PA: Association for Computational Linguistics.

Ellis, N. (2007). Language Acquisition as Rational Contingency Learning. Applied Linguistics, 27(1), 1–24.

Francis, W., & Kucera, H. (1967). Computational Analysis of Present-Day American English. Providence, RI: Brown University Press.

Gentzkow, M., Shapiro, J., & Taddy, M. (2018). Congressional Record for the 43rd–114th Congresses: Parsed Speeches and Phrase Counts (Tech. Rep.). Palo Alto, CA: Stanford Libraries. https://data.stanford.edu/congress_text

Gerlach, M., & Font-Clos, F. (2020). A Standardized Project Gutenberg Corpus for Statistical Analysis of Natural Language and Quantitative Linguistics. Entropy, 22(1), 126. DOI: https://doi.org/10.3390/e22010126

Goldberg, Y. (2017). Neural Network Methods in Natural Language Processing. Williston, VT: Morgan & Claypool Publishers.

Gries, S. T. (2013). 50-Something Years of Work on Collocations: What Is or Should Be Next. International Journal of Corpus Linguistics, 18(1), 137–165.

Hellrich, J., Kampe, B., & Hahn, U. (2019). The Influence of Down-Sampling Strategies on SVD Word Embedding Stability. In Rogers, A., Drozd, A., Rumshisky, A., & Goldberg, Y. (Eds.), Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP (pp. 18–26). Stroudburg, PA: Association for Computational Linguistics.

Kilgarriff, A. (2001). Comparing Corpora. International Journal of Corpus Linguistics, 6(1), 97–133.

Koppel, M., Schler, J., & Bonchek-Dokow, E. (2007). Measuring Differentiability: Unmasking Pseudonymous Authors. Journal of Machine Learning Research, 8, 1261–1276.

Landauer, T., Foltz, P., & Laham, D. (1998). Introduction to Latent Semantic Analysis. Discourse Processes, 25(2–3), 259–284.

Levy, O., Goldberg, Y., & Dagan, I. (2015, May). Improving Distributional Similarity with Lessons Learned from Word Embeddings. Transactions of the Association for Computational Linguistics, 3, 211–225.

Li, J. (2012). Hotel Reviews Dataset (Tech. Rep.). Carnegie Mellon University. www.cs.cmu.edu/~jiweil/html/hotel-review.html

McKenzie, G., & Adams, B. (2018). A Data-Driven Approach to Exploring Similarities of Tourist Attractions through Online Reviews. Journal of Location Based Services, 12(2), 94–118.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013). Distributed Representations of Words and Phrases and Their Compositionality. In Burges, C. J. C., Bottou, L., Welling, M., Ghahramani, Z., & Weinberger, K. Q. (Eds.), Proceedings of the 26th International Conference on Neural Information Processing Systems–Volume 2 (pp. 3111–3119). Red Hook, NY: Curran Associates Inc.

Mueller, A., Nicolai, G., Petrou-Zeniou, P., Talmina, N., & Linzen, T. (2020). Cross-Linguistic Syntactic Evaluation of Word Prediction Models. In Jurafsky, D., Chai, J., Schluter, N., & Tetreault, J. (Eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 5523–5539). Stroudsburg, PA: Association for Computational Linguistics.

Parsons, A. (2019). NY Times Article Lead Paragraphs 1851–2017 (Tech. Rep.). Kaggle. https://www.kaggle.com/parsonsandrew1/nytimes-article-lead-paragraphs-18512017

Pennebaker, J. (2011). The Secret Life of Pronouns: What Our Words Say About Us. New York: Bloomsbury Publishing.

Pennington, J., Socher, R., & Manning, C. (2014). GloVe: Global Vectors for Word Representation. In Moschitti, A., Pang, B., & Daelemans, W. (eds.), Empirical Methods in Natural Language Processing (EMNLP) (pp. 1532–1543). Stroudsburg, PA: Association for Computational Linguistics.

Petrov, S., Das, D., & McDonald, R. (2012). A Universal Part-of-Speech Tagset. In Calzolari, N., Choukri, K., Declerck, T., Uğur Doğan, M., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., & Piperidis, S. (eds.), Proceedings of the Eighth Conference on Language Resources and Evaluation (pp. 2089–2096). Paris: European Language Resources Association.

Taylor, J. (2004). Linguistic Categorization (3rded.). Oxford: Oxford University Press.

Wang, H., Lu, Y., & Zhai, C. (2011). Latent Aspect Rating Analysis Without Aspect Keyword Supervision. In Proceedings of the 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (pp. 618–626). New York: Association for Computing Machinery.

Zeman, D. et al. (2021). Universal Dependencies 2.8.1 (Tech. Rep.). LINDAT/CLARIAH-CZ Digital Library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University. http://hdl.handle.net/11234/1-3687

Zhao, J., Zhou, Y., Li, Z., Wang, W., & Chang, K.-W. (2018, October–November). Learning Gender-Neutral Word Embeddings. In Riloff, E., Chiang, D., Hockenmaier, J., & Tsujii, J. (eds.), Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (pp. 4847–4853). Brussels: Association for Computational Linguistics.

Zuboff, S. (2019). The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power. New York: PublicAffairs.

Natural Language Processing for Corpus Linguistics

This Element has been cited by the following publications. This list is generated based on data provided by Crossref.

Book description

References

Metrics

Altmetric attention score

Full text views

Book summary page views

Accessibility standard: Unknown

Why this information is here

Accessibility Information