Skip to main content Accessibility help

Roland Schäfer & Felix Bildhauer, Web Corpus Construction (Synthesis Lectures on Human Language Technologies 22). Morgan & Claypool, 2013. Pp. xv + 129.

  • Mats Wirén (a1)
Hide All
Alpert, Jesse & Hajaj, Nissan. 2008. We knew the web was big. . .
Baroni, Marco, Bernardini, Silvia, Ferraresi, Adriano & Zanchetta, Eros. 2009. The WaCkyWide Web: A collection of very large linguistically processed webcrawled corpora. Language Resources & Evaluation 43, 209226.
Biemann, Chris, Bildhauer, Felix, Evert, Stefan, Goldhahn, Dirk, Quasthoff, Uwe, Schäfer, Roland, Simon, Johannes, Swiezinski, Leonard & Zesch, Torsten. 2013. Scalable construction of high-quality web corpora. Journal for Language Technology and Computational Linguistics, 28 (2), 2359.
Fletcher, William H. 2013. Corpus analysis of the World Wide Web. In Chapelle, Carol A. (ed.), The Encyclopedia of Applied Linguistics, vol. 3, 1339–1347. Oxford: Wiley-Blackwell.
Hundt, Marianne, Nesselhauf, Nadja & Biewer, Carolin (eds.). 2007. Corpus Linguistics and the Web. Amsterdam: Rodopi.
Kilgarriff, Adam. 2001. Comparing corpora. International Journal of Corpus Linguistics 6 (1), 97133.
Kilgarriff, Adam. 2007. Googleology is bad science. Computational Linguistics 33 (1), 147151.
Kilgarriff, Adam & Grefenstette, Gregory. 2003. Introduction to the special issue on the web as corpus. Computational Linguistics 29 (3), 333347.
Loftsson, Hrafn & Östling, Robert. 2013. Tagging a morphologically complex language using an Averaged Perceptron Tagger: The case of Icelandic. 19th Nordic Conference of Computational Linguistics (NODALIDA), 105119. Linköping: Linköping University Electronic Press.
Nivre, Joakim, Hall, Johan, Nilsson, Jens, Chanev, Atanas, Eryigit, Gülsen, Kübler, Sandra, Marinov, Svetoslav & Marsi, Erwin. 2007. MaltParser: A language-independent system for data-driven dependency parsing. Natural Language Engineering 13, 95135.
Östling, Robert. 2013. Stagger: An open-source part of speech tagger for Swedish. Northern European Journal of Language Technology (NEJLT) 3, 118.
Renouf, Antoinette, Kehoe, Andrew & Banerjee, Jayeeta. 2007. WebCorp: An integrated system for web text search. In Hundt et al. (eds.), 2007, 47–67.
Schäfer, Roland & Bildhauer, Felix. 2012. Building large corpora from the web using a New Efficient Tool Chain. The Eighth International Conference on Language Resources and Evaluation (LREC), Istanbul, Turkey, 486–493.
Suchomel, Vít & Pomikálek, Jan. 2012. Efficient web crawling for large text corpora. The Seventh Web as Corpus Workshop (WAC), Lyon, France, 39–43.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Nordic Journal of Linguistics
  • ISSN: 0332-5865
  • EISSN: 1502-4717
  • URL: /core/journals/nordic-journal-of-linguistics
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed