Skip to main content

Professional language in Swedish clinical text: Linguistic characterization and comparative studies

  • Kelly Smith (a1), Beata Megyesi (a2), Sumithra Velupillai (a3) and Maria Kvist (a4)

This study investigates the linguistic characteristics of Swedish clinical text in radiology reports and doctor's daily notes from electronic health records (EHRs) in comparison to general Swedish and biomedical journal text. We quantify linguistic features through a comparative register analysis to determine how the free text of EHRs differ from general and biomedical Swedish text in terms of lexical complexity, word and sentence composition, and common sentence structures. The linguistic features are extracted using state-of-the-art computational tools: a tokenizer, a part-of-speech tagger, and scripts for statistical analysis. Results show that technical terms and abbreviations are more frequent in clinical text, and lexical variance is low. Moreover, clinical text frequently omit subjects, verbs, and function words resulting in shorter sentences. Clinical text not only differs from general Swedish, but also internally, across its sub-domains, e.g. sentences lacking verbs are significantly more frequent in radiology reports. These results provide a foundation for future development of automatic methods for EHR simplification or clarification.

Hide All
Aantaa Kirsi. 2012. Mot patientvänligare epikriser. En kontrastiv undersökning [Towards more patient friendly discharge letters: A contrastive study]. MA thesis, Department of Nordic Languages, University of Turku.
Adnan Mehnaz, Warren Jim & Orr Martin. 2010. Assessing text characteristics of electronic discharge summaries and their implications for patient readability. Proceedings of the Fourth Australasian Workshop on Health Informatics and Knowledge Management 108, 77–84.
Allvin Helen. 2010. Patientjournalen som genre. En text- och genreanalys om patientjournalers relation till patientdatalagen [The patient record as genre: A text and genre analysis of the relationship of patient records and the Patient Data Act]. MA thesis, Department of Nordic Languages, Stockholm University.
Allvin Helen, Carlsson Elin, Dalianis Hercules, Danielsson-Ojala Riitta, Daudaravicius Vidas, Hassel Martin, Kokkinakis Dimitrios, Lundgren-Laine Heljö, Nilsson Gunnar H, Nytrø Øystein, Salanterä Sanna, Skeppstedt Maria, Suominen Hanna & Velupillai Sumithra. 2011. Characteristics of Finnish and Swedish intensive care nursing narratives: A comparative analysis to support the development of clinical language technologies. Journal of Biomedical Semantics 2 (Suppl. 3):S1.
Aramaki Eiji, Miura Yasuhide, Tonoike Masatsugu, Ohkuma Tomoko, Mashuichi Hiroshi & Ohe Kazuhiko. 2009. TEXT2TABLE: Medical Text summarization system based on named entity recognition and modality identification. Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing (BioNLP ’09), 185–192.
Biber Douglas & Conrad Susan. 2009. Register, Genre, and Style. Cambridge: Cambridge University Press.
Borin Lars, Grabar Natalia, Gronostaj Maria Toporowska, Hallett Catalina, Hardcastle David, Kokkinakis Dimitrios, Williams Sandra & Willis Alistair. 2009. Semantic Mining Deliverable D27.2: Empowering the Patient with Language Technology (Technical Report Semantic Mining, NOE 507505), 175. Göteborg: Göteborg University.
Bretschneider Claudia, Zillner Sonja & Hammon Matthias. 2013. Identifying pathological findings in German radiology reports using a syntacto-semantic parsing approach. Proceedings of the 2013 Workshop on Biomedical Natural Language Processing (BioNLP ’13), 27–35.
Campbell David A. & Johnson Stephen B.. 2001. Comparing syntactic complexity in medical and non-medical corpora. AMIA Annual Symposium Proceedings 2001, 90–94.
Coden Anni R., Pakhomov Serguei V., Ando Rie K., Duffy Patrick H. & Chute Christopher G.. 2005. Domain-specific language models and lexicons for tagging. Journal of Biomedical Informatics 38 (6), 422430.
Dalianis Hercules, Hassel Martin, Henriksson Aron & Skeppstedt Maria. 2012. Stockholm EPR Corpus: A clinical database used to improve health care. Proceedings of Fourth Swedish Language Technology Conference, 17–18.
Dalianis Hercules, Hassel Martin & Velupillai Sumithra. 2009. The Stockholm EPR Corpus – characteristics and some initial findings. Proceedings of the 14th International Symposium on Health Information Management Research – ISHIMR 2009, 243–249.
Fan Jung Wei, Yang Elly W., Jiang Min, Prasad Rashmi, Loomis Richard M., Zisook Daniel S., Denny Josh C., Xu Hua & Huang Yang. 2013. Syntactic parsing of clinical text: Guideline and corpus development with handling ill-formed sentences. Journal of the American Medical Informatics Association 20, 110.
Ferraro Jeffrey P., Daumé III Hal, DuVall Scott L., Chapman Wendy Webber, Harkema Henk & Haug Peter J.. 2013. Improving performance of natural language processing part-of-speech tagging on clinical narratives through domain adaptation. Journal of the American Medical Informatics Association 20 (5), 931939.
Friedman Carol, Kra Pauline & Rzhetsky Andrey. 2002. Two biomedical sublanguages: A description based on the theories of Zellig Harris. Journal of Biomedical Informatics 35 (4), 222235.
Grigonyté Gintaré, Kvist Maria, Velupillai Sumithra & Wirén Mats. 2014. Improving readability of Swedish electronic health records through lexical simplification: First results. Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR)@EACL, 74–83.
Hahn Udo & Wermter Joachim. 2004. High-performance tagging on medical texts. Proceedings of the 20th International Conference on Computational Linguistics (COLING ’04), 973–979.
Isenius Niklas, Velupillai Sumithra & Kvist Maria. 2012. Initial results in the development of SCAN: A Swedish clinical abbreviation normalizer. Proceedings of the CLEF 2012 Workshop on Cross-language Evaluation of Methods, Applications, and Resources for eHealth Document Analysis.
Keselman Alla, Slaughter Laura, Smith Catherine Arnott, Kim Hyeoneui, Divita Guy, Browne Allen, Tsai Christopher & Zeng-Treitler Qing. 2007. Towards consumer-friendly PHRs: Patients’ experience with reviewing their health records. Proceedings of the Eighth International Conference on Language Resources and Evaluation, 399–403.
Kokkinakis Dimitrios. 2012. The journal of the Swedish Medical Association – a corpus resource for biomedical text mining in Swedish. Proceedings of the 3rd Workshop on Building and Evaluating for Biomedical Text Mining (BioTxtM), LREC 2012 Workshop, 40–44.
Krauthammer Michael & Nenadic Goran. 2004. Term identification in the biomedical literature. Journal of Biomedical Informatics 37 (6), 512526.
Kvist Maria, Skeppstedt Maria, Velupillai Sumithra & Dalianis Hercules. 2011. Modeling human comprehension of Swedish medical records for intelligent access and summarization systems – future vision, a physician's perspective. Proceedings of Scandinavian Health Informatics Meeting, 31–35.
Kvist Maria & Velupillai Sumithra. 2013. Professional language in Swedish radiology reports – characterization for patient-adapted text simplification. Proceedings of Scandinavian Conference on Health Informatics, 55–60.
Liu Hongfang, Lussier Yves A. & Friedman Carol. 2001. A study of abbreviations in the UMLS. AMIA Annual Symposium Proceedings 2001, 393–397.
Melin Lars. 2004. Fattaru?! [Do ya get it?!]. Forskning och Framsteg 3.
Meystre Stephane M., Savova Guergana K., Kipper-Schuler Karin C. & Hurdle John F. 2008. Extracting information from textual documents in the electronic health record: A review of recent research. IMIA Yearbook of Medical Informatics 47 (S1), 128144.
Mühlenbock Katarina & Kokkinakis Sofie Johansson. 2009. LIX 68 revisited – an extended readability measure. Proceedings of Corpus Linguistics 2009,
Olsson May. 2011. Vem begriper patientjournalen? [Who comprehends the patient record?]. BA thesis, Department of Language and Literature, Linné University.
Östling Robert. 2013. Stagger: An open-source part of speech tagger for Swedish. Northern European Journal of Language Technology 3, 118.
Ownby Raymond. 2005. Influence of vocabulary and sentence complexity and passive voice on the readability of consumer-oriented mental health information on the Internet. AMIA Annual Symposium Proceedings 2005, 585–588.
Pakhomov Serguei, Pedersen Ted & Chute Christopher G.. 2005. Abbreviation and acronym disambiguation in clinical discourse. AMIA Annual Symposium Proceedings 2005, 589–593.
Patrick Jon, Sabbagh Mojtaba, Jain Suvir & Zheng Haifeng. 2010. Spelling correction in clinical notes with emphasis on first suggestion accuracy. Proceedings of the 2nd Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM), 2–8.
Pyper Cecilia, Amery Justin, Watson Marion & Crook Claire. 2004. Patients’ experiences when accessing their on-line electronic patient records in primary care. The British Journal of General Practice 54, 3843.
Skeppstedt Maria, Kvist Maria & Dalianis Hercules. 2012. Rule-based entity recognition and coverage of SNOMED CT in Swedish clinical text. Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), 1250–1257.
Smedby Björn. 1991. Medicinens Språk: språket i sjukdomsklassifikationen – mer konsekvent försvenskning eftersträvas [Language of medicine: The language of diagnose classification – more consistent Swedification sought]. Läkartidningen 88, 1519–1520.
Smith Christian, Danielsson Henrik & Jönsson Arne. 2012. A good space: Lexical predictors in word space evaluation. Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), 2530–2535.
Tomanek Katrin, Wermter Joachim & Hahn Udo. 2007. A reappraisal of sentence and token splitting for life sciences documents. Proceedings of 12th World Congress on Health (Medical) Informatics – Building Sustainable Health Systems, 524–528.
Xu Hua, Stetson Peter & Friedman Carol. 2007. A study of abbreviations in clinical notes. AMIA Annual Symposium Proceedings 2007, 821–825.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Nordic Journal of Linguistics
  • ISSN: 0332-5865
  • EISSN: 1502-4717
  • URL: /core/journals/nordic-journal-of-linguistics
Please enter your name
Please enter a valid email address
Who would you like to send this to? *



Full text views

Total number of HTML views: 2
Total number of PDF views: 18 *
Loading metrics...

Abstract views

Total abstract views: 165 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 16th January 2018. This data will be updated every 24 hours.