This study investigates the linguistic characteristics of Swedish clinical text in radiology reports and doctor's daily notes from electronic health records (EHRs) in comparison to general Swedish and biomedical journal text. We quantify linguistic features through a comparative register analysis to determine how the free text of EHRs differ from general and biomedical Swedish text in terms of lexical complexity, word and sentence composition, and common sentence structures. The linguistic features are extracted using state-of-the-art computational tools: a tokenizer, a part-of-speech tagger, and scripts for statistical analysis. Results show that technical terms and abbreviations are more frequent in clinical text, and lexical variance is low. Moreover, clinical text frequently omit subjects, verbs, and function words resulting in shorter sentences. Clinical text not only differs from general Swedish, but also internally, across its sub-domains, e.g. sentences lacking verbs are significantly more frequent in radiology reports. These results provide a foundation for future development of automatic methods for EHR simplification or clarification.
Email your librarian or administrator to recommend adding this journal to your organisation's collection.
* Views captured on Cambridge Core between September 2016 - 26th July 2017. This data will be updated every 24 hours.