Hostname: page-component-89b8bd64d-ksp62 Total loading time: 0 Render date: 2026-05-07T04:22:55.620Z Has data issue: false hasContentIssue false

Syntactic methods for topic-independent authorship attribution

Published online by Cambridge University Press:  09 August 2017

JOHANNA BJÖRKLUND
Affiliation:
Deptartment of Computer Science, Umeå University, 901 87 Umeå, Sweden e-mails: johanna@cs.umu.se, zechner@cs.umu.se
NIKLAS ZECHNER
Affiliation:
Deptartment of Computer Science, Umeå University, 901 87 Umeå, Sweden e-mails: johanna@cs.umu.se, zechner@cs.umu.se

Abstract

The efficacy of syntactic features for topic-independent authorship attribution is evaluated, taking a feature set of frequencies of words and punctuation marks as baseline. The features are ‘deep’ in the sense that they are derived by parsing the subject texts, in contrast to ‘shallow’ syntactic features for which a part-of-speech analysis is enough. The experiments are made on two corpora of online texts and one corpus of novels written around the year 1900. The classification tasks include classical closed-world authorship attribution, identification of separate texts among the works of one author, and cross-topic authorship attribution. In the first tasks, the feature sets were fairly evenly matched, but for the last task, the syntax-based feature set outperformed the baseline feature set. These results suggest that, compared to lexical features, syntactic features are more robust to changes in topic.

Information

Type
Articles
Copyright
Copyright © Cambridge University Press 2017 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable