from PART II - LEGAL TEXT ANALYTICS
Published online by Cambridge University Press: 13 July 2017
INTRODUCTION
In the examples of ML so far, a program has learned from data about judges, trends, or cases as in the Supreme Court Database, but not from the texts of cases or other legal documents. This chapter introduces applying ML algorithms to corpora of legal texts, discusses how ML models implicitly represent users’ hypotheses about relevance, illustrates how ML can improve full-text legal information retrieval, and explains its role in conceptual information retrieval and in cognitive computing. The chapter also distinguishes between supervised and unsupervised ML from text and discusses techniques for automating learning of structure and semantics from legal documents.
Along the way, the chapter answers the following questions: How can ML be applied to textual data? What is the difference between supervised and unsupervised ML from texts? What is predictive coding? How well does predictive coding work? What is “information extraction” from text? How are texts represented for purposes of applying ML? What is a “support vector machine (SVM)” and why use one with textual data?
APPLYING MACHINE LEARNING TO TEXTUAL DATA
ML algorithms identify patterns in data, summarize the patterns in a model, and use the models to make predictions by identifying the same patterns in new data (see Kohavi and Provost, 1998).
A model is a structure that summarizes the patterns in data in some statistical or logical form in which it can be applied to new data (see Kohavi and Provost, 1998). This book has already introduced some examples of ML models, such as the decision tree for bail decisions in Figure 4.2 or the random forests of decision trees referred to in Section 4.4.
The models capture the strength of the association in the patterns between observed features and an outcome feature. For example, the decision on bail is an outcome feature, and the observed features included whether the offense involved drugs or the offender had a prior record. The Supreme Court's decision to affirm or not is an outcome feature, and the observed features included a justice's gender or the appointing president's party. The model captures the strength of the association in the patterns between observation and outcome features either statistically, logically, or in some combination of the two.
To save this book to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Find out more about the Kindle Personal Document Service.
To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.
To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.