Network analysis of narrative content in large corpora

SAATVIGA SUDHAHAR; GIANLUCA DE FAZIO; ROBERTO FRANZOSI; NELLO CRISTIANINI

doi:10.1017/S1351324913000247

Network analysis of narrative content in large corpora

Published online by Cambridge University Press: 11 September 2013

ROBERTO FRANZOSI and

SAATVIGA SUDHAHAR: Affiliation:
Intelligent Systems Laboratory, University of Bristol, Bristol BS8 1TH, UK e-mail: saatviga.sudhahar@bristol.ac.uk, nello.cristianini@bristol.ac.uk
GIANLUCA DE FAZIO: Affiliation:
Department of Sociology, Emory University, Atlanta, GA 30322, USA e-mail: rfranzo@emory.edu, gdefazi@emory.edu
ROBERTO FRANZOSI: Affiliation:
Department of Sociology, Emory University, Atlanta, GA 30322, USA e-mail: rfranzo@emory.edu, gdefazi@emory.edu
NELLO CRISTIANINI: Affiliation:
Intelligent Systems Laboratory, University of Bristol, Bristol BS8 1TH, UK e-mail: saatviga.sudhahar@bristol.ac.uk, nello.cristianini@bristol.ac.uk

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

We present a methodology for the extraction of narrative information from a large corpus. The key idea is to transform the corpus into a network, formed by linking the key actors and objects of the narration, and then to analyse this network to extract information about their relations. By representing information into a single network it is possible to infer relations between these entities, including when they have never been mentioned together. We discuss various types of information that can be extracted by our method, various ways to validate the information extracted and two different application scenarios. Our methodology is very scalable, and addresses specific research needs in social sciences.

Type: Articles
Information: Natural Language Engineering , Volume 21 , Issue 1 , January 2015 , pp. 81 - 112

DOI: https://doi.org/10.1017/S1351324913000247 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2013

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Agarwal, A., Corvalan, A., Jensen, J., and Rambow, O. 2012. Social network analysis of alice in wonderland. In Workshop on Computational Linguistics for Literature, Montreal, Canada.Google Scholar

Anchuri, P., and Magdon-Ismail, M. 2012. Communities and balance in signed networks: a spectral approach. In IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Istanbul, Turkey.Google Scholar

Bontcheva, K., Dimitrov, M., Maynard, D., Tablan, V., and Cunningham, H. 2002. Shallow methods for named entity co-reference resolution. In 9th Annual Workshop on TALN 2002, Nancy, France.Google Scholar

Brin, S., and Page, L. 1998. The anatomy of a large-scale hypertextual (web) search engine. In Seventh International World Wide Web Conference, Brisbane, Australia.Google Scholar

Chen, H., Chung, W., Xu, J., Wang, G., Qin, Y., and Chau, M. 2004. Crime data mining: a general framework and some examples. IEEE Computer 37 (4): 50–6.CrossRef Google Scholar

Cunningham, H. 2002. GATE, a general architecture for text engineering. Computer and the Humanties 36: 223–54 (Springer, Netherlands).Google Scholar

Dali, L., Rusu, D., Fortuna, B., Mladenic, D., and Grobelnik, M. 2009. Question answering based on semantic graphs. In 18th International World Wide Web Conference, Madrid, Spain.Google Scholar

De Fazio, G. 2012. Political Radicalization in the Making: The Civil Rights Movement in Northern Ireland,1968–1972. PhD thesis, Department of Sociology, Emory University, Atlanta, GA.Google Scholar

Doreian, P., and Mrvar, A. 1996. A partitioning approach to structural balance. Social Networks 18 (2):149–68.Google Scholar

Earl, J., Martin, A., McCarthy, J., and Soule, S. 2004. The use of newspaper data in the study of collective action. Annual Review of Sociology 30: 65–80.Google Scholar

Elson, D. K., Dames, N., and McKeown, K. R. 2010. Extracting social networks from literary fiction. In 24th AAAI Conference on Artificial Intelligence (AAAI 2010), Atlanta, GA.Google Scholar

Erdös, P., and Rényi, A. 1960. On the evolution of random graphs. Mathematical Institute of the Hungarian Academy of Sciences 5: 17–61.Google Scholar

Flaounas, I., Ali, O., Turchi, M., Snowsill, T., Nicart, F., Tijl, D. B., and Cristianini, N. 2011. Noam: news outlets analysis and monitoring system. In ACM SIGMOD International Conference on Management of Data, Athens, Greece.Google Scholar

Franzosi, R. 1987. The press as a source of socio-historical data: issues in the methodology of data collection from newspapers. Historical Methods 20: 5–16.Google Scholar

Franzosi, R. 1998. Narrative as data. Linguistic and statistical tools for the quantitative study of historical events. International Review of Social History (Special Issue on New Methods in Historical Sociology/Social History) 43: 81–104.Google Scholar

Good, P. 2005. Permutation, Parametric, and Bootstrap Tests of Hypotheses, 3rd ed. (Springer Series in Statistics). New York, NY: Springer.Google Scholar

Gruzd, A., and Haythornthwaite, C. 2008. Automated discovery and analysis of social networks from threaded discussions. In International Network of Social Network Analysis (INSNA) Conference, St. Pete Beach, FL.Google Scholar

Hassan, A., Abu-Jbara, A., and Radev, D. 2012. Extracting signed social networks from text. In TextGraphs-7 Workshop at ACL, Jeju, Korea.Google Scholar

Heider, F. 1946. Attitudes and cognitive organization. The Journal of psychology 21 (1): 107–12.Google Scholar

Kimura, M., Saito, K., Ohara, K., and Motoda, H. 2010. Learning to predict opinion share in social networks. In 24th AAAI Conference on Artificial Intelligence (AAAI-10), Atlanta, GA.Google Scholar

Kipper, K., Korhonen, A., Ryant, N., and Palmer, M. 2006. Extensive classifications of English verbs. In 12th EURALEX International Congress, Turin, Italy.Google Scholar

Kleinberg, J. 1998. Authoritative sources in a hyperlinked environment. In 9th ACM-SIAM Symposium on Discrete Algorithms, San Francisco, CA.Google Scholar

Kunegis, J., Schmidt, S., Lommatzsch, A., Lerner, J., De Luca, E., and Albayrak, S. 2010. Spectral analysis of signed graphs for clustering, prediction and visualization. In SIAM International Conference on Data Mining, Columbus, OH.Google Scholar

Lin, D. 1998. Dependency-based evaluation of minipar. In Workshop on the Evaluation of Parsing Systems, Granada, Spain.Google Scholar

Mac Carron, P., and Kenna, R. 2012. Universal properties of mythological networks. Europhysics Letters 99: 28002. arXiv:1205.4324 [physics.soc-ph].Google Scholar

Mihalcea, R., and Radev, D. 2011. Graph-Based Natural Language Processing and Information Retrieval. Cambridge, UK: Cambridge University Press.Google Scholar

Mitkov, R. 1999. Anaphora resolution: the state of the art. Technical Report, School of Languages and European Studies, University of Wolverhampton, West Midlands, UK.Google Scholar

Moretti, F. 2011. Network theory, plot analysis. New Left Review 68: 80–102.Google Scholar

Rusu, D., Dali, L., Fortuna, B., Grobelnik, M., and Mladenic, D. 2007. Triplet extraction from sentences. In 10th International Multiconference Information Society – IS 2007, Ljubljana, Slovenia.Google Scholar

Rusu, D., Fortuna, B., Grobelnik, M., and Mladenic, D. 2008. Semantic graphs derived from triplets with application in document summarization. In Conference on Data Mining and Data Warehouses (SiKDD), Las Vegas, NV.Google Scholar

Sandhaus, E., 2008. The New York Times Annotated Corpus. New York, NY: New York Times. LDC Catalog No. LDC2008T19; ISBN: 1-58563-486-7.Google Scholar

Sclano, F., and Velardi, P. 2007. TermExtractor: a web application to learn the common terminology of interest groups and research communities. In 9th Conference on Terminology and Artificial Intelligence (TIA 2007), Sophia, Antinopolis.Google Scholar

Shannon, P., Markiel, A., Ozier, O., Baliga, N., Wang, J., Ramage, D., Amin, N., Schwikowski, B., and Ideker, T. 2003. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Research 13: 2498–504.Google Scholar

Seigel, S. 1957. Nonparametric statistics. The American Statistician 11 (3): 13–9.Google Scholar

Sergei, M., and Kim, S. 2002. Specificity and stability in topology of protein networks. Science 296 (5569): 910–3.Google Scholar

Soon, W., Ng, H., and Lim, D. 2001. A machine learning approach to co-reference resolution of noun phrases. Computational Linguistics 27: 521–44.Google Scholar

Trampus, M., and Mladenic, D. 2011. Learning event patterns from text. Informatica 35: 200711.Google Scholar

Velardi, P., Navigli, R., Cucchiarelli, A., and Antonio, F. D. 1990. A new contentbased model for social network analysis. In IEEE International Conference on Semantic Computing, Santa Clara, CA.Google Scholar

William, J. W. 1990. Construction of permutation tests. Journal of American Statistical Association 85: 693–8.Google Scholar

Wilson, E. B. 1927. Probable inference, the law of succession, and statistical inference. Journal of American Statistical Association 22: 209–12.Google Scholar

Wilson, T., Hoffmann, P., Somasundaran, S., Kessler, J., Wiebe, J., Choi, Y., Cardie, C., Riloff, E., and Patwardhan, S. 2005. Opinionfinder: a system for subjectivity analysis. In Human Language Technology Conference on Empirical Methods in Natural Language Processing, Vancouver, BC, Canada.Google Scholar

Yang, B., Cheung, W., and Liu, J. 2007. Community mining from signed social networks. IEEE Transactions on Knowledge and Data Engineering 19: 10.Google Scholar

Zeng, D., Chen, H., Lusch, R., and Li, S. 2010. Social media analytics and intelligence. Journal of IEEE Intelligent Systems 25 (6): 13–6.Google Scholar

Article contents

Network analysis of narrative content in large corpora

Abstract

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests