Skip to main content

Network analysis of narrative content in large corpora


We present a methodology for the extraction of narrative information from a large corpus. The key idea is to transform the corpus into a network, formed by linking the key actors and objects of the narration, and then to analyse this network to extract information about their relations. By representing information into a single network it is possible to infer relations between these entities, including when they have never been mentioned together. We discuss various types of information that can be extracted by our method, various ways to validate the information extracted and two different application scenarios. Our methodology is very scalable, and addresses specific research needs in social sciences.

Hide All
Agarwal, A., Corvalan, A., Jensen, J., and Rambow, O. 2012. Social network analysis of alice in wonderland. In Workshop on Computational Linguistics for Literature, Montreal, Canada.
Anchuri, P., and Magdon-Ismail, M. 2012. Communities and balance in signed networks: a spectral approach. In IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Istanbul, Turkey.
Bontcheva, K., Dimitrov, M., Maynard, D., Tablan, V., and Cunningham, H. 2002. Shallow methods for named entity co-reference resolution. In 9th Annual Workshop on TALN 2002, Nancy, France.
Brin, S., and Page, L. 1998. The anatomy of a large-scale hypertextual (web) search engine. In Seventh International World Wide Web Conference, Brisbane, Australia.
Chen, H., Chung, W., Xu, J., Wang, G., Qin, Y., and Chau, M. 2004. Crime data mining: a general framework and some examples. IEEE Computer 37 (4): 50–6.
Cunningham, H. 2002. GATE, a general architecture for text engineering. Computer and the Humanties 36: 223–54 (Springer, Netherlands).
Dali, L., Rusu, D., Fortuna, B., Mladenic, D., and Grobelnik, M. 2009. Question answering based on semantic graphs. In 18th International World Wide Web Conference, Madrid, Spain.
De Fazio, G. 2012. Political Radicalization in the Making: The Civil Rights Movement in Northern Ireland,1968–1972. PhD thesis, Department of Sociology, Emory University, Atlanta, GA.
Doreian, P., and Mrvar, A. 1996. A partitioning approach to structural balance. Social Networks 18 (2):149–68.
Earl, J., Martin, A., McCarthy, J., and Soule, S. 2004. The use of newspaper data in the study of collective action. Annual Review of Sociology 30: 6580.
Elson, D. K., Dames, N., and McKeown, K. R. 2010. Extracting social networks from literary fiction. In 24th AAAI Conference on Artificial Intelligence (AAAI 2010), Atlanta, GA.
Erdös, P., and Rényi, A. 1960. On the evolution of random graphs. Mathematical Institute of the Hungarian Academy of Sciences 5: 1761.
Flaounas, I., Ali, O., Turchi, M., Snowsill, T., Nicart, F., Tijl, D. B., and Cristianini, N. 2011. Noam: news outlets analysis and monitoring system. In ACM SIGMOD International Conference on Management of Data, Athens, Greece.
Franzosi, R. 1987. The press as a source of socio-historical data: issues in the methodology of data collection from newspapers. Historical Methods 20: 516.
Franzosi, R. 1998. Narrative as data. Linguistic and statistical tools for the quantitative study of historical events. International Review of Social History (Special Issue on New Methods in Historical Sociology/Social History) 43: 81104.
Good, P. 2005. Permutation, Parametric, and Bootstrap Tests of Hypotheses, 3rd ed. (Springer Series in Statistics). New York, NY: Springer.
Gruzd, A., and Haythornthwaite, C. 2008. Automated discovery and analysis of social networks from threaded discussions. In International Network of Social Network Analysis (INSNA) Conference, St. Pete Beach, FL.
Hassan, A., Abu-Jbara, A., and Radev, D. 2012. Extracting signed social networks from text. In TextGraphs-7 Workshop at ACL, Jeju, Korea.
Heider, F. 1946. Attitudes and cognitive organization. The Journal of psychology 21 (1): 107–12.
Kimura, M., Saito, K., Ohara, K., and Motoda, H. 2010. Learning to predict opinion share in social networks. In 24th AAAI Conference on Artificial Intelligence (AAAI-10), Atlanta, GA.
Kipper, K., Korhonen, A., Ryant, N., and Palmer, M. 2006. Extensive classifications of English verbs. In 12th EURALEX International Congress, Turin, Italy.
Kleinberg, J. 1998. Authoritative sources in a hyperlinked environment. In 9th ACM-SIAM Symposium on Discrete Algorithms, San Francisco, CA.
Kunegis, J., Schmidt, S., Lommatzsch, A., Lerner, J., De Luca, E., and Albayrak, S. 2010. Spectral analysis of signed graphs for clustering, prediction and visualization. In SIAM International Conference on Data Mining, Columbus, OH.
Lin, D. 1998. Dependency-based evaluation of minipar. In Workshop on the Evaluation of Parsing Systems, Granada, Spain.
Mac Carron, P., and Kenna, R. 2012. Universal properties of mythological networks. Europhysics Letters 99: 28002. arXiv:1205.4324 [physics.soc-ph].
Mihalcea, R., and Radev, D. 2011. Graph-Based Natural Language Processing and Information Retrieval. Cambridge, UK: Cambridge University Press.
Mitkov, R. 1999. Anaphora resolution: the state of the art. Technical Report, School of Languages and European Studies, University of Wolverhampton, West Midlands, UK.
Moretti, F. 2011. Network theory, plot analysis. New Left Review 68: 80102.
Rusu, D., Dali, L., Fortuna, B., Grobelnik, M., and Mladenic, D. 2007. Triplet extraction from sentences. In 10th International Multiconference Information Society – IS 2007, Ljubljana, Slovenia.
Rusu, D., Fortuna, B., Grobelnik, M., and Mladenic, D. 2008. Semantic graphs derived from triplets with application in document summarization. In Conference on Data Mining and Data Warehouses (SiKDD), Las Vegas, NV.
Sandhaus, E., 2008. The New York Times Annotated Corpus. New York, NY: New York Times. LDC Catalog No. LDC2008T19; ISBN: 1-58563-486-7.
Sclano, F., and Velardi, P. 2007. TermExtractor: a web application to learn the common terminology of interest groups and research communities. In 9th Conference on Terminology and Artificial Intelligence (TIA 2007), Sophia, Antinopolis.
Shannon, P., Markiel, A., Ozier, O., Baliga, N., Wang, J., Ramage, D., Amin, N., Schwikowski, B., and Ideker, T. 2003. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Research 13: 2498–504.
Seigel, S. 1957. Nonparametric statistics. The American Statistician 11 (3): 13–9.
Sergei, M., and Kim, S. 2002. Specificity and stability in topology of protein networks. Science 296 (5569): 910–3.
Soon, W., Ng, H., and Lim, D. 2001. A machine learning approach to co-reference resolution of noun phrases. Computational Linguistics 27: 521–44.
Trampus, M., and Mladenic, D. 2011. Learning event patterns from text. Informatica 35: 200711.
Velardi, P., Navigli, R., Cucchiarelli, A., and Antonio, F. D. 1990. A new contentbased model for social network analysis. In IEEE International Conference on Semantic Computing, Santa Clara, CA.
William, J. W. 1990. Construction of permutation tests. Journal of American Statistical Association 85: 693–8.
Wilson, E. B. 1927. Probable inference, the law of succession, and statistical inference. Journal of American Statistical Association 22: 209–12.
Wilson, T., Hoffmann, P., Somasundaran, S., Kessler, J., Wiebe, J., Choi, Y., Cardie, C., Riloff, E., and Patwardhan, S. 2005. Opinionfinder: a system for subjectivity analysis. In Human Language Technology Conference on Empirical Methods in Natural Language Processing, Vancouver, BC, Canada.
Yang, B., Cheung, W., and Liu, J. 2007. Community mining from signed social networks. IEEE Transactions on Knowledge and Data Engineering 19: 10.
Zeng, D., Chen, H., Lusch, R., and Li, S. 2010. Social media analytics and intelligence. Journal of IEEE Intelligent Systems 25 (6): 13–6.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Natural Language Engineering
  • ISSN: 1351-3249
  • EISSN: 1469-8110
  • URL: /core/journals/natural-language-engineering
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Altmetric attention score

Full text views

Total number of HTML views: 12
Total number of PDF views: 116 *
Loading metrics...

Abstract views

Total abstract views: 794 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 23rd March 2018. This data will be updated every 24 hours.