Skip to main content
×
×
Home

Possession identification in text

  • CARMEN BANEA (a1) and RADA MIHALCEA (a1)
Abstract

Just as industrialization matured from mass production to customization and personalization, so has the Web migrated from generic content to public disclosures of one’s most intimately held thoughts, opinions, and beliefs. This relatively new type of data is able to represent finer and more narrowly defined demographic slices. If until now researchers have primarily focused on leveraging personalized content to identify latent information such as gender, nationality, location, or age, this article seeks to establish a structured way of extracting possessions, or items that people own or are entitled to, as a way to ultimately provide insights into people’s behaviors and characteristics. We introduce the new task of ‘possession identification in text’, and release a novel dataset where possessions are marked at different confidence levels. We present experiments and results obtained when seeking to automatically identify and extract possessions from the text.

Copyright
References
Hide All
Aha, D. W., Kibler, D., and Albert, M. K., 1991. Instance-based learning algorithms. Machine Learning 6 (1): 3766.
Burger, J. D., and Henderson, J. C. 2006. An exploration of observable features related to blogger age. In Proceedings of the AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs, March, pp. 15–20.
Burger, J. D., Henderson, J., Kim, G., and Zarrella, G. 2011. Discriminating gender on Twitter. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-2011), July, pp. 1301–9.
Cheng, Z., Caverlee, J., and Lee, K. 2010. You are where you tweet: a content-based approach to geo-locating Twitter users. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM-2010), October, pp. 759–68.
Ciot, M., Sonderegger, M., and Ruths, D. 2013. Gender inference of Twitter users in non-English contexts. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP-2013), October, pp. 18–21.
Cohen, R., and Ruths, D. 2013. Classifying political orientation on Twitter: it’s not easy! In Proceedings of the 7th International AAAI Conference on Weblogs and Social Media (ICWSM-2013), July, pp. 91–9.
Conover, M., Gonçalves, B., Ratkiewicz, J., Flammini, A., and Menczer, F. 2011. Predicting the political alignment of Twitter users. IEEE Third International Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social Computing (SocialCom-2011), October, pp. 192–199.
Gerlof, B. 2009. Normalized (pointwise) mutual information in collocation extraction. In Proceedings of the Biennial Conference of the German Society for Computational Linguistics and Language Technology (GSCL-2009), September, pp. 3140–51.
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H., 2009. The WEKA data mining software: an update. SIGKDD Explorations 11 (1): 10–8.
Hornik, K., 1991. Approximation capabilities of multilayer feedforward networks. Neural Networks 4 (2): 251–7.
Hu, T., Bigelow, E., Luo, J., and Kautz, H., 2017. Tales of two cities: using social media to understand idiosyncratic lifestyles in distinctive metropolitan areas. IEEE Transactions on Big Data 3 (1): 5566.
Levin, B., 1993. English Verb Classes and Alternations: A Preliminary Investigation. Chicago, IL: The University of Chicago Press.
Levin, B. 2006. English Object Alternations: A Unified Account. Unpublished manuscript. Stanford, CA, USA. http://web.stanford.edu/~bclevin/alt06.pdf
Li, J., Ritter, A., and Hovy, E. 2014. Weakly supervised user profile extraction from Twitter. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL-2014), June, pp. 165–74.
Liu, Wendy, & Ruths, Derek. 2013. What’s in a name? Using first names as features for gender inference in Twitter. In Analyzing Microtext: Papers from the 2013 AAAI Spring Symposium, March, pp. 10–6.
Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S. J., and McClosky, D. 2014. The Stanford CoreNLP natural language processing toolkit. In Proceedings of the Association for Computational Linguistics System Demonstrations (ACL-2014), June, pp. 55–60.
Mukherjee, A., and Liu, B. 2010. Improving gender classification of blog authors. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP-2010), October, pp. 207–17.
Nelson, D. L., McEvoy, C. L., and Schreiber, T. A., 2004. The University of South Florida free association, rhyme, and word fragment norms. Behavior Research Methods, Instruments, & Computers 36 (3): 402–7.
Pennacchiotti, M., and Popescu, A.-M. 2011. Democrats, republicans and Starbucks afficinados: user classification in Twitter. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2011), August, pp. 430–8.
Platt, J. C. 1999. Fast training of support vector machine using sequential minimal optimization. In Schölkopf, B., Burges, C. J. C., and Smola, A. J. (eds.), Advances in Kernel Methods – Support Vector Learning. Cambridge, MA: MIT Press, pp. 185208.
Quinlan, R., 1993. C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann Publishers.
Rao, D., Yarowsky, D., Shreevats, A., and Gupta, M. 2010. Classifying latent user attributes in Twitter. In Proceedings of the 2nd International Workshop on Search and Mining User-Generated Contents (SMUC-2010.), October, pp. 37–44.
Rosenberg, M. J., 1956. Cognitive structure and attitudinal affect. The Journal of Abnormal and Social Psychology 53 (3): 367–72.
Rosenberg, M. J. 1968. Hedonism, inauthenticity, and other goals toward expansion of a consistency theory. In pp. 73111 Abelson, R. P., Aronson, E., McGuire, W. J., Newcomb, T. M., Rosenberg, M. J., and Tannenbaum, P. H. (eds.), Theories of Cognitive Consistency: A Sourcebook. Chicago, IL: Rand McNally.
Sadilek, A., Kautz, H., and Bigham, J. P. 2012. Finding your friends and following them to where you are. In Proceedings of the 5th ACM International Conference on Web Search and Data Mining (WSDM-2012), February, pp. 723–32.
Stecher, K., and Counts, S. 2008. Spontaneous inference of personality traits and effects on memory for online profiles. Proceedings of the 2nd International Conference on Weblogs and Social Media (ICWSM-2008), March, pp. 118–26.
Van Durme, B. 2012. Streaming analysis of discourse participants. Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL-2012), July, pp. 48–58.
Volkova, S., and Bachrach, Y., 2015. On predicting sociodemographic traits and emotions from communications in social networks and their implications to online self-disclosure. Cyberpsychology, Behavior and Social Networking 18 (12): 726–36.
Volkova, S., and Bachrach, Y. 2016. Inferring perceived demographics from user emotional tone and user-environment emotional contrast. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL-2016), August, pp. 1567–78.
Zamal, F. A., Liu, W., and Ruths, D. 2012. Homophily and latent attribute inference: inferring latent attributes of Twitter users from neighbors. In Proceedings of the 6th International AAAI Conference on Weblogs and Social Media (ICWSM-2012), June, pp. 387–90.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Natural Language Engineering
  • ISSN: 1351-3249
  • EISSN: 1469-8110
  • URL: /core/journals/natural-language-engineering
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×

Metrics

Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed