AlAgha, I., and Nafee, R., 2015. Investigating the efficiency of WordNet as background knowledge for document clustering. Journal of Engineering Research and Technology
2
(2): 152–8.
Amiri, H., and III, H. D.
2016. Short text representation for detecting churn in microblogs. In Proceedings of the 30th AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA. Menlo Park, CA: AAAI Press, pp. 2566–72.
Andrews, N. O., and Fox, E. A.
2007. Recent developments in document clustering. Technical Report, Department of Computer Science, Virginia Tech.
Billhardt, H., Borrajo, D., and Maojo, V., 2002. A context vector model for information retrieval. Journal of the American Society for Information Science and Technology
53
(3): 236–49.
Blei, D. M., Ng, A. Y., and Jordan, M. I., 2003. Latent Dirichlet allocation. Journal of Machine Learning Research
3
(2003): 993–1022.
Bullinaria, J. A., and Levy, J. P., 2007. Extracting semantic representations from word co-occurrence statistics: a computational study. Behavior Research Methods
39
(3): 510–26.
Cai, D., He, X., and Han, J., 2011. Locally consistent concept factorization for document clustering. IEEE Transactions on Knowledge and Data Engineering
23
(6): 902–13.
Cheng, X., Miao, D., Wang, C., and Cao, L., 2013. Coupled term-term relation analysis for document clustering. In Proceedings of the International Joint Conference on Neural Networks, Dallas, TX, USA. Washington, DC, USA: IEEE, pp. 1–8.
Das, R., Zaheer, M., and Dyer, C., 2015. Gaussian LDA for topic models with word embeddings. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, Beijing, China. aclweb.org, pp. 795–804.
Deerwester, S. C., Dumais, S. T., Landauer, T. K., Furnas, G. W., and Harshman, R. A., 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science
41
(6): 391–407.
Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., and Ruppin, E., 2002. Placing search in context: the concept revisited. ACM Transactions on Information Systems
20
(1): 116–31.
Gabrilovich, E., and Markovitch, S., 2006. Overcoming the brittleness bottleneck using wikipedia: enhancing text categorization with encyclopedic knowledge. In Proceedings of the 21st National Conference on Artificial Intelligence, Boston, MA, USA. Menlo Park, CA: AAAI Press, pp. 1301–6.
Gabrilovich, E., and Markovitch, S., 2007. Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In International Joint Conference on Artifical Intelligence, Hyderabad, India. San Francisco: Margan Kaufmann, pp. 1606–11.
Grefenstette, E., Hermann, K. M., Dinu, G., and Blunsom, P., 2014. New directions in vector space models of meaning. Tutorials. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, MD, USA. aclweb.org, pp. 8–8.
Harris, Z. S., 1954. Distributional structure. Word
10
(2–3): 146–62.
Hassan, S., and Mihalcea, R.
2011. Semantic relatedness using salient semantic analysis. In Proceedings of the 25th AAAI Conference on Artificial Intelligence, San Francisco, CA, USA. Menlo Park, CA: AAAI Press, pp. 884–9.
Hu, X., Zhang, X., Lu, C., Park, E. K., and Zhou, X., 2009. Exploiting wikipedia as external knowledge for document clustering. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France. New York, NY, USA: ACM, pp. 389–96.
Iosif, E., and Potamianos, A., 2010. Unsupervised semantic similarity computation between terms using web documents. IEEE Transactions on Knowledge and Data Engineering
22
(11): 1637–47.
Kalogeratos, A., and Likas, A., 2012. Text document clustering using global term context vectors. Knowledge and Information Systems
31
(3): 455–74.
Kim, Y., 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar. aclweb.org, pp. 1746–51.
Kusner, M. J., Sun, Y., Kolkin, N. I., and Weinberger, K. Q., 2015. From word embeddings to document distances. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France: JMLR.org, pp. 957–66.
Landauer, T. K., and Dumais, S. T., 1997. A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review
104
(2): 211–40.
Landauer, T. K., Laham, D., Rehder, B., and Schreiner, M. E., 1997. How well can passage meaning be derived without using word order? A comparison of Latent Semantic Analysis and humans. In Proceedings of the 19th Annual Meeting of the Cognitive Science Society, Stanford University, CA, USA, Mawhwah, NJ: Erlbaum, pp. 412–7.
Le, Q. V., and Mikolov, T., 2014. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning, Beijing, China, San Francisco, CA, USA: Morgan Kaufmann, pp. 1188–96.
Lebret, R., and Collobert, R., 2015. Rehabilitation of count-based models for word vector representations. In Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics, Cairo, Egypt, Lecture Notes in Computer Science, Cham: Springer, pp. 417–29.
Lin, D., 1998. An information-theoretic definition of similarity. In Proceedings of the 15th International Conference on Machine Learning, Madison, WI, USA, San Francisco, CA, USA: Morgan Kaufmann, pp. 296–304.
Lovász, L., and Plummer, MD., 1986. Matching theory. Annals of Discrete Mathematics
29
(5): 42–6.
Mihalcea, R., Corley, C., and Strapparava, C., 2006. Corpus-based and knowledge-based measures of text semantic similarity. In Proceedings of the 21st National Conference on Artificial Intelligence, Boston, MA, USA, Menlo Park, CA: AAAI Press, pp. 775–80.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J., 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA. USA: Curran Associates, pp. 3111–9.
Miller, G. A., and Charles, W. G., 1991. Contextual correlates of semantic similarity. Language Cognition and Neuroscience
6
(1): 1–28.
Mitchell, J., and Steedman, M., 2015. Orthogonality of syntax and semantics within distributional spaces. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, Beijing, China. aclweb.org, pp. 1301–10.
Nasir, J. A., Varlamis, I., Karim, A., and Tsatsaronis, G., 2013. Semantic smoothing for text clustering. Knowledge-Based Systems
54: 216–29.
Österlund, A., and Ödling, D., 2015. Factorization of latent variables in distributional semantic models. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal. aclweb.org, pp. 227–31.
Pangos, A., Iosif, E., Potamianos, A., and Fosler-Lussier, E., 2005. Combining statistical similarity measures for automatic induction of semantic classes. In Automatic Speech Recognition and Understanding, 2005 IEEE Workshop on, San Juan, Puerto Rico. Washington, DC, USA: IEEE, pp. 278–83.
Rubenstein, H., and Goodenough, J. B., 1965. Contextual correlates of synonymy. Communications of the ACM
8
(10): 627–33.
Rubner, Y., Tomasi, C., and Guibas, L. J., 1998. A metric for distributions with applications to image databases. In Procedings of the 16th International Conference on Computer Vision, Bombay, India. Washington, DC, USA: IEEE, pp. 59–66.
Rui, L., Liu, S., Yang, M., Li, M., Zhou, M., and Li, S., 2015. Hierarchical recurrent neural network for document modeling. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal. aclweb.org, pp. 899–907.
Rungsawang, A., 1998. Dsir: the first trec-7 attempt. In Proceedings of The 7th Text REtrieval Conference, Gaithersburg, MD, USA, pp. 366–72.
Turney, P. D., and Pantel, P., 2010. From frequency to meaning: vector space models of semantics. Journal of Artificial Intelligence Research
37
(1): 141–88.
Wang, T., Mohamed, A., and Hirst, G., 2015. Learning lexical embeddings with syntactic and lexicographic knowledge. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, Beijing, China. aclweb.org, pp. 458–63.
Wei, T., Lu, Y., Chang, H., Zhou, Q., and Bao, X., 2015. A semantic approach for text clustering using WordNet and lexical chains. Expert Systems with Applications
42
(4): 2264–75.
Wei, Y., and Wei, J., 2013. A semantic set theory for word semantic similarity assessment. In Proceedings of the International Conference on Mechatronic Sciences, Electric Engineering and Computer, Shenyang, China. Washington, DC, USA: IEEE, pp. 2466–71.
Wei, Y., Wei, J., and Xu, H., 2015. Context vector model for document representation: a computational study. In Natural Language Processing and Chinese Computing, Nanchang, China. Lecture Notes in Computer Science, Cham: Springer, pp. 194–206.
Wei, Y., Wei, J., and Yang, Z., 2015. Enriching document representation with the deviations of word co-occurrence frequencies. In Proceedings of the International Conference on Algorithms and Architectures for Parallel Processing, Zhangjiajie, China, Lecture Notes in Computer Science, Cham: Springer, pp. 241–54.
Wei, Y., Wei, J., Yang, Z., and Liu, Y., 2016. Joint probability consistent relation analysis for document representation. In Proceedings of the International Conference on Database Systems for Advanced Applications, Dallas, TX, USA, Lecture Notes in Computer Science, Cham: Springer, pp. 517–32.
Wu, Z., and Giles, C. L., 2015. Sense-aware semantic analysis: a multi-prototype word representation model using Wikipedia. In Proceedings of the 29th AAAI Conference on Artificial Intelligence, Austin, TX, USA. Menlo Park, CA: AAAI Press, pp. 2188–94.
Xie, P., Deng, Y., and Xing, E., 2015. Diversifying restricted boltzmann machine for document modeling. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Hilton, Sydney, Australia, New York, NY, USA: ACM, pp. 1315–24.
Xu, W., Liu, X., and Gong, Y., 2003. Document clustering based on non-negative matrix factorization. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, Toronto, Canada, New York, NY, USA: ACM, pp. 267–73.
Yang, Y., Downey, D., and Boyd-Graber, J., 2015. Efficient methods for incorporating knowledge into topic models. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal. aclweb.org, pp. 308–17.
Zimmerman, D. W., 1997. Teacher’s corner: a note on interpretation of the paired-samples t test. Journal of Educational and Behavioral Statistics
22
(3): 349–60.