Skip to main content
×
Home
The Text Mining Handbook
  • Get access
    Check if you have access via personal or institutional login
  • Cited by 78
  • Cited by
    This book has been cited by the following publications. This list is generated based on data provided by CrossRef.

    Ando, Ryosuke and Li, Ang 2017. An Evaluation Analysis on Three-Wheeled Personal Mobility Vehicles. International Journal of Intelligent Transportation Systems Research, Vol. 14, Issue. 3, p. 164.


    Dinov, Ivo D. 2017. Methodological challenges and analytic opportunities for modeling and interpreting Big Healthcare Data. GigaScience, Vol. 5, Issue. 1,


    Schroeder, Stefan Tummel, Christian Isenhardt, Ingrid Jeschke, Sabina and Richert, Anja 2017. 2016 International Conference on Information Systems Engineering (ICISE). p. 22.

    Trovati, Marcello and Bessis, Nik 2017. An influence assessment method based on co-occurrence for topologically reduced big data sets. Soft Computing, Vol. 20, Issue. 5, p. 2021.


    Wang, Jing-Doo 2017. Extracting significant pattern histories from timestamped texts using MapReduce. The Journal of Supercomputing,


    Wu, Bo and Knoblock, Craig A. 2017. Proceedings of the 21st International Conference on Intelligent User Interfaces - IUI '16. p. 375.

    Mateos, Cristian Rodriguez, Juan Manuel and Zunino, Alejandro 2017. A tool to improve code-first Web services discoverability through text mining techniques. Software: Practice and Experience, Vol. 45, Issue. 7, p. 925.


    Mironczuk, Marcin 2017. 2015 Second International Conference on Computer Science, Computer Engineering, and Social Media (CSCESM). p. 83.

    Miyata, Kumiko Yoshimura, Sadako and Hayashi, Yuko 2017. Facilitating patients with disorders of consciousness to sit without trunk support: a qualitative study. Journal of Clinical Nursing, Vol. 24, Issue. 17-18, p. 2498.


    Peng, Tao and Liu, Lu 2017. Clustering-Based Topical Web Crawling for Topic-Specific Information Retrieval Guided by Incremental Classifier. International Journal of Software Engineering and Knowledge Engineering, Vol. 25, Issue. 01, p. 147.


    Simovici, Dan A. Wang, Tong Chen, Ping and Pletea, Dan 2017. 2015 International Conference on Computing, Networking and Communications (ICNC). p. 551.

    Sluban, Borut Smailović, Jasmina Battiston, Stefano and Mozetič, Igor 2017. Sentiment leaning of influential communities in social networks. Computational Social Networks, Vol. 2, Issue. 1,


    Suominen, Arho and Toivanen, Hannes 2017. Map of science with topic modeling: Comparison of unsupervised learning and human-assigned subject classification. Journal of the Association for Information Science and Technology, p. n/a.


    Tomašev, Nenad 2017. Extracting the patterns of truthfulness from political information systems in Serbia. Information Systems Frontiers,


    Ammann, Manuel Frey, Roman and Verhofen, Michael 2017. Do Newspaper Articles Predict Aggregate Stock Returns?. Journal of Behavioral Finance, Vol. 15, Issue. 3, p. 195.


    Jiang, Feng and McComas, William F. 2017. Analysis of Nature of Science Included in Recent Popular Writing Using Text Mining Techniques. Science & Education, Vol. 23, Issue. 9, p. 1785.


    Jovanovic, Jelena Bagheri, Ebrahim Cuzzola, John Gasevic, Dragan Jeremic, Zoran and Bashash, Reza 2017. Automated Semantic Tagging of Textual Content. IT Professional, Vol. 16, Issue. 6, p. 38.


    Kamal, M. Vijaya and Vasumathi, D. 2017. 2014 2nd International Symposium on Computational and Business Intelligence. p. 53.

    KAYSER, VICTORIA GOLUCHOWICZ, KERSTIN and BIERWISCH, ANTJE 2017. TEXT MINING FOR TECHNOLOGY ROADMAPPING — THE STRATEGIC VALUE OF INFORMATION. International Journal of Innovation Management, Vol. 18, Issue. 03, p. 1440004.


    Kim, Uihyun 2017. Nuclear Exports Control System Using Semi-Automatic Keyword Extraction. International Journal of Information and Electronics Engineering, Vol. 4, Issue. 4,


    ×
  • Export citation
  • Recommend to librarian
  • Recommend this book

    Email your librarian or administrator to recommend adding this book to your organisation's collection.

    The Text Mining Handbook
    • Online ISBN: 9780511546914
    • Book DOI: https://doi.org/10.1017/CBO9780511546914
    Please enter your name
    Please enter a valid email address
    Who would you like to send this to? *
    ×
  • Buy the print book

Book description

Text mining is a new and exciting area of computer science research that tries to solve the crisis of information overload by combining techniques from data mining, machine learning, natural language processing, information retrieval, and knowledge management. Similarly, link detection – a rapidly evolving approach to the analysis of text that shares and builds upon many of the key elements of text mining – also provides new tools for people to better leverage their burgeoning textual data resources. The Text Mining Handbook presents a comprehensive discussion of the state-of-the-art in text mining and link detection. In addition to providing an in-depth examination of core text mining and link detection algorithms and operations, the book examines advanced pre-processing techniques, knowledge representation considerations, and visualization approaches. Finally, the book explores current real-world, mission-critical applications of text mining and link detection in such varied fields as M&A business intelligence, genomics research and counter-terrorism activities.

Reviews

' … buy the book. This book is definitely worth having in your book shelf as a handy reference.'

Source: IAPR Newsletter

    • Aa
    • Aa
Refine List
Actions for selected content:
Select all | Deselect all
  • View selected items
  • Export citations
  • Download PDF (zip)
  • Send to Kindle
  • Send to Dropbox
  • Send to Google Drive
  • Send content to

    To send content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about sending content to .

    To send content to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle.

    Note you can select to send to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

    Find out more about the Kindle Personal Document Service.

    Please be advised that item(s) you selected are not available.
    You are about to send:
    ×

Save Search

You can save your searches here and later view and run them again in "My saved searches".

Please provide a title, maximum of 40 characters.
×

This list contains references from the content that can be linked to their source. For a full set of references and notes please see the PDF or HTML where available.


C. C. Aggarwal , S. C. Gates , and P. S. Yu (1999). On the Merits of Building Categorization Systems by Supervised Clustering. In Proceedings of EDBT-00, 7th International Conference on Extending Database Technology. Konstanz, Germany, ACM Press, New York: 352–356.

R. Agrawal , and R. Srikant (2001). On Integrating Catalogs. In Proceedings of WWW-01, 10th International Conference on the World Wide Web. Hong Kong, ACM Press, New York: 603–612.

H. Ahonen , O. Heinonen , M. Klemettinen , and A. Verkamo (1997b). Mining in the Phrasal Frontier. In Proceedings of Principles of Knowledge Discovery in Databases Conference. Trondheim, Norway, Springer-Verlag, London.

A. Aizawa (2000). The Feature Quantity: An Information-Theoretic Perspective of TFIDF-like Measures. In Proceedings of SIGIR-00, 23rd ACM International Conference on Research and Development in Information Retrieval. Athens, ACM Press, New York: 104–111.

K. Al-Kofahi , A. Tyrrell , A. Vachher , T. Travers , and P. Jackson (2001). Combining Multiple Classifiers for Text Categorization. In Proceedings of CIKM-01, 10th ACM International Conference on Information and Knowledge Management. Atlanta, ACM Press, New York: 97–104.

G. Amati , and F. Crestani (1999). “Probabilistic Learning for Selective Dissemination of Information.” Information Processing and Management 35(5): 633–654.

A. Amir , Y. Aumann , R. Feldman , and M. Fresko (2003). “Maximal Association Rules: A Tool for Mining Associations in Text.” Journal of Intelligent Information Systems 25(3): 333–345.

I. Androutsopoulos , J. Koutsias , K. V. Chandrinos , and C. D. Spyropoulos (2000). An Experimental Comparison of Naive Bayesian and Keyword-Based Anti-Spam Filtering with Personal E-mail Messages. In Proceedings of SIGIR-00, 23rd ACM International Conference on Research and Development in Information Retrieval. Athens, ACM Press, New York: 160–167.

E. Appiani , F. Cesarini , A. Colla , M. Diligenti , M. Gori , S. Marinai , and G. Soda (2001). “Automatic Document Classification and Indexing in High-Volume Applications.” International Journal on Document Analysis and Recognition 4(2): 69–83.

C. Apte , F. J. Damerau , and S. M. Weiss (1994b). “Automated Learning of Decision Rules for Text Categorization.” ACM Transactions on Information Systems 12(3): 233–251.

H. Avancini , A. Lavelli , B. Magnini , F. Sebastiani , and R. Zanoli (2003). Expanding Domain-Specific Lexicons by Term Categorization. In Proceedings of SAC-03, 18th ACM Symposium on Applied Computing. Melbourne, FL, ACM Press, New York: 793–797.

F. B. Backer , and L. G. Hubert (1976). “A Graphtheoretic Approach to Goodness-of-Fit in Complete-Link Hierarchical Clustering.” Journal of the American Statistical Association 71: 870–878.

A. Bairoch , and R. Apweiler (2000). “The Swiss-Prot Protein Synthesis Database and Its Supplement TrEMBL in 2000.” Nucleic Acids Research 28: 45–48.

S. Baluja , V. O. Mittal , and R. Sukthankar (2000). “Applying Machine Learning for High-Performance Named-Entity Extraction.” Computational Intelligence 16(4): 586–596.

F. Bapst , and R. Ingold (1998). “Using Typography in Document Image Analysis.” Lecture Notes in Computer Science 1375: 240–260.

R. Basili , and A. Moschitti (2001). A Robust Model for Intelligent Text Classification. In Proceedings of ICTAI-01, 13th IEEE International Conference on Tools with Artificial Intelligence. Dallas, IEEE Computer Society Press, Los Alamitos, CA: 265–272.

R. Basili , A. Moschitti , and M. T. Pazienza (2001a). An Hybrid Approach to Optimize Feature Selection Process in Text Classification. In Proceedings of AI∗IA-01, 7th Congress of the Italian Association for Artificial Intelligence. F. Esposito, ed. Bari, Italy, Springer-Verlag, Heidelberg: 320–325.

V. Batagelj (1997). “Notes on Blockmodeling.” Social Networks 19: 143–155.

V. Batagalj , P. Doreian , and A. Ferligoj (1992). “An Optimization Approach to Regular E-quivalence.” Social Networks 14: 121–135.

T. Bayer , U. Kressel , H. Mogg-Schneider , and I. Renz (1998). “Categorizing Paper documents. A Generic System for Domain and Language-Independent Text Categorization.” Computer Vision and Image Understanding 70(3): 299–306.

D. Beeferman , A. Berger , and J. D. Lafferty (1999). “Statistical Models for Text Segmentation.” Machine Learning 34(1–3): 177–210.

R. Bekkerman , R. El-Yaniv , N. Tishby , and Y. Winter (2001). On Feature Distributional Clustering for Text Categorization. In Proceedings of SIGIR-01, 24th ACM International Conference on Research and Development in Information Retrieval. New Orleans, ACM Press, New York: 146–153.

N. Bel , C. H. Koster , and M. Villegas (2003). Cross-Lingual Text Categorization. In Proceedings of ECDL-03, 7th European Conference on Research and Advanced Technology for Digital Libraries. Trodheim, Norway, Springer-Verlag, Heidelberg: 126–139.

M. Benkhalifa , A. Mouradi , and H. Bouyakhf (2001a). “Integrating External Knowledge to Supplement Training Data in Semi-Supervised Learning for Text Categorization.” Information Retrieval 4(2): 91–113.

M. Benkhalifa , A. Mouradi , and H. Bouyakhf (2001b). “Integrating WordNet Knowledge to Supplement Training Data in Semi-Supervised Agglomerative Hierarchical Clustering for Text Categorization.” International Journal of Intelligent Systems 16(8): 929–947.

P. N. Bennett (2003). Using Asymmetric Distributions to Improve Text Classifier Probability Estimates. In Proceedings of SIGIR-03, 26th ACM International Conference on Research and Development in Information Retrieval. Toronto, ACM Press, New York: 111–118.

P. N. Bennett , S. T. Dumais , and E. Horvitz (2002). Probabilistic Combination of Text Classifiers Using Reliability Indicators: Models and Results. In Proceedings of SIGIR-02, 25th ACM International Conference on Research and Development in Information Retrieval. Tampere, Finland, ACM Press, New York: 207–214.

B. Bigi (2003). Using Kullback–Leibler Distance for Text Categorization. In Proceedings of ECIR-03, 25th European Conference on Information Retrieval. F. Sebastiani, ed. Pisa, Italy, Springer-Verlag, Berlin/Heidelberg: 305–319.

D. M. Bikel , S. Miller , R. Schwartz , and R. Weischedel (1997). Nymble: A High-Performance Learning Name-Finder. In Proceedings of ANLP-97. Washington, DC, Morgan Kaufmann Publishers, San Francisco: 194–201.

E. Bloedorn , and R. S. Michalski (1998). “Data-Driven Constructive Induction.” IEEE Intelligent Systems 13(2): 30–37.

M. J. Blosseville , G. Hebrail , M. G. Montell , and N. Penot (1992). Automatic Document Classification: Natural Langage Processing and Expert System Techniques Used Together. In Proceedings of SIGIR-92, 15th ACM International Conference on Research and Development in Information Retrieval. Copenhagen, ACM Press, New York: 51–57.

A. Blum , and T. M. Mitchell (1998). Combining Labeled and Unlabeled Data with Co-Training. COLT. Madison, WI, ACM Press, New York: 92–100.

R. Bod , and R. Kaplan (1998). A Probabilistic Corpus-Driven Model for Lexical-Functional Analysis. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics. Montreal, Morgan Kaufmann Publishers, San Francisco: 145–151.

P. Bonacich (1972). “Factoring and Weighting Approaches to Status Scores and Clique Identification.” Journal of Mathematical Sociology 2: 113–120.

P. Bonacich (1987). “Power and Centrality: A Family of Measures.” American Journal of Sociology 92: 1170–1182.

R. Bonnema , R. Bod , and R. Scha (1997). A DOP Model for Semantic Interpretation. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics. Somerset, NJ, Morgan Kaufmann Publishers, San Francisco: 159–167.

S. P. Borgatti , and M. G. Everett (1993). “Two Algorithms for Computing Regular Equivalence.” Social Networks 15: 361–376.

H. Borko , and M. Bernick (1963). “Automatic Document Classification.” Journal of the Association for Computing Machinery 10(2): 151–161.

H. Borko , and M. Bernick (1964). “Automatic Document Classification. Part II: Additional Experiments.” Journal of the Association for Computing Machinery 11(2): 138–151.

K. Borner , C. Chen , and K. Boyack (2003). “Visualizing Knowledge Domains.” Annual Review of Information Science and Technology 37: 179–255.

R. Brachman , P. Selfridge , L. Terveen , B. Altman , A. Borgida , F. Halper , T. Kirk , A. Lazar , D. McGuinness , and L. Resnick (1993). “Integrated Support for Data Archeology.” International Journal of Intelligent and Cooperative Information Systems. 2(2): 159–185.

L. Cai , and T. Hofmann (2003). Text Categorization by Boosting Automatically Extracted Concepts. In Proceedings of SIGIR-03, 26th ACM International Conference on Research and Development in Information Retrieval. Toronto,ACM Press, New York: 182–189.

J. Carbonell , W. W. Cohen , and Y. Yang (2000). “Guest Editors' Introduction to the Special Issue on Machine Learning and Information Retrieval.” Machine Learning 39(2/3): 99–101.

A. Cardoso-Cachopo , and A. L. Oliveira (2003). An Empirical Comparison of Text Categorization Methods. In Proceedings of SPIRE-03, 10th International Symposium on String Processing and Information Retrieval. Manaus, Brazil, Springer-Verlag, Heidelberg: 183–196.

C. Carpineto , and G. Romano (1996). “Information Retrieval through Hybrid Navigation of Lattice Representations.” International Journal of Human-Computer Studies 45(5): 553–578.

K. M. Chai , H. T. Ng , and H. L. Chieu (2002). Bayesian Online Classifiers for Text Classification and Filtering. In Proceedings of SIGIR-02, 25th ACM International Conference on Research and Development in Information Retrieval. Tampere, FI, ACM Press, New York: 97–104.

S. Chakrabarti , B. E. Dom , R. Agrawal , and P. Raghavan (1998). “Scalable Feature Selection, Classification and Signature Generation for Organizing Large Text Databases into Hierarchical Topic Taxonomies.” Journal of Very Large Data Bases 7(3): 163–178.

S. Chakrabarti , B. E. Dom , and P. Indyk (1998). Enhanced Hypertext Categorization Using Hyperlinks. In Proceedings of SIGMOD-98, ACM International Conference on Management of Data. Seattle, ACM Press, New York: 307–318.

S. Chakrabarti , B. E. Dom , S. R. Kumar , P. Raghavan , S. Rajagopalan , A. Tomkins , D. Gibson , and J. Kleinberg (1999). “Mining the Web's Link Structure.” IEEE Computer 32(8): 60–67.

C. Chen , and R. Paul (2001). “Visualizing a Knowledge Domain's Intellectual Structure.” Computer 34(3): 65–71.

C. C. Chen , M. C. Chen , and Y. Sun (2001). PVA: A Self-Adaptive Personal View Agent. In Proceedings of KDD-01, 7th ACM SIGKDD International Conferece on Knowledge Discovery and Data Mining. San Francisco, ACM Press, New York: 257–262.

C. C. Chen , M. C. Chen , and Y. Sun (2002). “PVA: A Self-Adaptive Personal View Agent.” Journal of Intelligent Information Systems 18(2/3): 173–194.

H. Chen , and S. T. Dumais (2000). Bringing Order to the Web: Automatically Categorizing Search Results. In Proceedings of CHI-00, ACM International Conference on Human Factors in Computing Systems. The Hague, ACM Press, New York: 145–152.

C.-H. Cheng , J. Tang , A. Wai-Chee , and I. King (2001). Hierarchical Classification of Documents with Error Control. In Proceedings of PAKDD-01, 5th Pacific-Asia Conferenece on Knowledge Discovery and Data Mining. Hong Kong, Springer-Verlag, Heidelberg: 433–443.

A. Chouchoulas , and Q. Shen (2001). “Rough Set-Aided Keyword Reduction for Text Categorization.” Applied Artificial Intelligence 15(9): 843–873.

W. T. Chuang , A. Tiyyagura , J. Yang , and G. Giuffrida (2000). A Fast Algorithm for Hierarchical Text Classification. In Proceedings of DaWaK-00, 2nd International Conference on Data Warehousing and Knowledge Discovery. London, Springer-Verlag, Heidelberg: 409–418.

C. Clack , J. Farringdon , P. Lidwell , and T. Yu (1997). Autonomous Document Classification for Business. In Proceedings of the 1st International Conference on Autonomous Agents. W. L. Johnson, ed. Marina Del Rey, CA, ACM Press, New York: 201–208.

W. W Cohen . (1995b). Text Categorization and Relational Learning. In Proceedings of ICML-95, 12th International Conference on Machine Learning. Lake Tahoe, NV, Morgan Kaufmann Publishers, San Francisco: 124–132.

W. W. Cohen , and Y. Singer (1996). Context-Sensitive Learning Methods for Text Categorization. In Proceedings of SIGIR-96, 19th ACM International Conference on Research and Development in Information Retrieval. Zurich, ACM Press, New York: 307–315.

W. W. Cohen , and Y. Singer (1999). “Context-Sensitive Learning Methods for Text Categorization.” ACM Transactions on Information Systems 17(2): 141–173.

T. M. Cover , and J. A. Thomas (1991). Elements of Information Theory. New York, John Wiley and Sons.

J. Cowie , and W. Lehnert (1996). “Information Extraction.” Communications of the Association of Computing Machinery 39(1): 80–91.

K. Crammer , and Y. Singer (2002). A New Family of Online Algorithms for Category Ranking. In Proceedings of SIGIR-02, 25th ACM International Conference on Research and Development in Information Retrieval. Tampere, Finland, ACM Press, New York: 151–158.

M. Craven , D. DiPasquo , D. Freitag , A. K. McCallum , T. M. Mitchell , K. Nigam , and S. Slattery (2000). “Learning to Construct Knowledge Bases from the World Wide Web.” Artificial Intelligence 118(1/2): 69–113.

M. Craven , and S. Slattery (2001). “Relational Learning with Statistical Predicate Invention: Better Models for Hypertext.” Machine Learning 43(1/2): 97–119.

R. M. Creecy , B. M. Masand , S. J. Smith , and D. L. Waltz (1992). “Trading MIPS and Memory for Knowledge Engineering: Classifying Census Returns on the Connection Machine.” Communications of the ACM 35(8): 48–63.

N. Cristianini , J. Shawe-Taylor , and H. Lodhi (2002). “Latent Semantic Kernels.” Journal of Intelligent Information Systems 18(2/3): 127–152.

M. Damashek (1995). “Gauging Similarity with N-Grams: Language-Independent Categorization of Text.” Science 267(5199): 843–848.

V. Dasigi , R. C. Mann , and V. A. Protopopescu (2001). “Information Fusion for Text Classification: An Experimental Comparison.” Pattern Recognition 34(12): 2413– 2425.

G. S. Davidson , B. Hendrickson , D. K. Johnson , C. E. Meyers , and B. N. Wylie (1999). “Knowledge Mining with VxInxight: Discovery through Interaction.” Journal of Intelligent Information Systems 11(3): 259–285.

R. Davidson , and D. Harel (1996). “Drawing Graphs Nicely Using Simulated Annealing.” ACM Transactions on Graphics 15(4): 301–331.

F. Debole , and F. Sebastiani (2003). Supervised Term Weighting for Automated Text Categorization. In Proceedings of SAC-03, 18th ACM Symposium on Applied Computing. Melbourne, FL, ACM Press, New York: 784–788.

S. Decker , S. Melnik , F. V. Harmelen , D. Fensel , M. C. A. Klein , J. Broekstra , M. Erdmann , and I. Horrocks (2000). “The Semantic Web: The Roles of XML and RDF.” IEEE Internet Computing 4(5): 63–74.

S. Deerwester , S. T. Dumais , G. W. Furnas , T. K. Landauer , and R. Harshman (1990). “Indexing by Latent Semantic Analysis.” Journal of the American Society of Information Science 41(6): 391–407.

I. Dhillon , S. Mallela , and R. Kumar (2002). Enhanced Word Clustering for Hierarchical Text Classification. In Proceedings of KDD-02, 8th ACM International Conference on Knowledge Discovery and Data Mining. Edmonton, Canada, ACM Press, New York: 191–200.

Y. Diao , H. Lu , and D. Wu (2000). A Comparative Study of Classification-Based Personal E-mail Filtering. In Proceedings of PAKDD-00, 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Kyoto, Japan, Springer-Verlag, Heidelberg: 408–419.

J. Diederich , J. Kindermann , E. Leopold , and G. Paass (2003). “Authorship Attribution with Support Vector Machines.” Applied Intelligence 19(1/2): 109–123.

Y. Ding , D. Fensel , M. C. A. Klein , and B. Omelayenko (2002). “The Semantic Web: Yet Another Hip?” DKE 41(2–3): 205–227.

P. Domingos (1999). “The Role of Occam's Razor in Knowledge Discovery.” Data Mining and Knowledge Discovery 3(1999): 409–425.

P. Domingos , and M. Pazzani (1997). “On the Optimality of the Simple Bayesian Classifier under Zero-One Loss.” Machine Learning 29: 103–130.

J. Dorre , P. Gerstl , and R. Seiffert (1999). Text Mining: Finding Nuggets in Mountains of Textual Data. In Proceedings of KDD-99, 5th ACM International Conference on Knowledge Discovery and Data Mining. San Diego,ACM Press, New York: 398–401.

L. B. Doyle (1965). “Is Automatic Classification a Reasonable Application of Statistical Analysis of Text?” Journal of the ACM 12(4): 473–489.

H. Drucker , V. Vapnik , and D. Wu (1999). “Support Vector Machines for Spam Categorization.” IEEE Transactions on Neural Networks 10(5): 1048–1054.

S. T. Dumais , and H. Chen (2000). Hierarchical Classification of Web Content. In Proceedings of SIGIR-00, 23rd ACM International Conference on Research and Development in Information Retrieval. Athens, ACM Press, New York: 256–263.

G. Escudero , L. Màrquez , and G. Rigau (2000). Boosting Applied to Word Sense Disambiguation. In Proceedings of ECML-00, 11th European Conference on Machine Learning. Barcelona,Springer-Verlag, Heidelberg: 129–141.

K. Etemad , D. S. Doermann , and R. Chellappa (1997). “Multiscale Segmentation of Unstructured Document Pages Using Soft Decision Integration.” IEEE Transactions on Pattern Analysis and Machine Intelligence 19(1): 92–96.

C. J. Fall , A. Torcsvari , K. Benzineb , and G. Karetka (2003). “Automated Categorization in the International Patent Classification.” SIGIR Forum 37(1): 10–25.

R. Feldman , Y. Aumann , M. Finkelstein-Landau , E. Hurvitz , Y. Regev , and A. Yaroshevich (2002). A Comparative Study of Information Extraction Strategies. In Proceedings of the 3rd International Conference on Intelligent Text Processing and Computational Linguistics. Mexico City, Springer, New York: 349–359.

R. Feldman , I. Dagan , and H. Hirsh (1998). “Mining Text Using Keyword Distributions.” Journal of Intelligent Information Systems 10(3): 281–300.

R. Feldman , and H. Hirsh (1996a). “Exploiting Background Information in Knowledge Discovery from Text.” Journal of Intelligent Information Systems 9(1): 83–97.

R. Feldman , Y. Regev , E. Hurvitz , and M. Landau-Finkelstein (2003). “Mining the Biomedical Literature Using Semantic Analysis and Natural Language Processing Techniques.” Biosilico 1 (2): 69–72.

S. Ferilli , N. Fanizzi , and G. Semeraro (2001). Learning Logic Models for Automated Text Categorization. In Proceedings of AI∗IA-01, 7th Congress of the Italian Association for Artificial Intelligence. F. Esposito, ed. Bari, Italy, Springer-Verlag, Heidelberg: 81–86.

B. J. Field (1975). “Towards Automatic Indexing: Automatic Assignment of Controlled-Language Indexing and Classification from Free Indexing.” Journal of Documentation 31(4): 246–265.

A. Finn , N. Kushmerick , and B. Smyth (2002). Genre Classification and Domain Transfer for Information Filtering. In Proceedings of ECIR-02, 24th European Colloquium on Information Retrieval Research. Glasgow, Springer-Verlag, Heidelberg: 353–362.

M. Fisher , and R. Everson (2003). When Are Links Useful? Experiments in Text Classification. In Proceedings of ECIR-03, 25th European Conference on Information Retrieval. F. Sebastiani, ed. Pisa, Italy, Springer-Verlag, Berlin: 41–56.

R. S. Forsyth (1999). “New Directions in Text Categorization.” In Causal Models and Intelligent Data Management. A. Gammerman, ed. Heidelberg, Springer-Verlag: 151–185.

E. Frank , C. Chui , and I. H. Witten (2000). Text Categorization Using Compression Models. In Proceedings of DCC-00, IEEE Data Compression Conference. Snowbird, UT, IEEE Computer Society Press, Los Alamitos, CA: 200–209.

P. Frasconi , G. Soda , and A. Vullo (2001). Text Categorization for Multi-page Documents: A Hybrid Naive Bayes HMM Approach. In Proceedings of JCDL, 1st ACM-IEEE Joint Conference on Digital Libraries. Roanoke, VA, IEEE Computer Society Press, Los Alamitos, CA: 11–20.

P. Frasconi , G. Soda , and A. Vullo (2002). “Text Categorization for Multi-page Documents: A Hybrid Naive Bayes HMM Approach.” Journal of Intelligent Information Systems 18(2/3): 195–217.

L. C. Freeman (1977). “A Set of Measures of Centrality Based on Betweenness.” Sociometry 40: 35–41.

L. C. Freeman (1979). “Centrality in Social Networks: Conceptual Clarification.” Social Networks 1: 215–239.

T. Fruchterman , and E. Reingold (1991). “Graph Drawing by Force-Directed Placement.” Software – Practice and Experience 21(11): 1129–1164.

N. Fuhr , and U. Pfeifer (1991). Combining Model-Oriented and Description-Oriented Approaches for Probabilistic Indexing. In Proceedings of SIGIR-91, 14th ACM International Conference on Research and Development in Information Retrieval. Chicago,ACM Press, New York: 46–56.

N. Fuhr , and U. Pfeifer (1994). “Probabilistic Information Retrieval as Combination of Abstraction Inductive Learning and Probabilistic Assumptions.” ACM Transactions on Information Systems 12(1): 92–115.

J. Furnkranz (1999). Exploiting Structural Information for Text Classification on the WWW. In Proceedings of IDA-99, 3rd Symposium on Intelligent Data Analysis. Amsterdam,Springer-Verlag, Heidelberg: 487–497.

J. Furnkranz (2002). “Hyperlink Ensembles: A Case Study in Hypertext Classification.” Information Fusion 3(4): 299–312.

R. Gaizauskas , and K. Humphreys (1997). “Using a Semantic Network for Information Extraction.” Natural Language Engineering 3(2): 147–196.

L. Galavotti , F. Sebastiani , and M. Simi (2000). Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization. In Proceedings of ECDL-00, 4th European Conference on Research and Advanced Technology for Digital Libraries. Lisbon,Springer-Verlag, Heidelberg: 59–68.

W. A. Gale , K. W. Church , and D. Yarowsky (1993). “A Method for Disambiguating Word Senses in a Large Corpus.” Computers and the Humanities 26(5): 415–439.

E. Gansner , E. Koutsofias , S. North , and K. Vo (1993). “A Technique for Drawing Directed Graphs.” IEEE Transactions on Software Engineering 19(3): 214–230.

E. Gansner , S. North , and K. Vo (1988). “DAG – A Program that Draws Directed Graphs.” Software Practice and Experience 18(11): 1047–1062.

B. Ganter , and R. Wille (1999). Formal Concept Analysis: Mathematical Foundations. Berlin, Springer-Verlag.

E. Gaussier , C. Goutte , K. Popat , and F. Chen (2002). A Hierarchical Model for Clustering and Categorising Documents. In Proceedings of ECIR-02, 24th European Colloquium on Information Retrieval Research. Glasgow,Springer-Verlag, Heidelberg: 229–247.

A. Gelbukh , ed. (2002). Computational Linguistics and Intelligent Text Processing. In Proceedings of 3rd International Conference, CICLing 2001. Mexico City, Springer-Verlag, Berlin and New York.

The Gene Ontology (GO) Consortium. (2000). “Gene Ontology: Tool for the Unification of Biology.” Nature Genetics 25: 25–29.

The Gene Ontology (GO) Consortium. (2001). “Creating the Gene Ontology Resource: Design and Implementation.” Genome Research 11: 1425–1433.

G. L. Gentili , M. Marinilli , A. Micarelli , and F. Sciarrone (2001). “Text Categorization in an Intelligent Agent for Filtering Information on the Web.” International Journal of Pattern Recognition and Artificial Intelligence 15(3): 527–549.

R. Ghani (2001). Combining Labeled and Unlabeled Data for Text Classification with a Large Number of Categories. In Proceedings of the IEEE International Conference on Data Mining. San Jose, CA, IEEE Computer Society Press, Los Alamitos, CA: 597–598.

D. Giorgetti , and F. Sebastiani (2003a). “Automating Survey Coding by Multiclass Text Categorization Techniques.” Journal of the American Society for Information Science and Technology 54(12): 1269–1277.

D. Giorgetti , and F. Sebastiani (2003b). Multiclass Text Categorization for Automated Survey Coding. In Proceedings of SAC-03, 18th ACM Symposium on Applied Computing. Melbourne, Australia, ACM Press, New York: 798–802.

E. J. Glover , K. Tsioutsiouliklis , S. Lawrence , D. M. Pennock , and G. W. Flake (2002). Using Web Structure for Classifying and Describing Web Pages. In Proceedings of WWW-02, International Conference on the World Wide Web. Honolulu,ACM Press, New York: 562–569.

J. L. Goldberg (1996). “CDM: An Approach to Learning in Text Categorization.” International Journal on Artificial Intelligence Tools 5(1/2): 229–253.

W. A. Gray , and A. J. Harley (1971). “Computer-Assisted Indexing.” Information Storage and Retrieval 7(4): 167–174.

G. Grieser , K. P. Jantke , S. Lange , and B. Thomas (2000). A Unifying Approach to HTML Wrapper Representation and Learning. Discovery Science. In Proceedings of 3rd International Conference, DS 2000. Kyoto, Japan, Springer-Verlag,Berlin: 50–64.

T. R. Gruber (1993). “A Translation Approach to Portable Ontologies.” Knowledge Acquisition 5: 199–220.

L. Guthrie , J. A. Guthrie , and J. Leistensnider (1999). “Document Classification and Routing.” In Natural Language Information Retrieval. T. Strzalkowski, ed. Dordrecht,Kluwer Academic Publishers: 289–310.

R. Hadany , and D. Harel (2001). “A Multi-Scale Method for Drawing Graphs Nicely.” Discrete Applied Mathematics 113: 3–21.

A. Hadjarian , J. Bala , and P. Pachowicz (2001). Text Categorization through Multistrategy Learning and Visualization. In Proceedings of CICLING-01, 2nd International Conference on Computational Linguistics and Intelligent Text Processing. A. Gelbukh, ed. Mexico City,Springer-Verlag, Heidelberg: 423–436.

U. Hahn , and K. Schnattinger (1997). Knowledge Mining from Textual Sources. In Proceedings of the 6th International Conference on Information and Knowledge Management. Las Vegas, ACM, New York: 83–90.

E.-H. Han , G. Karypis , and V. Kumar (2001). Text Categorization Using Weight-Adjusted k-Nearest Neighbor Classification. In Proceedings of PAKDD-01, 5th Pacific-Asia Conferenece on Knowledge Discovery and Data Mining. Hong Kong, Springer-Verlag, Heidelberg: 53–65.

D. Hanauer (1996). “Integration of Phonetic and Graphic Features in Poetic Text Categorization Judgements.” Poetics 23(5): 363–380.

V. Hatzivassiloglou , P. A. Duboue , and A. Rzhetsky (2001). “Disambiguating Proteins, Genes, and RNA in Text: A Machine Learning Approach.” Bioinformatics 17(Suppl 1): S97–106.

P. J. Hayes , L. E. Knecht , and M. J. Cellio (1988). A News Story Categorization System. In Proceedings of ANLP-88, 2nd Conference on Applied Natural Language Processing. Austin, JX, Association for Computational Linguistics, Morristown, NJ: 9–17.

J. He , A.-H. Tan , and C.-L. Tan (2003). “On Machine Learning Methods for Chinese Document Categorization.” Applied Intelligence 18(3): 311–322.

H. S. Heaps (1973). “A Theory of Relevance for Automatic Document Classification.” Information and Control 22(3): 268–278.

B. Hetzler , P. Whitney , L. Martucci , and J. Thomas (1998). Multi-Faceted Insight through Interoperable Visual Information Analysis Paradigms. In Proceedings of Information Visualization '98. Research Triangle Park, NC, IEEE Computer Society Press, Los Alamitos, CA: 137–144.

D. P. Hill , J. A. Blake , J. E. Richardson , and M. Ringwald (2002). “Extension and Integration of the Gene Ontology (GO): Combining GO Vocabularies with External Vocabularies.” Genome Research 12: 1982–1991.

L. Hirschman , J. C. Park , J. Tsujii , L. Wong , and C. H. Wu (2002). “Accomplishments and Challenges in Literature Data Mining for Biology.” Bioinformatics Review 18(12): 1553–1551.

K. Hoashi , K. Matsumoto , N. Inoue , and K. Hashimoto (2000). Document Filtering Methods Using Non-Relevant Information Profile. In Proceedings of SIGIR-00, 23rd ACM International Conference on Research and Development in Information Retrieval. Athens, ACM Press, New York: 176–183.

R. Hoch (1994). Using IR Techniques for Text Classification in Document Analysis. In Proceedings of SIGIR-94, 17th ACM International Conference on Research and Development in Information Retrieval. Dublin, Springer-Verlag, Heidelberg: 31–40.

T. Honkela , K. Lagus , and S. Kaski (1998). “Self-Organizing Maps of Large Document Collections.” In Visual Explorations in Finance with Self-Organizing Maps. G. Deboeck and T. Kohonen, eds. London, Springer: 168–178.

K. Hornbaek , B. Bederson , and C. Plaisant (2002). “Navigation Patterns and Usability of Zoomable User Interfaces With and Without an Overview.” ACM Transactions on Computer–Human Interaction 9(4): 362–389.

W. G. Hoyle (1973). “Automatic Indexing and Generation of Classification by Algorithm.” Information Storage and Retrieval 9(4): 233–242.

W.-L. Hsu , and S.-D. Lang (1999). Classification Algorithms for NETNEWS Articles. In Proceedings of CIKM-99, 8th ACM International Conference on Information and Knowledge Management. Kansas City, MO, ACM Press, New York: 114–121.

G. S. Hubona , G. Shirah , and D. Fout (1997). “The Effects of Motion and Stereopsis on Three-Dimensional Visualization.” International Journal of Human Computer Studies 47(5): 609–627.

D. Hull (1996). “Stemming Algorithms – A Case Study for Detailed Evaluation.” Journal of the American Society for Information Science 47(1): 70–84.

D. A. Hull (1994). Improving Text Retrieval for the Routing Problem Using Latent Semantic Indexing. In Proceedings of SIGIR-94, 17th ACM International Conference on Research and Development in Information Retrieval. Dublin, Springer-Verlag, Heidelberg: 282–289.

D. A. Hull , J. O. Pedersen , and H. Schutze (1996). Method Combination for Document Filtering. In Proceedings of SIGIR-96, 19th ACM International Conference on Research and Development in Information Retrieval. H.-P. Frei, D. Harman, P. Schable, and R. Wilkinson, eds. Zurich, ACM Press, New York: 279–288.

M. P. Hummon , and K. Carley (1993). “Social Networks as Normal Science.” Social Networks 14: 71–106.

K. Humphreys , R. Gaizauskas , and S. Azzam (1997). Event Coreference for Information Extraction. In Proceedings of the Workshop on Operational Factors in Practical, Robust, Anaphora Resolution for Unrestricted Texts. Madrid, Spain, Association for Computational Linguistics, Morristown, NJ: 75–81.

P. G. Ipeirotis , L. Gravano , and M. Sahami (2001). Probe, Count, and Classify: Categorizing Hidden Web Databases. In Proceedings of SIGMOD-01, ACM International Conference on Management of Data. W. G. Aref, ed. Santa Barbara, CA, ACM Press, New York: 67–78.

M. Iwayama , and T. Tokunaga (1994). A Probabilistic Model for Text Categorization: Based on a Single Random Variable with Multiple Values. In Proceedings of ANLP-94, 4th Conference on Applied Natural Language Processing. Stuttgart, Germany, Association for Computational Linguistics, Morristown, NJ: 162–167.

M. Iwayama , and T. Tokunaga (1995a). Cluster-Based Text Categorization: A Comparison of Category Search Strategies. In Proceedings of SIGIR-95, 18th ACM International Conference on Research and Development in Information Retrieval. E. A. Fox, P. Ingwersen, and R. Fidel, eds. Seattle, ACM Press, New York: 273–281.

R. D. Iyer , D. D. Lewis , R. E. Schapire , Y. Singer , and A. Singhal (2000). Boosting for Document Routing. In Proceedings of CIKM-00, 9th ACM International Conference on Information and Knowledge Management. A. Agah, J. Callan, and E. Rundensteiner, eds. McLean, VA, ACM Press, New York: 70–77.

P. S. Jacobs (1992). Joining Statistics with NLP for Text Categorization. In Proceedings of ANLP-92, 3rd Conference on Applied Natural Language Processing. M. Bates and O. Stock, eds. Trento, Italy, Association for Computational Linguistics, Morristown, NJ: 178–185.

P. S. Jacobs (1993). “Using Statistical Methods to Improve Knowledge-Based News Categorization.” IEEE Expert 8(2): 13–23.

A. K. Jain , M. N. Murty , and P. J. Flynn (1999). “Data Clustering: A Review.” ACM Computing Surveys 31(3): 264–323.

T. C. Jo (1999c). Text Categorization with the Concept of Fuzzy Set of Informative Keywords. In Proceedings of FUZZ-IEEE '99, IEEE International Conference on Fuzzy Systems. Seoul, KR, IEEE Computer Society Press, Los Alamitos, CA: 609–614.

T. Joachims (2001). A Statistical Learning Model of Text Classification with Support Vector Machines. In Proceedings of SIGIR-01, 24th ACM International Conference on Research and Development in Information Retrieval. W. B. Croft, D. J. Harper, D. H. Kraft, and J. Zobel, eds. New Orleans, ACM Press, New York: 128–136.

T. Joachims (2002). Learning to Classify Text Using Support Vector Machines. Dordrecht, Kluwer Academic Publishers.

T. Joachims , and F. Sebastiani (2002). “Guest Editors' Introduction to the Special Issue on Automated Text Categorization.” Journal of Intelligent Information Systems 18(2/3): 103–105.

A. Juan , and E. Vidal (2002). “On the Use of Bernoulli Mixture Models for Text Classification.” Pattern Recognition 35(12): 2705–2710.

M. Junker , and A. Dengel (2001). Preventing Overfitting in Learning Text Patterns for Document Categorization. In Proceedings of ICAPR-01, 2nd International Conference on Advances in Pattern Recognition. S. Singh, N. A. Murshed, and W. G. Kropatsch, eds. Rio de Janeiro, Springer-Verlag, Heidelberg: 137–146.

M. Junker , and R. Hoch (1998). “An Experimental Evaluation of OCR Text Representations for Learning Document Classifiers.” International Journal on Document Analysis and Recognition 1(2): 116–122.

M. Junker , M. Sintek , and M. Rinck (2000). Learning for Text Categorization and Information Extraction with ILP. In Proceedings of the 1st Workshop on Learning Language in Logic. Bled, Slovenia, Springer-Verlag, Heidelberg: 247–258.

A. Kaban , and M. Girolami (2002). “A Dynamic Probabilistic Model to Visualise Topic Evolution in Text Streams.” Journal of Intelligent Information Systems 18(2/3): 107–125.

T. Kamada , and S. Kawai (1989). “An Algorithm for Drawing General Undirected Graphs.” Information Processing Letters 31: 7–15.

G. Kar , and L. J. White (1978). “A Distance Measure for Automated Document Classification by Sequential Analysis.” Information Processing and Management 14(2): 57–69.

G. Karypis , and E.-H. Han (2000). Fast Supervised Dimensionality Reduction Algorithm with Applications to Document Categorization and Retrieval. In Proceedings of CIKM-00, 9th ACM International Conference on Information and Knowledge Management. A. Agah, J. Callan, and E. Rundensteiner, eds. McLean, VA, ACM Press, New York: 12–19.

S. Kaski , T. Honkela , K. Lagus , and T. Kohonen (1998). “WEBSOM-Self-Organizing Maps of Document Collections.” Neurocomputing 21: 101–117.

T. Kawatani (2002). Topic Difference Factor Extraction between Two Document Sets and Its Application to Text Categorization. In Proceedings of SIGIR-02, 25th ACM International Conference on Research and Development in Information Retrieval. K. Jarvelin, M. Beaulieu, R. Baeza-Yates, and S. H. Myaeng, eds. Tampere, Finland, ACM Press, New York: 137–144.

A. Kehagias , V. Petridis , V. G. Kaburlasos , and P. Fragkou (2003). “A Comparison of Word- and Sense-Based Text Categorization Using Several Classification Algorithms.” Journal of Intelligent Information Systems 21(3): 227–247.

D. Keim (2002). “Information Visualization and Visual Data Mining.” IEEE Transactions on Visualization and Computer Graphics 8(1): 1–8.

B. Kessler , G. Nunberg , and H. Schutze (1997). Automatic Detection of Text Genre. In Proceedings of ACL-97, 35th Annual Meeting of the Association for Computational Linguistics. P. R. Cohen and W. Wahlster, eds. Madrid, Morgan Kaufmann Publishers, San Francisco: 32–38.

D. V. Khmelev , and W. J. Teahan (2003). A Repetition Based Measure for Verification of Text Collections and for Text Categorization. In Proceedings of SIGIR-03, 26th ACM International Conference on Research and Development in Information Retrieval. C. Clarke, G. Cormack, J. Callan, D. Hawking, and A. Smeaton, eds. Toronto, ACM Press, New York: 104–110.

H. Kim (2002). “Predicting How Ontologies for the Semantic Web Will Evolve.” CACM 45(2): 48–54.

Y.-H. Kim , S.-Y. Hahn , and B.-T. Zhang (2000). Text Filtering by Boosting Naive Bayes Classifiers. In Proceedings of SIGIR-00, 23rd ACM International Conference on Research and Development in Information Retrieval. N. J. Belkin, P. Ingwersen, and M. K. Leong, eds. Athens, ACM Press, New York: 168–75.

J. Kindermann , G. Paass , and E. Leopold (2001). Error Correcting Codes with Optimized Kullback–Leibler Distances for Text Categorization. In Proceedings of ECML-01, 12th European Conference on Machine Learning. L. de Raedt and A. Siebes, eds. Freiburg, Germany, Springer-Verlag, Heidelberg: 266–275.

R. Kindermann , and J. L. Snell (1980). Markov Random Fields and Their Applications. Providence, RI, American Mathematical Society.

P. H. Klingbiel (1973a). “Machine-Aided Indexing of Technical Literature.” Information Storage and Retrieval 9(2): 79–84.

W. Kloesgen (1992). “Problems for Knowledge Discovery in Databases and Their Treatment in the Statistics Interpreter EXPLORA.” International Journal for Intelligent Systems 7(7): 649–673.

W. Kloesgen (1995a). “Efficient Discovery of Interesting Statements in Databases.” Journal of Intelligent Information Systems 4: 53–69.

A. Kloptchenko , T. Eklund , B. Back , J. Karlson , H. Vanharanta , and A. Visa (2002). “Combining Data and Text Mining Techniques for Analyzing Financial Reports.” International Journal of Intelligent Systems in Accounting, Finance, and Management 12(1): 29–41.

E. Knorr , R. Ng , and V. Tucatov (2000). “Distance Based Outliers: Algorithims and Applications.” The VLDB Journal 8(3): 237–253.

T. Kohonen (1982). “Analysis of Simple Self-Organizing Process.” Biological Cybernetics 44(2): 135–140.

T. Kohonen (1995). Self-Organizing Maps. Berlin, Springer-Verlag.

T. Kohonen (1997). Exploration of Very Large Databases by Self-Organizing Maps. In Proceedings of ICNN '97, International Conference on Neural Networks. Houston, TX, IEEE Service Center Press, Piscataway, NJ: 1–6.

T. Kohonen , S. Kaski , K. Lagus , and T. Honkela (1996). Very Large Two-Level SOM for the Browsing of Newsgroups. In Proceedings of ICANN96, International Conference on Artificial Neural Networks. Bochum, Germany, Springer-Verlag, Berlin: 269–274.

T. Kohonen , S. Kaski , K. Lagus , J. Salojarvi , T. Honkela , V. Paatero , and A. Saarela (1999). “Self-Organization of a Massive Text Document Collection.” In Kohonen Maps. E. Oja and S. Kaski, eds. Amsterdam, Elsevier: 171–182.

H. Koike (1993). “The Role of Another Spatial Dimension in Software Visualization.” ACM Transactions on Information Systems 11(3): 266–286.

H. Koike (1995). “Fractal Views: A Fractal-Based Method for Controlling Information Display.” ACM Transactions on Information Systems 13(3): 305–323.

T. Koike , and A. Rzhetsky (2000). “A Graphic Editor for Analyzing Signal-Transduction Pathways.” Gene 259: 235–244.

M. Koppel , S. Argamon , and A. R. Shimoni (2002). “Automatically Categorizing Written Texts by Author Gender.” Literary and Linguistic Computing 17(4): 401–412.

C. H. Koster , and M. Seutter (2003). Taming Wild Phrases. In Proceedings of ECIR-03, 25th European Conference on Information Retrieval. F. Sebastiani, ed. Pisa, Italy, Springer-Verlag, Heidelberg: 161–176.

M. Krauthammer , A. Rzhetsky , P. Morozov , and C. Friedman (2000). “Using BLAST for Identifying Gene and Protein Names in Journal Articles.” Gene 259: 245–252.

M. Krier , and F. Zaccà (2002). “Automatic Categorization Applications at the European Patent Office.” World Patent Information 24: 187–196.

J. Kupiec (1992). “Robust Part-of-Speech Tagging Using a Hidden Markov model.” Computer Speech and Language 6: 225–243.

N. Kushmerick (2000). “Wrapper Induction: Efficiency and Expressiveness.” Artificial Intelligence 118(1–2): 15–68.

O.-W. Kwon , and J.-H. Lee (2003). “Text Categorization Based on k-nearest Neighbor Approach for Web Site Classification.” Information Processing and Management 39(1): 25–44.

Y. Labrou , and T. Finin (1999). Yahoo! as an Ontology: Using Yahoo! Categories to Describe documents. In Proceedings of CIKM-99, 8th ACM International Conference on Information and Knowledge Management. Kansas City, MO, ACM Press, New York: 180–187.

K. Lagus , T. Honkela , S. Kaski , and T. Kohonen (1999). “WEBSOM for Textual Data Mining.” Artificial Intelligence Review 13(5/6): 345–364.

K.-Y. Lai , and W. Lam (2001). Meta-Learning Models for Automatic Textual Document Categorization. In Proceedings of PAKDD-01, 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining. D. Cheung, Q. Li, and G. Williams, eds. Hong Kong, Springer Verlag, Heidelberg: 78–89.

Y.-S. Lai , and C.-H. Wu (2002). “COLUMN: Meaningful Term Extraction and Discriminative Term Selection in Text Categorization via Unknown-Word Methodology.” ACM Transactions on Asian Language Information Processing 1(1): 34–64.

S. L. Lam , and D. L. Lee (1999). Feature Reduction for Neural Network Based Text Categorization. In Proceedings of DASFAA-99, 6th IEEE International Conference on Database Advanced Systems for Advanced Application. A. L. Chen and F. H. Lochovsky, eds. Hsinchu, Taiwan, IEEE Computer Society Press, Los Alamitos, CA: 195–202.

W. Lam , and C. Y. Ho (1998). Using a Generalized Instance Set for Automatic Text Categorization. In Proceedings of SIGIR-98, 21st ACM International Conference on Research and Development in Information Retrieval. W. B. Croft, A. Moffat, C. J. van Rijsergen, R. Wilkinson, and J. Zobel, eds. Melbourne, Australia, ACM Press, New York: 81–89.

W. Lam , and K.-Y. Lai (2001). A Meta-Learning Approach for Text Categorization. In Proceedings of SIGIR-01, 24th ACM International Conference on Research and Development in Information Retrieval. W. B. Croft, D. J. Harper, D. H. Kraft, and J. Zobel, eds. New Orleans, ACM Press, New York: 303–309.

W. Lam , M. E. Ruiz , and P. Srinivasan (1999). “Automatic Text Categorization and Its Applications to Text Retrieval.” IEEE Transactions on Knowledge and Data Engineering 11(6): 865–879.

T. K. Landauer , P. W. Foltz , and D. Laham (1998). “Introduction to Latent Semantic Analysis.” Discourse Processes 25: 259–284.

L. S. Larkey (1998). Automatic Essay Grading Using Text Categorization Techniques. In Proceedings of SIGIR-98, 21st ACM International Conference on Research and Development in Information Retrieval. W. B. Croft, A. Moffat, C. J. v. Rijsbergen, R. Wilkinson, and J. Zobel, eds. Melbourne, Australia, ACM Press, New York: 90–95.

L. S. Larkey (1999). A Patent Search and Classification System. In Proceedings of DL-99, 4th ACM Conference on Digital Libraries. E. A. Fox and N. Rowejeds, eds. Berkeley, CA, ACM Press, New York: 179–187.

L. S. Larkey , and W. B. Croft (1996). Combining Classifiers in Text Categorization. In Proceedings of SIGIR-96, 19th ACM International Conference on Research and Development in Information Retrieval. H. P. Frei, D. Harmon, P. Schaubie, and R. Wilkinson, eds. Zurich, ACM Press, New York: 289–297.

K. H. Lee , J. Kay , B. H. Kang , and U. Rosebrock (2002). A Comparative Study on Statistical Machine Learning Algorithms and Thresholding Strategies for Automatic Text Categorization. In Proceedings of PRICAI-02, 7th Pacific Rim International Conference on Artificial Intelligence. Milshizuka and A. Sattar, eds. Tokyo,Springer-Verlag, Heidelberg: 444–453.

Y.-B. Lee , and S. H. Myaeng (2002). Text Genre Classification with Genre-Revealing and Subject-Revealing Features. In Proceedings of SIGIR-02, 25th ACM International Conference on Research and Development in Information Retrieval. M. Beavliev, E. Beazz-Yakes, S. Myaeng, and K. Jarvelin, eds. Tampere, Finland, ACM Press, New York: 145–150.

W. Lehnert , S. Soderland , D. Aronow , F. Feng , and A. Shmueli (1994). “Inductive Text Classification for Medical Applications.” Journal of Experimental and Theoretical Artificial Intelligence 7(1): 49–80.

E. Leopold , and J. Kindermann (2002). “Text Categorization with Support Vector Machines: How to Represent Texts in Input Space?” Machine Learning 46(1/3): 423–444.

C.-H. Leung , and W.-K. Kan (1997). “A Statistical Learning Approach to Automatic Indexing of Controlled Index Terms.” Journal of the American Society for Information Science 48(1): 55–67.

Y. K. Leung , and M. D. Apperley (1994). “A Review and Taxonomy of Distortion-Oriented Presentation Techniques.” ACM Transactions on Computer–Human Interaction 1(2): 126–160.

D. D. Lewis (1991). Data Extraction as Text Categorization: An Experiment with the MUC-3 Corpus. In Proceedings of MUC-3, 3rd Message Understanding Conference. San Diego, CA, Morgan Kaufmann Publishers, San Francisco: 245–255.

D. D. Lewis (1992a). An Evaluation of Phrasal and Clustered Representations on a Text Categorization task. In Proceedings of SIGIR-92, 15th ACM International Conference on Research and Development in Information Retrieval. N. Belkin, P. Ingwersen, and A. M. Pejtersen, eds. Copenhagen, ACM Press, New York: 37–50.

D. D. Lewis (1995b). “A Sequential Algorithm for Training Text Classifiers: Corrigendum and Additional Data.” SIGIR Forum 29(2): 13–19.

D. D. Lewis , and W. A. Gale (1994). A Sequential Algorithm for Training Text Classifiers. In Proceedings of SIGIR-94, 17th ACM International Conference on Research and Development in Information Retrieval. W. B. Croft and C. J. v. Rijsbergen, eds. Dublin,Springer-Verlag, Heidelberg: 3–12.

D. D. Lewis , R. E. Schapire , J. P. Callan , and R. Papka (1996). Training Algorithms for Linear Text Classifiers. In Proceedings of SIGIR-96, 19th ACM International Conference on Research and Development in Information Retrieval. Zurich, ACM Press, New York: 298–306.

H. Li , and K. Yamanishi (1997). Document Classification Using a Finite Mixture Model. In Proceedings of ACL-97, 35th Annual Meeting of the Association for Computational Linguistics. P. Cohen and W. Wahlster, eds. Madrid,Morgan Kaufmann Publishers, San Francisco: 39–47.

H. Li , and K. Yamanishi (1999). Text Classification Using ESC-Based Stochastic Decision Lists. In Proceedings of CIKM-99, 8th ACM International Conference on Information and Knowledge Management. Kansas City, MO, ACM Press, New York: 122–130.

H. Li , and K. Yamanishi (2002). “Text Classification Using ESC-based Stochastic Decision Lists.” Information Processing and Management 38(3): 343–361.

Y. H. Li , and A. K. Jain (1998). “Classification of Text Documents.” The Computer Journal 41(8): 537–546.

E. D. Liddy , W. Paik , and E. S. Yu (1994). “Text Categorization for Multiple Users Based on Semantic Features from a Machine-Readable Dictionary.” ACM Transactions on Information Systems 12(3): 278–295.

J. H. Lim (1999). Learnable Visual Keywords for Image Classification. In Proceedings of DL-99, 4th ACM Conference on Digital Libraries. E. A. Fox and N. Rowe, eds. Berkeley, CA, ACM Press, New York: 139–145.

X. Lin (1992). Visualization for the Document Space. In Proceedings of Visualization '92. Los Alamitos, CA, Center for Computer Legal Research, Pace University/IEEE Computer Society Press, Piscataway, NJ: 274–281.

X. Lin (1997). “Map Displays for Information Retrieval.” Journal of the American Society for Information Science 48: 40–54.

Y. Liu , Y. Yang , and J. Carbonell (2002). Boosting to Correct the Inductive Bias for Text Classification. In Proceedings of CIKM-02, 11th ACM International Conference on Information and Knowledge Management. McLean, VA, ACM Press, New York: 348–355.

F. Lorrain , and H. C. White (1971). “Structural Equivalence of Individuals in Social Networks.” Journal of Mathematical Sociology 1: 49–80.

S. Y. Lu , and K. S. Fu (1978). “A Sentence-to-Sentence Clustering Procedure for Pattern Analysis.” IEEE Translations on Systems, Man and Cybernetics. 8: 381–389.

S. A. Macskassy , H. Hirsh , A. Banerjee , and A. A. Dayanik (2003). “Converting Numerical Classification into Text Classification.” Artificial Intelligence 143(1): 51–77.

G. Maderlechner , P. Suda , and T. Bruckner (1997). “Classification of Documents by Form and Content.” Pattern Recognition Letters 18(11/13): 1225–1231.

G. Marchionini (1995). Information Seeking in Electronic Environments. Cambridge, UK, Cambridge University Press.

M. E. Maron (1961). “Automatic Indexing: An Experimental Inquiry.” Journal of the Association for Computing Machinery 8(3): 404–417.

B. Masand , G. Linoff , and D. Waltz (1992). Classifying News Stories Using Memory-Based Reasoning. In Proceedings of SIGIR-92, 15th ACM International Conference on Research and Development in Information Retrieval. N. Belkin, P. Ingwersen, and A. M. Pejtersen, eds. Copenhagen, Denmark, ACM Press, New York: 59–65.

K. Matsuda , and T. Fukushima (1999). Task-Oriented World Wide Web Retrieval by Document-Type Classification.In Proceedings of CIKM-99, 8th ACM International Conference on Information and Knowledge Management. S. Gruch, ed. Kansas City, MO, ACM Press, New York: 109–113.

G. Melancon , and I. Herman (2000). DAG Drawing from an Information Visualization Perspective. In Proceedings of Data Visualization '00, Amsterdam,Springer-Verlag, Heidelberg: 3–12.

D. Meretakis , D. Fragoudis , H. Lu , and S. Likothanassis (2000). Scalable Association-Based Text Classification. In Proceedings of CIKM-00, 9th ACM International Conference on Information and Knowledge Management. A. Agoh, J. Callan, S. Gauch, and E. Rundensteiner, eds. McLean, VA, ACM Press, New York: 373–374.

D. Merkl (1998). “Text Classification with Self-Organizing Maps: Some Lessons Learned.” Neurocomputing 21(1/3): 61–77.

D. Mladenic (1998a). Feature Subset Selection in Text Learning. In Proceedings of ECML-98, 10th European Conference on Machine Learning. C. Nedellec and C. Rouveirol, eds. Chemnitz, Germany, Springer-Verlag, London: 95–100.

D. Mladenic (1999). “Text Learning and Related Intelligent Agents: A Survey.” IEEE Intelligent Systems 14(4): 44–54.

D. Mladenic , and M. Grobelnik (2003). “Feature Selection on Hierarchy of Web Documents.” Decision Support Systems 35(1): 45–87.

M.-F. Moens , and J. Dumortier (2000). “Text Categorization: The Assignment of Subject Descriptors to Magazine Articles.” Information Processing and Management 36(6): 841–861.

R. J. Mooney , and L. Roy (2000). Content-Based Book Recommending Using Learning for Text Categorization. Proceedings of DL-00, 5th ACM Conference on Digital Libraries. San Antonio, TX, ACM Press, New York: 195–204.

A. Moschitti (2003). A Study on Optimal Parameter Tuning for Rocchio Text Classifier. In Proceedings of ECIR-03, 25th European Conference on Information Retrieval. F. Sebastiani, ed. Pisa, Italy, Springer-Verlag, Heidelberg: 420–435.

J. Mostafa , and W. Lam (2000). “Automatic Classification Using Supervised Learning in a Medical Document Filtering Application.” Information Processing and Management 36(3): 415–444.

I. Moulinier , and J.-G. Ganascia (1996). “Applying an Existing Machine Learning Algorithm to Text Categorization.” In Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing. S. Wermter, E. Riloff, and G. Scheler, eds. Heidelberg, Springer-Verlag: 343–354.

P. Nardiello , F. Sebastiani , and A. Sperduti (2003). Discretizing Continuous Attributes in AdaBoost for Text Categorization. In Proceedings of ECIR-03, 25th European Conference on Information Retrieval. F. Sebastiani, ed. Pisa, Italy, Springer-Verlag, Heidelberg: 320–334.

T. Nasukawa , and T. Nagano (2001). “Text Analysis and Knowledge Mining System.” IBM Systems Journal 40(4): 967–984.

P. Neuhaus , and N. Broker (1997). The Complexity of Recognition of Linguistically Adequate Dependency Grammars. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics. P. R. Cohen and W. Wahlster, eds. Somerset, NJ, Association for Computational Linguistics: 337–343.

V. Ng , and C. Cardie (2003). Bootstrapping Coreference Classifiers with Multiple Machine Learning Algorithms. In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing (EMNLP-2003), Sappora, Japan, Association for Computational Linguistics, Morristown, NJ: 113–120.

K. Nigam , and R. Ghani (2000). Analyzing the Applicability and Effectiveness of Co-training. In Proceedings of CIKM-00, 9th ACM International Conference on Information and Knowledge Management. A. Agah, J. Callan, S. Gauch, and E. Rundensteiner, eds. McLean, VA, ACM Press, New York: 86–93.

K. Nigam , A. K. McCallum , S. Thrun , and T. M. Mitchell (2000). “Text Classification from Labeled and Unlabeled Documents Using EM.” Machine Learning 39(2/3): 103–134.

J. Ontrup , and H. Ritter (2001b). Text Categorization and Semantic Browsing with Self-Organizing Maps on Non-Euclidean Spaces. In Proceedings of PKDD-01, 5th European Conference on Principles and Practice of Knowledge Discovery in Databases. Freiburg, Germany, Springer-Verlag, Heidelberg: 338–349.

F. Peng , and D. Schuurmans (2003). Combining Naive Bayes n-gram and Language Models for Text Classification. In Proceedings of ECIR-03, 25th European Conference on Information Retrieval. F. Sebastiani, ed. Pisa, Italy, Springer-Verlag, Heidelberg: 335–350.

G. Petasis , A. Cucchiarelli , P. Velardi , G. Paliouras , V. Karkaletsis , and C. D. Spyropoulos (2000). Automatic Adaptation of Proper Noun Dictionaries through Cooperation of Machine Learning and Probabilistic Methods. In Proceedings of SIGIR-00, 23rd ACM International Conference on Research and Development in Information Retrieval. N. Belkin, Peter lngwersen, and M.-K. Leong, eds. Athens,ACM Press, New York: 128–135.

C. Peters , and C. H. Koster (2002). Uncertainty-Based Noise Reduction and Term Selection in Text Categorization. In Proceedings of ECIR-02, 24th European Colloquium on Information Retrieval Research. F. Crestani, M. Girolomi, and C. J. ⅴ. Rijsbergen, eds. Glasgow,Springer-Verlag, London: 248–267.

V. Punyakanok , and D. Roth (2000). Shallow Parsing by Inferencing with Classifiers. In Proceedings of the 4th Conference on Computational Natural Language Learning and of the 2nd Learning Language in Logic Workshop. Lisbon, Association for Computational Linguistics, Somerset, NJ: 107–110.

L. R. Rabiner (1986). “An Introduction to Hidden Markov Models.” IEEE ASSP Magazine 3(1): 4–16.

L. R. Rabiner (1990). “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition.” In Readings in Speech Recognition. A. Waibel and K.-F. Lee, eds. Los Altos, CA, Morgan Kaufmann Publishers: 267–296.

H. Ragas , and C. H. Koster (1998). Four Text Classification Algorithms Compared on a Dutch Corpus. In Proceedings of SIGIR-98, 21st ACM International Conference on Research and Development in Information Retrieval. W. B. Croft, A. Moffat, C. J. v. Rijsbergen, R. Wilkinson and J. Zobel, eds. Melbourne,Australia, ACM Press, New York: 369–370.

L. F. Rau , and P. S. Jacobs (1991). Creating Segmented Databases from Free Text for Text Retrieval. In Proceedings of SIGIR-91, 14th ACM International Conference on Research and Development in Information Retrieval. Chicago,ACM Press, New York: 337–346.

B. Ribeiro-Neto , A. H. F. Laender , and L. R. D. Lima (2001). “An Experimental Study in Automatically Categorizing Medical Documents.” Journal of the American Society for Information Science and Technology 52(5): 391–401.

E. Riloff (1993b). Using Cases to Represent Context for Text Classification. In Proceedings of CIKM-93, 2nd International Conference on Information and Knowledge Management. Washington, DC,ACM Press, New York: 105–113.

E. Riloff (1995). Little Words Can Make a Big Difference for Text Classification. In Proceedings of SIGIR-95, 18th ACM International Conference on Research and Development in Information Retrieval. E. A. Fox, P. Ingwersen, and R. Fidel, eds. Seattle,ACM Press, New York: 130–136.

E. Riloff (1996b). “Using Learned Extraction Patterns for Text Classification.” In Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing. S. Wermter, E. Riloff, and G. Scheler, eds. Springer-Verlag, London: 275–289.

E. Riloff , and W. Lehnert (1994). “Information Extraction as a Basis for High-Precision Text Classification.” ACM Transactions on Information Systems, 12(3): 296–333.

S. E. Robertson , and P. Harding (1984). “Probabilistic Automatic Indexing by Learning from Human Indexers.” Journal of Documentation 40(4): 264–270.

P. Rokita (1996). “Generating Depth-of-Field Effects in Virtual Reality Applications.” IEEE Computer Graphics and Applications 16(2): 18–21.

M. Ruiz , and P. Srinivasan (2002). “Hierarchical Text Classification Using Neural Networks.” Information Retrieval 5(1): 87–118.

A. Rzhetsky , I. Iossifov , T. Koike , M. Krauthammer , P. Kra , M. Morris , H. Yu , P. A. Duboue , W. Weng , J. W. Wilbur , V. Hatzivassiloglou , and C. Friedman (2004). “GeneWays: A System for Extracting, Analyzing, Visualizing, and Integrating Molecular Pathway Data.” Journal of Biomedical Informatics 37: 43–53.

A. Rzhetsky , T. Koike , S. Kalachikov , S. M. Gomez , M. Krauthammer , S. H. Kaplan , P. Kra , J. J. Russo , and C. Friedman (2000). “A Knowledge Model for Analysis and Simulation of Regulatory Networks.” Bionformatics 16: 1120–1128.

G. Sabidussi (1966). “The Centrality Index of a Graph.” Psychometrika 31: 581–603.

C. L. Sable , and V. Hatzivassiloglou (1999). Text-Based Approaches for the Categorization of Images. In Proceedings of ECDL-99, 3rd European Conference on Research and Advanced Technology for Digital Libraries. S. Abitebout and A.-M. Vercoustre, eds. Paris,Springer-Verlag, Heidelberg: 19–38.

C. L. Sable , and V. Hatzivassiloglou (2000). “Text-Based Approaches for Non-topical Image Categorization.” International Journal of Digital Libraries 3(3): 261–275.

M. Sahami , S. Yusufali , and M. Q. Baldonado (1998). SONIA: A Service for Organizing Networked Information Autonomously. In Proceedings of DL-98, 3rd ACM Conference on Digital Libraries. I. Witten, R. Aksyn, and F. M. Shipman, eds. Pittsburgh,ACM Press, New York: 200–209.

Y. Sakakibara , K. Misue , and T. Koshiba (1996). “A Machine Learning Approach to Knowledge Acquisitions from Text Databases.” International Journal of Human Computer Interaction 8(3): 309–324.

G. Sakkis , I. Androutsopoulos , G. Paliouras , V. Karkaletsis , C. D. Spyropoulos , and P. Stamatopoulos (2003). “A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists.” Information Retrieval 6(1): 49–73.

S. N. Sanchez , E. Triantaphyllou , and D. Kraft (2002). “A Feature Mining Based Approach for the Classification of Text Documents into Disjoint Classes.” Information Processing and Management 38(4): 583–604.

M. Sasaki , and K. Kita (1998). Rule-Based Text Categorization Using Hierarchical Categories. In Proceedings of SMC-98, IEEE International Conference on Systems, Man, and Cybernetics. La Jolla, CA, IEEE Computer Society Press, Los Alamitos, CA: 2827–2830.

R. E. Schapire , and Y. Singer (2000). “BoosTexter: A Boosting-Based System for Text Categorization.” Machine Learning 39(2/3): 135–168.

R. E. Schapire , Y. Singer , and A. Singhal (1998). Boosting and Rocchio Applied to Text Filtering. In Proceedings of SIGIR-98, 21st ACM International Conference on Research and Development in Information Retrieval. W. S. Croft, A. Moffat, C. J. v. Rijsbergen, R. Wilkinson, and J. Zobel, eds. Melbourne, Australia, ACM Press, New York: 215–223.

H. Schutze , D. A. Hull , and J. O. Pedersen (1995). A Comparison of Classifiers and Document Representations for the Routing Problem. In Proceedings of SIGIR-95, 18th ACM International Conference on Research and Development in Information Retrieval. E. A. Fox, P. Ingwersen, and R. Fidel, eds. Seattle,ACM Press, New York: 229–237.

F. Sebastiani (2002). “Machine Learning in Automated Text Categorization.” ACM Computing Surveys 34(1): 1–47.

S. B. Seidman (1983). “Network Structure and Minimum Degree.” Social Networks 5: 269–287.

C. Shin , D. Doermann , and A. Rosenfeld (2001). “Classification of Document Pages Using Structure-Based Features.” International Journal on Document Analysis and Recognition 3(4): 232–247.

B. Shneiderman , D. Byrd , and W. B. Croft (1998). “Sorting Out Searching: A User Interface Framework for Text Searches.” Communications of the ACM 41(4): 95–98.

A. Silberschatz , and A. Tuzhilin (1996). “What Makes Patterns Interesting in Knowledge Discovery Systems.” IEEE Transactions on Knowledge and Data Engineering 8(6): 970–974.

C. Silverstein , S. Brin , and R. Motwani (1999). “Beyond Market Baskets: Generalizing Association Rules to Dependence Rules.” Data Mining and Knowledge Discovery 2(1): 39–68.

G. Siolas , and F. d'Alche-Buc (2000). Support Vector Machines Based on a Semantic Kernel for Text Categorization. In Proceedings of IJCNN-00, 11th International Joint Conference on Neural Networks. Como, Italy, IEEE Computer Society Press, Los Alamitos, CA: 205–209.

A. G. Skarmeta , A. Bensaid , and N. Tazi (2000). “Data Mining for Text Categorization with Semi-supervised Agglomerative Hierarchical Clustering.” International Journal of Intelligent Systems 15(7): 633–646.

S. Slattery , and M. Craven (1998). Combining Statistical and Relational Methods for Learning in Hypertext Domains. In Proceedings of ILP-98, 8th International Conference on Inductive Logic Programming. D. Page, ed. Madison, WI, Springer-Verlag, Heidelberg: 38–52.

S. Soderland (1999). “Learning Information Extraction Rules for Semi-Structured and Free Text.” Machine Learning 34(1–3): 233–272.

W. M. Soon , H. T. Ng , and D. C. Y. Lim (2001). “A Machine Learning Approach to Coreference Resolution in Noun Phrases.” Computational Linguistics 27(4): 521–544.

P. Soucy , and G. W. Mineau (2001b). A Simple KNN Algorithm for Text Categorization. In Proceedings of ICDM-01, IEEE International Conference on Data Mining. N. Cerone, T. Y. Lin, and X. Wu, eds. San Jose, CA, IEEE Computer Society Press, Los Alamitos, CA: 647–648.

E. Stamatatos , N. Fakotakis , and G. Kokkinakis (2000). “Automatic Text Categorization in Terms of Genre and Author.” Computational Linguistics 26(4): 471–495.

A. Sun , E.-P. Lim , and W.-K. Ng (2003a). “Hierarchical Text Classification Methods and Their Specification.” In Cooperative Internet Computing. A. T. Chan, S. Chan, H. Y. Leong, and V. T. Y. Ng., eds. Dordrecht, Kluwer Academic Publishers: 236–256.

A. Sun , E.-P. Lim , and W.-K. Ng (2003b). “Performance Measurement Framework for Hierarchical Text Classification.” Journal of the American Society for Information Science and Technology 54(11): 1014–1028.

H. Taira , and M. Haruno (2001). Text Categorization Using Transductive Boosting. In Proceedings of ECML-01, 12th European Conference on Machine Learning. L. D. Raedt and P. A. Flach, eds. Freiburg, Germany, Springer-Verlag, Heidelberg: 454–465.

A.-H. Tan (2001). Predictive Self-Organizing Networks for Text Categorization. In Proceedings of PAKDD-01, 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Hong Kong, Springer-Verlag, Heidelberg: 66–77.

C.-M. Tan , Y.-F. Wang , and C.-D. Lee (2002). “The Use of Bigrams to Enhance Text Categorization.” Information Processing and Management 38(4): 529–546.

D. R. Tauritz , J. N. Kok , and I. G. Sprinkhuizen-Kuyper (2000). “Adaptive Information Filtering Using Evolutionary Computation.” Information Sciences 122(2/4): 121–140.

D. R. Tauritz , and I. G. Sprinkhuizen-Kuyper (1999). Adaptive Information Filtering Algorithms. In Proceedings of IDA-99, 3rd Symposium on Intelligent Data Analysis. D. J. Wand, J. N. Kok, and M. R. Berthold, eds. Amsterdam, Springer-Verlag, Heidelberg: 513–524.

P. Thompson (2001). Automatic Categorization of Case Law. In Proceedings of ICAIL-01, 8th International Conference on Artificial Intelligence and Law. St. Louis, MO, ACM Press, New York: 70–77.

A. Tombros , R. Villa , and C. J. Rijsbergen (2002). “The Effectiveness of Query-Specific Hierarchic Clustering in Information Retrieval.” Information Processing & Management 38(4): 559–582.

K. Toutanova , F. Chen , K. Popat , and T. Hofmann (2001). Text Classification in a Hierarchical Mixture Model for Small Training Sets. In Proceedings of CIKM-01, 10th ACM International Conference on Information and Knowledge Management. H. Paques, L. Liu, and D. Grossman, eds. Atlanta, ACM Press, New York: 105–113.

D. Trastour , C. Bartolini , and C. Preist (2003). “Semantic Web Support for the Business-to-Business E-Commerce Pre-Contractual Lifecycle.” Computer Networks 42(5): 661–673.

P. D. Turney (2000). “Learning Algorithms for Keyphrase Extraction.” Information Retrieval 2(4): 303–336.

K. Tzeras , and S. Hartmann (1993). Automatic Indexing Based on Bayesian Inference Networks. In Proceedings of SIGIR-93, 16th ACM International Conference on Research and Development in Information Retrieval. R. Korfhage, E. M. Rasmussen, and P. Willett, eds. Pittsburgh, ACM Press, New York: 22–34.

L. A. Ure῾na-Lopez , M. Buenaga , and J. M. Gomez (2001). “Integrating linguistic resources in TC through WSD.” Computers and the Humanities 35(2): 215–230.

V. S. Uren , and T. R. Addis (2002). “How Weak Categorizers Based upon Different Principles Strengthen Performance.” The Computer Journal 45(5): 511–524.

V. Vapnik (1995). The Nature of Statistical Learning Theory. Berlin, Springer-Verlag.

O. Y. D. Vel , A. Anderson , M. Corney , and G. M. Mohay (2001). “Mining Email Content for Author Identification Forensics.” SIGMOD Record 30(4): 55–64.

J.-P. Vert (2001). Text Categorization Using Adaptive Context Trees. In Proceedings of CICLING-01, 2nd International Conference on Computational Linguistics and Intelligent Text Processing. A. Gelbukh, ed. Mexico City, Springer-Verlag, Heidelberg: 423–436.

P. Viechnicki (1998). A Performance Evaluation of Automatic Survey Classifiers. In Proceedings of ICGI-98, 4th International Colloquium on Grammatical Inference. V. Honavar and G. Slutzki, eds. Ames, IA, Springer-Verlag, Heidelberg: 244–256.

A. Vinokourov , and M. Girolami (2002). “A Probabilistic Framework for the Hierarchic Organisation and Classification of Document Collections.” Journal of Intelligent Information Systems 18(2/3): 153–172.

H. Wang , and N. H. Son (1999). Text Classification Using Lattice Machine. In Proceedings of ISMIS-99, 11th International Symposium on Methodologies for Intelligent Systems. A. Skowron and Z. W. Ras, eds. Warsaw, Springer-Verlag, Heidelberg: 235–243.

J. T. L. Wang , K. Zhang , G. Chang , and D. Shasha (2002). “Finding Approximate Patterns in Undirected Acyclic Graphs.” Pattern Recognition 35(2): 473–483.

W. Wang , W. Meng , and C. Yu (2000). Concept Hierarchy Based Text Database Categorization in a Metasearch Engine Environment. In Proceedings of WISE-00, 1st International Conference on Web Information Systems Engineering. Hong Kong, IEEE Computer Society Press, Los Alamitos, CA: 283–290.

S. Wasserman , and K. Faust (1994). Social Network Analysis: Methods and Applications. Cambridge, UK, Cambridge University Press.

A. S. Weigend , E. D. Wiener , and J. O. Pedersen (1999). “Exploiting Hierarchy in Text Cate-gorization.” Information Retrieval 1(3): 193–216.

S. M. Weiss , C. Apte , F. J. Damerau , D. E. Johnson , F. J. Oles , T. Goetz , and T. Hampp (1999). “Maximizing Text-Mining Performance.” IEEE Intelligent Systems 14(4): 63–69.

S. Wermter (2000). “Neural Network Agents for Learning Semantic Text Classification.” Information Retrieval 3(2): 87–103.

S. Wermter , G. Arevian , and C. Panchev (1999). Recurrent Neural Network Learning for Text Routing. In Proceedings of ICANN-99, 9th International Conference on Artificial Neural Networks. Edinburgh, Institution of Electrical Engineers, London, UK: 898–903.

D. R. White , and K. P. Reitz (1983). “Graph and Semigroup Homomorphisms on Networks of Relations.” Social Networks 5: 193–234.

W. Wibowo , and H. E. Williams (2002). Simple and Accurate Feature Selection for Hierarchical Categorisation. In Proceedings of the 2002 ACM Symposium on Document Engineering. McLean, VA, ACM Press, New York: 111–118.

G. Wills (1999). “NicheWorks' Interactive Visualization of Very Large Graphs.” Journal of Computational and Graphical Statistics 8(2): 190–212.

J. W. Wong , W.-K. Kan , and G. H. Young (1996). “Action: Automatic Classification for Full-Text Documents.” SIGIR Forum 30(1): 26–41.

Z. Xu , K. Yu , V. Tresp , X. Xu , and J. Wang (2003). Representative Sampling for Text Classification Using Support Vector Machines. In Proceedings of ECIR-03, 25th European Conference on Information Retrieval. F. Sebastiani, ed. Pisa, Italy, Springer-Verlag, Berlin: 393–407.

C. C. Yang , H. Chen , and K. Hong (2003). “Visualization of Large Category Map for Internet Browsing.” Decision Support Systems 35: 89–102.

H.-C. Yang , and C.-H. Lee (2000b). Automatic Category Structure Generation and Categorization of Chinese Text Documents. In Proceedings of PKDD-00, 4th European Conference on Principles of Data Mining and Knowledge Discovery. D. Zighed, A. Komorowski, and D. Zytkow, eds. Lyon, France, Springer-Verlag, Heidelberg, Germany: 673–678.

Y. Yang (1995). Noise Reduction in a Statistical Approach to Text Categorization. In Proceedings of SIGIR-95, 18th ACM International Conference on Research and Development in Information Retrieval. E. A. Fox, P. Ingwersen, and R. Fidel, eds. Seattle, ACM Press, New York: 256–263.

Y. Yang (1999). “An Evaluation of Statistical Approaches to Text Categorization.” Information Retrieval 1(1/2): 69–90.

Y. Yang (2001). A Study on Thresholding Strategies for Text Categorization. In Proceedings of SIGIR-01, 24th ACM International Conference on Research and Development in Information Retrieval. W. B. Croft, D. J. Harper, D. H. Kroft, and J. Zobel, eds. New Orleans, ACM Press, New York: 137–145.

Y. Yang , T. Ault , T. Pierce , and C. W. Lattimer (2000). Improving Text Categorization Methods for Event Tracking. In Proceedings of SIGIR-00, 23rd ACM International Conference on Research and Development in Information Retrieval. N. J. Belkin, P. Ingwersen, and M.-K. Leong, eds. Athens, Greece, ACM Press, New York: 65–72.

Y. Yang , and C. G. Chute (1993). An Application of Least Squares Fit Mapping to Text Information Retrieval. In Proceedings of SIGIR-93, 16th ACM International Conference on Research and Development in Information Retrieval. R. Korthage, E. Rasmussen, and P. Willett, eds. Pittsburgh, ACM Press, New York: 281–290.

Y. Yang , and C. G. Chute (1994). “An Example-Based Mapping Method for Text Categorization and Retrieval.” ACM Transactions on Information Systems 12(3): 252–277.

Y. Yang , and X. Liu (1999). A Re-examination of Text Categorization Methods. In Proceedings of SIGIR-99, 22nd ACM International Conference on Research and Development in Information Retrieval. M. Hearst, F. Gey, and R. Tong, eds. Berkeley, CA, ACM Press, New York: 42–49.

Y. Yang , S. Slattery , and R. Ghani (2002). “A Study of Approaches to Hypertext Categorization.” Journal of Intelligent Information Systems 18(2/3): 219–241.

Y. Yang , and J. W. Wilbur (1996b). “Using Corpus Statistics to Remove Redundant Words in Text Categorization.” Journal of the American Society for Information Science 47(5): 357–369.

Y. Yang , J. Zhang , and B. Kisiel (2003). A Scalability Analysis of Classifiers in Text Categorization. In Proceedings of SIGIR-03, 26th ACM International Conference on Research and Development in Information Retrieval. J. Callan, G. Cormack, C. Clarke, D. Hawking, and A. Smeaton, eds. Toronto, ACM Press, New York: 96–103.

J. Yi , and N. Sundaresan (2000). A Classifier for Semi-Structured Documents. In Proceedings of KDD-00, 6th ACM International Conference on Knowledge Discovery and Data Mining. Boston, ACM Press, New York: 340–344.

K. L. Yu , and W. Lam (1998). A New On-Line Learning Algorithm for Adaptive Text Filtering. In Proceedings of CIKM-98, 7th ACM International Conference on Information and Knowledge Management. G. Gardarin, J. French, N. Pissinou, K. Makki, and L. Bouganim, eds. Bethesda, MD, ACM Press, New York: 156–160.

O. Zamir , and O. Etzioni (1999). “Grouper: A Dynamic Clustering Interface to Web Search Results.” Computer Networks. 31(11–16): 1361–1374.

S. Zelikovitz , and H. Hirsh (2001). Using LSI for Text Classification in the Presence of Background Text. In Proceedings of CIKM-01, 10th ACM International Conference on Information and Knowledge Management. H. Paques, L. Liu, and D. Grossman, eds. Atlanta, ACM Press, New York: 113–118.

D. Zhang , and W. S. Lee (2003). Question Classification Using Support Vector Machines. In Proceedings of SIGIR-03, 26th ACM International Conference on Research and Development in Information Retrieval. J. Callan, G. Cormack, C. Clarke, D. Hawking, and A. Smeaton, eds. Toronto, ACM Press, New York: 26–32.

J. Zhang , and Y. Yang (2003). Robustness of Regularized Linear Classification Methods in Text Categorization. In Proceedings of SIGIR-03, 26th ACM International Conference on Research and Development in Information Retrieval. J. Collan, G. Cormack, C. Clarke, D. Hawking, and A. Smeaton, eds. Toronto, ACM Press, New York: 190–197.

T. Zhang , and F. J. Oles (2001). “Text Categorization Based on Regularized Linear Classification Methods.” Information Retrieval 4(1): 5–31.

S. Zhong , and J Ghosh (2003). “A Comparative Study of Generative Models for Document Clustering.” Knowledge and Information Systems: An International Journal 8: 374–384.

S. Zhou , and J. Guan (2002a). An Approach to Improve Text Classification Efficiency. In Proceedings of ADBIS-02, 6th East-European Conference on Advances in Databases and Information Systems. Y. M., and P. Navrat, eds. Bratislava, Slovakia, Springer-Verlag, Heidelberg: 65–79.

S. Zhou , and J. Guan (2002b). Chinese Documents Classification Based on N-Grams. In Proceedings of CICLING-02, 3rd International Conference on Computational Linguistics and Intelligent Text Processing. A. F. Gelbukh, ed. Mexico City, Springer-Verlag, Heidelberg: 405–414.

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 1094 *
Loading metrics...

Book summary page views

Total views: 700 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 21st September 2017. This data will be updated every 24 hours.