Text data management and analysis : a practical introduction to information retrieval and text mining [electronic resources] / ChengXiang Zhai, Sean Massung.
Material type: TextSeries: ACM books ; #12.Publication details: [New York, NY] : ACM Books ; ; [San Rafael, California] : Morgan & Claypool, c2016Description: 1 online resources (xx, 510 pages) : illustrationsISBN: 9781970001174 (pdf); 9781970001181 (epub)Subject(s): Data mining | Natural language processing (Computer science) | Computational linguistics -- Statistical methodsDDC classification: 006.312 LOC classification: QA76.9.D343 | Z42 2016ebOnline resources: Available in ACM Digital Library. Requires Log In to view full text.Item type | Current library | Collection | Call number | Copy number | Status | Date due | Barcode |
---|---|---|---|---|---|---|---|
General Circulation | APU Library Online Database | E-Book | QA76.9.D343 Z42 2016eb (Browse shelf (Opens below)) | 1 | Available |
Includes bibliographical references and index.
Part I. Overview and background -- 1. Introduction -- 1.1 Functions of text information systems -- 1.2 Conceptual framework for text information systems -- 1.3 Organization of the book -- 1.4 How to use this book -- Bibliographic notes and further reading -- 2. Background -- 2.1 Basics of probability and statistics -- 2.2 Information theory -- 2.3 Machine learning -- Bibliographic notes and further reading -- Exercises -- 3. Text data understanding -- 3.1 History and state of the art in NLP -- 3.2 NLP and text information systems -- 3.3 Text representation -- 3.4 Statistical language models -- Bibliographic notes and further reading -- Exercises -- 4. MeTA: a unified toolkit for text data management and analysis -- 4.1 Design philosophy -- 4.2 Setting up MeTA -- 4.3 Architecture -- 4.4 Tokenization with MeTA -- 4.5 Related toolkits -- Exercises --
Part II. Text data access -- 5. Overview of text data access -- 5.1 Access mode: pull vs. push -- 5.2 Multimode interactive access -- 5.3 Text retrieval -- 5.4 Text retrieval vs. database retrieval -- 5.5 Document selection vs. document ranking -- Bibliographic notes and further reading -- Exercises -- 6. Retrieval models -- 6.1 Overview -- 6.2 Common form of a retrieval function -- 6.3 Vector space retrieval models -- 6.4 Probabilistic retrieval models -- Bibliographic notes and further reading -- Exercises -- 7. Feedback -- 7.1 Feedback in the vector space model -- 7.2 Feedback in language models -- Bibliographic notes and further reading -- Exercises -- 8. Sarch engine implementation -- 8.1 Tokenizer -- 8.2 Indexer -- 8.3 Scorer -- 8.4 Feedback implementation -- 8.5 Compression -- 8.6 Caching -- Bibliographic notes and further reading -- Exercises -- 9. Search engine evaluation -- 9.1 Introduction -- 9.2 Evaluation of set retrieval -- 9.3 Evaluation of a ranked list -- 9.4 Evaluation with multi-level judgements -- 9.5 Practical issues in evaluation -- Bibliographic notes and further reading -- Exercises -- 10. Web search -- 10.1 Web crawling -- 10.2 Web indexing -- 10.3 Link analysis -- 10.4 Learning to rank -- 10.5 The future of web search -- Bibliographic notes and further reading -- Exercises -- 11. Recommender systems -- 11.1 Content-based recommendation -- 11.2 Collaborative filtering -- 11.3 Evaluation of recommender systems -- Bibliographic notes and further reading -- Exercises --
Part III. Text data analysis -- 12. Overview of text data analysis -- 12.1 Motivation: applications of text data analysis -- 12.2 Text vs. non-text data: humans as subjective sensors -- 12.3 Landscape of text mining tasks -- 13. Word association mining -- 13.1 General idea of word association mining -- 13.2 Discovery of paradigmatic relations -- 13.3 Discovery of syntagmatic relations -- 13.4 Evaluation of word association mining -- Bibliographic notes and further reading -- Exercises -- 14. Text clustering -- 14.1 Overview of clustering techniques -- 14.2 Document clustering -- 14.3 Term clustering -- 14.4 Evaluation of text clustering -- Bibliographic notes and further reading -- Exercises -- 15. Text categorization -- 15.1 Introduction -- 15.2 Overview of text categorization methods -- 15.3 Text categorization problem -- 15.4 Features for text categorization -- 15.5 Classification algorithms -- 15.6 Evaluation of text categorization -- Bibliographic notes and further reading -- Exercises -- 16. Text summarization -- 16.1 Overview of text summarization techniques -- 16.2 Extractive text summarization -- 16.3 Abstractive text summarization -- 16.4 Evaluation of text summarization -- 16.5 Applications of text summarization -- Bibliographic notes and further reading -- Exercises -- 17. Topic analysis -- 17.1 Topics as terms -- 17.2 Topics as word distributions -- 17.3 Mining one topic from text -- 17.4 Probabilistic latent semantic analysis -- 17.5 Extension of PLSA and latent Dirichlet allocation -- 17.6 Evaluating topic analysis -- 17.7 Summary of topic models -- Bibliographic notes and further reading -- Exercises -- 18. Opinion mining and sentiment analysis -- 18.1 Sentiment classification -- 18.2 Ordinal regression -- 18.3 Latent aspect rating analysis -- 18.4 Evaluation of opinion mining and sentiment analysis -- Bibliographic notes and further reading -- Exercises -- 19. Joint analysis of text and structured data -- 19.1 Introduction -- 19.2 Contextual text mining -- 19.3 Contextual probabilistic latent semantic analysis -- 19.4 Topic analysis with social networks as context -- 19.5 Topic analysis with time series context -- 19.6 Summary -- Bibliographic notes and further reading -- Exercises --
Part IV. Unified text data management analysis system -- 20. Toward a unified system for text management and analysis -- 20.1 Text analysis operators -- 20.2 System architecture -- 20.3 MeTA as a unified system --
Appendix A. Bayesian statistics -- Binomial estimation and the beta distribution -- Pseudo counts, smoothing, and setting hyperparameters -- Generalizing to a multinomial distribution -- The Dirichlet distribution -- Bayesian estimate of multinomial parameters -- Conclusion -- Appendix B. Expectation- maximization -- A simple mixture Unigram language model -- Maximum likelihood estimation -- Incomplete vs. complete data -- A lower bound of likelihood -- The general procedure of EM -- Appendix C. KL-divergence and Dirichlet prior smoothing -- Using KL-divergence for retrieval -- Using Dirichlet prior smoothing -- Computing the query model p(w [theta]q) -- References -- Index -- Authors' biographies.
Abstract freely available; full-text restricted to subscribers or individual document purchasers.
The growth of "big data" created unprecedented opportunities to leverage computational and statistical approaches to turn raw data into actionable knowledge that can support various application tasks. This is especially true for the optimization of decision making in virtually all application domains such as health and medicine, security and safety, learning and education, scientific discovery, and business intelligence. Just as a microscope enables us to see things in the "micro world" and a telescope allows us to see things far away, one can imagine a "big data scope" would enable us to extend our perception ability to "see" useful hidden information and knowledge buried in the data, which can help make predictions and improve the optimality of a chosen decision. This book covers general computational techniques for managing and analyzing large amounts of text data that can help users manage and make use of text data in all kinds of applications.
Mode of access: World Wide Web.
System requirements: Adobe Acrobat Reader.
There are no comments on this title.