000 08224nam a22004577a 4500
003 APU
005 20221031122702.0
008 220307t20162016nyua b 001 0 eng d
020 _a9781970001174 (pdf)
020 _a9781970001181 (epub)
020 _z9781970001167
020 _z9781970001198
035 _a(OCoLC)954593347
035 _a(CaBNVSL)swl00406771
040 _aCaBNVSL
_beng
_cAPU
_dSF
050 4 _aQA76.9.D343
_bZ42 2016eb
082 0 4 _a006.312
_223
100 1 _aZhai, ChengXiang.,
_947408
245 1 0 _aText data management and analysis :
_ba practical introduction to information retrieval and text mining
_h[electronic resources] /
_cChengXiang Zhai, Sean Massung.
260 _a[New York, NY] :
_bACM Books ;
260 _a[San Rafael, California] :
_bMorgan & Claypool,
_cc2016.
300 _a1 online resources (xx, 510 pages) :
_billustrations.
490 1 _aACM books,
_x2374-6777 ;
_v#12
504 _aIncludes bibliographical references and index.
505 0 _aPart I. Overview and background -- 1. Introduction -- 1.1 Functions of text information systems -- 1.2 Conceptual framework for text information systems -- 1.3 Organization of the book -- 1.4 How to use this book -- Bibliographic notes and further reading -- 2. Background -- 2.1 Basics of probability and statistics -- 2.2 Information theory -- 2.3 Machine learning -- Bibliographic notes and further reading -- Exercises -- 3. Text data understanding -- 3.1 History and state of the art in NLP -- 3.2 NLP and text information systems -- 3.3 Text representation -- 3.4 Statistical language models -- Bibliographic notes and further reading -- Exercises -- 4. MeTA: a unified toolkit for text data management and analysis -- 4.1 Design philosophy -- 4.2 Setting up MeTA -- 4.3 Architecture -- 4.4 Tokenization with MeTA -- 4.5 Related toolkits -- Exercises --
505 8 _aPart II. Text data access -- 5. Overview of text data access -- 5.1 Access mode: pull vs. push -- 5.2 Multimode interactive access -- 5.3 Text retrieval -- 5.4 Text retrieval vs. database retrieval -- 5.5 Document selection vs. document ranking -- Bibliographic notes and further reading -- Exercises -- 6. Retrieval models -- 6.1 Overview -- 6.2 Common form of a retrieval function -- 6.3 Vector space retrieval models -- 6.4 Probabilistic retrieval models -- Bibliographic notes and further reading -- Exercises -- 7. Feedback -- 7.1 Feedback in the vector space model -- 7.2 Feedback in language models -- Bibliographic notes and further reading -- Exercises -- 8. Sarch engine implementation -- 8.1 Tokenizer -- 8.2 Indexer -- 8.3 Scorer -- 8.4 Feedback implementation -- 8.5 Compression -- 8.6 Caching -- Bibliographic notes and further reading -- Exercises -- 9. Search engine evaluation -- 9.1 Introduction -- 9.2 Evaluation of set retrieval -- 9.3 Evaluation of a ranked list -- 9.4 Evaluation with multi-level judgements -- 9.5 Practical issues in evaluation -- Bibliographic notes and further reading -- Exercises -- 10. Web search -- 10.1 Web crawling -- 10.2 Web indexing -- 10.3 Link analysis -- 10.4 Learning to rank -- 10.5 The future of web search -- Bibliographic notes and further reading -- Exercises -- 11. Recommender systems -- 11.1 Content-based recommendation -- 11.2 Collaborative filtering -- 11.3 Evaluation of recommender systems -- Bibliographic notes and further reading -- Exercises --
505 8 _aPart III. Text data analysis -- 12. Overview of text data analysis -- 12.1 Motivation: applications of text data analysis -- 12.2 Text vs. non-text data: humans as subjective sensors -- 12.3 Landscape of text mining tasks -- 13. Word association mining -- 13.1 General idea of word association mining -- 13.2 Discovery of paradigmatic relations -- 13.3 Discovery of syntagmatic relations -- 13.4 Evaluation of word association mining -- Bibliographic notes and further reading -- Exercises -- 14. Text clustering -- 14.1 Overview of clustering techniques -- 14.2 Document clustering -- 14.3 Term clustering -- 14.4 Evaluation of text clustering -- Bibliographic notes and further reading -- Exercises -- 15. Text categorization -- 15.1 Introduction -- 15.2 Overview of text categorization methods -- 15.3 Text categorization problem -- 15.4 Features for text categorization -- 15.5 Classification algorithms -- 15.6 Evaluation of text categorization -- Bibliographic notes and further reading -- Exercises -- 16. Text summarization -- 16.1 Overview of text summarization techniques -- 16.2 Extractive text summarization -- 16.3 Abstractive text summarization -- 16.4 Evaluation of text summarization -- 16.5 Applications of text summarization -- Bibliographic notes and further reading -- Exercises -- 17. Topic analysis -- 17.1 Topics as terms -- 17.2 Topics as word distributions -- 17.3 Mining one topic from text -- 17.4 Probabilistic latent semantic analysis -- 17.5 Extension of PLSA and latent Dirichlet allocation -- 17.6 Evaluating topic analysis -- 17.7 Summary of topic models -- Bibliographic notes and further reading -- Exercises -- 18. Opinion mining and sentiment analysis -- 18.1 Sentiment classification -- 18.2 Ordinal regression -- 18.3 Latent aspect rating analysis -- 18.4 Evaluation of opinion mining and sentiment analysis -- Bibliographic notes and further reading -- Exercises -- 19. Joint analysis of text and structured data -- 19.1 Introduction -- 19.2 Contextual text mining -- 19.3 Contextual probabilistic latent semantic analysis -- 19.4 Topic analysis with social networks as context -- 19.5 Topic analysis with time series context -- 19.6 Summary -- Bibliographic notes and further reading -- Exercises --
505 8 _aPart IV. Unified text data management analysis system -- 20. Toward a unified system for text management and analysis -- 20.1 Text analysis operators -- 20.2 System architecture -- 20.3 MeTA as a unified system --
505 8 _aAppendix A. Bayesian statistics -- Binomial estimation and the beta distribution -- Pseudo counts, smoothing, and setting hyperparameters -- Generalizing to a multinomial distribution -- The Dirichlet distribution -- Bayesian estimate of multinomial parameters -- Conclusion -- Appendix B. Expectation- maximization -- A simple mixture Unigram language model -- Maximum likelihood estimation -- Incomplete vs. complete data -- A lower bound of likelihood -- The general procedure of EM -- Appendix C. KL-divergence and Dirichlet prior smoothing -- Using KL-divergence for retrieval -- Using Dirichlet prior smoothing -- Computing the query model p(w [theta]q) -- References -- Index -- Authors' biographies.
506 _aAbstract freely available; full-text restricted to subscribers or individual document purchasers.
520 3 _aThe growth of "big data" created unprecedented opportunities to leverage computational and statistical approaches to turn raw data into actionable knowledge that can support various application tasks. This is especially true for the optimization of decision making in virtually all application domains such as health and medicine, security and safety, learning and education, scientific discovery, and business intelligence. Just as a microscope enables us to see things in the "micro world" and a telescope allows us to see things far away, one can imagine a "big data scope" would enable us to extend our perception ability to "see" useful hidden information and knowledge buried in the data, which can help make predictions and improve the optimality of a chosen decision. This book covers general computational techniques for managing and analyzing large amounts of text data that can help users manage and make use of text data in all kinds of applications.
538 _aMode of access: World Wide Web.
538 _aSystem requirements: Adobe Acrobat Reader.
650 0 _aData mining.
650 0 _aNatural language processing (Computer science)
650 0 _aComputational linguistics
_xStatistical methods.
_947409
700 1 _aMassung, Sean.,
_947410
830 0 _aACM books ;
_v#12.
_947379
856 4 8 _uhttps://dl-acm-org.ezproxy.apu.edu.my/doi/book/10.1145/2915031
_zAvailable in ACM Digital Library. Requires Log In to view full text.
942 _2lcc
_cE-Book
999 _c383691
_d383691