Asia Pacific University Library catalogue


Text data management and analysis : (Record no. 383691)

000 -LEADER
fixed length control field 08224nam a22004577a 4500
003 - CONTROL NUMBER IDENTIFIER
control field APU
005 - DATE AND TIME OF LATEST TRANSACTION
control field 20221031122702.0
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field 220307t20162016nyua b 001 0 eng d
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number 9781970001174 (pdf)
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number 9781970001181 (epub)
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
Cancelled/invalid ISBN 9781970001167
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
Cancelled/invalid ISBN 9781970001198
035 ## - SYSTEM CONTROL NUMBER
System control number (OCoLC)954593347
035 ## - SYSTEM CONTROL NUMBER
System control number (CaBNVSL)swl00406771
040 ## - CATALOGING SOURCE
Original cataloging agency CaBNVSL
Language of cataloging eng
Transcribing agency APU
Modifying agency SF
050 #4 - LIBRARY OF CONGRESS CALL NUMBER
Classification number QA76.9.D343
Item number Z42 2016eb
082 04 - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number 006.312
Edition number 23
100 1# - MAIN ENTRY--PERSONAL NAME
Personal name Zhai, ChengXiang.,
9 (RLIN) 47408
245 10 - TITLE STATEMENT
Title Text data management and analysis :
Remainder of title a practical introduction to information retrieval and text mining
Medium [electronic resources] /
Statement of responsibility, etc ChengXiang Zhai, Sean Massung.
260 ## - PUBLICATION, DISTRIBUTION, ETC. (IMPRINT)
Place of publication, distribution, etc [New York, NY] :
Name of publisher, distributor, etc ACM Books ;
260 ## - PUBLICATION, DISTRIBUTION, ETC. (IMPRINT)
Place of publication, distribution, etc [San Rafael, California] :
Name of publisher, distributor, etc Morgan & Claypool,
Date of publication, distribution, etc c2016.
300 ## - PHYSICAL DESCRIPTION
Extent 1 online resources (xx, 510 pages) :
Other physical details illustrations.
490 1# - SERIES STATEMENT
Series statement ACM books,
International Standard Serial Number 2374-6777 ;
Volume number/sequential designation #12
504 ## - BIBLIOGRAPHY, ETC. NOTE
Bibliography, etc Includes bibliographical references and index.
505 0# - FORMATTED CONTENTS NOTE
Formatted contents note Part I. Overview and background -- 1. Introduction -- 1.1 Functions of text information systems -- 1.2 Conceptual framework for text information systems -- 1.3 Organization of the book -- 1.4 How to use this book -- Bibliographic notes and further reading -- 2. Background -- 2.1 Basics of probability and statistics -- 2.2 Information theory -- 2.3 Machine learning -- Bibliographic notes and further reading -- Exercises -- 3. Text data understanding -- 3.1 History and state of the art in NLP -- 3.2 NLP and text information systems -- 3.3 Text representation -- 3.4 Statistical language models -- Bibliographic notes and further reading -- Exercises -- 4. MeTA: a unified toolkit for text data management and analysis -- 4.1 Design philosophy -- 4.2 Setting up MeTA -- 4.3 Architecture -- 4.4 Tokenization with MeTA -- 4.5 Related toolkits -- Exercises --
505 8# - FORMATTED CONTENTS NOTE
Formatted contents note Part II. Text data access -- 5. Overview of text data access -- 5.1 Access mode: pull vs. push -- 5.2 Multimode interactive access -- 5.3 Text retrieval -- 5.4 Text retrieval vs. database retrieval -- 5.5 Document selection vs. document ranking -- Bibliographic notes and further reading -- Exercises -- 6. Retrieval models -- 6.1 Overview -- 6.2 Common form of a retrieval function -- 6.3 Vector space retrieval models -- 6.4 Probabilistic retrieval models -- Bibliographic notes and further reading -- Exercises -- 7. Feedback -- 7.1 Feedback in the vector space model -- 7.2 Feedback in language models -- Bibliographic notes and further reading -- Exercises -- 8. Sarch engine implementation -- 8.1 Tokenizer -- 8.2 Indexer -- 8.3 Scorer -- 8.4 Feedback implementation -- 8.5 Compression -- 8.6 Caching -- Bibliographic notes and further reading -- Exercises -- 9. Search engine evaluation -- 9.1 Introduction -- 9.2 Evaluation of set retrieval -- 9.3 Evaluation of a ranked list -- 9.4 Evaluation with multi-level judgements -- 9.5 Practical issues in evaluation -- Bibliographic notes and further reading -- Exercises -- 10. Web search -- 10.1 Web crawling -- 10.2 Web indexing -- 10.3 Link analysis -- 10.4 Learning to rank -- 10.5 The future of web search -- Bibliographic notes and further reading -- Exercises -- 11. Recommender systems -- 11.1 Content-based recommendation -- 11.2 Collaborative filtering -- 11.3 Evaluation of recommender systems -- Bibliographic notes and further reading -- Exercises --
505 8# - FORMATTED CONTENTS NOTE
Formatted contents note Part III. Text data analysis -- 12. Overview of text data analysis -- 12.1 Motivation: applications of text data analysis -- 12.2 Text vs. non-text data: humans as subjective sensors -- 12.3 Landscape of text mining tasks -- 13. Word association mining -- 13.1 General idea of word association mining -- 13.2 Discovery of paradigmatic relations -- 13.3 Discovery of syntagmatic relations -- 13.4 Evaluation of word association mining -- Bibliographic notes and further reading -- Exercises -- 14. Text clustering -- 14.1 Overview of clustering techniques -- 14.2 Document clustering -- 14.3 Term clustering -- 14.4 Evaluation of text clustering -- Bibliographic notes and further reading -- Exercises -- 15. Text categorization -- 15.1 Introduction -- 15.2 Overview of text categorization methods -- 15.3 Text categorization problem -- 15.4 Features for text categorization -- 15.5 Classification algorithms -- 15.6 Evaluation of text categorization -- Bibliographic notes and further reading -- Exercises -- 16. Text summarization -- 16.1 Overview of text summarization techniques -- 16.2 Extractive text summarization -- 16.3 Abstractive text summarization -- 16.4 Evaluation of text summarization -- 16.5 Applications of text summarization -- Bibliographic notes and further reading -- Exercises -- 17. Topic analysis -- 17.1 Topics as terms -- 17.2 Topics as word distributions -- 17.3 Mining one topic from text -- 17.4 Probabilistic latent semantic analysis -- 17.5 Extension of PLSA and latent Dirichlet allocation -- 17.6 Evaluating topic analysis -- 17.7 Summary of topic models -- Bibliographic notes and further reading -- Exercises -- 18. Opinion mining and sentiment analysis -- 18.1 Sentiment classification -- 18.2 Ordinal regression -- 18.3 Latent aspect rating analysis -- 18.4 Evaluation of opinion mining and sentiment analysis -- Bibliographic notes and further reading -- Exercises -- 19. Joint analysis of text and structured data -- 19.1 Introduction -- 19.2 Contextual text mining -- 19.3 Contextual probabilistic latent semantic analysis -- 19.4 Topic analysis with social networks as context -- 19.5 Topic analysis with time series context -- 19.6 Summary -- Bibliographic notes and further reading -- Exercises --
505 8# - FORMATTED CONTENTS NOTE
Formatted contents note Part IV. Unified text data management analysis system -- 20. Toward a unified system for text management and analysis -- 20.1 Text analysis operators -- 20.2 System architecture -- 20.3 MeTA as a unified system --
505 8# - FORMATTED CONTENTS NOTE
Formatted contents note Appendix A. Bayesian statistics -- Binomial estimation and the beta distribution -- Pseudo counts, smoothing, and setting hyperparameters -- Generalizing to a multinomial distribution -- The Dirichlet distribution -- Bayesian estimate of multinomial parameters -- Conclusion -- Appendix B. Expectation- maximization -- A simple mixture Unigram language model -- Maximum likelihood estimation -- Incomplete vs. complete data -- A lower bound of likelihood -- The general procedure of EM -- Appendix C. KL-divergence and Dirichlet prior smoothing -- Using KL-divergence for retrieval -- Using Dirichlet prior smoothing -- Computing the query model p(w [theta]q) -- References -- Index -- Authors' biographies.
506 ## - RESTRICTIONS ON ACCESS NOTE
Terms governing access Abstract freely available; full-text restricted to subscribers or individual document purchasers.
520 3# - SUMMARY, ETC.
Summary, etc The growth of "big data" created unprecedented opportunities to leverage computational and statistical approaches to turn raw data into actionable knowledge that can support various application tasks. This is especially true for the optimization of decision making in virtually all application domains such as health and medicine, security and safety, learning and education, scientific discovery, and business intelligence. Just as a microscope enables us to see things in the "micro world" and a telescope allows us to see things far away, one can imagine a "big data scope" would enable us to extend our perception ability to "see" useful hidden information and knowledge buried in the data, which can help make predictions and improve the optimality of a chosen decision. This book covers general computational techniques for managing and analyzing large amounts of text data that can help users manage and make use of text data in all kinds of applications.
538 ## - SYSTEM DETAILS NOTE
System details note Mode of access: World Wide Web.
538 ## - SYSTEM DETAILS NOTE
System details note System requirements: Adobe Acrobat Reader.
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element Data mining.
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element Natural language processing (Computer science)
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element Computational linguistics
General subdivision Statistical methods.
9 (RLIN) 47409
700 1# - ADDED ENTRY--PERSONAL NAME
Personal name Massung, Sean.,
9 (RLIN) 47410
830 #0 - SERIES ADDED ENTRY--UNIFORM TITLE
Uniform title ACM books ;
Volume number/sequential designation #12.
9 (RLIN) 47379
856 48 - ELECTRONIC LOCATION AND ACCESS
Uniform Resource Identifier https://dl-acm-org.ezproxy.apu.edu.my/doi/book/10.1145/2915031
Public note Available in ACM Digital Library. Requires Log In to view full text.
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Source of classification or shelving scheme
Koha item type E-Book
Holdings
Withdrawn status Lost status Source of classification or shelving scheme Damaged status Not for loan Collection code Home library Current library Shelving location Date acquired Source of acquisition Total Checkouts Full call number Date last seen Copy number Price effective from Koha item type
Not Withdrawn Available   Not Damaged Available for loan E-Book APU Library APU Library Online Database 07/03/2022 OTHERS   QA76.9.D343 Z42 2016eb 07/03/2022 1 07/03/2022 General Circulation