Asia Pacific University Library catalogue


DERIVING SENTIMENTS FROM 10-K REPORTS USING TERM FREQUENCY (TF) ANALYSIS / FELICIA TAY SUE CHING.

By: FELICIA TAY SUE CHING (TP044602)Contributor(s): Dr. Tan Chye Cheah [Supervisor.]Material type: TextTextPublication details: Kuala Lumpur : Asia Pacific University, 2019Description: xii, 64 pages : illustrations ; 30 cmSubject(s): Term-frequency analysis | Data structures (Computer science) | Computer systems -- Data -- Structure | Data miningLOC classification: PM-31-48Online resources: Available in APres - Requires login to view full text. Dissertation note: A thesis submitted in fulfillment of the requirement for the award of the degree of M.Sc in Data Science and Business Analytics (UCMP1701DSBA) Summary: The purpose of this study is to develop a step-by-step approach for analyzing 10-K reports, through comparison of term frequency (TF) using the bag of words method. An analysis of the change in TF can be significant to the 10-K report reader because the change represents the corporation's action in amending the 10-K report, which might be due to changes in operations or fluctuations in corporate expectations. Nevertheless, the reason behind each change of TF is subjective to the reader's interpretation. The main contribution of this study is in detailing the methodology of TF comparison through analysis of null terms, identification of terms with highest TF counts and validation of these terms against the 10-K report. Several analytical tools were used to aid analysis, and to produce visualization for clearer understanding of the data. There are two stages of implementation, the first stage being the testing of current methodology in using sentiment word-list to derive sentiment scores from the 10-K reports, while the second stage consist of steps in TF comparison. The result of Stage 1 show that there might be confusion if sentiment scores were to be evaluated at face value. This is because the scores did not reflect any clear relationship against the financial performance of the corporations analysed, and the sentiment scores tend to be in congruent with the length of the corpus, which might delude the interpretation of the results of sentiment scores. Stage 2 implementation drills down into the microscopic view of the 10-K report through term frequencies, which is filtered through analytical tools to enable comparison of null terms and +TF/-TF terms. These filtered terms then go through the process of validation through sentences identification in deriving the insight behind TF differences. From the result Stage 2, it was found that the comparative approach enables the analyser of the report to understand year on year changes better, as the TF differences are often indicative of changes in corporate directions or operational decisions during the year.
    Average rating: 0.0 (0 votes)

A thesis submitted in fulfillment of the requirement for the award of the degree of M.Sc in Data Science and Business Analytics (UCMP1701DSBA)

The purpose of this study is to develop a step-by-step approach for analyzing 10-K reports, through comparison of term frequency (TF) using the bag of words method. An analysis of the change in TF can be significant to the 10-K report reader because the change represents the corporation's action in amending the 10-K report, which might be due to changes in operations or fluctuations in corporate expectations. Nevertheless, the reason behind each change of TF is subjective to the reader's interpretation. The main contribution of this study is in detailing the methodology of TF comparison through analysis of null terms, identification of terms with highest TF counts and validation of these terms against the 10-K report. Several analytical tools were used to aid analysis, and to produce visualization for clearer understanding of the data. There are two stages of implementation, the first stage being the testing of current methodology in using sentiment word-list to derive sentiment scores from the 10-K reports, while the second stage consist of steps in TF comparison. The result of Stage 1 show that there might be confusion if sentiment scores were to be evaluated at face value. This is because the scores did not reflect any clear relationship against the financial performance of the corporations analysed, and the sentiment scores tend to be in congruent with the length of the corpus, which might delude the interpretation of the results of sentiment scores. Stage 2 implementation drills down into the microscopic view of the 10-K report through term frequencies, which is filtered through analytical tools to enable comparison of null terms and +TF/-TF terms. These filtered terms then go through the process of validation through sentences identification in deriving the insight behind TF differences. From the result Stage 2, it was found that the comparative approach enables the analyser of the report to understand year on year changes better, as the TF differences are often indicative of changes in corporate directions or operational decisions during the year.

There are no comments on this title.

to post a comment.