This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Interim financial report is a financial report containing either a complete set of financial statements or a set of condensed financial statements for an interim period. Interim period is a financial reporting period shorter than a full financial year (IAS 34, 1998). A frequency of reporting could be different depending on a company, however in this study only quarterly interim financial reports will be considered.
A structured representation of the financial position and financial performance of an entity is called a financial statement (IAS 1, 1997). In this paper general purpose financial statements are considered and an entity is implied to be a company. General purpose financial statements are "intended to serve users who are not in a position to require financial reports tailored to their particular information needs" (IAS 1, 1997). The objective of general purpose financial statements is to provide information about the financial position, financial performance, and cash flows of a company that is useful to a wide range of users in making economic decisions (IAS 1, 1997).
According to IAS 1 (1997) the following information is provided in the financial statements:
Income and expenses, including gains and losses
Contributions by and distributions to owners in their capacity as owners
In general external users of the financial statements can be classified in three groups (White, Sondhi and Fried, 2002):
Credit and equity investors
Government (executive and legislative branches), regulatory bodies, and tax authorities
The general public and special interest groups, labor unions, and consumer groups.
As Financial Accounting Standards Board states the primary users of the financial statements are creditors and equity investors (Financial Accounting Standards Board, 2007).
When a new financial report is published it is always followed by a market reaction and this behavior has been studied since Beaver (1968). Therefore, for users who trade on the financial markets real-time it is extremely important to be the first who gets information from the interim financial reports.
An application of Natural Language Processing (NLP) techniques to process text documents in an effort to isolate specific types of information is called Information Extraction (IE) (Lehnert, 1997).
The study by Ziebart (1987) shows that unexpected changes in profitability are the major source of trading volume abnormality. Thus, the most important information to be extracted from the interim financial reports is income and expenses of a company. Since interim financial reports are published each quarter 4 times per year they include data for the half year with the second quarter results, 9 months with the third quarter results and 12 months with the fourth quarter results. But only quarterly results are valuable because data for previous quarters are already known. Hence it is significant to extract a correct period of time together with numerical data (income and expenses).
eXtended Business Reporting Language (XBRL) has been developed which applies XML technologies to business reporting and allows to mark information in the reports in the way it will be understood in a given context. For example, XBRL provides a guideline to create a Balance Sheet including not only titles and numbers but all the details as well to join those numbers together to form a Balance Sheet (Lymer, Debreceny, 2003). Nevertheless, XBRL is not widely used and companies just started to make their steps in adapting this technology. Nowadays companies typically publish their financial information on the Web in Hypertext Mark-up Language (HTML) or Adobe Acrobat's Portable Document Format (PDF) (Fisher, Oyelere and Laswad, 2002).
Companies distribute their interim financial reports through distribution agencies and these agencies deliver the reports to their subscribers - professional and private investors, press agencies and media (Hugin Group, 2009). A new interim financial report is distributed to subscribers as a press release with a link to a full PDF version of the report and a short summary of a report inside the press release. There are different technologies to distribute press releases but the common detail about this process is that distribution agencies add tags specifying properties of the press release such as company name, type of a press release, release date etc. A summary of the interim financial report is published in the body of the press release as a plain text.
In the paper "Conceptual Graph Interchange Format for Mining Financial Statements" (Kamaruddin, Hamdan, Bakar and Nor, 2009) authors discuss a method that allows to transform financial statements into conceptual graph interchange format (CGIF). The main goal of this method is to induce structure of the documents and make it easier to perform mining tasks on them (Kamaruddin, Hamdan, Bakar and Nor, 2009). The method consists of 3 main steps: information extraction, parsing and conceptual graph generation. On the first step authors applied an integrated development environment VisualText to identify and extract relevant performance indicators and phrases from the documents. The method discussed in this paper can become an initial stage for knowledge discovery in financial statements.
In the paper by Samejima, et al. (2010) authors propose a method to extract time-series numerical data from press releases by using business keywords input by user. The method allows to identify sentences where keywords occurs together with numerical data and time stamps. Authors assume that numerical data and time stamps occur close to each other in the press releases. The output of this method is time-series numerical data. The target of the extraction is plain sentences and itemizations, authors ignore tables for purpose assuming that tables are easy to process. A user of the proposed system can specify key words to be extracted as well as key words to be ignored. The method uses several techniques which allow improving the output by deleting redundant data, adding time stamps if numerical data occur in itemizations, compensating missing numerical data.
Nevertheless, the studies have been made about extracting numerical data from press releases which can be applicable to the interim financial reports, there is still a gap in the area.
Firstly, a format of the interim financial reports should be taken into consideration. As has been said above, the interim financial reports can include data for different time periods but the data about last quarter are the most important and affect trading on the financial markets. Thus, it is significant to extract correct period of time with numerical values.
Secondly, the data in the interim financial reports can be presented in different formats, time stamps can appear out of sentences, for example, the period of time can be mentioned before a set of items with numbers. This problem cannot be solved by the method proposed by Kamaruddin, Hamdan, Bakar and Nor (2009) but can be quite accurately solved with a method described in the paper by Samejima, et al. (2010).
Thirdly, big companies may include numerical data for the whole group as well as for departments separately. In this case it is extremely important to correctly identify application of data.
Lastly, the numerical data in the interim financial reports can be presented only in a table or the most explanatory data can be presented in a table. Both proposed methods are not able to process tables. Extracting data from tables in the interim financial reports is not as trivial as it seems for the first sight. As has been mentioned above, companies distribute the interim financial reports as a plain text and in case of tables there is no descriptive information about their format that can be easily interpreted by a machine code.
VisualText is a product of American company Text Analysis International. VisualText is positioned as an integrated development environment for building information extraction systems, natural language processing systems, and text analyzers. It is free for academic and personal use and has an open architecture. VisualText provides easily understandable interface which can be used to build text analyzers not only by the original developers or programmers with specialization in natural language processing (NLP) but by any non-programming staff (Text Analysis International, 2003).
VisualText provides NLP++ language (general purpose programming language with integration of extensions for NLP) to build rules for processing text. NLP++ embodies syntax for pass files, contexts, rules, and actions (Text Analysis International, 2003). In addition to standard data types NLP++ includes parse tree and knowledge base data types which allows to address the rules, parse trees, and the knowledge base by using C++ -like code. VisualText includes a library which allows calling text analyzers from the .Net environment.
VisualText processes the input text by multiple pass analyzers and each pass uses the parse tree as modified by the previous pass. A parse tree is a data structure that records and groups patterns discovered in the input text. The parse tree consists of elements that are called nodes. Nodes may carry information in the form of node variables and values (Text Analysis International, 2003).