Business Organisation Information Systems
2.0 Introduction
For a majority of business organisation activities, information systems (IS) are the key enablers for operations and decision making. Regarded as the ultimate system, Enterprise Resource planning systems provide organisations with large application functionality by supporting a major part of business activities. They aim is to solve the problem of fragmented information system in large organizations, thereby bringing together various types of data from different business activities in a consistent model of the business (Knolmayer and Rothlin, 2006). Unfortunately, these systems have been plagued with data quality issues, rendering them inefficient to the main aim of providing accurate and essential information required for making essential business decisions.
This chapter brings into focus the ERP systems and its functionalities as well as the kind of data that exists within such systems. It then goes on to explain data quality and those that affect the functionality of ERP systems. It goes on to present the use of ontologies as a solution to such issues
This research therefore aims
2.1 ERP Systems
In quite a large percentage of existing organisations, information systems are regarded as the key enablers of operations of business activities and decision-making, especially in the scope of ERP systems whose main function is to support and integrate all business functions, processes and units of an organisation and to create a system that is capable of providing up-to–date and relevant information to the decision makers, the employees and business partners (Fotini et al, 2008). They not only encompass the “enterprise” and focus on “resources” but also facilitate tasks beyond planning which include financial control, operational management, analysis and reporting, and routine decision support (Botta-Genoulaz and Millet, 2006).
Generally, ERP systems in business enterprises mainly include the following functional modules: planning, production, inventory, marketing and sales, financial management, material management, e.t.c. (Zhang and Liang, 2006). Therefore, Scott (2002) defined ERP systems as “a suite of integrated corporate wide software applications that drive manufacturing, financial, distribution, Human Resources, and other business functions in a real-time environment. These modules represent or play vital roles in the various departments in a business enterprise and are basically used for everyday business transactions. This is because they provide reference models that, according to the manufacturers, embody the current best business practices by supporting organisational business processes (Botta-Genoulaz and Millet, 2006).
Their design promises to eliminate the problem and cost of operating disparate legacy systems by providing a single software system which provides a number of separate but integrated modules. These modules represent the functionalities of the various departments within the organisation. The implementation of an ERP system is quite expensive, requiring a multi-million dollar budget and large project teams (Xu et al, 2002). Despite the costs, many organisations world over have deployed them within their organisations. This research will focus on SAP’s ERP Package because SAP is one of the well-known ERP packages on the market, with strengths in finance and accounting (Xu et al, 2002)
It is an integrated business system, which evolved from a concept first developed by five former IBM systems engineers in 1972. It is a software package designed to enable businesses to effectively and efficiently run a variety of business processes within a single integrated system. SAP stands for systems, applications and products in data processing. It is produced by SAP AG, based in Walldorf, Germany, which employs more than 22,000 people in more than 50 countries. SAP AG is the third-largest software company in the world and the largest ERP provider. SAP software is deployed at more than 22,000 business installations in more than 100 countries and is currently used by companies of all sizes, including more than half of the world’s 500 top companies (SAP AG Corporate Overview, 2000). Therefore, mySAP ERP system is an excellent system to study in an effort to evaluate ERP environments.
The data in an ERP system is seen from two different views, from a business view and from a technical view (Wieczorek et al, 2008). From the business view, data is divided into Master Data and Transactional Data. Master Data refers to core business entities a company uses repeatedly across many business processes and systems, e.g. hierarchies of customers, suppliers, accounts, products or organisational units (Brunner et al, 2007).It is “the data that has been cleansed, rationalized, and integrated into an enterprise-wide “system of record” for core business activities” (Berson and Dubov, 2007, p8). Therefore, it is data that is created once and re-used many times. Transactional data, on the other hand, has a short life-span and is used for a specific transaction. It is always related to master data (Wieczorek et al, 2008). For example, an order for a product requires specific data such as quantity of product or delivery deadline, such data is known as transactional data.
From a technical perspective, the difference between master data and transactional data is almost insignificant as transactional data must be stored in a database and therefore references the master data tables.
Over the years, ERP systems have generally advertised as a panacea, with the ability to eliminate the data quality issues that most legacy systems tend to have. They make use of relational database technology which integrates data from the various functional modules in the system. One of the strongest arguments for ERP systems as earlier stated, is that master data is entered just once and can be used multiple times in different contexts on an enterprise wide system (Knolmayer and Rothlin, ), thereby eliminating the occurrence of redundancies and inaccuracies.
However, in reality, errors are most likely to occur during the capture of the master data and such errors find their way round the system, ultimately affecting the transactional data in the ERP database. If such data is used to make business decisions, it may have an adverse effect on the organisation.
To overcome such problems, many organisations are currently developing and implementing data warehouses in order to reduce the costs associated with the provision of data to support business processes, and to achieve high estimated returns on investment (McFadden 1996). Developers of ERP systems have also adopted this approach by integrating data warehousing technology into their systems. Fig 2 below shows a diagram of the data warehouse based on an ERP system.
Data warehouse architecture of an ERP system in coal mining industry
Source: Zhang and Liang (2006)
The diagram above shows the flow of data from the functional modules of an ERP system to the data warehouse where it is cleaned and stored. The data can then be queried and used as a source of knowledge for making informed business decisions.
2.1.0 Data Warehouses
“A data warehouse is not a product but a concept to support an integrated and systematic data architecture to deliver high quality, decision relevant (data) structures” (Lehman and Jaszewski, 1999, pp.1). A simple definition by Delvin (1997), defines a Data warehouse as “a single, complete and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use in business context.” Chaudhuri and Dayal (1997) similarly defined data warehousing as a collection of decision support technologies, aimed at enabling the knowledge worker (executive, manager, and analyst) to make better and faster decisions.
A data warehouse is a “subject-oriented, integrated, time varying, non-volatile collection of data that is used primarily in organizational decision making” (Inmon, 1992). It supports on-line analytical processing (OLAP), which differs from the on-line transaction processing (OLTP) applications supported by the traditional operational databases (Chaudhuri and Dayal, 1997). Data warehouses extract, cleanse, integrate and store vast amounts of data from ERP systems, thereby providing the relevant support for timely and accurate response to user’s queries (Zhang and Liang, 2006). They provide the information that is required for supporting executive decision making.
As earlier stated, data warehouses were integrated with ERP systems to solve the issue of data quality, such as data duplication and redundancy. This is because they are designed to enable businesses effectively and efficiently run a variety of business processes based on the master data from a single repository system i.e. data warehouse (Xu et al., 2002). Although, considered a good solution, data warehouses over the years have only proved to be a temporary solution as the data contained within these systems is still plagued with inaccurate and incomplete data which adversely affects the competitive success of the organisation (Redman, 1992). The next section provides an overview of data quality and those that affect ERP systems.
2.2 Data Quality
Although many definitions for data quality have emerged in literature, the most widely accepted and used is “fitness of use” (Wand and Wang, 1996; Strong et al, 1997; Tayi and Ballou, 1998; Wang, 1998; Orr, 1998). This means that any concept of quality can only be applied at the moment where the data is used for some purpose (Dalcin, 2005). Simply put, the quality of data cannot be assessed without putting into consideration the people who use the data, in other words, the data consumers (Strong et al (1997) and Chrisman (1991)). In support of this concept, English (1999) stated that data within a database has no actual value or quality, but only possesses potential value. Its value is only realised when someone uses it for something.
Since data quality does not have a fully encompassing definition, it has been proposed as a multi-dimensional concept (Scannapieco and Missier, 2005; Strong et al, 1997; Lee and Strong, 2003). This is because data quality is defined across various dimensions in literature. Some of the most typical dimensions are accuracy, reliability, importance, consistency, precision, timeliness, understandability, conciseness and usefulness (see Wang and Strong (1996) for an extensive list of data quality dimensions in literature). They define data quality dimensions as by “a set of data quality attributes that represent a single aspect or construct of data quality”.
Data Model Quality Factors
Source: Moody et al (1998).
For simplistic purposes, they propose four categories of data quality dimensions: the intrinsic, accessibility, contextual, and representational categories, into which the dimensions have been grouped. Table 1 below shows the four categories with associated dimensions.
Categories
Dimensions
Intrinsic
Accuracy, objectivity, believability, reputation
Contextual
Relevancy, value-added, timeliness, completeness, amount of information
Representational
Interpretability, ease of understanding, concise representation, consistent representation
Accessibility
Access, Security
Example of data quality dimension categories
Source:
Redman (2001), suggested that for data to be fit for use, it must satisfy the various dimensions and provide a proper level of detail, be easy to read and easy to interpret but data quality not only involves the achievement of the various dimensions, it also involves data management, modelling and analysis, quality control and assurance, storage and presentation.
On the other hand, Wand and Wang (1996) take an alternative approach by defining the data quality dimensions using Bunge’s ontology. They identify five intrinsic data quality problems which occur when data is said to be incomplete, meaningless, ambiguous, redundant or incorrect. Following this train of thought, Shanks and Drake (1998), also developed a frame work that define data quality goals for the intrinsic and contextual data quality categories, which they base on the semiotic theory.
The semiotic theory is concerned with the use of symbols to pass on knowledge (Shanks and Corbitt, 1999). Out of the six levels proposed by Stamper (1992), only four are significant within the context of data quality: syntactic, semantic, pragmatic and social levels.
Semiotic Levels in Understanding Data Quality in a Data Warehouse
Source: Shanks and Corbitt (1999)
Syntactic data quality is concerned with how data is structured (Shanks and Corbitt, 1999). The main of syntactic data quality is consistency, where data attributes have a consistent symbolic representation (Ballou et al. 1996). Pragmatic refers to the usage of data. Its main goals are usability and usefulness (Kahn et al. 1997). Social is concerned with the shared understanding of the meaning of symbols. Its goals are an understanding of different stakeholder point of view and awareness of any biases (Shanks and Corbitt, 1999). Semantic data quality refers to the meaning of data and will be the main focus of this research. This dimension will be discussed more in-depth in the next section.
2.2.0 Semantic Data Quality
The derivation of semantic quality criteria is based on the work of Wand and Wang (1996) because as opposed to other literature, it provides a unique theoretical and accurate approach to the definition of the data quality criteria (Price and Shanks, 2004). As early stated, semantic data quality is concerned with the meaning of data (Shanks and Corbitt, 1999) with the main goal of achieving the highest level of data completeness and accuracy (Tayi and Ballou (1998); Wang et al (1995)).
Accuracy refers to how well symbols represent states of the real world (Shanks and Corbitt, 1999). Pipino et al (2002) define it as “the extent to which data is correct and reliable” while Motro and Rakov (1998), present it as whether data available are the true values.
Batini and Scannapieco (2006), define semantic accuracy as “cases in which v is a syntactically correct value but different from v’. Simply put, accuracy is the closeness between the values v and v’, considered as the correct representation of the real-life phenomena.
Completeness is defined as “the extent to which data are of sufficient breadth, depth, and scope for the task at hand”. The completeness dimension can be viewed from many perspectives. At the most abstract level, one can define the concept of the schema completeness, which is the degree to which entities and attributes are not missing from the schema. At the data level, one can define column completeness as a function of the missing values in a column of a table (Pipino et al., 2002). For Wand and Wang (1996), completeness is the ability of an information system to represent every meaningful state of the represented real world system.
Veregin (1998) defines completeness as “a lack of errors of omission in a database” and describes two kinds of completeness: data completeness, as a measurable error of omission observed between the database and the specification; and model completeness, as the agreement between the database specification and the “abstract universe” that is that part of the real world for which data are required for a particular database application. Motro and Rakov’s (1998) definition for completeness is “whether all the data are available”, regarding the database terminology where data completeness refers to both the completeness of files (no records are missing), and to the completeness of records (all fields are known for each record).
As earlier stated, ERP systems depend largely on master data representing their customers, suppliers and products to carry out the various business activities. Master data is typically created once and re-used many times and does not change frequently. It is distributed through a controlled process which is supposed to ensure that all data is entered and approved with respect to business rules, and that every user and every system should receive new or updated master data on-demand (Knolmayer and Rothlin, 2006).
The major problem affecting master data is that its capture and processing are error-prone activities. This can either be due to human error when capturing the data, the integration of data with different semantic rules from multiple data sources in to the data warehouse. For instance, duplicated or missing data will produce incorrect or misleading statistics, simply put garbage in, garbage out, (Rahm and Do, 2000). The data quality problems plaguing these systems can be classified into two categories, single-source problems and multi-source problems.
Source: Rahm and Do (2000)
From the above diagram, both the single-source and multi-source problems are categorised into schema level problems and instance level problems. The schema level problems are those that are reflected in design of the data store while the instance level problems refer to the errors and inconsistencies in the actual data and are not visible at the schema level although the problems at the schema level affect the instance level.
Single-Source problems can be grouped into those that occur in a single relation (a database or a file) and those that occur from existing relationships among the various relations (Oliviera et al,). The data quality of a source is based on the schema and integrity constraints controlling acceptable data (Rahm and Do, 2000). A schema is an internal representation of the world; an organization of concepts and actions that can be revised by new information about the world (Wordnet, ). For data sources that do not have schema such as files, there are no restrictions on what is permissible. Therefore there is a very high probability of errors occurring. Databases on the other hand, enforce restrictions based on the pre-defined data model and application-specific integrity constraints (Rahm and Do, 2000). This gives rise to schema related problems if the data model or application-specific integrity constraints are inappropriate. In this case semantic integrity refers to the “preservation and consistency of database semantics across different applications”.
On the instance levels, errors that occur are due to misspelling, duplicated data, meaningless data, and contradictory data. These errors cannot be prevented at the schema level but can be influenced by errors at that level (Rahm and Do, 2000). Most of these errors occur due to human error when data is being entered into the system.
Multi-Source Problems are aggravated single-source problems. Each source may contain dirty data and the data source may be represented differently, overlap or contradict one another (Rahm and Do, 2000). They occur at the schema level because the different data sources are governed by different schema and integrity rules. The different sources have different sets of modelling data, different naming conventions, e.t.c. At the instance level, data from the various systems might mean the same thing but are represented differently. For example, gender in one source may be represented as Female and Male, while from another it may be represented as 0 and 1. This therefore leads to data inconsistencies and replication.
During data integration, there are two levels of data integration that may occur, these rae the extensional and intensional levels.
The implication of such data quality problems on an enterprise cannot be understated. The next section gives a brief description of the importance of data quality in a typical enterprise.
2.2.1 Importance of data quality
Redman (1996) stated that poor data quality impacts a typical enterprise at various levels. At an operational level poor data quality leads to customer displeasure, amplified cost and reduced employee job fulfilment. Poor quality of data leads to an increase in operational cost because time, financial and non-financial resources are dedicated to detecting and fixing errors. Data quality also has an impact on the tactical level. At the tactical level poor data quality presents difficulty in the reengineering process. At a strategic level data quality makes it increasingly difficult to set and execute business strategy. It also contributes towards the issues of data ownership and diverts management’s attention.
Therefore, this research aims at improving the data models of ERP systems because if the quality of a data model that defines the entities and attributes relevant to the user, then data quality will also be improved (Reigner and Gregory (1994) and Fox et al (1994)). Although the data modelling phase represents only a small portions of the total development effort, its impact on the final result is probably greater than any other phase. The data model forms the foundation for all later design work, and is a major determinant of the quality of the overall system design. This is because data models focus on structuring the entity and attribute portions of user requirements (Yoon et al, 2000). The data model is one of the most critical components in the entire systems’ development. To this end, this research will adopt the use of ontologies to improve the data model of the ERP system. The next section gives a brief introduction of ontologies and its uses. It also highlights how it will be used to improve the data model.
2.3 Ontology
Ontologies have gained popularity within the Information Technology community because it serves as means for establishing explicit formal vocabulary to share between applications. (Noy, 2004). Aristotle defined ontology as the “science of being” (cited in Guarino and Giaretta). Traditionally studied in philosophy, ontology is “the metaphysical study of the nature of being and existence”. Smith and Welty (2001) defined it in a different manner when they presented ontology as “the science of what is, of the kinds and structures of objects, properties, events, processes, and relation in every reality”.
Ontology is a well-established theoretical domain within philosophy dealing with identifying and understanding elements of the real world and their meaning. As one of the many borrowed terms within the context of computer science (Antoniou and Harmelen, 2004), the most commonly used definition is given Gruber (1993) who defines ontology as “an explicit specification of conceptualisation”. Conceptualisation is simply the way the world or a particular domain is viewed; therefore ontology describes a domain in terms of its concepts and relationships (Horridge et al, 2004). While Guarino and Giaretta (1995) have been critical in their definition, it has mostly been philosophical in nature. Gruber on the other hand communicates an idea of ontology in a simple and precise manner which makes it the most generally accepted definition of ontology.
Wang et al (), on the other hand define an ontology as “a formal explicit specification of a shared conceptualization of a domain. It represents the concepts and their relations that are relevant for a given domain of discourse. It consists of a representational vocabulary with precise definitions of the meanings of the terms of this vocabulary plus a set of axioms.” Providing a simplistic definition, Fensel (2001) said it “provides a shared and common understanding of a domain that can be communicated between people and heterogeneous, widely spread application systems. “
Ontologies as a Solution
The essence of information systems is that they are designed in such a way that they are a faithful representation of the world in the same way humans perceive it. Therefore, theories of ontology provide the basis for understanding and documenting real-world semantics of the data (Daga et al, 2005). Data models have been used in the context of information systems for many decades with the main aim of creating representations of reality. They are used in organisations to represent reality at three different levels. They are used to establish the highest level of description of an organisation’s reality, construct a description of the reality surrounding a proposed information system and finally, they are used to model parts of an organisation’s reality leading to implementation in an operational database (Kazimerczak and Milton, 2005). Therefore, ontologies are proposed as a method of improving data models as they present a shared understanding of a domain. They enable shared communication and understanding between people with different needs and viewpoints arising from their different contexts. This allows for the creation of normative models which are extendible for future purposes and creates the semantics of the system (Uschold and Gruniger, 1996).
Ontologies also promote consistency and reduce ambiguity of data that exists in different systems. They are provide easy to re-use library of objects, attributes, relationships, e.t.c, therefore integrating data from different systems becomes easier.
Conventional Solutions
There is a wide range of research carried out on the issue of data quality with numerous proposed methods of overcoming such issues. This research will focus on research carried out with regards to data integration and Data Warehouses. One of such methods is proposed by Goh et al (1999) and Mandick and Zhu (2006), which focuses on a flexible query answering system, named COntext INterchange (COIN). The system allows users to query data in multiple sources without worrying about the most syntactic and semantic differences in those sources (Mandick et al, 2009). COIN understands the context of the data sources and the data consumers and attempts to overcome data misinterpretation problems by converting data in to forms users prefer and can understand (Mandick et al, 2009). Although, the COIN method has proved to be useful in solving some aspect of data quality, it totally ignores the presence of data quality issues within the systems.
There has also been significant research in entity resolution and schema matching. Schema matching (Rahm and Bernstein, 2001; Doan and Halevy, 2005) entails the development of techniques to automatically or semi-automatically match the various data schemas, the result of which can be used to construct a global schema for the data warehouse. On the other hand, entity resolution (Wang and Mandick, 1989; Talburt et al. 2005) also known as record linkage (Winkler, 2006) and object identification (Tejada et al. 2001), provides the techniques that are used to improve completeness, resolve issues on inconsistencies and for the elimination of redundancies during data integration process (Mandick et al, 2009). Other conventional solutions include data cleansing, data monitoring, data cleaning, e.t.c.
Although, all research is significant in its own right as there is no single perfect approach to solving a problem, the next section provides a comparison between ontologies as a solution for data quality issues and the conventional approaches and argues why ontologies are considered the best solution in this research.
Ontologies vs. Conventional Approaches
References
Botta-Genoulaz and Millet (2005) An investigation into the use of ERP in the service sector. International Journal of Production Economics, Vol. 99 (1-2) pp. 202-221
Fontini, M. Anthi-Maria, S. and Euripidis, L. (2008) ERP Systems Business Value: A Critical Review of Empirical Literature. Panhellinic Conference on Informatics, pp. 186-190
Knolmayer, G. F. and Röthlin, M. (2006) Quality of Material Master Data and Its Effect on the Usefulness of Distributed ERP Systems. Lecture Notes in Computer Science, Vol. 4231 pp. 362-371
Scott, T. (2002), "Aligning your data collection and ERP implementation decisions", IT Papers, available at: http://www.autoscan.biz/images/PDF/resource/aligning%20your%20data%20collection%20and%20ERP%20implementation%20decisions.pdf. Accessed 22nd July, 2009
Uschold, M. and M. Gruniger, "Ontologies: Principles, methods and applications", Knowledge Engineering Review, vol 11(2), pp. 93-155, 1996.
Wieczorek, S., Stefanescu, A., and Schieferdecker, I. (2008) Test Data Provision for ERP Systems. International Conference on Software Testing, Verification, and Validation. pp. 396-403
http://wordnetweb.princeton.edu/perl/webwn?s=schema
Xu, H., Nord H. J., Brow, N., Nord, D. G. (2002). Data quality issues in implementing an ERP. Journal of Industrial Management & Data Systems, Vol 102(1) pp. 47-58
Zhang, H. and Liang Y. (2006) A Knowledge warehouse system for enterprise resource planning systems. Systems and Behavioural Science, Vol. 23(2) pp. 169-176
We provide a professional essay writing service that thousands of our customers use as an effective way of improving their grades, improving their research and saving them lots of time.

