Chapter 2. Background and Related Work
During the last decade the amount of literature published in the field of eLearning has grown noticeably, as has the diversity in attitudes and viewpoints of people who work on this subject. The general background presented here with regard to eLearning includes the definition, details of different types and the concept of quality. Information quality within information systems (IS), web mining and information extracting techniques are the main areas on which supporting literature is primarily focused. However, an in-depth explanation of each branch of these research fields is outside the scope of this literature review. The literature presented here is particularly focused on the subtopics of these large research areas which are directly applicable to this research.
The structure of this chapter is divided into three main parts: a general view of eLearning including definitions of eLearning, an overview of eLearning types and the concept of quality in eLearning; information quality (IQ) within ISs; and information extraction methods. Each section includes a number of subsections which address the factors that are relevant to this research.
In this part of the literature review, we focus on eLearning by providing a discussion about the definitions of eLearning, eLearning types and the concept of quality in eLearning. Moreover, in this section we lay the foundation for the general concept of quality in eLearning upon which the research will be based. This section also presents a discussion about the relationships between technology, users and content in an eLearning context.
The term eLearning is used in the literature and in business to describe many fields, such as online learning, web-based training, distance learning, distributed learning, virtual learning, or technology-based training. During recent decades, eLearning has been defined in several instances in different ways. In any publication in the field of eLearning, it is important to ensure that the author's understanding exactly matches that of the majority of the readers, therefore, the specific definition used should be stated first. Moreover, to reach a clearer understanding of what eLearning is, in this part of the thesis we present numerous definitions of eLearning as mentioned in the literature.
In general, most of the definitions of the term eLearning are used to express the exploitation of technologies which can be used to deliver learning (or learning materials) in an electronic format, most likely via the World Wide Web (WWW). Psaromiligkos and Retalis consider eLearning to be the systems which utilise the WWW as a delivery medium for static learning resources, such as instructional files, or as an interface onto interactive
The previous definitions look at eLearning in general; in more detail, eLearning can be in the form of courses or in the form of modules and smaller learning materials - it also could take various forms. Romiszowski takes these details into account and summarises the definitions encountered in the literature in a way that emphasises that eLearning can be a solitary, individual activity, or a collaborative group activity. It also suggests that both synchronous and asynchronous interactive forms can be engaged. Naidu also takes into consideration the differences in the forms of interaction when trying to formulate a general definition of eLearning:
"... educational processes that utilize information and communications technology to mediate asynchronous as well as synchronous learning and teaching activities."
The position adopted in this research is that eLearning entails the technology used to distribute the learning materials, the quality of these materials, and the interaction with learners. The definition of eLearning used in this research addresses these dimensions in terms of:
"... the use of new multimedia technologies and the Internet to improve the quality of learning by facilitating access to resources and services as well as remote exchange and collaborations"
As mention earlier, eLearning takes many different forms and includes numerous types of systems. In the extant literature eLearning types are defined following two main axes: the user context (individuals, groups or a community of users) and users' engagement and interactivity.
Romiszowski takes these details into account and summarises the definitions encountered in the following table, which emphasises that eLearning can be a solitary, individual activity, or a collaborative group activity. It also suggests that both synchronous and asynchronous interactive forms can be engaged.
Looking more deeply at the division of the forms of interactivity used in eLearning systems, there are two main types of eLearning: asynchronous and synchronous, depending on learning and teaching activities. Synchronous eLearning environments require tutors and learners, or the online classmates, to be online at the same time, where live interactions take place between them. In this context, Doherty describes an Asynchronous Learning Network (ALN) as a variety of eLearning systems which distribute learning materials and concepts in one direction at a time. Moreover, Spencer and Hiltz express ALN as a place where learners can interact with learning materials, tutors and other learners, through the WWW at different times and from different places.
The focus of this research will be on a case where students log-in to and use the system independently of other students and staff members, as well as using asynchronous methods regarding learning content, quality management and delivery which fit firmly into the general definition of the asynchronous eLearning environment.
Quality Concept in ELearning
The definition of eLearning adopted in this thesis represents three fundamental dimensions: technology, access and quality. The focus in this research will be on quality, which is considered a crucial issue for education in general, and for eLearning in particular. This section of the literature review will discuss concepts of quality in eLearning generally, and highlight the importance of content as the most critical factor for the overall quality.
Currently, there are two recognised challenges in eLearning: the demand for overall interoperability and the request for (high) quality. However, quality cannot be expressed and set by a simple definition, since in itself quality is a very abstract notion. In fact, it is much easier to notice the absence of quality than its presence.
Despite efforts to reach a comprehensive, universal definition of quality in eLearning, there is still a fundamental ambiguity surrounding the issue. One position is to consider quality as an evaluation of excellence, a stance which is primarily adopted by universities and education institutions. For example, in universities quality teaching and learning are promoted as the top priority, giving less attention to criteria or measurements regarding teaching input into courses, the learning outcomes, and the interactivity with the system.
Another trend is to consider the improvement in quality, where quality is improved by moving beyond the set conceptions applied, and generally moving in the direction of a flexible process of negotiation, which needs a very high level of quality capability from those involved.
Furthermore, quality can be viewed and considered from different aspects. Here, the SunTrust Equitable report illustrates what they perceive to be the value chain in eLearning in the form of a pyramid.
The content is the most critical factor of eLearning. Indeed, to be able to use the internet as a tool to improve learning, the content should not distract learners, but increase their interest for learning. Learning tools and enablers are also important in the learning procedure. In reality, providers of learning platforms and knowledge management systems are key in the successful delivery of content. These companies provide the necessary infrastructure to deliver learning content. Moreover, learning service providers (LSP) are the distribution channels for content providers. One of the challenges facing these knowledge hubs and LSPs is to ensure that the learners are receiving fresh content. Companies focused on educational e-tailing then complete the value pyramid of eLearning.
Looking at the pyramid it can be clearly observed that content is the most critical component of learning through the internet. In a similar manner, Henry stated that eLearning is composed of three main aspects: content, technology and services, he also emphasised that content is the most significant factor. Although this thesis will focus on the quality of content delivered by eLearning as the most important criteria and the most influential in the overall level of learning quality, the specified context and the perspectives of users also need to be taken into account when defining quality in eLearning. It is also essential to classify suitable criteria to address this quality.
ELearning Technology, Users and Content
Although most eLearning explanations focus on the technology and not on the learning, it is important to keep the people eLearning is designed for in mind. Moreover, individual learning styles and required learning materials should be addressed first. Then a suitable electronic delivery method can be adopted. On their website (agelesslearner.com), Karl and Marcia Conner commented, in this regard, that "Maybe the 'e' should actually follow the word 'learning'".
Henry describes the content in a way that includes all delivered materials, including the materials which are usually offered in classroom-based learning and that are tailored for eLearning, in addition to any other knowledge the developer might offer.
In fact, eLearning systems are considered to be user-adaptive systems, where systems are designed to react with user performance and choices. Webber, Pesty and Balacheff express user modelling as a central issue in the development of user-adaptive systems, whose behaviour is usually based on the users' preferences, goals, interests and knowledge. Moreover, they declare that a system can be considered user-adaptive when changes in its functionality, structure or interface can be monitored, in order to consider the different needs of users and, ultimately, their changing needs. In the area of eLearning Heift and Nicholson believe that eLearning systems as adaptive systems are designed to meet the diverse requirements of students who have different levels of knowledge and backgrounds .
There is a significant base of literature and research in the area of adaptive systems, which usually base their behaviour on user models. In more detail, Kobsa explained that the user model often depends on one user or a group of users sharing the same profile and it characterises user's preferences, goals, interests and knowledge. Webber, Pesty and Balacheff notice that with regard to this point there are two main problems relating to user modelling: to identify the relevant information to be modelled and to decide which method is more suitable to apply in order to determine the relevant information about the user. In fact, personalisation plays an important role in all areas of the e-era, especially in eLearning, as stated by Esposito, Licchelli and Semeraro, where the main issue is student modelling. This is the analysis of student behaviour and the prediction of future activities and learning performance . Furthermore, Ong and Ramachandran perceive that the literature on adaptive systems shows that by modelling the learner, the human tutor and the knowledge domain of instructional content, powerful pedagogical outcomes can be obtained.
Although eLearning systems are considered types of adaptive systems, the difference between the concept of the user and the concept of the student creates a fundamental problem in the eLearning area. In this context, Esposito, Licchelli and Semeraro believe that in a general web system the user is free to surf and the system attempts to predict future user steps using the user model in order to improve the interaction between the user and the system, while in the eLearning system the modelling has to improve the educational route, adapting it to the model of the student. As a result it is essential to control and to assess student browsing. The systems should not give the students absolute freedom to decide their way through the content and learning materials, rather, the system should provide a specific educational path and offer a continuous evaluation activity of student performance, towards a defined pedagogical goal.
Although delivering web-based educational materials can be very useful as the same content is distributed to a number of students and can be accessed regardless of time and place, this delivery would not be beneficial from a pedagogical point of view if the students, their level of knowledge and their learning style was not known. In fact, Sanatally and Senteni observe that the widely held principle of using the web simply as a form of distributed medium for learning materials does not add significant value to the learning process. This argument leads to the conviction of the importance of developing adaptive eLearning systems. Even if adaptive systems are focused on the interaction with users and changing the course and the content dynamically with their needs, and not on controlling the set sequence of a course, eLearning can exploit adaptive technologies to build learning environments that form user-specific sequencing. Tang and McCalla use the example of the Paper Recommender System as a good example of this exploitation: the system was designed to give recommendations to students about what conference or journal papers to read, based on their level of understanding and knowledge.
We can see more clearly, as suggested by Conati and VanLehn, that the aim of adaptive systems is to build precise, interactively changing models of individual student learning, in order to use them as representations of how learners are progressing within the content of the course. Moreover, Papanikolaou et al. describe adaptivity as being system-controlled and in most cases assists in: planning the content, planning the delivery and presentation of the learning materials, supporting student navigation throughout the field of knowledge and problem solving. From this, it can be deduced that learner models generally characterise learner knowledge levels on the concepts of domain knowledge, pedagogical goals and learning preferences towards diverse styles of learning materials. In this context, they suggest that the domain model should be used in parallel with the learner model to provide a structure for the representation of learner knowledge of the defined domain. Using this procedure, tailored learning materials can be distributed to specific learners to be consistent with their requirements. This corresponds with the vision of Mittal et al., who realised that by creating several broad groups into which it is possible to segment target learners, it can be ensured that the content of learning materials for an absolute beginner student is not the same for that of a student getting ready for an exam.
Nowadays, most student modelling systems follow the same method, in which the systems' starting point is to create a reference template for a student, thus, the expertise or intelligence encoded into the system can adapt the course organisation and content to the individual student. The use of this method to decide the style and level of content that a student should be offered, according to how students interact with the system, will lead to a more personalised learning experience. In the case of this research, the student and domain model did not entail the complexity of those built in adaptive systems; however, several of the underlying principles of available student and domain modelling techniques proved to be useful. The key issue in most adaptive systems that feature student and domain modelling is a sequence of complex data repositories that give highly precise values about student performance and completion against learning materials.
The focus in this research will be on measuring the quality of the content of learning materials distributed via eLearning systems, and establishing how the student will interact with the materials, how they will be able to extract the relevant information from the content and how the context of the online materials will help students to recognise the underlying structure of the content and easily access the parts in which they are interested. This research will gather empirical evidence using online questionnaires, which can be used to directly ask students about their preferences and perspectives.
This part of the literature review provided a general overview of eLearning, including definitions of eLearning, a note of eLearning types and consideration of the concept of quality in eLearning. It also identified the definition adopted for eLearning in this study and considered the type upon which this research will focus. Moreover, in this section we laid the foundation for the general concept of quality in eLearning upon which the research will be based. Finally, it presented a brief discussion about the relationships between technology, users and content in an eLearning context.
The next part of this chapter will discuss the concept of IQ within ISs; this will be used later on to set standards for IQ in the context of eLearning systems.
Information Quality in Information Systems
In this part of the literature review we will start with a brief discussion of the terms "data quality" and "information quality", and will shed some light on the concept of IQ within ISs and how it could be defined. We will also provide a comprehensive review of the major historical developments of IQ frameworks.
Data Quality(DQ) vs. Information Quality
During recent years, much work has been done to build quality frameworks for IQ dimensions. In the past, research focused on DQ, but due to the recent development of internet technologies, ISs today are providing users with information, not only data. Therefore, research attention has shifted to focus on IQ frameworks.
While, some researchers explicitly distinguish between the terms "data" and "information" and explain information as data which has been processed in some way, sometimes, it may be difficult to discriminate between them in practice .
Still, in some studies the term "information" is interchangeable with "data". Likewise, the term "data quality" is often used synonymously with "information quality". Consequently, in this study, the concept of information will be used in a broad sense, which covers the concept of data.
Before reviewing the researches that were conducted to formulate (data/information) quality frameworks within ISs, first we will discuss the meaning of IQ and how it could be defined.
How Information Quality Could be Defined
Although it is important to set standards for IQ, it is a difficult and complex issue, particularly in the area of ISs, because there is no formal definition of IQ, as quality is dependent on the criteria applied to it. Furthermore, it is dependent on the targets, the environment and from which viewpoint we look at the IQ, that is, from the provider or the consumer perspective. Moreover, IQ is both a task-dependent and a subjective concept. Juran summarises these aspects of quality in his quality definition as "fitness for use". Similarly, Wang described DQ (which could apply to IQ) as data that is fit-for-use.
This description has been adopted by researchers because it brings to light the fact that IQ cannot be defined and evaluated without knowing its context. Defining IQ in a contextual approach seems to be logical because quality criteria, which could be used to assess IQ, can differ according to the context. In fact, IQ is expressed in the literature to be a multi-dimensional concept with varying attributed characteristics depending on the context of the information. However, taking into account the complexity of the IQ concept and that its measurement is expected to be multi-dimensional in nature, the prime issue in defining the quality of any IS is identifying the criteria by which the quality is determined. The criteria result from the multi-dimensional and interdependent nature of quality in ISs, and are dependent on the objectives and the context of the system. Thus, it is common to define IQ on the internet by identifying the main dimensions of the quality, for that purpose IQ frameworks are widely used to identify the important quality dimensions in a specific context, these dimensions can be used as benchmark to improve the effectiveness of information systems, as described by Porter.
Information Quality Frameworks
Today, for any IS to be judged successfully it has first to satisfy additional predefined quality criteria. An eLearning system is a special type of IS so it is important to examine the literature relating to the traditional IS success models and the proposed quality frameworks, in order to test the possibility of extending these success models to identify eLearning content quality criteria in an eLearning context.
Much of the work done in IS success has its origins in the well-known DeLone and McLean (D&M) IS Success Model.This model provided a comprehensive taxonomy on IS success based on the analysis of more than 180 studies on IS success and it identified over 100 IS success measures during the analysis. It established that system quality, IQ, use, user satisfaction, individual and organisational impact were the most distinct elements of the IS success equation. In a later work, the authors confirmed the original taxonomy and their conclusion, namely that IS success was "a multidimensional and interdependent construct". Their model makes two important contributions to the understanding of IS success. First, it provides a scheme for categorising the multitude of IS success measures that have been used in the literature. Second, it suggests a model of temporal and causal interdependencies between the categories. The updated model, which was proposed in 2003, consists of six dimensions:
- Information quality, which concerns the system content issue. Web content should be personalised, complete, relevant, easy to understand and secure.
- System quality, which measures the desired characteristics of a web based system such as usability, availability, reliability and adaptability.
- Service quality
- Usage, which measures visits to a website, navigation within the site and information retrieval.
- User satisfaction, which measures user's opinions of the system and should cover the entire user experience cycle.
- Net benefits, which capture the balance of positive and negative impacts of the system on the users. Although this success measure is very important, it cannot be analysed and understood without system quality and IQ measurements.
In their model, DeLone and McLean defined three main dimensions for the quality: IQ, systems quality and service quality. Each one has to be measured separately, because singularly or jointly, they will affect subsequent system usage and user satisfaction.
In 1996, Wang and Strong proposed their DQ framework, which will be discussed in more detail in the following section. In their framework they categorised characteristics/attributes in to four main types/factors: intrinsic, accessibility, contextual and representational. This method of categorising IQ factors and attributes proved to be a valuable methodology for defining IQ. Lately, several quality management projects in business and government have successfully used this framework.
After Wang & Strong DQ framework, diverse research efforts were spent in order to identify IQ dimensions in deference contexts. Although these frameworks varied in their approach and application, they shared some of the same characteristics concerning their classifications of the dimensions of quality.
In 1996, Gertz focused on finding possible solutions for the problems regarding modeling and managing data quality and integrity of integrated data. H proposed a taxonomy of data quality characteristics that includes important attributes such as timeliness and completeness of local information sources. While Redman's work aimed to set up practical guidelines to analyze and improve information quality within business processes, h proposed a number of quality attributes grouped into six categories: Privacy, Content, Quality of Values, Presentation, Improvement and Commitment. In the same year, Zeist & Hendricks identified 32 IQ sub-characteristics grouped in 6 main IQ characteristics which covered functionality, reliability, efficiency, usability, maintainability and portability.
Unlike general purpose IQ framework, in 1997 Jarke proposed a special purpose framework where he used the same hierarchical design established by Wang & Strong. He defined IQ criteria depending on the context and requirements for specific application; Data Warehouse Quality (DWQ). In his framework, Jarke linked each operational quality goals for data warehouses to the criteria which describe this goal. The main defined criteria are accessibility, interpretability, usefulness, believability, and validation.
In 1998, Chen gave a list of IQ criteria with no special taxonomy. He, however, proposed a goal-oriented framework focusing mainly on time-oriented criteria such as response time and network delay. One year later, Alexander & Tate proposed their framework for IQ IN Web environment. This framework consisted of 6 main criteria; authority, accuracy, objectivity, currency, orientation and navigation. In the same year, Katerattanakul & Siau adapted Wang & Strong DQ framework to propose their four categories IQ framework of individual websites. Furthermore, Shanks & Corbitt recommended a semiotic-based quality framework for information on the Web. This framework includes four semiotic levels. Syntactic level to insure that information is consistent whiles the Semantic level focuses on the information completion and accuracy. Pragmatic level is the third level which covers the usability and the usefulness of the information. The forth level is the social level ensures information understandability. Within their framework there are 11 quality dimension distributed within the identified levels.
Dedeke in 2000 developed a conceptual IS quality framework that includes 5 categories; ergonomic, accessible, transactional, contextual and representational quality. Each category consists of number of quality dimensions such as; availability, relevancy and conciseness. Whilst Zhu & Gauch described 6 quality metrics for information retrieval on the web; these are availability, authority, currency, information-to-noise ratio and cohesiveness.
Leung adapted Zeist & Hendricks's quality framework in 2001 and applied it to Intranet applications. He defined 6 main IQ characteristics; functionality, reliability, usability, efficiency, maintainability and portability. Each quality characteristic in the proposed framework includes numbers of sub-characteristics.
Several research in IS quality were undertaken in 2002, Eppler & Muenzenmayer suggested two main manifestations for their proposed framework; content quality and media quality. The content quality is focused on the quality of the presented information and it consists of two categories; relevant information and sound information. Whereas media quality is focused on the quality of the medium used to deliver the information and it includes optimized process category and reliable infrastructure category. Each category in the framework contains number of quality dimensions. Khan categorised IQ depending on the context of the system. The framework divided IQ into two main quality types; product and service quality. Moreover, it divided these two types into 4 quality classifications and each classification into number of quality dimensions. The quality classifications are sound information, useful information, dependable information and usable information.
In addition, Klein conducted a research in the same year to identify five IQ dimensions chosen Wang & Strong's DQ framework to measure IQ in Web context; accuracy, completeness, relevance, timeliness and amount of data. Mecella also proposed an initial framework for quality management in Cooperative Information System (CIS). This framework includes a model for quality data exported by cooperating organizations and the design of an infrastructure service and improving quality.
More recent, in 2005 Liu & Huang mentioned 6 key dimensions for IQ; source (focused on information availability), content (focused on information completeness), format and presentation (focused on information consistency), currency (focused on information currency and timeliness), accuracy (focused on information accuracy and reliability) and speed (focused on how easily information is downloadable).
Besiki et all introduced in 2007 a general framework for IQ assessment. This framework consists of a comprehensive taxonomy of IQ dimensions, and provides a straightforward and powerful predictive method to study IQ problems and reason through them in a systematic and meaningful way.
Lately, Kimberly et all presented in 2009 a model for how to think about IQ depending on the application context; they identified number of common IQ metrics. Kargar & Azimzadeh also presented an original experimental framework for ranking IQ on the Web log. The results of their research revealed 7 IQ dimensions for IQ in Web log. For each quality dimension, quality variables associated coefficients were calculated and used so that the proposed framework is able to automatically assess IQ of Web logs. In the same year Thi & Helfert conducted a research aimed to propose a quality framework based on IS architecture. In their research they identified quality factors for different construct levels of IS architecture. Moreover, they also presented impacts amongst different quality factors which help to analyze the cause of IS defects.
In this part we gave a brief review of the researches conducted to formulate (data/information) quality frameworks within ISs. However in the next section we will focus on Wang and Strong's DQ framework as we will use it as a base for this research to measure IQ in eLearning systems along the dimensions of the framework.
Wang and Strong's Data Quality Framework
Wang & Strong's DQ framework, one of the most comprehensive, popular, remarkable and cited DQ frameworks, was established by Richard Wang and Diana Strong in 1996. Their framework was designed empirically by asking users to give their viewpoints about the relevance of the IQ dimensions to capture the most important aspects of DQ to the data consumer.
In their framework, Wang and Strong classified quality dimensions into four groups:
- Intrinsic DQ: refers to the quality dimensions originating from the data on its own. This aspect of quality is independent of the user's perspective and context.
- Contextual DQ: focuses on the aspect of IQ within the context of the task at hand. In this group, the quality dimensions are subjective preferences of the user. Contrary to the first group, DQ dimensions cannot be assessed without considering the user's viewpoint about their use of provided information.
- Representational DQ: is related to the representation of information within the systems.
- Accessibility DQ: refers to the quality aspects concerned with accessing distributed information.
The defining feature of this particular study is that quality attributes of data were collected from the data consumer instead of being defined theoretically or being based on the researchers' own experiences. Their research can provide a basis for measuring DQ/IQ along the dimensions of this framework.
In this part of the literature review we shed some light on the use of the terms "data quality" and "information quality", we also discussed the concept of IQ within ISs and considered how it could be defined. We also gave a historical review of the researches conducted to formulate (data/information) quality frameworks within ISs, focusing on Wang and Strong's DQ framework which will provide a good basis for this research to measure IQ in eLearningsystems along the dimensions of this framework.
However, this research will also investigate the possibility of integrating a web-mining approach, a data gathering technique, in order to automate the evaluation process. It seems logical, therefore, that the available methods for web-mining and information extraction are now reviewed. These will be discussed in the next section.
Information extraction and Web-Mining
This study focuses not only on the evaluation of IQ in the context of eLearning systems, but also it will investigate the possibility of integrating a web-mining approach, a data extraction technique, in order to automate the evaluation process. This part of the literature review will provide a brief overview of the information represented on the web. It will also focus on web-mining definitions and categories, and the idea of information extraction.
Information on the Web
Today, the web is becoming more popular and interactive information publishing mediums and the levels of web information are growing rapidly. Moreover, the web holds a huge amount of distributed information for news, education, government, e-commerce and various other information services. Also, the web contains a rich and dynamic collection of hyperlink information and webpage access and usage information, all of which raise an information overload issue. In fact, today web users can access vast amounts of information, however, it becomes ever more difficult to weed out the irrelevant and discover the relevant information which has drawn attention to a fundamental issue: information overload.
The nature of web information is unstructured, thus it can only be understood by humans, but the massive amount of available information means that it can only be processed efficiently by machines. A lack of metadata, data about data, represents another challenge when dealing with the published information.
To be able to cope with these challenges researchers started to apply techniques from data-mining and machine learning to web data and documents. Web-mining applications help users in finding, sorting and filtering the available information, while the Semantic Web aims to make the data machine understandable as well.
Extracting useful or valuable information from the web is usually referred to as "web-mining". It refers to the application of data-mining methods for the discovery of useful information on the web.
In the literature, several definitions exist relating to web-mining. It could be generally defined as the automated discovery and analysis of useful information published in web documents and services using data-mining methods. It is a large and new area converging from several research districts, such as database, information extraction and artificial intelligence. Web-mining techniques could be used to solve the information overload problem.
There are three categories for web-mining according to the different sources of the target data:
- Web-content mining: which addresses the discovery of knowledge from the content of web pages, thus, it includes the target data contained in a web page as text, images, multimedia, etc.
- Web-usage mining: which addresses the discovery of knowledge from user navigation data while surfing the web, thus, this includes the target data contained in users' log files.
- Web-structure mining: which addresses the discovery of knowledge from hyperlinks on the web.
The focus in this research will be on web-content mining as a technique to automate the extraction process of the information needed in the quality measurement.
Web-Mining and Information Extraction (IE)
Natural language (NL) texts are used mostly as digital information storage mediums. The main goal of information extraction (IE) is to find the required information in NL texts and store this information in a way that is suitable for automatic querying and processing. IE involves defining output representations or templates and searching only for information that fits the defined representations.
Within this section of the literature review a brief idea of information representation on the web was provided. It also shed some light on the web-mining definition and considered the categories of web-mining, finally, the idea of information extraction was noted.
ConclusionThe literature review provided a general background to the subject of eLearning, including the definitions, types and the concepts of quality, IQ within ISs, and web-mining as an information extracting technique. The literature offered here mainly focused on the sub-topics of the larger research areas which will be directly applicable to this research.
- Paulsen, M.F., (2002) Online education systems: Discussion and definition of terms. NKI Distance Education,
- Romiszowski, A., (2004) How's the e-learning baby? Factors leading to success or failure of an educational technology innovation. Educational Technology. 44(1): p. 5-27
- Gerhard, J. and Mayr, P.(2002) Competing in the e-learning environment--strategies for universities. Proceedings of the 35th Annual Hawaii International Conference on System Sciences (HICSS'02).
- Psaromiligkos, Y. and Retalis, S., (2003) Re-evaluating the effectiveness of a web-based learning system: A comparative case study. Journal of educational multimedia and hypermedia. 12: p. 5-20
- NAIDU, S. (2006) "E-learning: A guidebook of principles, procedures and practices". 2nd edition, Commonwealth Educational Media Center for Asia(CEMCA).
- European Commission. (2001) "The elearning action plan designing tomorrow's education". 2001 [cited 2007 04 October ]; Available from: http://www.europa.eu/eur-lex/en/com/cnc/2001/com2001_0172en01.pdf.
- Holmes, B. and Gardner, J.R., (2006) E-learning: Concepts and Practice 1st edition, Sage Publications Ltd.xfirst edition.
- Rosen, A.,(2009) E-learning 2.0: Proven practices and emerging technologies to achieve real results, AMACOM (American Management Association), New York.
- Doherty, P.(1998). Learner control in asynchronous learning environments. ALN Magazine. Vol. 2.
- Spencer, D. and Hiltz, S.R. (2001) Studies of ALN: An empirical assessment. Proceedings of the 34th Hawaii International Conference on System Sciences( HICSS-34).
- Alex, D., Michael, C., and Peter, A., (1994) Risk management for software projects. 2nd edition McGraw-Hill Book Company, Berkshire, England.
- Crisp, G. (2002) A model for the implementation and sustainability of a course management system in a research university. Proceedings of the 19th Annual Conference of the Australasian Society for Computers in Learning in Tertiary Education.Auckland, New Zealand.
- Ehlers, U.-D., Goertz, L., Hildebrandt, B., and Pawlowski, J.M.(2005) Quality in e-learning use and dissemination of quality approaches in european e-learning Report By the European Quality Observatory
- Close, R.C., Humphreys, R., and Ruttenbur, B.W. (2000) E-learning & technology: Technology & the internet are changing the way we learn Report By Sun Trust Equitable Securities
- Henry, P., ( 2001.) E-learning technology, content and services. Education & Training. 43: p. 249-255
- Stracke, C.M., (2006) Quality standards for quality development in e-learning: Adoption, implementation and adaptation of iso/iec 19796-1. Q.E.D. - The Quality Initiative E-Learning in Germany. The National Project for Quality in e-Learning,
- Conner, K. and Conner, M. (2006) "Ageless learner". [cited 2010 8 April]; Available from: http://agelesslearner.com/intros/elearning.html.
- Webber, C., Pesty, S., and Balacheff, N. A (2002) multi-agent and emergent approach to learner modelling. Proceedings of the 15th Eureopean Conference on Artificial Intelligence (ECAI).pp.98-102.
- Heift, T. and Nicholson, D., (2001) Web delivery of adaptive and interactive language tutoring International Journal of Artificial Intelligence in Education. 12: p. 310-324
- KOBSA, A., (2001) Generic user modeling systems. User Modelling and User-Adapted Interaction Journal. 11: p. 49-63
- Esposito, F., Licchelli, O., and Semeraro, G., (2004) Discovering student models in e-learning systems. Journal of Universal Computer Science. 10(1): p. 47-57
- Ong, J. and Ramachandran, S.(2000) "Intelligent tutoring systems: The what and the how". [cited 2007 30/October ]; http://www.learningcircuits.org/2000/feb2000/ong.htm.
- Santally, M.I. and Senteni, A., (2005) Adaptation models for personalisation in web-based learning environments. Malaysian Online Journal of Instructional Technology. 2(1)
- Tang, T.Y. and McCalla, G. (2003) Towards pedagogy-oriented paper recommendations and adaptive annotations for a web-based learning system. 18th International Joint Conference On Artificial Intelligence.Acapulco, Mexico.
- Conati, C. and VanLehn, K. Providing (2001) adaptive support to the understanding of instructional material Proceedings of the 6th international conference on Intelligent user interfaces Santa Fe, New Mexico, United States.
- Papanikolaou, K.A., Grigoriadou, M., Kornilakis, H., and Magoulas, G.D., (2003) Personalizing the interaction in a web-based educational hypermedia system: The case of inspire User Modeling and User-Adapted Interaction. 13(3): p. 213-267
- Mittal, A., Krishnan, P.V., and Altman, E., (2006) Content classification and context-based retrieval system for e-learning. Educational Technology & Society. 9(1): p. 349-358
- Richter, G., "On the relationship between information and data" (1976) , "Data base systems". , Springer Berlin / Heidelberg. p. 21-43.
- Alan F Karr, A.P.S., David L Banks (2006) Data quality: A statistical perspective. Statistical Methodology 3:p. 137-173
- RY, W. (1998). A product perspective on total data quality management. Communication of the ACM. Vol. 41. no. 2. pp. 58-65
- Knight, S.-a., (2008) User perception of information quality in world wid web information retrieval behaviour, PhD Thesis, Cowan University
- Juran., J., (1974) The quality control handbook. McGraw-Hill, New York, 3rd edition.
- Wang, R.Y. and Strong, D.M., (1996) Beyond accuracy: What data quality means to data consumers. Journal of Management Information Systems. 12(4): p. 5
- Strong, D.M., Lee, Y.W., and Wang, R.Y.(1997). Data quality in context. Communications of the ACM. Vol. 40. no. 5. pp. 103-110
- Shankar, G. and Watts, S. (2003) A relevant, believable approach for data quality assessment. Proceedings of 8th international conference on information quality.pp.178-189 MIT.
- Klein, B.D., (2001) User perceptions of data quality:Internet and traditional text. Journal of Computer Information Systems. 4(41): p. 9-18
- Aladwani, A.M. and Palvia, P.C., (2002) Developing and validating an instrument for measuring user-perceived web quality. Information & Management. 39(6): p. 467-476
- Buyukozkan, G., Ruan, D., and Feyzioglu, O., (2007) Evaluating e-learning web site quality in a fuzzy environment. International Journal of Intelligent Systems. 22(5): p. 567-586
- Porter, M., (1991) Towards a dynamic theory of strategy. Strategic Management Journal. 12(1): p. 954-1117
- Elpez, I. and Fink, D., (2006) Information systems success in the public sector: Stakeholders' perspectives and emerging alignment model. Issues in Informing Science and Information Technology. 3
- Delone, w.H. and McLean, E.R., (1992) Information systems success: The quest for the dependent variable. Information Systems Research. 3(1): p. 60-95
- Delone, w.H. and McLean, E.R., (2003) The delone and mclean model for information systems success:A ten year update. Journal of Managment Information Systems. 19(2): p. 9-30
- Lin, S., Gao, J., Koronios, A., and Chanana, V., (2007) Developing a data quality framework for asset management in engineering organisations. International Journal of Information Quality. 1(1): p. 100 - 126
- Katerattanakul, P. and Siau, K. (1999) Measuring information quality of web sites: Development of an instrument. Proceedings of 20th international conference on Information Systems.pp.279-285 Charlotte, North Carolina, USA.
- Klein, B.D. (2002) When do users detect information quality problems on the world wide web? Proceedings of 8th American Conference in Information Systems.pp.1101-1103
- Knight, S.a. and Burn, J., (2005) Developing a framework for assessing information quality on the world wide web. Informing Science Journal. 8
- Gertz, M. (1996) Managing data quality and integrity in federated databases. Proceedings of 2nd Working Conference on Integrity and Internal Control in Information Systems.pp.211-230
- Redman, T. (1996) Data quality for the information age. 2nd edition Artech House, London.
- Zeist, R.H.J. and Hendriks, P.R.H., (1996) Specifying software quality with the extended iso model. Software Quality Journal. 5(4): p. 273-284
- Jarke, M. and Vassiliou, Y. (1997) Data warehouse quality: A review of the dwq project. Proceedings of 2nd Intl. Conf. on Information Quality pp.98--112 Cambridge, Mass.
- Chen, Y., Zhu, Q., and Wang, N., (1998) Query processing with quality control in the world wide web. World Wide Web Journal. 1(4): p. 241-255
- Alexander, J.E. and Tate, M.A. (1999) Web wisdom: How to evaluate and create information quality on the web 4th edition, Lawrence Erlbaum Associates, New Jersey
- Shanks, G. and Corbitt, B. (1999) Understanding data quality: Social and cultural aspects. Proceedings of the 10th Australasian Conference on Information Systems.
- Dedeke, A. (2000) A conceptual framework for developing quality measures for information systems. Proceeding of 5th International Conference on information Quality.pp.126-128
- Zhu, X. and Gauch, S. (2000) Incorporating quality metrics in centralized/ distributed information retrieval on world wide web. Proceedings of 23 international ACM SIGIR conference on research and development in information retrieval. Athens, Greece.
- Leung, H.K.N., (2001) Quality metrics for intranet applications. Information & Management. 38(3): p. 137 - 152.
- Eppler, M. and Muenzenmayer, P. (2002) Measuring information quality in the web context: A survey of state-of-the-art instruments and an application methodology. Proceedings of International Conference on Information Quality.pp.187-196.
- Kahn, K., Strong, D., and Wang, R., (2002) Information quality benchmarks: Product and service performance. Communications of the ACM. 45(4): p. 184 - 193
- Mecella, M., Scannapieco, M., Virgillito, A., Baldoni, R., Catarci, T., and Batini, C. (2002) Managing data quality in cooperative information systems. Proceedings of the Confederated International Conferences DOA, CoopIS and ODBASE.pp.486-502
- Liu, X.W. and Han, S.L., (2005) Ranking fuzzy numbers with preference weighting function expectations. Computers & Mathematics with Applications. 49(11-12): p. 1731-1753
- Besiki, S., Gasser, L., Twidale, M.B., and Smith, L.C., (2007) A framework for information quality assessment. Journal of the American Society for Information Science and Technology 58(12): p. 1720-1733
- Kimberly, K., Pankaj, M., and John, W., (2009) Do you know your iq?: A research agenda for information quality in systems. Sigmetrics Perform. Eval. Rev. 37(3): p. 26-31
- Kargar, M.J. and Azimzadeh, F., (2009) A framework for ranking quality of information on weblog. World Academy of Science, Engineering and Technology. 56: p. 690-695
- Thi, T.T.P. and Helfert, M. (2009) "An information system quality framework based on information system architectures ", "Information systems development: Challenges in practice, theory, and education". 1st edition., Springer US. p. 337-350.
- Han, J. and Kamber, M. (2006) Data mining concepts and techniques. 2nd edition. Morgan Kaufmann, Fransisco, USA.2006
- Maes, P. (1994) Agent that reduce work and information overload. Communication of the ACM. Vol. 37. no. 7. pp. 30-40
- Zhou, X., Li, Y., Bruza, P., Wu, S.T., and Y. Xu. (2007) Using information filtering in web data mining process. Proceedings of ACM International Conference on Web Intelligence.pp.163-169 Silicon Valley, California, USA.
- Stumme, G., Hotho, A., and Berendt, B., (2006) Semantic web mining: State of the art and future directions. Web Semantics: Science, Services and Agents on the World Wide Web. 4(2): p. 124-143
- Finding and evaluating information on the web [online guide] Leeds Metropolitan University [cited at 15 April 2010]; libraryonline.leedsmet.ac.uk/lco/publications/pdf/web/qg-46.pdf
- Sivaramakrishnan, J. And Balakrishnan, V., (2009) Web mining functions in an academic search application. Informatica Economica Journal. 13(3): p. 132-139
- Kosala, R. and Blockeel, H. (2000). Web mining research: A survey. ACM SIGKDD Explorations,. Vol. 2. no. pp. 1-15
- Lappas, G., (2008) An overview of web mining in societal benefit areas. Online Information Review. 32(2): p. 179-195
- Cooley, R., Srivastava, J., and Mobasher, B. (1997) Web mining: Information and pattern discovery on the world wide web volume(1) pp.12-23
- Raymond, K. and Hendrik, B. (2000) Web mining research: A survey ACM volume(2) issue (1) pp.1-15
- Siefkes, C. (2003) Learning to extract information for the semantic web. Proceedings of Berliner XML Tage 2003.pp.452-459 Berlin.