This dissertation has been submitted by a student. This is not an example of the work written by our professional dissertation writers.
1. Literature Review
The last 20 years, the evolution of personal computers has been rapid. Respectively, the advancements in software and hardware have been huge and inevitably, the e-learning sector was influenced as well, with several tools being developed with ever growing capabilities, from plain delivering of text, to audio/photo/video management. 
But what really e-learning is and why such importance is given to it? Electronic learning (or e-Learning) can be defined as the process of educating or offering knowledge via electronic means. Many researchers go a little bit further, like Nichols (2008) for example, who perceives e-learning as "pedagogy empowered by digital technology". This aspect assigns to e-learning an even greater importance, which is probably valid when considering the different ways that e-learning is applied. In most cases there is no face to face interaction between the trainer and the learner and the communication is based solely on electronic means like computers, videos, web sites, virtual reality environments, etc. In other cases e-learning is used as an addition to conventional learning, in order the latter to be enriched (blended learning); in these cases the aforementioned means are combined with or are added to, traditional techniques (classroom interaction) in order the desired results to be achieved. Finally there is an intermediate application as well, where, although there is no face to face interaction in a daily basis, certain meetings are organised from time to time between the learners and the trainer in order the learning process to be improved. 
Despite the fact there has been an explosion of computer-based multimedia applications in education in recent years (Gerlic and Jaušovec 1999), the success of e-learning applications has always been debatable. Plenty of researchers have studied the subject, with many of them (e.g. Kazmerski and Blasko - 1999, Kulik and Kulik - 1991, Steyn, du Toit and Lachmann - 1999) stressing out the advantages of e-learning against conventional learning methods. Others though, tend to think differently, believing that e-learning systems can prove to be deficient or simply not superior that the conventional ones (Merchant, Kreie, Cronan 2001) , , , , , .
Despite this debate however, there is a growing trend towards e-learning processes and implementations, with a basic reason being the continuously growing technology possibilities available. Moreover, e-learning presents certain benefits which have raised its popularity these past years:
- Education is made possible for people who may not have the time -like parents or professionals- or the money to attend a classroom. Time and distance are no longer an issue and learning can be offered globally even to people with disabilities.
- Higher quality of education can be offered to anyone, since highly skilled professors can offer their services electronically, to any individual anywhere in the world, with much lower costs. The costs' issue comprises a general advantage of e-learning since it applies in any of its forms; for example enterprises are now able to offer training services to their employees and partners with minimal costs and without the need to organise trips for seminars or relevant activities.
- The learner can "attend" learning sessions anytime, anywhere, at his/her own pace. In other words each learner can adjust the course according to his/her own needs, experience and free time.
- The learner participates energetically in the learning process, something that doesn't always apply in a conventional classroom, since in many cases the learners do not pay any attention to the trainer. In e-learning environments such behaviours do not exist since the course process cannot proceed without the energetic participation of the learner.
- The subsequent reduction of paper usage due to the electronic character of the courses can be highly beneficial for the environment.
- Even the disadvantage of lacking personal interaction, can nowadays be overcome in a great degree, via several services like forums, e-mails, chats, teleconferences, etc. , , , 
On the other hand, many tend to believe that the human interaction in a classroom during the learning process is irreplaceable by any other learning type and consequently, e-learning. Jean Barbazette, president of The Training Clinic in Seal Beach, California, believes that "Some things still can't be taught online" and that "For interpersonal skills, classroom learning usually works better". The classroom offers immediate feedback from instructors and co-learners, which is crucial to the learning process Rebecca Aronauer supports. Nevertheless, many researchers support that should e-learning is implemented correctly, it can be as efficient as conventional learning (Zhao et al 2005). Moreover, Rovai (2002) supports that there are no significant differences concerning the experiences of students learning on-line or in a classroom, and that the sense of being part of a team can be effectively simulated when the electronic course is designed appropriately. Indeed, nowadays computers are considered to be of great importance by most university students (Gunn et al 2003). , , , 
188.8.131.52. In General
In order e-learning to be implemented, certain tools are needed which vary significantly one from the other and each of them undertake a certain part of the whole process. These tools can be divided in the following categories based on the process part that they serve:
Hardware and networks
E-learning cannot be implemented unless the relevant infrastructure exists; this can include computers, visual and audio devices like web cams, microphones and speakers, web servers, media servers, etc. This hardware has to communicate remotely with respective equipment on the trainer's side, an issue which is addressed via networks like intranets, extranets, VPNs, or the World Wide Web. , 
Tools to access knowledge
These are the tools with the help of which the user can gain access to e-learning material; the main are web browsers like Internet Explorer, Mozilla Firefox, Opera and Safari and media players, like Windows Media Player, QuickTime, Winamp and VLC Player. Since the Internet is the main tool to interconnect the trainer and the learner, web browsers are basic tools of the e-learning process and without them, the Internet becomes inaccessible. Media players are quite important as well, since through them the user can access visual and audio material on which, the success of the electronic course is based many times.
Tools to offer knowledge
In order e-learning material to become accessible to the learner certain tools are needed, which are called Learning Management Systems (LMSs). Their main duty is to provide the platform which will offer the learning content over a network. An LMS is a piece of software that enables as to plan, deliver, and manage the learning process and it can be found in different versions with different functionalities; it can be used just to keep records on the courses that the learners are interested in or to offer complete online learning sessions along with online interaction tools via which, the learners can communicate and enrich the whole process. It can be web-based or not, but in most cases, it is web-based. , , , , , 
Tools to create the e-learning content
The basic tools for creating e-learning material are called Learning Content Management Systems (LCMSs) and are responsible for authoring and managing e-learning content. In other words, these systems are used for creating and exploiting the learning content which will be later delivered via an LMS. The main advantage of LCMSs -in contrast to LMSs- is that they offer the possibility to a programmer to develop, export, import, manage or search for content that can be reused by other programmers in different projects, keeping in parallel history data and versions' data; this content may include text, graphics, media files, etc. In LMSs, courses cannot be developed and managed, and learning objects (small pieces of learning content) cannot be reused in other courses. Nevertheless, it should be noted that many confuse these two terms and often refer to both by using the term LMS; this is wrong though, since as it is evident from the above, an LCMS can be considered as a development of an LMS and offers different possibilities. It is true however, that many times the functionalities of an LCMS overlap those of an LMS.
Tools for human interaction
In order the classroom "feeling" to be simulated effectively in an e-learning environment, various tools can be utilised. Despite the fact that these tools were not developed initially for this specific purpose, when combined can enrich greatly the learning process. These tools can be of two types depending on the presence of the individuals or parties that communicate: synchronous and asynchronous. Synchronous tools enable individuals or parties to communicate in "real time", when asynchronous don't. Asynchronous tools include e-mail services like Gmail, Yahoo or Hotmail, Blogs, Fora, etc. On the other hand, synchronous include chat clients like GoogleTalk or MSN, VoIP/ teleconference tools like Skype or WowPow, media players like VLC, WinAMP or Media Player, etc. Media players can be used of course, as asynchronous tool as well. 
184.108.40.206. Popular Tools
All the above functionalities -besides those concerning hardware of course- are successfully incorporated in most modern LM/LCM Systems. There is a great variety of such tools from various vendors, but the most popular among the educational community these days seem to be Moodle and JoomlaLMS.
Moodle is one of the most popular tools of its kind due to the fact that it is a free and open source LCMS for creating dynamic environments for educational purposes and despite being free, it is considered to be highly efficient, since its modular design allows developers to add desired functionalities and in essence tailor it on their needs. Moreover many additional third party plug-ins are available for free, which enhance even more its modular character. The main programming language used for developing new modules is PHP, a fact that assigns an important advantage; Moodle can run on different platforms (Windows, Linux, Unix, Mac OS, etc) without any modifications being needed, as soon as PHP is supported. 
JoomlaLMS emerged from the extremely popular web content management platform Joomla, and like its parent application, is based on PHP programming language and MySQL database system. The basic Joomla characteristics like modularity, extensions and templates are still there, as well as in the aforementioned Moodle application. The difference however is that JoomlaLMS is not independent -needs Joomla to function- and additionally, it is not a free software package. , 
As we saw in the previous paragraphs the main tools to create and offer knowledge are LMSs and LCMSs; the former hosts content which is created on the latter. Additionally, the main advantage of LCMSs is their ability to create reusable learning objects. Obviously, reusing an object offers many advantages with the main being the time that is saved; the developers can use on their projects already developed pieces of content, independently of their project's nature and special demands. Nevertheless, creating learning objects on a certain platform doesn't mean that it can be used efficiently on any other platform that some other developer may use. This is where standards come in with their main goals being:
- Interoperability. The learning objects must be able to be incorporated efficiently in any course designed and delivered in any platform. Standards ensure that each object is designed and developed following certain guidelines which in turn guide to its effective utilisation on any project. For example, activities like moving a course from one LMS server to another, reusing content on different LMSs, searching for learning content across different LMS environments, etc, cannot be performed without following certain standards.
- Content exchange. The learning content in not exchanged just locally, but globally as well. In order the global exchange of content to be efficient, setting respective guidelines is a crucial process.
- Performance. Common specifications can ensure that objects design is such that the best possible system performance based on the current hardware and software possibilities is reached. The changes in technology are balanced by the dynamic character of standards that constantly evolve depending on the current circumstances.
- Rights protection. E-learning content is developed by people or organisations that put much effort, resources and time on this process, and subsequently this effort has to be protected; a means towards this direction is the adoption of standards which set corresponding guidelines which can help to protect the developed content by unauthorised usage.
Until 1999, no e-learning standards had been used and the first development attempts gave results in 2000. Since then, several organisations have been developing e-learning standards for different purposes and some of them are:
- Airline Industry CBT Committee - AICC (airline training)
- EDUCAUSE Institutional Management System Project (IMS) Vendor group working to build standards for e-learning based on work of AICC
- Advanced Distributed Learning (ADL) US Federal government initiative Development of Sharable Content Object Reference Model (SCORM)
- Allince of Remote Institutiopnal Authoring and Distribution Network for Europe (ARIADNE)
- An industry association focusing on e-learning standards issues (ariadne.unil.ch) IEEE Learning Technology Standards Committee (IEEE LTSC) Accredits the standards for the US that emerge from the other groups (ltsc.ieee.org)
- ISO/IEC JTC1 SC36 (ITLET) IT for Learning, Education and Training
- Advanced Learning Infrastructure Consortium (ALIC) Japanese Consortium for promotion of e-leaning technology and infrastructure
- e-Learning Consortium Japan (eLC) Vendor/User company working to promote e-learning business and technology
In the picture below the interconnection between the various standards is presented.
1.1.4. Role of Navigation
Navigation is one of the most important elements of an e-learning course, since courses characterised by problematic navigation not only -most of the times- are abandoned by their users, but even when this is not the case, the efficiency of the course is significantly reduced.
At the dawn of the e-learning era, the navigation schemes were very simple and mostly linear, meaning that the trainee could just move from page to page, forwards and backwards. Nowadays, the recent hardware and software developments, along with the subsequent developments in e-learning systems, offer to us the possibility to create complex navigation schemes, with simultaneous and parallel access to different parts of the course, which can be comprised by texts, images, audio, video or combinations of these, meaning multimedia. Nevertheless, besides the obvious advantages, this modern navigation approach presents some serious drawbacks as well. The basic are the following:
- The "serendipity effect". The aforementioned free access to any part of the course, may "guide" the learner to focus on irrelevant or insignificant elements.
- The "lost in hyperspace" phenomenon. The above apply for this one as well, since the learner, due to the overflow of information in different formats, fails to concentrate on the important parts of the course and furthermore fails to identify where exactly he/she is "located" in the course "map" and subsequently, what exactly he/she was originally searching for.
- Cognitive overload. In e-learning courses the student, besides pure learning content, is being occupied with other things as well, like the way to navigate through the content and the adequate configuration of the course from a software/hardware point of view.
So, in order the learner to remain focused on the learning content and be distracted as less as possible by irrelevant elements, Holzinger (2000) proposes several mechanisms, like indexes, site maps, guided tours, bookmarks, "fish-eye" views, etc. Nevertheless, such mechanisms don't prove to be enough in all cases and additional actions are often needed; these actions are defined by several standards with the most important of these being SCORM. , , , , , 
1.2. The SCORM standard
As we saw in section 1.1.3, standards are a fundamental element of the e-learning organisation globally. A quite serious effort on the subject has been conducted by the ADL (Advanced Distributing Learning) initiative -established by the White House Office of Science and Technology Policy (OSTP) and the US Department of Defence (DoD)- and is called SCORM.
SCORM was based on previous efforts by several organisations -like the aforementioned in paragraph 1.1.4- with the main being:
- IEEE Data Model For Content Object Communication
- IEEE ECMAScript Application Programming Interface for Content to Runtime Services Communication
- IEEE Learning Object Metadata (LOM)
- IEEE Extensible Markup Language (XML) Schema Binding for Learning Object Metadata Data Model
- IMS Content Packaging
- IMS Simple Sequencing
SCORM stands for Sharable Content Object Reference Model and has been developed in order to "foster the creation of reusable learning content as "instructional objects" within a common technical framework for computer-based and Web-based learning. SCORM describes that technical framework by providing a harmonized set of guidelines, specifications and standards based on the work of several distinct e-learning specifications and standards bodies". 
With the implementation of SCORM, the ADL Initiative aims to "accelerate large-scale development of dynamic and cost-effective learning software and systems and to stimulate the market for these products". 
SCORM's basic idea is that, the learning content, meaning courses, modules, etc, can be obtained by aggregating reusable content objects. These objects can be used repeatedly in any platform, without restrictions. This uniformity is achieved by certain rules and guidelines defined in SCORM. A SCORM compliant LMS can identify the organisation of the content without needing information regarding sequencing and navigation, since these subjects are taken care by SCORM, provided that the course is SCORM compliant. So, the content objects can be reused in other environments.
In order an e-learning environment to be SCORM compliant, it has to fulfil certain general requirements set by the ADL Initiative, which are incorporated in SCORM. These requirements are called "ilities" and are the following:
- Accessibility: Instructional components must be able to be accessed and transferred between remote locations.
- Adaptability: Instructions must be developed based on individual and organizational needs.
- Affordability: Instructions delivery, must be related to increased efficiency and productivity and reduced time and costs.
- Durability: Technology evolution must not charge with design, configuration or coding changes.
- Interoperability: Instructional components must be compatible to any tools or platforms.
- Reusability: Instructional components must be able to be used in multiple applications and contexts.
Additionally, due to
- the rapid expansion of web-based technologies and infrastructures,
- the lack of wide-spread web-based learning technology standards,
- and the convenience on delivering web-based content using nearly any medium,
SCORM assumes that the implemented e-learning environments are web-based.
This blending of the "ilities" with the web-based character of the learning applications, offers the following abilities:
- The ability of a Web-based LMS to launch content that is authored using tools from different vendors and to exchange data with that content.
- The ability of Web-based LMS products from different vendors to launch the same content and exchange data with that content during execution.
- The ability of multiple Web-based LMS products/environments to access a common repository of executable content and to launch such content.
Naturally, the above mentioned requirements have a general character and are not the only ones incorporated in SCORM. There exist a large number of guidelines and specifications. In order these to be efficiently exploited, SCORM is divided in three technical books, with each one of them referring to a certain subject. These subjects are: the Content Aggregation Model (CAM), the Run-Time Environment (RTE) and Sequencing and Navigation (SN).
1.2.2. The Content Aggregation Model (CAM)
The first book of SCORM (CAM) provides descriptions of the content objects, which -when aggregated- comprise a course, module, etc, as well as ways to package these objects so as interoperability between several platforms to be achieved. Additionally, it proposes ways to describe these objects via metadata so as these to be easily searched and discovered and additionally, ways to define sequencing rules. The objects are organised together so as to produce content packages, meaning courses, lessons, modules, etc.
A Content Package connects and organises content objects or aggregations of content objects. A SCORM Content Package may represent a course, a lesson, a module or may simply be a collection of related content objects.
This process of creating, discovering, aggregating and organising small content pieces into more complex learning entities and moreover defining sequencing rules on how these are going to be accessed by the learner, consists of the following:
v Content Model
It refers to the components of a content package and how these are organised to create it. It consists of the following elements: Assets, SCOs (Sharable Content Objects), Learning Activities, Content Organization and Content Aggregation.
The Assets comprise the main building parts of any learning resource and can be described as electronic representations of any kind data that can be delivered to the user via a web browser (texts, images, videos, sound, etc.).
A SCO can be described as a single learning resource that can be launched by LMSs via the SCORM RTE. It can be produced by aggregations either of single assets or by connecting sets of assets, which in turn consist of multiple single assets. The SCOs comprise the lowest level of data that can communicate with LMSs, with this characteristic comprising their main difference versus assets or sets of assets.
The Content Organizations, are collections of SCOs and represent the ways that the learning content should be used by the learner; this can be accomplished by utilising meaningful units of instruction, the Activities.
Finally, the Content Aggregation is used to describe the process of creating sets of objects with related content in terms of functionality, so as these sets to be delivered to the learner during the learning experience.
v Content Packaging
Content Packaging is a process with main objective to ensure that the aggregated content will be able to operate on different platforms. A Content Package represents a unit of learning, meaning that it contains all the data needed so as the learning content to be processed by the LMS and delivered to the learner. It consists of two basic components; the so-called Manifest and the physical files that comprise the content. The Manifest is an XML file which holds data regarding the package's organisation and the included corresponding resources; consists of 4 main components, 2 mandatory (Organisations and Resources) and 2 optional (Metadata and Sub-Manifests). The "Metadata" provide general information about the package, i.e. title, description, etc, the "Organisations" hold the organisation (structure) of the package's resources, the "Resources" contain resources' data when the Sub-Manifests describe any stand-alone instruction units.
Metadata hold descriptive information of the content object, i.e. its properties.
v Sequencing and Navigation
These information provide definitions of rules' models which set the sequence and ordering of the content that is delivered to the learner.
1.2.3. The Run-Time Environment (RTE)
RTE describes the requirements to which LMSs should conform in order interoperability between different platforms to be achieved, independently of the tools used in developing the content. In other words it defines how an LMS launches content objects, how it communicates with these at runtime and what data are exchanged during execution, so as interoperability to be accomplished.
These three activities are served by three respective components:
- Launch: Launch, describes how the SCORM compliant content will be delivered to the learner via the LMS.
- API: Each object may communicate with the LMS via a defined set of methods; this set is called API. Subsequently, a SCORM compliant LMS must be able to support the SCORM API, in order the objects to be compatible to it.
- Data Model: The Data Model defines standardised types of data which are used to deliver the learning information to the learner.
1.2.4. Sequencing and Navigation (SN)
The Sequencing and Navigation (SN) book of SCORM focuses on defining ways so as the learning content to be offered to the learner efficiently, in an adequate order. In order this to be accomplished and the sequencing information to be processed at run-time, a SCORM compliant LMS must incorporate certain elements and functionalities, which are defined in this book as well. The sequencing information essentially refers, to what learning activity is to be delivered next to the learner; each learning activity is associated with a content object. How these objects are launched by the LMS, is described in the RTE book.
As it was mentioned in a previous paragraph, the content package holds information regarding the organization of resources, which however do not include information regarding the way that the learning content is going to be delivered to the user, meaning sequencing and order information, or which parts of the content will be accessible to the user and when; these information are held by the aforementioned, manifest file.
Towards this goal, SCORM has adopted sets of specifications originally developed by IMS, which provide ways for the sequencing information to be incorporated in the learning process.
Some fundamental concepts in these specifications are the Learning Activity, the Activity Tree, the Activity Cluster, the Attempt, the Learning Objectives, the Sequencing Rules and the Rollup Rules.
A loose definition of the Learning Activity is that it is "a meaningful unit of instruction"; in other words it is an action of the learner as he/she goes through the course. It can be an autonomous learning unit or may be comprise by several of sub-activities; sub-activities in turn, may consist of 2nd-level sub-activities and these in turn by 3rd-level, and so on. The activities and the users experiencing them can be associated with a tracking status. Each user can execute a predefined number of a certain activity or he/she may be free to execute it as many times desired. Activities may be suspended, abandoned, exited normally etc., nevertheless all of them must remain within the context of the parent activity.
An Activity Tree is a tree holding nodes with each node being associated to an activity and storing the sequencing information. The LMS goes through the activity tree and identifies which is the next learning activity to be delivered to the learner. Generally, the sequencing information is those that determine the activities' order; in case that there is no such information, those contained in the manifest file are followed.
An Activity Cluster can be defined as a group of activities containing a parent activity and its 1st-level children (sub-activities) and its main role is to help developers to organize sequencing in a more efficient way. Whatever rules apply on the child activity, these rules apply on the parent activity as well.
Each time the user tries to execute an activity he/she is making an Attempt. If this activity is a child of a parent activity, which in turn is the child of another parent and so on, then the attempt reflects to all activities throughout the whole tree.
A Learning Activity can be associated with one or more Learning Objectives and SCORM provides full freedom in associating activities to objectives. Nevertheless, the meaning multiple objectives cannot be assumed by SCORM and status information of an activity's objective is held locally to that activity. Status information sharing cannot be accomplished unless the objectives have a global character; status information of global objectives is available for sharing among several activities, either within a single Activity Tree or across multiple trees. There two restrictions however:
- A local objective can obtain ("read") objective status information only from one shared global objective.
- When, for a certain activity, a set of local objectives is defined, no two local objectives can set ("write") status information to the same shared global objective.
The Sequencing Rules are applied to an activity and evaluated -by using tracking information associated with the activity- at specified times during different sequencing cases, in other words, different learning cases. Each rule consists of a set of conditions and a relevant action. The rule is applied only when the status of the set of conditions is "True".
The Rollup Rules are used for evaluating the progress of the learner for cluster activities. Due to the fact that the cluster activities have no association with the content objects, information regarding the user's progress, cannot be applied directly to a cluster activity. A set of zero or more rules may be applied and the evaluation process takes place during a process called Rollup; this process uses the status data of children activities in order to evaluate the status information of the corresponding cluster. Each rule of this type, consists of a set of child activities, a set of conditions which are evaluated based on the status data of these child activities and a, relevant to these conditions, action, which is executed when the conditions' status is set to "True". 
1.3. Data Mining
Data mining has attracted great attention the last decades with the main reason being that it offers the possibility to extract useful information by huge amounts of data, which in turn can be used for decision making in various fields like research activities, engineering, marketing, business management, etc.
The last 30 years (1980-onwards), information technology has made gigantic steps forward and the evolution -in hardware and software as well- has been so rapid, that the available data processing capabilities have reached astronomical levels. Subsequently, the quantities of data collected are correspondingly huge. Evidentially, according to a research conducted by P. Lyman and H. R. Varian, "the new stored information grew about 30% a year between 1999 and 2002".
Obviously, the analysis and effective exploitation of these data quantities although not a simple task is yet an essential one since, unless extracting valuable information by data, these data are practically useless.
A solution to this problem is given by Data Mining, which according to G. Karypis can be defined as "Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns & rules."; when according to F. Castro et al and P. Chapman et al, Data Mining "is not just a collection of data analysis methods, but a data analysis process that encompasses anything from data understanding, preprocessing and modeling to process evaluation and implementation." , , , 
220.127.116.11. Data mining vs. Statistics
Data mining and Statistics share many common characteristics a fact which may seem confusing at first, albeit being perfectly natural since, data mining has emerged by the composition of various disciplines like informatics, machine learning, database systems, visualization and finally, statistics (figure 1).
Additionally, besides the inevitable similarities between them, there exists a major and fundamental difference; data mining allows the development of models which when applied on data can offer different views and visualisations of results depending on the number of dimensions that were used for building these models. On the other hand when using statistics such a practice is not possible and in order to get different views and results' visualisations the effort has to be repeated as many times as the required number of dimensions in order the desired conclusions to be reached.
18.104.22.168. Data Mining Steps
When data mining is applied, the data under interest are often coming from different sources; this is the rule when talking about web data. Subsequently due to the inevitable differentiations between them, these data cannot be mined effectively. In order the data to end up in an adequate form and be submitted to data mining, several steps have to be followed:
- Initially, the data under interest have to be identified, since they may be in the form of text, images, videos, hyperlinks, etc.
- Then the data are located and collected from the different sources (database servers, web servers, etc). Most of the times the useful data comprise only a part of the total data, so a selection has to be made.
- Next, the collected data have to be "cleaned" by applying related techniques since these data may be inconsistent, incomplete or may contain errors. All inconsistencies need to be removed.
- The data cleaning however is not enough. The data have to be normalised/modified in order to come into adequate form and become able to be submitted to data mining.
- When the data are in the required form, their mining can begin. There exist several techniques for this purpose, with the main being, association rules, classification and clustering; all of them are described in more detail in the following paragraphs.
- When data mining process is concluded and the patterns are produced, the next step is to clean them in turn -many of them may be products of coincidence or may be of limited value- and produce the relevant visualisations.
- Finally, when the whole analysis is completed, the relevant reports are generated, the knowledge is available and decisions have to be taken. The analysis will help the researcher to reach useful conclusions and take the right decisions.
Usually the first four steps are very time consuming; in fact these may require over 60-70% of the overall process time. For this purpose the data are inserted in adequate databases or even data warehouses, when the data exist in large amounts. 
1.3.2. Mining the data
The data mining process is supported by several techniques or combinations of them. The fundamental ones are the following:
22.214.171.124.1. Association Rules
The search for associations in data sets comprises a basic data mining task. Via an association process, we take a certain set of data and analyse it so as to extract associations' patterns of the included objects. The outcome of such a process is a number of rules which offer a set of associations between database objects which help us to reach useful conclusions and these rules are accompanied by two factors, Support and Confidence, which comprise measures of the rule's strength. More specifically:
- Support, shows the frequency of the rule's application in a set of transactions and if it is too low, the rule may just be an outcome of luck or it may be applicable so rarely that at the end it is of no use.
- Confidence, measures how predictable a rule is. Low confidence means that the conclusions extracted by such a rule are not trustworthy and subsequently this rule cannot be used effectively.
The most common example of such an association process, which, as well, explains the role of the two aforementioned factors, is that of the "market basket". In this example we try to find what products are purchased in a super market and how these are associated.
The rule shows that 15% customers purchase Milk along with Chocolate and additionally, whoever purchases Milk also buys Chocolate 70% of the cases. , 
In classification or supervised learning, we determine certain classes and rules with each class holding certain attributes. We develop a classification model which is then applied on data that are not classified and these are grouped accordingly. In other words, we extract a set of rules from existing data and these rules are applied in turn on different -but similar- data sets, in order to predict certain behaviors.
For example, let's assume that we have an utterly simple data set like the one below, which shows the political preferences of certain population groups according to Age and Income attributes. The fourth column comprises the Class attribute:
IDAge Income Political Party
- young low Liberal
- middle low Liberal
- old low Conservative
- young middle Liberal
- middle middle Centre
- old middle Conservative
- young high Centre
- middle high Conservative
- old high Conservative
What we want to achieve is, to initiate a "learning process" and extract a classification model from this data set which, when applied in different -but of same nature- data sets, will provide predictions regarding the political views of the registered individuals. An emerging rule from the data set above could be that, young and low income persons tend to vote for liberal parties. This rule when applied in different data, -if it is correct- should "predict" that people holding these attributes will indeed vote for liberals. The initial data set is called training set, when the data set on which the model is evaluated is called test set. The accuracy of the classification model can be assessed by comparing the predicted results, with the actual results of the class. The longest the training period, the better the models accuracy will be. , , 
There are several classification methods:
- Statistical Classification is the method in which objects are grouped based on certain inherent quantitative information, with the help of information acquired from a training set of previously processed objects. 
- A Decision Tree is predictive model which is an hierarchical structure comprised of conditions. Beginning for the root of the tree we reach to the leaves that correspond to a class label; the route that is going to be followed -and subsequently the destination leaf- depends on the compliance of the instances on certain conditions. 
- Rule Induction is a method in which IF-THEN rules emerge from adequate processes. Each rule is connected to a state and via certain operators that perform generalization and specialisation operations, one rule can be transformed to another. 
- In Fuzzy Rule Induction the data are interpreted in a linguistic manner, by applying fuzzy logic. This means that, in contrast to conventional rules that use Boolean logic (right/wrong, warm/cold, etc), fuzzy rules can be multi-valued and intermediate values can be processed, like partly wrong, a little warm, quite cold, etc. , 
- Neural Networks can be connected to Rule Induction as well and it emerged as an idea form human brain structure. Several processing objects called nodes/neurons co-operate so as a result function to be produced. 
Clustering is the process of developing clusters so as all objects that are members of the cluster to conform to some pre-found criteria, meaning that these objects will be similar in a certain degree. It is often called unsupervised learning, because in opposition to supervised learning, there are no class attributes which define the grouping of data. The discovered data groups are called clusters, which, in order to be formed, several approaches may be followed. The two most important and widely accepted approaches are, partitional clustering and hierarchical clustering.
In Partitional Clustering, random points within the data set are selected as the centers of the clusters called "centroids" and their number is depended on the number of clusters that the user wants to discover. Next, the distances between the centroids and the data points are computed, each centroid is matched to the points that are closest to it and the emerging groups (centroid plus matched points) shape the clusters. This process is iterated many more times in order the clusters shaping to be improved as much as possible and stops only when certain pre-defined conditions are met.
In Hierarchical Clustering a nested sequence of clusters like a tree is produced. This is called "dendogram". At the top of the tree there exists one cluster (root), each internal cluster node contains child cluster nodes and the lowest part of the tree represents single data points. The following schema depicts such a dendogram.
Some confuse clustering with classification, due to the fact that in both techniques, sets of data are created. The difference however is that in classification the criteria are pre-determined by the user, when in clustering these criteria emerge by analysing the data. , 
The first field where Data Mining found immediate application was the corporate sector. Nevertheless the last years, due to the increasing needs for data analysis as well as the wide variety of tools that have been developed which can execute data mining process of great complexity, its applications have been expanded practically everywhere.
Nowadays, data mining techniques are applied, in businesses, military or security offices, medical institutions, banks, educational organisations, etc.
Businesses use data mining in order to find potential customers and improve their marketing strategies by finding patterns regarding the customers buying preferences and habits.
Military or security offices analyse opponents' data or even private data -illegally in many cases- in order to extract information regarding hostile movements or terrorist attacks.
Medical Institutions, executing researches on genomic data for example, are dealing with gigantic sets of data, which cannot be analysed without using data mining techniques and tools.
The Banks apply data mining to detect credit card fraud e.g. by identifying the patterns of transactions related to fraud actions or to reduce the risks when supplying loans by identifying or predicting potential untrustworthy customers. Finally, in educational systems, data mining is applicable in many fields, with one of the most important being e-learning or -more accurately- web-learning, since, most e-learning courses are carried out via the internet.
126.96.36.199. Application in e-Learning
We referred above to the various advantages of e-learning in the process of providing knowledge and education to literally any individual, independently of time, distance or personal ability. Nevertheless, e-learning environments even nowadays are still far from perfect and continuous improvements are needed in order to reach the desired level. The trainers need ways to assess the courses in terms of efficiency, structure, activities selected by the learners, learners' satisfaction, results, etc, and get adequate feedback in order to alter their course for the better.
In the e-learning field, two types of users are of main interest; the trainers and the learners. In the first category falls any organization that may be offering training courses of any kind, like universities, enterprises, public organizations, etc, while the second refers to any single one of us who is interested in acquiring knowledge.
Some of the data that are kept for each user may be: name, age, qualifications, experience (e.g. previous courses taken), course visiting frequency, time spent, grades achieved, etc. By applying data mining techniques on these data, we are able to extract information that may help us to evaluate the content of the courses, add/remove courses, establish new programs, guide the users better, identify most popular courses, improve the navigation schemes, identify groups of learners with similar behaviours, find cases where the learners don't take the process seriously and just play around, etc. , 
Additionally, due to the fact that most e-learning courses are offered nowadays via the internet and refer to a global audience, the amounts of collected data are huge and so, the processing and management of these data comprises a complicated issue. A solution on this issue can be offered by data mining via which the data can be assessed, managed, processed and exploited in such ways so as the e-learning environment itself to be adequately assessed and improved.
The online character that e-learning has adopted these least years, leads to conclusion that e-learning data mining is essentially applied on Web data; hence, it is called web mining. These web data may be:
- Web pages' content
- HTML or XML scripts
- Visitors numbers and data
- Links between pages
- Navigation data,
and Web mining adopts the same techniques with its "parent" discipline in order to mine these data. It can be divided in three main categories: Web Content Mining, Web Usage Mining and Web Structure Mining. 
188.8.131.52.1. Web Content Mining
The World Wide Web has been expanding rapidly the last two decades and it is becoming harder and harder for the user to identify the information that interest him/her within such a vast pool of information. The main goal of web content mining is to offer to the user the information of interest, by searching the content of the available online resources and in order to achieve this goal, the classical data mining techniques are not always enough. In other words web content mining takes the functionalities of a search engine, one step further, by implementing more advanced techniques. Due to the fact that web content is not organized in relational databases -like offline data-, and it can be text, images, audio, videos, metadata or hyperlinks, a relational database cannot be used in this case and different types of databases have to be used, like multimedia databases for example. The reason is that web content is not always structured like offline data and it may be unstructured (text data), semi-structured (HTML data) or structured (table data)
Moreover, web data are almost never accumulated in one place, but are dispersed in heterogeneous sources; consequently the data have to be pooled in one place, in order to be organized and homogenised (e.g. data warehouse). , 
184.108.40.206.2. Web Usage Mining
When Web Usage Mining is applied, in essence data mining techniques are used for discovering patterns regarding the web surfing activities of the users. Practically the data that are mined are the metadata (data about data) of these activities which are kept in respective logs (web logs).
The extracted patterns provide valuable information concerning the users' trends and preferences when surfing the web, products' marketing strategies, outcomes of promotional campaigns, etc; these information assist the web designers on developing improved web applications or marketing researchers to adjust their strategies accordingly. , 
220.127.116.11.3. Web Structure Mining
Web Structure Mining attempts to find patterns concerning the structure of the hyperlinks that reside within web pages and link one with another. The main scopes that it serves are the following:
- Categorizing web pages (search engines)
- Discovering structures of web documents
- Discovering the nature of the hierarchy or network of hyperlinks in the Website of a particular domain.
All the above have a main goal which is to provide information on improving the structure of web pages and applications. Under this scope we could support that it is strongly related to web usage mining, since a main goal of both aim is to improve the web structure in general. 
The importance of data mining in extracting knowledge and assisting in the decisions' making process, has led to the development of various tools. These tools aim to assist the researcher in conducting the data mining task in n easy, quick and efficient way without being necessary to be fully aware of the discipline. These tools can be either of commercial character or be available for free. Despite the great variety of commercial tools (BayesiaLab, Clementine, Data Miner Software Kit, DBMiner 2.0, IBM Intelligent Miner Data Mining Suite, KXEN, Oracle Data Mining (ODM), SPSS, SAS Enterprise Miner, etc.), their commercial character as well as their functionalities, are out of the scope of this thesis. Subsequently, we're going to focus in the free tools that are available. A few of the most widely used are the following:
18.104.22.168. RapidMiner (Community Edition)
RapidMiner is a Java-based application and its main characteristic is that it hosts a large number of operators (over 500); this feature provides the possibility to use a large number of different methods and make the corresponding comparisons and an additional advantage is the great possibilities that it offers for model building and validation. It seems to be the most powerful of all, but its main disadvantage is the somewhat complex GUI, which albeit aesthetically beautiful, it lacks user friendliness and seems to be harder to be learned, despite the complete documentation offered in the website. Another element is that Rapidminer has adopted several WEKA algorithms.
KNIME is simpler than Rapidminer in use, but it lacks the power when coming to model building and validation. Nevertheless, for relatively simple tasks, it includes all the required operators and visual components. Additionally, it can connect to and read from a database, and moreover it can incorporate modules of the WEKA tool a fact that enhances significantly the offered possibilities.
WEKA is somewhat in the middle between the two aforementioned applications KNIME and RapidMiner. It hosts many algorithms and visualization tools -not as many as RapidMiner- and it is relatively simple to use. Its user interface is not aesthetically in the level of the other two tools; nevertheless it seems to be more easy to use than both. The learning time required is significantly less than the other two even for novices and the accompanying documentation is seems to be more than enough. It offers direct access to databases and it is able to process the results of database queries.
2. Research Methodology
The primary scope of this project is to investigate how to enhance the navigation scheme of a SCORM compliant course at TEI Piraeus using data mining techniques.
The course will be analysed by applying mining techniques on the data collected by the learning management system. The underlying analysis will focus on identifying patterns/clusters of data, which will provide information regarding the navigational behaviour of the students. The main goal is to find navigational patterns and clusters of students that perform high or low. Next, a new navigation scheme will be proposed with the basis being the SCORM standard.
In order to evaluate the navigation scheme of the course data mining techniques will be implemented. The data extracted by the course's database will be mined by using three different techniques so as to gain the best possible view of the students' actions; these techniques will be, Association Rules, Classification and Clustering, with the algorithms selected being a-Priori, J48 and K-Means respectively.
- Association Rules (a-Priori) will help us to:
- Analyse the order with which the several learning activities are accessed by the students, meaning which is accessed first, which follow, which are usually omitted, etc.
- Correlate the difficulty levels to the grades earned, meaning, whether good or bad grades are due to to low difficulty activities or due to inappropriate navigation patterns.
- Classification (J48) will help us to:
- Connect the grades earned, meaning to the way the students navigate meaning, whether good or bad grades are connected to adequate or inappropriate respectively navigation patterns.
- Define how the difficulty levels are connected to the grades earned, meaning, under what difficulty circumstances good or bad grades emerge.
- Clustering (K-Means) will help us to:
- Group the students based on the course visiting frequency and reflect it on the grades and difficulty levels.
- Group the students based on their grades and reflect it on the difficulty levels.
It is obvious that the above described goals are more or less similar in a certain extent. Nevetheless, by using all these three techniques, we can have a more consistent view of the situation and the outcomes of this assessment will be more accurate and the errors' possibility will be significantly diminished.
The tool that will be used for the data mining process is WEKA. WEKA was chosen over Rapidminer and KNIME. Despite the fact that Rapidminer and KNIME, would be efficient as well for the tasks that this thesis requires, WEKA was chosen due to the superior simplicity of its interface. The WEKA GUI that will be used among the four available will be the "Explorer".
After identifying the problematic navigation schemes, solutions will be proposed. The "tool" that will be deployed is SCORM. If the data mining outcome is adequately evaluated and compliance with the corresponding SCORM guidelines is achieved, the students' navigation patterns and subsequently the efficiency of the course will be significantly improved.
3. Plan for Completion
At this point the data of interest (attributes) have been identified in the course's database and the respective queries have been built and executed so as the data to be extracted. The data have been exported and have been preprocessed (cleansing, filtering, etc), so as become ready to be imported and analysed in WEKA. Some data mining tasks have already been performed on test data in order the best algorithm settings to be selected.
What remains to be done, is the main data analysis to be executed in WEKA and the results to be evaluated accordingly. The outcome of this process will be the identification of the main problems that the current navigational scheme presents. Next, the current scheme will be compared to the SCORM guidelines in order any divergences to be pointed out and subsequently, propositions to be made so as the course's navigation scheme to converge to SCORM as much as possible.
- Rogerson-Revell, P., "Directions in e-learning tools and technologies and their relevance to online distance language education", Open Learning, v22 n1 p57-74 Feb 2007
- Nichols, M. (2008). "E-Learning in context", http://akoaotearoa.ac.nz/sites/default/files/ng/group-661/n877-1---e-learning-in-context.pdf
- Gerlic, I and N Jausovec (1999), "Multimedia: Differences in cognitive processes observed with EEG", Etr&DEducational Technology Research and Development vol 47, no 3, pps 5-14
- Kazmerski, V A and D G Blasko (1999), "Teaching observational research in introductory psychology: Computerized and lecture-based methods", Teaching of Psychology vol 26, no 4, pps 295-298
- Kulik, C-L and J Kulik (1991) "Effectiveness of computer-based instruction: An updated analysis", Computers in Human Behavior vol 7, pps 75-94
- Steyn, M, du Toit C.J and Lachmann, G. (1999) "The implementation of a multimedia program for first year university chemistry practicals", South African Journal of Chemistry-Suid-Afrikaanse Tydskrif Vir Chemie vol 52, no 4, pps 120-126
- Merchant, S, J Kreie and T P Cronan (2001), "Training end users: Assessing the effectiveness of multimedia CBT", Journal of Computer Information Systems vol 41, no 3, pps 20-25
- Huk T., Lipper T., Steinke M., Floto C., "The role of navigation and motivation in e-learning -the crimp- approach within a swedish german research cooperation"
- Zhang, D, Nunamaker, J.F., "Powering E-Learning In the New Millennium: An Overview of E-Learning and Enabling Technology"
- Gunn, C., McSporran, M., Macleod, H., & French, S. (2003). "Dominant or different? Gender issues in computer supported learning.", Journal of Asynchronous Learning Networks, 7(1).
- Rovai, F. (2002)., "A preliminary look at the structural differences of higher education classroom communities in traditional and ALN courses." Journal of Asynchronous Learning Networks, 6(1), 41-56.
- Zhao, Y., Lei, J., Lai, B. Y. C., & Tan, H. S. (2005). "What makes the difference? A practical analysis of research on the effectiveness of distance education." Teachers College Record 107(8), 1836-1884.
- Horton W. & K. "E-learning Tools and Technologies", 2003, 592 pages, ISBN: 0-471-444588, John Wiley & Sons
- Vannakrairojn S. Standards.ppt, NOLP, National Science and Development Agency, [email protected], 5 March 2003
- Desmarais, C.M., Villarreal, A. and Gagnon M., (2008), "Adaptive Test Design with a Naive Bayes Framework", EDM 2008: 48-56, http://www.educationaldatamining.org/EDM2008/uploads/proc/5_ Desmarais_17.pdf
- Holzinger, A (2000) "Basiswissen Multimedia Band 2: Lernen. Kognitive Grundlagen multimedialer Informationssysteme", Würzburg, Vogel
- Kuhlen, R (1991) "Hypertext: Ein nicht-lineares Medium zwischen Buch und Wissensbank", Berlin, Springer
- SCORM-1-12 SCORM® 2004 3rd Edition Overview Version 1.0 © 2006 Advanced Distributed Learning. All Rights Reserved.
- SCORM® 2004 3rd Edition Content Aggregation Model (CAM) Version 1.0
- SCORM® 2004 3rd Edition Run-Time Environment (RTE) Version 1.0
- SCORM® 2004 3rd Edition Sequencing and Navigation (SN) Version 1.0
- Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., Wirth, R.: CRIPS-DM 1.0 Step by Step Data Mining Guide. CRISP-DM Consortium (2000)
- Presentation of G. Karypis, Department of Computer Science, Digital Technology Center, University of Minnesota, Minneapolis, USA.
- Hanna M. (2004), "Data mining in the e-learning domain", Campus-Wide Information Systems, Vol. 21 , No 1, 2004, pp. 29-34
- Psaromiligkos papers: details must be added
- Orfanidou M., "A Tool for Investigating Students' Access Patterns of a Web-based Learning Management System"
- Romero C., Ventura S., Espejo P.G. and Hervas C, (2008), "Data mining algorithms to classify students", EDM 2008, http://www.educationaldatamining.org/EDM2008/uploads/proc/1_Romero_ 3.pdf
- Mavrikis M., (2008), "Data-driven modeling of students' interactions in an ILE", EDM 2008, http://www.educationaldatamining.org/EDM2008/uploads/proc/9_Mavrikis_27.pdf
- Pechenizkiy, M., Calders, T., Vasilyeva, E. and De Bra, P. "Mining the Student Assessment Data: Lessons Drawn from a Small Scale Case Study", EDM 2008, http://www.educationaldatamining. org/EDM2008/uploads/proc/20_Pechenizkiy_26.pdf
- Talavera L., and Gaudioso E., (2004), "Mining student data to characterize similar behavior groups in unstructured collaboration spaces", Workshop on Artificial Intelligence in CSCL, 16th European Conference on Artificial Intelligence, (ECAI 2004), pp. 17-23. http://www.lsi.upc.edu/~talavera/ papers/TalaveraGaudiosoECAI04ws.pdf
- Bresfelean, V.P. (2007), "Analysis and Predictions on Students' Behavior Using Decision Trees in Weka Environment", Information Technology Interfaces, 2007. ITI 2007. 29th International Conference on, June 2007, pp. 51-56.