Table of Contents
- Date – Primary Key – DateId as auto Number
- Territory – Primary Key- TerritoryId as auto Number
- ProductSubcaterory – Primary Key – ProductSubcategoryId as auto Number
The surrogate key is better to establish a unique connexion with a single record from the table. It is represented by an attribute with random values, unique, and does not have any linking to the company. Compared to the natural key that has a connection with all the others values from the table, the surrogate key has a single connection with a single value. The surrogate key cannot create and store a double value. The surrogate key is used to change the composite key from multiple columns. In Access is used auto numbering field (autoNumber). From the surrogate key values cannot be obtained information about the recognised records
The attribute is a property that describes a certain aspect of an entity. (Angela, 2002)The attributes are chosen according to some significant criteria related to the interesting domain, for which is defined the certain model, thus to ensure the difference between the entity and the rest of the company. The ‘Date’ is an entity. This type of entity can be described through many attributes as day, month, year. These attributes are added to make it easier to select dates and to be made a statistic analysis, for instance, to visualise the sales from a certain month for a specific product category comparing to another month. The attributes are simple for example town, and composed as an address – the address can include street, number, town, region.
The query syntax has a better understanding and is easier to be read. Is easier because uses the natural language to develop a command as select, from, where.
The data can be stores on fields or activities specific to the departments of an organisation in so-called data marts. Star and snowflake are two types of dimensional models. In these models can be retrieved quantitively data such as amounts or values or grouped by various criteria, per customer, per products. The data identified by dimension will be stored in a relational table that is called the fact table. The codes used are associated with the fact tables in this way the star model is created. The fact table consists of most of the data. The advantage of the star model is that this assures the optimum performance for the interrogations from a data mart. For instance, the Product Subcategory table includes Id, name, category name.
The snowflake model is a different type of the star model where the tables are normalised, such that to appear additional tables. The difference between these two is that the dimension table from the snowflake model can be stored in normalise form which causes a reduced redundancy. The snowflake tables are easy to maintain and save storage space. (Mohammad, 2018 )
Slowly Changing Dimension (SCD) signify a dimension of existing and historical data which are stored and managed in a data mart. There are 3 types of Slowly Changing Dimension: overwriting, creating another dimension record and creating a current value field. (Kashinathan, 2012)
- Overwriting – the old data are replaced with new data. The dimension table will be increased with only one record. The weakness of this method is that the data historical value will be lost being replaced with a new value for each attribute. For instance, in the Customer table, there are several data. If a customer changed his phone number, then the old phone number will be deleted and replaced with the new number.
- Creating/ adding another record – this method holds the value’s history. When the old record is changed its value will pass as it is inactive, then a new row with new data will be created and it will be declared as active. In the Customer table, if a customer moved to another address, the previous address will be inactive and the new address will be added and declared active.
- Creating/adding a current value. Each field will have for example an old address and a new address. If the new address will change again then the new address will become the old address and the new date that will be introduced will be the new address. (“What are Slowly Changing Dimensions?” 2009)
The fact table, in this case, is characterised by Sale table and the other four are the dimension tables. (Customer, Territory, Date and Product SubCategory). The fact table – Sale has an indicator to each of the dimension tables. For instance, CustomerKey from the fact table is attributed to the CustomerKey from the Customer dimension table and references the customer number. The same happens with the TerritoryId, ProductSubcategoryId and DateId, territory reference the country, Productsubcategory reference the category of product and Date reference the date as time. The primary keys from the fact table provide a unique identifier for each record and a foreign key in the other tables.
If you need assistance with writing your essay, our professional essay writing service is here to help!Essay Writing Service
The granularity is a very important aspect for building the data marts., because of controls the volume occupied by the data. It refers to the detail level in a data unit warehouse. The higher detail level is with that much the granularity is smoother. (Bill, 2006) Taking as an example the fact table- if the summery is stronger then the granularity will be higher. In the fact table, the saleAmount is stocked for long periods of time and is used in the organisational decision. Also, in the fact table exist the level per day which means that each customer for each subcategory from a territory per day. In the data mart, the data is updated by the ELT (Extract Load Transformation) process. In the ELT process are kept all the sale records of each customer/territory/ product subcategory per day. The ELT implies the extraction of external data, its transformation according to operational needs and the loading in the database. At the end of the process does not have to exist more records for a customer on the same day. Briefly, the granularity refers to the level of summarising or complete the data.
The fact table
1 row has 5 attributes. Each attribute has 4 bits.
1 row 5 attributes x 4bit = 20 bits
500 000 records x 20 bit = 1.25 GB
The dimension table
- Date – 1158 x 1Kb/ row = 1.15Mb
- Territory – 11 x 1Kb/row = 0.01Mb
- Subcategory – 37 x 1Kb/row = 0.037 Mb
- Client - 18484 x 1Kb/row = 18.5 Mb
Total = 20Mb
Data mart total = 1.25(500 000 records) + 0.20(20Mb) = 1.45 Gb
ETL process is used in database and particularly data warehouse that involves three stages.
The extract is the first stage- extracting the data from external source.
Transformation stage- transform data to suit the operational needs.
Loading stage – loading the data into the data mart.
The Access File – Australia
To import data from Access file it was used source assistance which is helpful to choose the source from countries data. That was the first step. Then the data is converted to a unique code string DT_WSTR, currency DT_CY, date DT_DATE. To acquire granularity of product per customer, per region per day, the sales amount needs to be summed using an aggregation tool. The next step is conditional split that will check the conditions and then will avoid any null values. In order to proceed the development, there are three lookups. The lookup will create a field that helps to find a value from another table using a list box. For instance, Territory Region from Access file must be related with TerritoryId from dimension table. The TerritoryId has a numerical value.
The same process will be developed for each subcategory.
Once the data is converted and achieve all the integrity rules it is loaded to SQL database.
The Access File has null values in the sales amount and it will be removed.
The Flat File does not encounter any problem having control over the completeness, accuracy and authorisation of the ETL process.
The XML File – in Product Subcategory dimension table it was used fuzzy lookup because in Tyre category there were two alternatives of category, tyre and tire, but it was used the same key for both.
The Excel File face with an issue concerning the date. There are values with 2019 and all must be removed in a document text. There cannot be stored in sales values for 2019 in 2018. The derived column is used to demonstrate that in the Territory the China value does not have a correspondent from Territory. Then the derived column is used to relate China with the Id 11 that signify the NA in the dimension Territory.
What is Business Intelligence Application?
Business Intelligence is a process that helps and supports the process management of a business in order to make it more efficient and accurate. To its functions, this process includes a set of specialised software tools and applications for collecting, storing, analysing and exploiting data and information resulted and accumulated at the level of processing of an organisation. The purpose of these is to offer a real support in taking decisions. The concept of Business Intelligence Application is also known as Data Warehouse or Analytical Application.
The Business Intelligence solution integrates current data and historical data from different programs and applications and consolidates them in a single optimised database to retrieve and analyse the information.
Business Intelligence process inputs are the data gained by the organisation while the operational stage runs as sales, stock, logistic, administration, financial, personnel as well as those from outside through streams, communication with business partners, suppliers, customers or their records/database.
The main value of this concept is the ability to transform the data into information.
The key elements in Business Intelligence are referring to:
- Data collection
- Data analysis – understanding the data and transform it into information
- Data distribution – timely delivery of information to those who require the information
The Business Intelligence use data warehouse which contains useful information for taking decisions through systems as Decision Support System (DSS) and On-line Analytical Processing (OLAP). These systems help to improve the Executives Information Systems used for activities analysis in the company. Also, the Business Intelligence is based on data mart that has to accept multiple technology process interrogation, reports, multidimensional analysis, data development, multiple granularity levels.
Our academic experts are ready and waiting to assist with any writing project you may have. From simple essay plans, through to full dissertations, you can guarantee we have a service perfectly matched to your needs.View our services
The implementation of data marts in Business Intelligence brings a number of advantages. Unlike the operational system that contains only current details, the data warehouse can provide both historical and aggregate information. The information in data mart is clear, consistent and is stored in a form that can be understood. Among the technologies developed by Business Intelligence Data Mining is useful for the company to discover the hidden or hard to find items in the database. With data, mining support can be determined relationship or correlation between different data or data groups.
The use of Business Intelligence Applications are valuable for the company because simplifies the process of making decisions, offers the opportunity to get the reports in the shortest possible time, reduce the operational costs and improve the relationship with clients.
The main value of Business Intelligence is the ability to convert data into information.
- Alechina, N. (2018). Entity/Relationship Modelling [PDF]. Retrieved from www.cs.nott.ac.uk/~psznza/G51DBS/dbs4.pdf
- Angela, B. S. (2002). Entity-Relationship Model. Retrieved from https://wofford-ecs.org/DataAndVisualization/ermodel/material.htm
- Adnan, A. A. Ilham and S. Usman, “Performance analysis of extract, transform, load (ETL) in apache Hadoop atop NAS storage using ISCSI,” 2017 4th International Conference on Computer Applications and Information Processing Technology (CAIPT), Kuta Bali, 2017, pp. 1-5 .doi:10.1109/CAIPT.2017.8320716
- Bill, I. (2006, December 14). Granularity of data. Retrieved from https://searchdatamanagement.techtarget.com/news/2240034162/Granularity-of-data
- H. Lv, L. Zhou and Y. Zhao, “Classification of Data Granularity in Data Warehouse,” 2017 9th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC), Hangzhou, 2017, pp. 118-122. doi: 10.1109/IHMSC.2017.142
- Kashinathan. (2012, June 27). SCD Type 1, SCD Type 2, SCD Type 3,Slowly Changing Dimension Types,Advantages & Disadvantages [Web log post]. Retrieved from http://dwhlaureate.blogspot.com/2012/06/slowly-changing-dimensions.html
- Limeback, R. (2005, November). Difference between a primary key and a surrogate key. Retrieved from https://searchoracle.techtarget.com/answer/Difference-between-a-primary-key-and-a-surrogate-key
- Mohammad, A. R. (2018). Advanced Topics of Dimensional Modeling [PDF]. Retrieved from https://mis.uhcl.edu/rob/course/dw/Lectures/Advanced%20Dimensional%20Modeling.pdf
- Surrogate Key – Java, SQL and jOOQ. (2017, March 16). Retrieved from https://blog.jooq.org/tag/surrogate-key/
- Tong Gang, Cui Kai and Song Bei, “The research & application of Business Intelligence system in retail industry,” 2008 IEEE International Conference on Automation and Logistics, Qingdao, 2008, pp. 87-91. doi: 10.1109/ICAL.2008.4636125
- What are Slowly Changing Dimensions? (2009). Retrieved from http://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.html
Cite This Work
To export a reference to this article please select a referencing stye below:
Related ServicesView all
DMCA / Removal Request
If you are the original writer of this essay and no longer wish to have your work published on UKEssays.com then please: