This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Data Modeling is the process of finding out how to store digitized information in a logically structured computer database. It is a technical process of understanding and mapping business information and stored in a database in a logical form. i.e. Data modeling is a process which applies structure and organization of large chunk of data for ease of data management.
Even though data modeling is the time consuming process, for a good database design, we need to have a sound data model. Otherwise we need to modify the database design again and again.
TYPES OF DATA MODELS
There are different types of data models.
File system (text or binary file such as windows explorer or Unix)
Hierarchical Databases- where every child entity has parent entity.
Network databases - child object has multiple parents. -a many to may relations. an employee can have more than one task. and each task can be assigned to more than one employee.
Dimensional Databases - more than two dimensional databases.
Object Relational Databases - new introduced some object aspects in relational databases.
ESSENTIAL ELEMETS OF DATA MODELING
I would like to discuss essential elements of Data Modeling.
Logical Elements or Logical Model
With the completion of conceptual modeling, we move on to Logical Modeling.
Logical model is a representation of the data in a way that can be represented to the business. Logical model serves as a road map for the physical implementation.
The 3 main elements of a logical model are :
i)Entities: Entities are logical grouping of data. (E.g: information describes about a
ii)Attributes: Pieces of information, that make up entities.(E.g: for student entity, attributes
might be studentname, idnumber, coursename, etc)
iii)Relationships: Relationships describe how one entity is related to another.
(E.g students enroll into a course)
Physical Elements or Physical model
Once the logical model is constructed, we create the physical model. Like the logical model, physical model is also made up of various elements.
i)Tables : Tables are useful for storing data. Tables have columns containing
information about the data in the table rows.
ii)Keys ââ‚¬" Keys define relationship between two tables. Following are the types of keys.
At glance, tables, columns and keys might seem to be same as the logical elements, but there are important differences. Logical Elements simply describe the grouping of data as they might exist in the real world. Physical Elements actually store the data in a database. A single entity might be stored in one table or in multiple tables. More than one entity might be stored in single table.
A well-designed data model has some level of normalization.
Normalization is the process of separating data into logical groupings.
Normalization is divided into levels, and each successive level build on the preceding level.
First Normal Form:
The data is stored in a table and each column contains one type of data. i.e. any
given column in the table stores the same piece of information, such as a phone
2) Data have a primary key which uniquely identify the each row in a table.
3) First normal form allow to create one-to-many relationships typically master-detail
relationships(ex: invoice and invoice lines). First normal form eliminating repeated
groups i.e. rows contain repeated column values.
Example: Before Normal form
After First Normal Form
Second Normal Form:
It Create many-to-one relationships in order to separate static to Transactional data. (For E.g such as removing customer details from invoices. As Invoices created every month and customers create every year or over 5 or 10 years.) i.e. It eliminates redundant data. It removed columns that are not dependent on primary key. Second normal must first fulfill the requirements of the First Normal Form.
Example: First Normal Form
Total in $
Second Normal form
The creation of two separate tables , eliminates the dependency problem.
Third Normal Form:
Third normal form resolves many-to- many relationships. e.g. an employee can be assigned with many tasks. And a task can be assigned to many employees.
Third Normal Form
Normalization can go up to 6 levels. But most well built models go up to 3rd normal form.
Boyce-Codd Normal Form(BCNF):
Fourth Normal Form(4NF) and Fifth Normal Form(5NF):
De-normalization is the process of adding redundant data to speed up complex queries involving multiple table. We can achieve De-normalization and better query execution performance by going to the lower form of normalization.
THE PROCESS OF DATA MODELING
Steps to Build a Data Modeling
There are three levels of data modeling. They are conceptual, logical, and physical. This section will explain the difference among the three, the order with which each one is created, and how to go from one level to the other.
Requirement and interpretation of the requirements to design a company's database.
2) Building Conceptual Data Model
Includes the important entities and the relationships among them.
No attribute is specified.
No primary key is specified.
At this level, the data modeler attempts to identify the highest-level relationships among the different entities.
Sample Conceptual Model (E.g : Student_Course):
3) Building the logical model.
Includes all entities and relationships among them.
All attributes for each entity are specified.
The primary key for each entity specified.
Foreign keys (keys identifying the relationship between different entities) are specified.
Normalization occurs at this level.
At this level, the data modeler attempts to describe the data in as much detail as possible, without regard to how they will be physically implemented in the database.
The steps for designing the logical data model are as follows:
Identify all entities.
Specify primary keys for all entities.
Find the relationships between different entities.
Find all attributes for each entity.
Resolve many-to-many relationships.
4) Building the physical model.
Physical model represents the physical implementation of the model in a database. Creates Tables, Columns, Primary Key Constraint, Unique Constraint, check constraint, foreign keys.
Features of physical data model include:
Specification all tables and columns.
Foreign keys are used to identify relationships between tables.
De-normalization may occur based on user requirements.
Physical considerations may cause the physical data model to be quite different from the logical data model.
At this level, the data modeler will specify how the logical data model will be realized in the database schema.
The steps for physical data model design are as follows:
Convert entities into tables.
Convert relationships into foreign keys.
Convert attributes into columns.
Modify the physical data model based on physical constraints / requirements.
4) Implementing the physical model on Database(SQL Server /Oracle).
Practical Project work
Designing a Sample Data Model(e.g Student_Course_Model) Practically using Erwin 7.3 Data Modeler Tool
Screen 2: Selecting new template for designing data model. Here we can see the three option
Logical : Selecting this only we can do logical model
Physical: with this option only we can do Physical Model
Logical/Physical: selecting this option we do logical and physical
Screen 3: Here first we are doing logical model of Student and Course data . For that we designed Tables and there attributes ( columns available in that table) using Erwin Data modeler
Screen 4 : Identifying relations between attributes in Student_Course model
Screen 5: Creating Data types like Integer, String ,date etc. to attributes in Student_ course model.
Screen 6 : Creating Varchar Data Type to CourseDescription in Student_Course Model
Screen 7 : Creating Data Types and length to attributes
Screen 8 : Creating Data Types and length to attributes
Screen 9 : Creating Data Types and length for attributes.
Screen 10 .Converting Logical Model into Physical Model
Screen 11 .Generating Script From the designed model.
Screen 12: Selecting the Forward Engine option with which we can forward the data to Databases here SQL Server 2008
Screen 13 : Selection Schema option and Schema generation.
Screen 14 : Here We can see the SQL Server Schema Generation Preview
Screen 15 : Executing the Script to Create Student_CourseDB in SQL Server 2008.
Screen 16: Finally , we can Student_Course Model i.e. Student_CourseDB in SQL Server 2008
Dimensional Data Model
Dimensional data model is most often used in data warehousing systems. This is different from the 3rd normal form, commonly used for transactional (OLTP) type systems. As you can imagine, the same data would then be stored differently in a dimensional model than in a 3rd normal form model.
Terms used in dimensional data modeling are
i)Dimension: A category of information. For example, the time dimensions.
ii) Attribute: A unique level within a dimension. For example, Month is an attribute in the Time Dimension.
Iii) Hierarchy: The specification of levels that represents relationship between different attributes within a hierarchy. For example, one possible hierarchy in the Time dimension is Year --> Quarter --> Month --> Day.
A dimensional model includes dimension tables, fact tables and lookup tables.
Fact Table: A fact table is a table that contains the measures. . For example, sales amount would be such a measure. This measure is stored in the fact table with the appropriate granularity. For example, it can be sales amount by store by day. In this case, the fact table would contain three columns: A date column, a store column, and a sales amount column. Fact tables connect to one or more lookup tables, but fact tables do not have direct relationships to one another.
Lookup Tables: The lookup table provides the detailed information about the attributes. These are like reference tables.
In designing data models for data warehouses / data marts, the most commonly used schema types are Star Schema and Snowflake Schema.
Star Schema: In the star schema design, a single object (the fact table) sits in the middle and is radically connected to other surrounding objects (dimension lookup tables) like a star. A star schema can be simple or complex. A simple star consists of one fact table; a complex star can have more than one fact table.
Snowflake Schema: The snowflake schema is an extension of the star schema, where each point of the star explodes into more points. The main advantage of the snowflake schema is the improvement in query performance due to minimized disk storage requirements and joining smaller lookup tables. The main disadvantage of the snowflake schema is the additional maintenance efforts needed due to the increase number of lookup tables.
Whether one uses a star or a snowflake largely depends on personal preference and business needs. Personally, I am partial to snowflakes, when there is a business case to analyze the information at that particular level.