Java and web programming languages

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

About Java

Initially the language was called as "oak" but it was renamed as "Java" in 1995. The primary motivation of this language was the need for a platform-independent (i.e., architecture neutral) language that could be used to create software to be embedded in various consumer electronic devices.

* Java is a programmer's language.

* Java is cohesive and consistent.

* Except for those constraints imposed by the Internet environment, Java gives the programmer, full control.

* Finally, Java is to Internet programming where C was to system programming.

Importance of Java to the Internet

Java has had a profound effect on the Internet. This is because; Java expands the Universe of objects that can move about freely in Cyberspace. In a network, two categories of objects are transmitted between the Server and the Personal computer. They are: Passive information and Dynamic active programs. The Dynamic, Self-executing programs cause serious problems in the areas of Security and probability. But, Java addresses those concerns and by doing so, has opened the door to an exciting new form of program called the Applet.

Java can be used to create two types of programs

Applications and Applets: An application is a program that runs on our Computer under the operating system of that computer. It is more or less like one creating using C or C++. Java's ability to create Applets makes it important. An Applet is an application designed to be transmitted over the Internet and executed by a Java -compatible web browser. An applet is actually a tiny Java program, dynamically downloaded across the network, just like an image. But the difference is, it is an intelligent program, not just a media file. It can react to the user input and dynamically change.

Features Of Java Security

Every time you that you download a "normal" program, you are risking a viral infection. Prior to Java, most users did not download executable programs frequently, and those who did scanned them for viruses prior to execution. Most users still worried about the possibility of infecting their systems with a virus. In addition, another type of malicious program exists that must be guarded against. This type of program can gather private information, such as credit card numbers, bank account balances, and passwords. Java answers both these concerns by providing a "firewall" between a network application and your computer.

When you use a Java-compatible Web browser, you can safely download Java applets without fear of virus infection or malicious intent.


For programs to be dynamically downloaded to all the various types of platforms connected to the Internet, some means of generating portable executable code is needed .As you will see, the same mechanism that helps ensure security also helps create portability. Indeed, Java's solution to these two problems is both elegant and efficient.

The Byte code

The key that allows the Java to solve the security and portability problems is that the output of Java compiler is Byte code. Byte code is a highly optimized set of instructions designed to be executed by the Java run-time system, which is called the Java Virtual Machine (JVM). That is, in its standard form, the JVM is an interpreter for byte code.

Translating a Java program into byte code helps makes it much easier to run a program in a wide variety of environments. The reason is, once the run-time package exists for a given system, any Java program can run on it.

Although Java was designed for interpretation, there is technically nothing about Java that prevents on-the-fly compilation of byte code into native code. Sun has just completed its Just In Time (JIT) compiler for byte code. When the JIT compiler is a part of JVM, it compiles byte code into executable code in real time, on a piece-by-piece, demand basis. It is not possible to compile an entire Java program into executable code all at once, because Java performs various run-time checks that can be done only at run time. The JIT compiles code, as it is needed, during execution.

Java Virtual Machine (JVM)

Beyond the language, there is the Java virtual machine. The Java virtual machine is an important element of the Java technology. The virtual machine can be embedded within a web browser or an operating system. Once a piece of Java code is loaded onto a machine, it is verified. As part of the loading process, a class loader is invoked and does byte code verification makes sure that the code that's has been generated by the compiler will not corrupt the machine that it's loaded on. Byte code verification takes place at the end of the compilation process to make sure that is all accurate and correct. So byte code verification is integral to the compiling and executing of Java code.

Overall Description

Java programming uses to produce byte codes and executes them. The first box indicates that the Java source code is located in a. Java file that is processed with a Java compiler called javac. The Java compiler produces a file called a. class file, which contains the byte code. The. Class file is then loaded across the network or loaded locally on your machine into the execution environment is the Java virtual machine, which interprets and executes the byte code.

Java Architecture

Java architecture provides a portable, robust, high performing environment for development. Java provides portability by compiling the byte codes for the Java Virtual Machine, which is then interpreted on each platform by the run-time environment. Java is a dynamic system, able to load code when needed from a machine in the same room or across the planet.

Compilation of code

When you compile the code, the Java compiler creates machine code (called byte code) for a hypothetical machine called Java Virtual Machine (JVM). The JVM is supposed to execute the byte code. The JVM is created for overcoming the issue of portability. The code is written and compiled for one machine and interpreted on all machines. This machine is called Java Virtual Machine.

Compiling and interpreting Java Source Code

During run-time the Java interpreter tricks the byte code file into thinking that it is running on a Java Virtual Machine. In reality this could be a Intel Pentium Windows 95 or SunSARC station running Solaris or Apple Macintosh running system and all could receive code from any computer through Internet and run the Applets.


Java was designed to be easy for the Professional programmer to learn and to use effectively. If you are an experienced C++ programmer, learning Java will be even easier. Because Java inherits the C/C++ syntax and many of the object oriented features of C++. Most of the confusing concepts from C++ are either left out of Java or implemented in a cleaner, more approachable manner. In Java there are a small number of clearly defined ways to accomplish a given task.


Java was not designed to be source-code compatible with any other language. This allowed the Java team the freedom to design with a blank slate. One outcome of this was a clean usable, pragmatic approach to objects. The object model in Java is simple and easy to extend, while simple types, such as integers, are kept as high-performance non-objects.


The multi-platform environment of the Web places extraordinary demands on a program, because the program must execute reliably in a variety of systems. The ability to create robust programs was given a high priority in the design of Java. Java is strictly typed language; it checks your code at compile time and run time.

Java virtually eliminates the problems of memory management and deallocation, which is completely automatic. In a well-written Java program, all run time errors can -and should -be managed by your program.



The Java web server is JavaSoft's own web Server. The Java web server is just a part of a larger framework, intended to provide you not just with a web server, but also with tools. To build customized network servers for any Internet or Intranet client/server system. Servlets are to a web server, how applets are to the browser.

About Servlets

Servlets provide a Java-based solution used to address the problems currently associated with doing server-side programming, including inextensible scripting solutions, platform-specific APIs, and incomplete interfaces.

Servlets are objects that conform to a specific interface that can be plugged into a Java-based server. Servlets are to the server-side what applets are to the client-side - object byte codes that can be dynamically loaded off the net. They differ from applets in that they are faceless objects (without graphics or a GUI component). They serve as platform independent, dynamically loadable, plugable helper byte code objects on the server side that can be used to dynamically extend server-side functionality.

For example, an HTTP Servlets can be used to generate dynamic HTML content. When you use Servlets to do dynamic content you get the following advantages:

Ø They're faster and cleaner than CGI scripts

Ø They use a standard API (the Servlets API)

Ø They provide all the advantages of Java (run on a variety of servers without needing to be rewritten).

Attractiveness of Servlets

There are many features of Servlets that make them easy and attractive to use. These include:

Ø Easily configured using the GUI-based Admin tool

Ø Can be loaded and invoked from a local disk or remotely across the network.

Ø Can be linked together, or chained, so that one Servlets can call another Servlets, or several Servlets in sequence.

Ø Can be called dynamically from within HTML pages, using server-side include tags.

Ø Are secure - even when downloading across the network, the Servlets security model and Servlets sandbox protect your system from unfriendly behavior.

Advantages of the Servlet API

One of the great advantages of the Servlet API is protocol independence. It assumes nothing about:

* The protocol being used to transmit on the net

* How it is loaded

* The server environment it will be running in

These qualities are important, because it allows the Servlet API to be embedded in many different kinds of servers. There are other advantages to the Servlet API as well. These include:

* It's extensible - you can inherit all your functionality from the base classes made available to you.

* It's simple, small, and easy to use.

Features of Servlets

* Servlets are persistent. Servlet are loaded only by the web server and can maintain services between requests.

* Servlets are fast. Since Servlets only need to be loaded once, they offer much better performance over their CGI counterparts.

* Servlets are platform independent.

* Servlets are extensible. Java is a robust, object-oriented programming language, which easily can be extended to suit your needs

* Servlets are secure.

* Servlets can be used with a variety of clients.

Loading Servlets

Servlets can be loaded from three places

From a directory that is on the CLASSPATH. The CLASSPATH of the JavaWebServer includes service root/classes/ which is where the system classes reside.

From the <SERVICE_ROOT /Servlets/ directory. This is *not* in the server's class path. A class loader is used to create Servlets from this directory. New Servlets can be added - existing Servlets can be recompiled and the server will notice these changes.

From a remote location. For this a code base like http: // nine.eng / classes / foo / is required in addition to the Servlets class name. Refer to the admin GUI docs on Servlet section to see how to set this up.

Loading Remote Servlets

Remote Servlets can be loaded by:

1. Configuring the Admin Tool to setup automatic loading of remote Servlets

2. Setting up server side include tags in. shtml files

3. Defining a filter chain configuration

Invoking Servlets

A Servlet invoker is a Servlet that invokes the "service" method on a named Servlet. If the Servlet is not loaded in the server, then the invoker first loads the Servlet (either from local disk or from the network) and the then invokes the "service" method. Also like applets, local Servlets in the server can be identified by just the class name. In other words, if a Servlet name is not absolute, it is treated as local.

A client can invoke Servlets in the following ways:

* The client can ask for a document that is served by the Servlet.

* The client (browser) can invoke the Servlet directly using a URL, once it has been mapped using the Servlet Aliases section of the admin GUI.

* The Servlet can be invoked through server side include tags.

* The Servlet can be invoked by placing it in the Servlets/ directory.

* The Servlet can be invoked by using it in a filter chain.


JavaScript is a script-based programming language that was developed by Netscape Communication Corporation. JavaScript was originally called Live Script and renamed as JavaScript to indicate its relationship with Java. JavaScript supports the development of both client and server components of Web-based applications. On the client side, it can be used to write programs that are executed by a Web browser within the context of a Web page. On the server side, it can be used to write Web server programs that can process information submitted by a Web browser and then updates the browser's display accordingly

Even though JavaScript supports both client and server Web programming, we prefer JavaScript at Client side programming since most of the browsers supports it. JavaScript is almost as easy to learn as HTML, and JavaScript statements can be included in HTML documents by enclosing the statements between a pair of scripting tags


<SCRIPT LANGUAGE = "JavaScript">

JavaScript statements


Here are a few things we can do with JavaScript:

Ø Validate the contents of a form and make calculations.

Ø Add scrolling or changing messages to the Browser's status line.

Ø Animate images or rotate images that change when we move the mouse over them.

Ø Detect the browser in use and display different content for different browsers.

Ø Detect installed plug-ins and notify the user if a plug-in is required.

We can do much more with JavaScript, including creating entire application.

JavaScript Vs Java

JavaScript and Java are entirely different languages. A few of the most glaring differences are:

* Java applets are generally displayed in a box within the web document; JavaScript can affect any part of the Web document itself.

* While JavaScript is best suited to simple applications and adding interactive features to Web pages; Java can be used for incredibly complex applications.

There are many other differences but the important thing to remember is that JavaScript and Java are separate languages. They are both useful for different things; in fact they can be used together to combine their advantages.


Ø JavaScript can be used for Sever-side and Client-side scripting.

Ø It is more flexible than VBScript.

Ø JavaScript is the default scripting languages at Client-side since all the browsers supports it.

Hyper Text Markup Language

Hypertext Markup Language (HTML), the languages of the World Wide Web (WWW), allows users to produces Web pages that include text, graphics and pointer to other Web pages (Hyperlinks).

HTML is not a programming language but it is an application of ISO Standard 8879, SGML (Standard Generalized Markup Language), but specialized to hypertext and adapted to the Web. The idea behind Hypertext is that instead of reading text in rigid linear structure, we can easily jump from one point to another point. We can navigate through the information based on our interest and preference. A markup language is simply a series of elements, each delimited with special characters that define how text or other items enclosed within the elements should be displayed. Hyperlinks are underlined or emphasized works that load to other documents or some portions of the same document.

HTML can be used to display any type of document on the host computer, which can be geographically at a different location. It is a versatile language and can be used on any platform or desktop.

HTML provides tags (special codes) to make the document look attractive. HTML tags are not case-sensitive. Using graphics, fonts, different sizes, color, etc., can enhance the presentation of the document. Anything that is not a tag is part of the document itself.

Basic HTML Tags :

<! -- --> Specifies comments

<A>..........</A> Creates hypertext links

<B>..........</B> Formats text as bold

<BIG>..........</BIG> Formats text in large font.

<BODY>...</BODY> Contains all tags and text in the HTML document

<CENTER>...</CENTER> Creates text

<DD>...</DD> Definition of a term

<DL>...</DL> Creates definition list

<FONT>...</FONT> Formats text with a particular font

<FORM>...</FORM> Encloses a fill-out form

<FRAME>...</FRAME> Defines a particular frame in a set of frames

<H#>...</H#> Creates headings of different levels

<HEAD>...</HEAD> Contains tags that specify information about a document

<HR>...</HR> Creates a horizontal rule

<HTML>...</HTML> Contains all other HTML tags

<META>...</META> Provides meta-information about a document

<SCRIPT>...</SCRIPT> Contains client-side or server-side script

<TABLE>...</TABLE> Creates a table

<TD>...</TD> Indicates table data in a table

<TR>...</TR> Designates a table row

<TH>...</TH> Creates a heading in a table


Ø A HTML document is small and hence easy to send over the net. It is small because it does not include formatted information.

Ø HTML is platform independent.

Ø HTML tags are not case-sensitive.

Java Database Connectivity

What Is JDBC?

JDBC is a Java API for executing SQL statements. (As a point of interest, JDBC is a trademarked name and is not an acronym; nevertheless, JDBC is often thought of as standing for Java Database Connectivity. It consists of a set of classes and interfaces written in the Java programming language. JDBC provides a standard API for tool/database developers and makes it possible to write database applications using a pure Java API.

Using JDBC, it is easy to send SQL statements to virtually any relational database. One can write a single program using the JDBC API, and the program will be able to send SQL statements to the appropriate database. The combinations of Java and JDBC lets a programmer write it once and run it anywhere.

What Does JDBC Do?

Simply put, JDBC makes it possible to do three things:

Ø Establish a connection with a database

Ø Send SQL statements

Ø Process the results.

JDBC versus ODBC and other APIs

At this point, Microsoft's ODBC (Open Database Connectivity) API is that probably the most widely used programming interface for accessing relational databases. It offers the ability to connect to almost all databases on almost all platforms.

So why not just use ODBC from Java? The answer is that you can use ODBC from Java, but this is best done with the help of JDBC in the form of the JDBC-ODBC Bridge, which we will cover shortly. The question now becomes "Why do you need JDBC?" There are several answers to this question:

1. ODBC is not appropriate for direct use from Java because it uses a C interface. Calls from Java to native C code have a number of drawbacks in the security, implementation, robustness, and automatic portability of applications.

2. A literal translation of the ODBC C API into a Java API would not be desirable. For example, Java has no pointers, and ODBC makes copious use of them, including the notoriously error-prone generic pointer "void *". You can think of JDBC as ODBC translated into an object-oriented interface that is natural for Java programmers.

3. ODBC is hard to learn. It mixes simple and advanced features together, and it has complex options even for simple queries. JDBC, on the other hand, was designed to keep simple things simple while allowing more advanced capabilities where required.

4. A Java API like JDBC is needed in order to enable a "pure Java" solution. When ODBC is used, the ODBC driver manager and drivers must be manually installed on every client machine. When the JDBC driver is written completely in Java, however, JDBC code is automatically installable, portable, and secure on all Java platforms from network computers to mainframes.

Two-tier and Three-tier Models

The JDBC API supports both two-tier and three-tier models for database access.

In the two-tier model, a Java applet or application talks directly to the database. This requires a JDBC driver that can communicate with the particular database management system being accessed. A user's SQL statements are delivered to the database, and the results of those statements are sent back to the user. The database may be located on another machine to which the user is connected via a network. This is referred to as a client/server configuration, with the user's machine as the client, and the machine housing the database as the server. The network can be an Intranet, which, for example, connects employees within a corporation, or it can be the Internet.

In the three-tier model, commands are sent to a "middle tier" of services, which then send SQL statements to the database. The database processes the SQL statements and sends the results back to the middle tier, which then sends them to the user. MIS directors find the three-tier model very attractive because the middle tier makes it possible to maintain control over access and the kinds of updates that can be made to corporate data. Another advantage is that when there is a middle tier, the user can employ an easy-to-use higher-level API which is translated by the middle tier into the appropriate low-level calls. Finally, in many cases the three-tier architecture can provide performance advantages.

Until now the middle tier has typically been written in languages such as C or C++, which offer fast performance. However, with the introduction of optimizing compilers that translate Java byte code into efficient machine-specific code, it is becoming practical to implement the middle tier in Java. This is a big plus, making it possible to take advantage of Java's robustness, multithreading, and security features. JDBC is important to allow database access from a Java middle tier.

JDBC Driver Types

The JDBC drivers that we are aware of at this time fit into one of four categories:

Ø JDBC-ODBC bridge plus ODBC driver

Ø Native-API partly-Java driver

Ø JDBC-Net pure Java driver

Ø Native-protocol pure Java driver


If possible, use a Pure Java JDBC driver instead of the Bridge and an ODBC driver. This completely eliminates the client configuration required by ODBC. It also eliminates the potential that the Java VM could be corrupted by an error in the native code brought in by the Bridge (that is, the Bridge native library, the ODBC driver manager library, the ODBC driver library, and the database client library).

What Is the JDBC- ODBC Bridge?

The JDBC-ODBC Bridge is a JDBC driver, which implements JDBC operations by translating them into ODBC operations. To ODBC it appears as a normal application program. The Bridge implements JDBC for any database for which an ODBC driver is available. The Bridge is implemented as the

Sun.jdbc.odbc Java package and contains a native library used to access ODBC. The Bridge is a joint development of Innersole and Java Soft.

Java Server Pages (JSP)

Java server Pages is a simple, yet powerful technology for creating and maintaining dynamic-content web pages. Based on the Java programming language, Java Server Pages offers proven portability, open standards, and a mature re-usable component model .The Java Server Pages architecture enables the separation of content generation from content presentation. This separation not eases maintenance headaches; it also allows web team members to focus on their areas of expertise. Now, web page designer can concentrate on layout, and web application designers on programming, with minimal concern about impacting each other's work.

Features of JSP


Java Server Pages files can be run on any web server or web-enabled application server that provides support for them. Dubbed the JSP engine, this support involves recognition, translation, and management of the Java Server Page lifecycle and its interaction components.


It was mentioned earlier that the Java Server Pages architecture can include reusable Java components. The architecture also allows for the embedding of a scripting language directly into the Java Server Pages file. The components current supported include Java Beans, and Servlets.


A Java Server Pages file is essentially an HTML document with JSP scripting or tags. The Java Server Pages file has a JSP extension to the server as a Java Server Pages file. Before the page is served, the Java Server Pages syntax is parsed and processed into a Servlet on the server side. The Servlet that is generated outputs real content in straight HTML for responding to the client.

Access Models:

A Java Server Pages file may be accessed in at least two different ways. A client's request comes directly into a Java Server Page. In this scenario, suppose the page accesses reusable Java Bean components that perform particular well-defined computations like accessing a database. The result of the Beans computations, called result sets is stored within the Bean as properties. The page uses such Beans to generate dynamic content and present it back to the client.

In both of the above cases, the page could also contain any valid Java code. Java Server Pages architecture encourages separation of content from presentation.

Steps in the execution of a JSP Application:

1. The client sends a request to the web server for a JSP file by giving the name of the JSP file within the form tag of a HTML page.

2. This request is transferred to the JavaWebServer. At the server side JavaWebServer receives the request and if it is a request for a jsp file server gives this request to the JSP engine.

3. JSP engine is program which can understands the tags of the jsp and then it converts those tags into a Servlet program and it is stored at the server side. This Servlet is loaded in the memory and then it is executed and the result is given back to the JavaWebServer and then it is transferred back to the result is given back to the JavaWebServer and then it is transferred back to the client.

JDBC connectivity

The JDBC provides database-independent connectivity between the J2EE platform and a wide range of tabular data sources. JDBC technology allows an Application Component Provider to:

Ø Perform connection and authentication to a database server

Ø Manager transactions

Ø Move SQL statements to a database engine for preprocessing and execution

Ø Execute stored procedures

Ø Inspect and modify the results from Select statements


The generated application is the first version upon the system. The overall system is planned to be in the formal of distributed architecture with homogeneous database platform. The major objective of the overall system is to keep the following components intact.

² System consistency ² System integrity ² Overall security of data ² Data reliability and Accuracy ² User friendly name both at administration and user levels ² Considering the fact of generality and clarity ²To cross check that the system overcomes the hurdles of the version specific standards entity-relationship model.

Databases are used to store structured data. The structure of this data, together with other constraints, can be designed using a variety of techniques, one of which is called entity-relationship modeling or ERM. The end-product of the ERM process is an entity-relationship diagram or ERD. Data modeling requires a graphical notation for representing such data models. An ERD is a type of conceptual data model or semantic data model.

The first stage of information system design uses these models to describe information needs or the type of information that is to be stored in a database during the requirements analysis. The data modeling technique can be used to describe any ontology (i.e. an overview and classifications of used terms and their relationships) for a certain universe of discourse (i.e. area of interest). In the case of the design of an information system that is based on a database, the conceptual data model is, at a later stage (usually called logical design), mapped to a logical data model, such as the relational model; this in turn is mapped to a physical model during physical design. Note that sometimes, both of these phases are referred to as "physical design".

There are a number of conventions for entity-relationship diagrams (ERDs). The classical notation is described in the remainder of this article, and mainly relates to conceptual modelling. There are a range of notations more typically employed in logical and physical database design, including information engineering, IDEF1x (ICAM DEFinition Language) and dimensional modelling.

A sample ER diagram

An entity represents a discrete object. Entities can be thought of as nouns. Examples: a computer, an employee, a song, a mathematical theorem. A relationship captures how two or more entities are related to one another. Relationships can be thought of as verbs. Examples: an owns relationship between a company and a computer, a supervises relationship between an employee and a department, a performs relationship between an artist and a song, a proved relationship between a mathematician and a theorem. Entities are drawn as rectangles, relationships as diamonds.

Entities and relationships can both have attributes. Examples: an employee entity might have a social security number attribute (in the US); the proved relationship may have a date attribute. Attributes are drawn as ovals connected to their owning entity sets by a line.

Every entity (unless it is a weak entity) must have a minimal set of uniquely identifying attributes, which is called the entity's primary key.

Entity-relationship diagrams don't show single entities or single instances of relations. Rather, they show entity sets and relationship sets (displayed as rectangles and diamonds respectively). Example: a particular song is an entity. The collection of all songs in a database is an entity set. The proved relationship between Andrew Wiles and Fermat's last theorem is a single relationship. The set of all such mathematician-theorem relationships in a database is a relationship set.

Lines are drawn between entity sets and the relationship sets they are involved in. If all entities in an entity set must participate in the relationship set, a thick or double line is drawn. This is called a participation constraint. If each entity of the entity set can participate in at most one relationship in the relationship set, an arrow is drawn from the entity set to the relationship set. This is called a key constraint. To indicate that each entity in the entity set is involved in exactly one relationship, a thick arrow is drawn.

Associative entity is used to solve the problem of two entities with a many-to-many relationship

Unary Relationships - a unary relationship is a relationship between the rows of a single table.

Less common symbols

A weak entity is an entity that can't be uniquely identified by its own attributes alone, and therefore must use as its primary key both its own attributes and the primary key of an entity it is related to. A weak entity set is indicated by a bold rectangle (the entity) connected by a bold arrow to a bold diamond (the relationship). Double lines can be used instead of bold ones.

Attributes in an ER model may be further described as multi-valued, composite, or derived. A multi-valued attribute, illustrated with a double-line ellipse, may have more than one value for at least one instance of its entity. For example, a piece of software (entity=application) may have the multivalued attribute "platform" because at least one instance of that entity runs on more than one operating system. A composite attribute may itself contain two or more attributes and is indicated as having at least contributing attributes of its own. For example, addresses usually are composite attributes, composed of attributes such as street address, city, and so forth. Derived attributes are attributes whose value is entirely dependent on another attribute and are indicated by dashed ellipses. For example, if we have an employee database with an employee entity along with an age attribute, the age attribute would be derived from a birth date attribute.

Sometimes two entities are more specific subtypes of a more general type of entity. For example, programmers and marketers might both be types of employees at a software company. To indicate this, a triangle with "ISA" on the inside is drawn. The superclass is connected to the point on top and the two (or more) subclasses are connected to the base.

A relation and all its participating entity sets can be treated as a single entity set for the purpose of taking part in another relation through aggregation, indicated by drawing a dotted rectangle around all aggregated entities and relationships.

Alternative diagramming conventions

Crow's Feet

Two related entities shown using Crow's Feet notation

The "Crow's Feet" notation is named for the symbol used to denote the many sides of a relationship, which resembles the forward digits of a bird's claw. You can see this claw shape in the diagram to the right, representing the same relationship depicted in the Common symbols section above.

In the diagram, the following facts are detailed:

* An Artist can perform many Songs, identified by the crow's foot.

* An Artist must perform at least one Song, shown by the perpendicular line.

* A Song may or may not be performed by any Artist, as indicated by the open circle.

This notation is gaining acceptance through common usage in Oracle texts, and in tools such as Visio and PowerDesigner, with the following benefits:

* Clarity in identifying the many, or child, side of the relationship, using the crow's foot.

* Concise notation for identifying mandatory relationship, using a perpendicular bar, or an optional relationship, using an open circle.

E-R Diagrams for E-Transaction Interface

Database normalization is a design technique by which relational database tables are structured in such a way as to make them invulnerable to certain types of logical inconsistencies and anomalies. Tables can be normalized to varying degrees: relational database theory defines "normal forms" of successively higher degrees of stringency, so, for example, a table in third normal form is less open to logical inconsistencies and anomalies than a table that is only in second normal form. Although the normal forms are often defined (informally) in terms of the characteristics of tables, rigorous definitions of the normal forms are concerned with the characteristics of mathematical constructs known as relations. Whenever information is represented relationally-that is, roughly speaking, as values within rows beneath fixed column headings-it makes sense to ask to what extent the representation is normalized.

Problems addressed by normalization

A table that is not sufficiently normalized can suffer from logical inconsistencies of various types, and from anomalies involving data operations. In such a table:

* The same fact can be expressed on multiple records; therefore updates to the table may result in logical inconsistencies. For example, each record in an unnormalized "DVD Rentals" table might contain a DVD ID, Member ID, and Member Address; thus a change of address for a particular member will potentially need to be applied to multiple records. If the update is not carried through successfully-if, that is, the member's address is updated on some records but not others-then the table is left in an inconsistent state. Specifically, the table provides conflicting answers to the question of what this particular member's address is. This phenomenon is known as an update anomaly.

* There are circumstances in which certain facts cannot be recorded at all. In the above example, if it is the case that Member Address is held only in the "DVD Rentals" table, then we cannot record the address of a member who has not yet rented any DVDs. This phenomenon is known as an insertion anomaly.

* There are circumstances in which the deletion of data representing certain facts necessitates the deletion of data representing completely different facts. For example, suppose a table has the attributes Student ID, Course ID, and Lecturer ID (a given student is enrolled in a given course, which is taught by a given lecturer). If the number of students enrolled in the course temporarily drops to zero, the last of the records referencing that course must be deleted-meaning, as a side-effect, that the table no longer tells us which lecturer has been assigned to teach the course. This phenomenon is known as a deletion anomaly.

Ideally, a relational database should be designed in such a way as to exclude the possibility of update, insertion, and deletion anomalies. The normal forms of relational database theory provide guidelines for deciding whether a particular design will be vulnerable to such anomalies. It is possible to correct an unnormalized design so as to make it adhere to the demands of the normal forms: this is normalization. Normalization typically involves decomposing an unnormalized table into two or more tables which, were they to be combined (joined), would convey exactly the same information as the original table.

Background to normalization: definitions

* Functional dependency :Attribute B has a functional dependency on attribute A if, for each value of attribute A, there is exactly one value of attribute B. For example, Member Address has a functional dependency on Member ID, because a particular Member Address value corresponds to every Member ID value. An attribute may be functionally dependent either on a single attribute or on a combination of attributes. It is not possible to determine the extent to which a design is normalized without understanding what functional dependencies apply to the attributes within its tables; understanding this, in turn, requires knowledge of the problem domain.

* Trivial functional dependency: A trivial functional dependency is a functional dependency of an attribute on a superset of itself. {Member ID, Member Address} → {Member Address} is trivial, as is {Member Address} → {Member Address}.

* Full functional dependency: An attribute is fully functionally dependent on a set of attributes X if it is a) functionally dependent on X, and b) not functionally dependent on any proper subset of X. {Member Address} has a functional dependency on {DVD ID, Member ID}, but not a full functional dependency, for it is also dependent on {Member ID}.

* Multivalued dependency: A multivalued dependency is a constraint according to which the presence of certain rows in a table implies the presence of certain other rows: see the Multivalued Dependency article for a rigorous definition.

* Superkey: A superkey is an attribute or set of attributes that uniquely identifies rows within a table; in other words, two distinct rows are always guaranteed to have distinct superkeys. {DVD ID, Member ID, Member Address} would be a superkey for the "DVD Rentals" table; {DVD ID, Member ID} would also be a superkey.

* Candidate key: A candidate key is a minimal superkey, that is, a superkey for which we can say that no proper subset of it is also a superkey. {DVD ID, Member ID} would be a candidate key for the "DVD Rentals" table.

* Non-prime attribute: A non-prime attribute is an attribute that does not occur in any candidate key. Member Address would be a non-prime attribute in the "DVD Rentals" table.

* Primary key: Most DBMSs require a table to be defined as having a single unique key, rather than a number of possible unique keys. A primary key is a candidate key which the database designer has designated for this purpose.


Edgar F. Codd first proposed the process of normalization and what came to be known as the 1st normal form:

Edgar F. Codd used the term "non-simple" domains to describe a heterogeneous data structure, but later researchers would refer to such a structure as an abstract data type.

Normal forms

The normal forms (abbrev. NF) of relational database theory provide criteria for determining a table's degree of vulnerability to logical inconsistencies and anomalies. The higher the normal form applicable to a table, the less vulnerable it is to such inconsistencies and anomalies. Each table has a "highest normal form" (HNF): by definition, a table always meets the requirements of its HNF and of all normal forms lower than its HNF; also by definition, a table fails to meet the requirements of any normal form higher than its HNF.

The normal forms are applicable to individual tables; to say that an entire database is in normal form n is to say that all of its tables are in normal form n.

Newcomers to database design sometimes suppose that normalization proceeds in an iterative fashion, i.e. a 1NF design is first normalized to 2NF, then to 3NF, and so on. This is not an accurate description of how normalization typically works. A sensibly designed table is likely to be in 3NF on the first attempt; furthermore, if it is 3NF, it is overwhelmingly likely to have an HNF of 5NF. Achieving the "higher" normal forms (above 3NF) does not usually require an extra expenditure of effort on the part of the designer, because 3NF tables usually need no modification to meet the requirements of these higher normal forms.

Edgar F. Codd originally defined the first three normal forms (1NF, 2NF, and 3NF). These normal forms have been summarized as requiring that all non-key attributes be dependent on "the key, the whole key and nothing but the key". The fourth and fifth normal forms (4NF and 5NF) deal specifically with the representation of many-to-many and one-to-many relationships among attributes. Sixth normal form (6NF) incorporates considerations relevant to temporal databases.

First normal form

The criteria for first normal form (1NF) are:

* A table must be guaranteed not to have any duplicate records; therefore it must have at least one candidate key.

* There must be no repeating groups, i.e. no attributes which occur a different number of times on different records. For example, suppose that an employee can have multiple skills: a possible representation of employees' skills is {Employee ID, Skill1, Skill2, Skill3 ...}, where {Employee ID} is the unique identifier for a record. This representation would not be in 1NF.

* Note that all relations are in 1NF. The question of whether a given representation is in 1NF is equivalent to the question of whether it is a relation.

Second normal form

The criteria for second normal form (2NF) are:

* The table must be in 1NF.

* None of the non-prime attributes of the table are functionally dependent on a part (proper subset) of a candidate key; in other words, all functional dependencies of non-prime attributes on candidate keys are full functional dependencies. For example, consider a "Department Members" table whose attributes are Department ID, Employee ID, and Employee Date of Birth; and suppose that an employee works in one or more departments. The combination of Department ID and Employee ID uniquely identifies records within the table. Given that Employee Date of Birth depends on only one of those attributes - namely, Employee ID - the table is not in 2NF.

* Note that if none of a 1NF table's candidate keys are composite - i.e. every candidate key consists of just one attribute - then we can say immediately that the table is in 2NF.

Third normal form

The criteria for third normal form (3NF) are:

* The table must be in 2NF.

* There are no non-trivial functional dependencies between non-prime attributes. A violation of 3NF would mean that at least one non-prime attribute is only indirectly dependent (transitively dependent) on a candidate key, by virtue of being functionally dependent on another non-prime attribute. For example, consider a "Departments" table whose attributes are Department ID, Department Name, Manager ID, and Manager Hire Date; and suppose that each manager can manage one or more departments. {Department ID} is a candidate key. Although Manager Hire Date is functionally dependent on {Department ID}, it is also functionally dependent on the non-prime attribute Manager ID. This means the table is not in 3NF.

Boyce-Codd normal form

The criteria for Boyce-Codd normal form (BCNF) are:

* The table must be in 3NF.

* Every non-trivial functional dependency must be a dependency on a superkey.

Fourth normal form

The criteria for fourth normal form (4NF) are:

* The table must be in BCNF.

* There must be no non-trivial multivalued dependencies on something other than a superkey. A BCNF table is said to be in 4NF if and only if all of its multivalued dependencies are functional dependencies.

Fifth normal form

The criteria for fifth normal form (5NF and also PJ/NF) are:

* The table must be in 4NF.

* There must be no non-trivial join dependencies that do not follow from the key constraints. A 4NF table is said to be in the 5NF if and only if every join dependency in it is implied by the candidate keys.

Domain/key normal form

Domain/key normal form (or DKNF) requires that a table not be subject to any constraints other than domain constraints and key constraints.

Sixth normal form

This normal form was, as of 2005, only recently proposed: the sixth normal form (6NF) was only defined when extending the relational model to take into account the temporal dimension. Unfortunately, most current SQL technologies as of 2005 do not take into account this work, and most temporal extensions to SQL are not relational. See work by Date, Darwen and Lorentzos for a relational temporal extension, or see TSQL2 for a different approach.