This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
This chapter shows a literature review on data conversion and its existing data conversion tools. A research on the potential technologies for the implementation of the project is also introduced.
Data Conversion is the conversion of computer data from one format to another. Throughout a computer environment, data is encoded in a variety of ways. For example, the computer hardware is built on the basis of certain standards, which requires that data contains, for example, parity bit checks. Similarly, the operating system is predicated on certain standards for data and file handling. Furthermore, each computer program handles data in a different manner. Whenever any one of these variable is changed, data must be converted in some way before it can be used by a different computer, operating system or program. Even different versions of these elements usually involve different data structures. For example, the changing of bits from one format to another, usually for the purpose of application interoperability or of capability of using new features, is merely a data conversion. Data conversions may as simple as the conversion of a text file from one character encoding system to another; or more complex, such as the conversion of office file formats, or the conversion of image and audio file formats.
There are many ways in which data is converted within the computer environment. This may be seamless, as in the case of upgrading to a newer version of a computer program. Alternatively, the conversion may require processing by the use of a special conversion program, or it may involve a complex process of going through intermediary stages, or involving complex "exporting" and "importing" procedures, which may converting to and from a tab-delimited or comma-separated text file. In some cases, a program may recognise several data file formats at the data input stage and then is also capable of storing the output data in a number of different formats. Such a program may be used to convert a file format. If the source format or target format is not recognised, then at times third program may be available which permits the conversion to an intermediate format, which can then be reformatted using the first program. There are many possible scenarios.
Before any data conversion is carried out, the user or application programmer should keep a few basics of computing and information theory in mind. These include:
Information can easily be discarded by the computer, but adding information takes effort.
The computer can add information only in a rule-based fashion.
Upsampling the data or converting to a more feature-rich format does not add information; it merely makes room for that addition, which usually a human must do.
.2 Lost and inexact data conversion
The objective is to maintain all of the data, and as much of the embedded information as possible. This can only be done if the target format supports the same features and data structures present in the source file. Sometimes there is loss of formatting information, for example, the conversion of a word processing document to a plain text file. Loss of information can be mitigated by approximation in the target format. Data conversion can also suffer from inexactitude, the result of converting between formats that are conceptually different.
2.1.3 Open vs. secret specifications
Successful data conversion requires thorough knowledge of the workings of both source and target formats. In the case where the specification of a format is unknown, reverse engineering will be needed to carry out conversion. Reverse engineering can achieve close approximation of the original specifications, but errors and missing features can still result.
2.1.4 Pivotal conversion
Data conversion can occur directly from one format to another, but many applications that convert between multiple formats use a pivotal encoding by way of which any source format is converted to its target. Office applications, when employed to convert between office file formats, use their internal, default file format as a pivot. For example, a word processor may convert an RTF file to a WordPerfect file by converting the RTF to OpenDocument and then that to WordPerfect format.
Figure1: An example of data conversion of Access to MySQL Converter (ref: http://www.google.mu/imgres [Accessed 17 February 2011])
Data migration is the process of transferring data between storage types, formats, or computer systems. Data migration is usually performed programmatically to achieve an automated migration, freeing up human resources from tedious tasks. It is required when organizations or individuals change computer systems or upgrade to new systems, or when systems merge (such as when the organizations that use them undergo a merger or takeover).
In order to achieve an effective data migration procedure, data on the old system is mapped to the new system providing a design for data extraction and data loading. The design relates old data formats to the new system's formats and requirements. Programmatic data migration may involve many phases but it minimally includes data extraction where data is read from the old system and data loading where data is written to the new system.
If a decision has been made to provide a set input file specification for loading data onto the target system, this allows a pre-load 'data validation' step to be put in place, interrupting the standard E(T)L process. Such a data validation process can be designed to interrogate the data to be transferred, to ensure that it meets the predefined criteria of the target environment, and the input file specification. An alternative strategy is to have on-the-fly data validation occurring at the point of loading, which can be designed to report on load rejection errors as the load progresses. However, in the event that the extracted and transformed data elements are highly 'integrated' with one another, and the presence of all extracted data in the target system is essential to system functionality, this strategy can have detrimental, and not easily quantifiable effects.
After loading into the new system, results are subjected to data verification to determine whether data was accurately translated, is complete, and supports processes in the new system. During verification, there may be a need for a parallel run of both systems to identify areas of disparity and forestall erroneous data loss.
Automated and manual data cleaning is commonly performed in migration to improve data quality, eliminate redundant or obsolete information, and match the requirements of the new system.
Data migration phases (design, extraction, cleansing, load, verification) for applications of moderate to high complexity are commonly repeated several times before the new system is deployed.
Data is stored on various media in files or databases, and is generated and consumed by software applications which in turn support business processes. The need to transfer and convert data can be driven by multiple business requirements and the approach taken to the migration depends on those requirements. Four major migration categories are proposed on this basis. They are storage migration, database migration, application migration, business process migration, project v/s process.
Database migration will be considered in this report. Similarly, it may be necessary to move from one database vendor to another, or to upgrade the version of database software being used. The latter case is less likely to require a physical data migration, but this can happen with major upgrades. In these cases a physical transformation process may be required since the underlying data format can change significantly. This may or may not affect behaviour in the applications layer, depending largely on whether the data manipulation language or protocol has changed ââ‚¬" but modern applications are written to be agnostic to the database technology so that a change from Sybase, MySQL, DB2 or SQL Server to Oracle should only require a testing cycle to be confident that both functional and non-functional performance has not been adversely affected.
Data mapping also forms part of data conversion. It is the process of creating data element mappings between two distinct data models. Data mapping is used as a first step for a wide variety of data integration tasks including:
Data transformation or data mediation between a data source and a destination
Identification of data relationships as part of data lineage analysis
Discovery of hidden sensitive data such as the last four digits social security number hidden in another user id as part of a data masking or de-identification project
Consolidation of multiple databases into a single data base and identifying redundant columns of data for consolidation or elimination
Existing Data Conversion Tools
There exist several data conversion tools. They are as follows:
Data Junction, the Company's flagship product, is a visual design tool for rapidly building and testing transformation objects that work with hundreds of applications and structured data formats. Two major design components automate every aspect of complex application integration and data transformation with drag-and-drop ease. The Project Designer is a graphical process flow component that provides a canvas onto which process steps can be dragged/dropped and interconnected. It employs simple flowchart symbols to link a limitless number of transformation objects together in a single, automated project. In addition to transformations designed with Data Junction, the Project Designer allows users to integrate external procedures, test global variables, leverage full conditional flow control and implement transactions between steps. The Conversion Designer combines the ease of an intuitive GUI with a robust transformation engine to visually and directly map source data to target structures while allowing the user to manipulate the data in virtually limitless ways. Both design components feature integrated metadata functionality for publishing to a repository or querying internally.
Developers can extend Data Junction's range of supported formats with the Custom Data Interface SDK. The CDI SDK is an API that enables developers to write their own connections between Data Junction and unique data sources residing on any platform. This SDK functionality is perfect for connectivity to proprietary file formats or for an additional pre-or post-process layer of security and/or record handling.
MS Access to MySQL Database Converter
MS Access to MySQL Database Converter offers rapid database conversion across databases maintaining the complete database integrity. Database conversion software easily and accurately converts database records created in MS Access into MySQL database records. MySQL is most popular open source database that is accepted world wide for the development of web applications. All the webmasters or web developers requires MySQL database for making their web applications supportive to various OS platforms like LAMP or WAMP.
The database created in MS Access cannot be used for this purpose and needs to be converted into MySQL. Converting the database manually is not an easy job and consumes much time and effort and may face the lack of accuracy after conversion. DRPU provides database migration utility that effectively and efficiently converts Microsoft Access databases into MySQL databases in very less time. The database conversion tool provides the facility of converting MS Access password protected MDB file records into MySQL database records without affecting the accuracy of the database records.
MySQL to MSSQL Database Converter
MySQL to MS SQL database Converter is reliable database migration utility that easily and effectively converts MySQL database records into MSSQL database records in a single click. Database conversion tool accurately converts the database records of one database format into another data base format that is database conversion takes place from source data base format to destination data base format in real time.
Database migration software converts or migrates the database records in accurate manner that is the originality of the MySQL database will remain same and there will not occur any kind of changes in the structure and functionality of the converted MSSQL data base. The shareware database conversion utility provide support to all key constraints, null value constraints, data types, tables (including rows and columns), schemas, attributes etc even after database conversion.
The WisdomForce FastReader provides Database Administrators, Data Warehouse Architects, Developers and QA with the ability to make a quick snapshot of data from a Database. FastReader can unload and extract Oracle tables of any size (terabytes of data) into portable flat text files in a fraction of the time and with no system overhead. The user is able to easily configure the output format which can be csv, xml, and among others. The user is able to perform fast and selective unload of data from Oracle which effectively utilizes the multi-processor environment. At the same time, FastReader automatically prepares input for high-speed loaders such as Oracle SQL*Loader, bcp, and among others. WisdomForce FastReader works on major databases, including Oracle, DB2, MSSQL, MySQL, Sybase.
Platform: Sun Solaris, HPUX, AIX, Tru64, Windows, Linux/386, IA64
license: free trial
DBConvert for Oracle & MS Access
DBConvert for Oracle & MS Access is a reliable bi-directional database migration tool which allows you to convert from:
Oracle to MS Access
Oracle to Oracle
MS Access to Oracle
MS Access to MS Access
Databases are converted from Oracle to MS Access or from MS Access to Oracle rapidly and reliably. It operates with a whole database or selects only needed tables, fields, indexes and foreign keys to proceed. It reaches the desired result by simply configuring of several options through Wizard interface or in command line mode.
DBConvert for Oracle & MS Access is also applicable for copying and synchronizing MS Access database with another MS Access database. Moreover, DBConvert for Oracle & MS Access is quite well for Oracle database synchronizing and migration to another Oracle database or to another Oracle Server.
MS Excel to MySQL Database Converter
It is easy to use and wonderful product developed for Software programmers, Database professionals or beginners and other similar business users to adapt with changing technologies. Now with database conversion utility all your database related queries can be easily solved in fast and efficient way. Software with advance technology converts database records of Microsoft Excel worksheet to MySQL format in quick time and thus saves your time and money required while converting records manually.
MS Excel to MySQL Database Converter Software works with all major versions of MySQL server and supports all database data types and attributes. Free trial is available to understand and analyze software features and working. The software demo is downloaded and catered answers to all your queries.
A research was done on three programming languages namely java, C++ and vb.net. They are as follows:
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities. Java applications are typically compiled to bytecode (class file) that can run on any Java Virtual Machine (JVM) regardless of computer architecture. Java is a general-purpose, concurrent, class-based, object-oriented language that is specifically designed to have as few implementation dependencies as possible. It is intended to let application developers "write once, run anywhere". Java is currently one of the most popular programming languages in use, and is widely used from application software to web applications.
Microsoft Visual C++
Microsoft Visual C++ is a commercial, non-free integrated development environment (IDE) product from Microsoft for the C, C++, and C++/CLI programming languages. It has tools for developing and debugging C++ code, especially code written for the Microsoft Windows API, the DirectX API, and the Microsoft .NET Framework. The latest version, visual C++10.0 was released on April 12, 2010, and it is currently the latest stable release. It uses a SQL Server Compact database to store information about the source code, including IntelliSense information, for better IntelliSense and code-completion support. This version adds a modern C++ parallel computing library called the Parallel Patterns Library, partial support for C++0x, significantly improved IntelliSense, and performance improvements to both the compiler and generated code. This version is built around .NET 4.0, but supports compiling to machine code. The partial C++0x support mainly consists of six compiler features such as lambdas, rvalue references, auto, decltype, static_assert, nullptr, and some library features, for example,. moving the TR1 components from std::tr1 namespace directly to std namespace. Variadic templates were also considered, but delayed until some future version due to lower priority which stemmed from the fact that unlike other costly-to-implement features (lambda, rvalue references), this one would benefit only a minority of library writers than the majority of compiler end users. By default, all applications compiled against the Visual C++ 2010 Runtimes will only work under Windows XP SP2 and later.
Visual Basic .NET (VB.NET) is an object-oriented computer programming language that can be viewed as an evolution of the classic Visual Basic (VB) which is implemented on the .NET Framework. Microsoft currently supplies two major implementations of Visual Basic: Microsoft Visual Studio, which is commercial software and Microsoft Visual Studio Express, which is free of charge. The latest version, visual basic 2010 (vb10.0) was released in April 2010 and it uses the Dynamic Language runtime (DLR). The Visual Basic compiler was improved to infer line continuation in a set of common contexts, in many cases removing the need for the "_" line continuation character. Also, existing support of inline Functions was complemented with support for inline Subs as well as multi-line versions of both Sub and Function lambdas.