Information Has Been A Learning Source Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Information has been a learning source. But it can be more an encumbrance and not a profit, if it is not organized properly, or processed, and made usable to the concerned user in a form for decision making,

Over the years, IT industries have arisen into composite systems and this disunited surroundings has resulted into major problems and in turn are breaching business processes, such as hindering Customer Relationship Management (CRM), Enterprise Resource Planning (ERP), and Supply Chain Management (SCM), perverting analytics, and charging great deal of billions a year.

Master Data is the critical business information corroborating the transactional and analytical functioning of the enterprise. Master Data Management (MDM) is a collection of applications and technologies that strengthens the incorporated master data, and contemporizes it with analytical instruments. It makes significant benefits in decision-making.

MDM approaches the enterprise data quality job at its beginning point on the operational slope of the business. Hence data warehousing / analytical side of the business are aligned together and is very productive in leading enterprise across the globe.

Master Data Management (MDM) is the comprehensive stage for organizations to increase time to value, improve information quality and decrease integration revenue to the company. Information management also stresses on planning and implementing solutions throughout the enterprise, and assists to gain the in process governance. It cedes most trusted information throughout the entire information supply chain which boosts the business process by gaining organizations perceptivities.


IBM infosphere is a platform for trusted information, data integration, data ware housing, master data management, big data and information governance. Its just like a foundation for enterprise having large no of projects which provides them various facilities like performance scalability, reliability and speeds up the proceess of delivering quality data and face the challenges.The products used are: IBM Information Server and Master Data Management

1.11 IBM Infosphere Information Server:

IBM InfoSphere Information Server is the platform which provides context rich and trust worthy information by understanding, extracting, cleansing, and performing various transformation on the input data and can scale to large extent and meets any kind of volume recquiremnt. Information server provides faster result with high quality of data

The Functions of Information Server:

Understand The Data :

Infosphere Server provides facilities to import data from various sources and helps in analyses of the information by data profiling, and anlysis of relationships, hence it decreses the hazards, Information server hence provides increased productivity by data quality auditing

Cleansing The Information

InfoSphere Information Server cleanses the data by removing unwanted information, redundant information, applying some standardizing, to produce data with consistency and high quality. Information server also helps in creating single and error free information across the organisation.

Transform The Data Into Information

InfoSphere Information Server transforms the fetched data into a form useful for the next process. There are different types of existing transformation functions are present which provides processed data

Deliver The Information:

Information Server has ability to provide information in different forms, in different volumes, depending upon time, event, from one place to other, or at one place.

It also synchronizes information within itself.

IBM Infosphere Master Data Management

Master data is the most critical business data regarding the people, organisation, household so on, that is stored and communicated across the enterprise. It has very high value and plays a major role in all the business deals, and important decision.

MDM is a platform and a paradigm that provides the enterprise with trusted information and a 360 degree view of information from the data input of various sources which can be either a database, a flat file, for the matter of fact even unstructured data. It has set of process of operations on the information and disciplines that results into consistent data and relationship between the entities. MDM helps the organisation to govern the flow of bussiness information in very accurate manner. MDM server centralises the process and enables to acquire data with the help of various bussiness services built, that represents entire MDM platform.

The objective of the project is to develop an MDM Input/Output stage so that MDM can be seamlessly integrated into the IBM InfoSphere Information Server ETL job flow. This integration is achieved by automating couple of steps performed by the MDM. These steps need to be done at Infosphere Information server and it should convert the data into the form as required by the MDM automatically and pass that data to MDM for processing. In this way connectivity is achieved between Infosphere Information Server and Infosphere MDM.

1.2 Literature survey

Literature survey is mainly carried out in order to analyze the background of the

current project which helps to find out flaws in the existing system & guides on which unsolved problems can work out. So, the following topics not only illustrate the background of the project but also uncover the problems and flaws which motivated to propose solutions and work on this project. The purpose of this study is to provide background information on the issues to be considered in this thesis and to emphasize the relevance of the present study.

In order to ascertain data quality the master data architecture controls divided access, reverberation, and data flow which is one among the part of MDM processing. The research in the field of knowledge about mdm architecture is not that comprehensive. Earlier contributions in this field can be found in data mining and data warehousing, in which they considered mdm architecture as major thing in standardization and exact reporting dimensions.

In the mid1990s researchers discussed intensively the concept of "Planning the Strategic Data" which result into a data architecture. Albeit not using the term "master data" Planning the Strategic Data refers to information that is used on a company-wide level. More recent studies addressing the topic of the master data architecture have been a case study on multidimensional databases and an investigation of the product information supply chain in the consumer goods industry. Furthermore, Allemang made the distinction between linked data enterprise architecture and master data architecture without specifying the latter any further.

The most comprehensive scientific contribution to the topic has been made by Otto and Schmidt (2010). They refer to master data architectures as information architectures the scope of which is restricted to master data. These authors use the notion of "infor- mation architecture" not only to include shared data sources and data flows between data bases into the concept, but also to emphasize the demand for "authoritative" data standards and definitions. In their under- standing, the master data architecture comprises both a conceptual master data model and the application architecture for master data. The application architecture for master data can be further divided into source and target systems of master data on the one hand and data flows on the other hand. Moreover, the authors identify a set of design decisions related to the master data architecture.

Many contributions can be found in the research community addressing the general data architecture. There are a number of architecture models -the Zachman Frame- work, The Open Group Architecture Framework (TOGAF) , Enterprise Architecture Planning (EAP) , and the Enterprise Architecture Cube, just to name a few-consider the data architecture to be part of the overall enterprise architecture. Otto and Schmidt (2010), however, show that these frameworks by definition are not useful in providing both the conceptual breadth and depth required to investigate, analyze and design a master data architecture in detail. Master data architecture is a topic of discussion also in the practitioners' community.

A prominent aspect in this regard is the question for the right architectural style. Analyst company Gartner, for example, distinguishes between three styles . Design/construct MDM is needed if a new business is created, operational MDM supports regular operations of existing businesses, and analytical MDM is mainly used for reporting purposes. Oberhofer and Dreibelbis (2008) propose a similar categorization, describing also operational and analytical approaches, but identifying also other type of collaborative method. In these different styles of architecture it relates to the flow of data between a golden record of organisation and data source systems. An analytical architecture, for example, uses a unidirectional flow of data from the source systems to the "golden record", using extract, transform and load (ETL) processes before importing the data.


MDM(master data management) is a set of Disciplines that provide a consistent understanding of master data entities, and their relationships. It is a set of Technologies that provide a mechanism for consistent use of master data across the organization, prescribed by governance policies. MDM is an essential product of IBM.

Infosphere Information Server provides a single unified platform that enables companies to understand, cleanse, transform, and deliver trustworthy and context-rich information.

A connector is required from the Infosphere information server for the MDM Database as there is no such connector in the information server till date. As a part of Information server, I got the responsibility of providing this connector from information server to the MDM. This connector is being designed to achieve connectivity between the IBM Infosphere Information server and IBM infosphere Master Data Management.

This connector is the java based connector, developed in java language. This is developed from the scratch and provides the basic functionalities required from any connector used in data stage. The real motivation behind developing this particular connector or stage is to integrate MDM with the information server which will increase the efficiency and productivity of MDM and enhance the functionality of Information Server as new connector or stage is being added into information server. There are two types of customer for MDM:one that extracts data and do little transformation on it, second that extract data and do complex tansformations on it. This connector is being designed for the second type of customers.

Any Enterprises which wants to accomplish its goals should always follow data governance rules. Everyone has different approaches according to their company needs and industry role. To make this possible companies needs to recognize and make use of already existing resources within and make most of it. Hence there is still a lot of distance between this goal and business process.

Hence MDM plays a vital role in todays rapidly growing technologies, any company can achieve their long end goals if they have mature data governance policies which MDM provides, across the company's customers and products.

The level of maturity and MDM determines the amount of impact of poor informatin management and data breaches. It also recquires the together working of people and the prosee and technology.

Some of the key points are :

Master data is usually spread out through the organisation with its sub organisation, take overs, alliances, acquires so on, resulting in dis ordered and redundant data which will effect bussiness process.

An exact interpretation is the key to business and results into most valuable resource

such as product information, customer information, billing information. And data keeps changing frequently, gets updated, its very difficult to maintain the single variant of correct information from all the sources.

MDM focuses on risk management and manages multiple domains of information, and include improved cost benefits and standards


The project is defined to develop an MDM platform and MDM Input/Output stage so that MDM can be seamlessly integrated into the Information Server ETL job flow.

Java management extension (JMX) technology is been used to remotely manage the MDM jobs and that needs to be done at Infosphere Information server and which will convert the data into the form as required by the MDM automatically and pass that data to MDM for processing. In this way connectivity is achieved between Infosphere Information Server and Infosphere MDM.

Different sources defined are data input sources for the information, using any extract transform and deliver tool such as information server the information is pre processed. Then the data is input to the various steps defined by the MDM process.

To the point of view of professionals, it helps the organization to figure and assess their data model and flow of information and the way it is stored and used. And to the academics it is using of different knowledge of probability, hashing, bucketing applied knowledge. The organizations should be made flexible to undergo various changes such as technical, managerial, and operational during the process.


From a corporate point of view, the objective is to give organizations the possibility to asses their own MDM maturity and benchmark against other organizations. This helps the firm figuring out improvement areas to become more efficient. From an academic point of view, the objective is to contribute to the body of knowledge on the field of master data management maturity assessment because little research was conducted on this particular topic so far yet it has a remarkable practical relevance. Some companies simply pile up masses of data with the expectation of gaining benefits from this. This expectation will not come true because the pure existence of data will not lead to any virtues. Even worse, it gets more and more complicated to find the particular relevant piece of data if it is somewhere among huge amounts of unmaintained data. The pure possession of data will not lead to anything if there is no logical structure that makes it possible to mine data according to criteria. Data only describes facts, there is still the lack of a judging ,interpreting or action triggering dimension . Problems with MDM can occur due to its complexity.A responsible person is confronted with a highly complex topic on the one hand and on the other hand, he is also overwhelmed with the saturated offerings of software solutions regarding this topic.The organization has to be prepared for the technical, operational and managerial changes that will occur throughout the process of MDM.Master data management is often not a sovereign management task, but just a sub area of business units. There is one person responsible for master data in his scope, but that person is not informed about data in other processes, even though the data might be the same and even changed in other processes.

The objective of the project is to develop an MDM Input/Output stage so that MDM can be seamlessly integrated into the IBM InfoSphere Information Server ETL job flow. This integration is achieved by automating couple of steps performed by the MDM.These steps need to be done at Infosphere Information server and it should convert the data into the form as required by the MDM automatically and pass that data to MDM for processing. The In this way connectivity is achieved between Infosphere Information Server and Infosphere MDM.


MDM has three versions InfoSphere Master Data Management Advanced Edition: Strategically transforms your organization through improved business processes and applications.

Standard Edition: Delivers business value for MDM projects with the quickest time-to-value. Collaborative Edition: Streamlines workflow activities across users who are involved in authoring and defining master information.

The different areas of master data may be in the context of customer data, product information, transactional data. Accounting and financial information is also important and used across the organistion. There can also be non-transactional data having the refernce information.

In the MDM platform resolutions and standards are enforced thought the enterprise. Various services are present to manage the master data. Successful MDM requires proper administration and collaboration between technology and bussiness.

Data from the various servers are processed by the Infosphere Information Server and convert in the format required by the MDM. The export and Import of the data as required by the different editions is separately processed for each edition of MDM. There is some manual steps which is currently performed at MDM side by the user in the MDM workbench provided in the form of user interface.

There are interfaces to achieve the connectivity between Infosphere Information Server and MDM but there is no mechanism to make them work together till date. With the help of this project we are developing an input output stage so that MDM is integrated into Information Server ETL job flow. Some of the jobs that are performed at MDM side is now moved to Information server so that efficiency of the overall system or organization can be increased. MDM Connector is capable of performing many functions like initial load,memget and memput to the MDM Server.


Complete master data management solutions require the entire lifecycle of the master entities to be managed from within the master data management solution. Controlling the entry of the master data allows the enterprise application to proactively manage the quality of the data. Although an enterprise implementation will be both the system of entry and system of record for all master data entities, it may still require mapping data to other applications.

It is not realistic for an organization to get all of their systems to use the exact same set of data. Some transformations will still be required to run the process systems. This does not mean that every defining characteristic of an entity is managed within the solution. Those defining attributes unique to an organization's system operation should be managed within the source system where they have relevance. The enterprise solution should provide a broad range of entry points to be a viable option as the system of entry. These are the key processes for any MDM system.Profile the master data. Understand all possible sources and the current state of data quality in resource. Consolidate the master data into a central repository and link it to all participating applications Govern the master data. Clean it up, duplicate it, and enrich it with information from 3rd party systems. Manage it according to business rules. Share it. Synchronize the central master data with enterprise business processes and the connected applications. Insure that data stays in sync across the IT landscape. Leverage the fact that a single version of the truth exists for all master data objects by supporting business intelligence systems and reporting.

Existing System:

MDM has three versions InfoSphere Master Data Management Advanced Edition Strategically transforms your organization through improved business processes and applications.

Standard Edition: Delivers business value for MDM projects with the quickest time-to-value. Collaborative Edition: Streamlines workflow activities across users who are involved in authoring and defining master information.

Data from the various servers are processed by the Infosphere Information Server and convert in the format recquired by the MDM. The export and Import of the data as required by the different editions is separately processed for each edition of MDM. There is some manual steps which is currently performed at MDM side in the form of a graphical user interface.

Proposed System:

A system that has a single export import utility for all the three editions and use the MPINET server in the data engine which provides TCP/IP socket connections for the various API calls from the clients.

The main purpose of this system is to achieve connectivity between IBM Infosphere information server and IBM Infosphere MDM server.



This connectivity is achieved by developing an Input Output stage betweem MDM and Information Server.

In the proposed system it is required to Provide a native Connector from Infosphere Information Server (DataStage )to InfoSphere MDM. This will be known as MDM input/output stage.

MDM Input/output stage is developed such that it should have following functionalities:





In Initial load data is initially loaded into MDM Server. For example any organization is using MDM, initially there is no data in MDM server engine, Initial load is the process of loading data into MDM. This will be the output of the MDM Connector and input for the MDM engine for a particular organization. This MDM Connector is provided by the Infosphere DataStage.

There are various jobs inside Initial Load: One of them is


In MPXDATA, the input to the MDM Connector is the input file, input file contains the input data. In MDM data is stored according to a data model, there is a different data model depending upon the member type. This data model is the metadata of MDM for an organization. This metadata is derived from the MDM using various api's and used in the MDM Connector to define the mappings between the source data and the actual data needs to be stored in the MDM.

The MDM connector have a GUI interface which will have a same functionality as MDM workbench and thereby avoiding manual steps from end user perspective. This GUI interface is required to create a configuration file manually as required by the user.

The output of the MDM Connector is the configuration file and mpxdata.xml file which is used as an input by the JMX Client. By using the mpxdata.xml file JMX Client is creating the different segments in the MDM Server and loading the data in those segments accordingly.


MEMGET is a type of interaction with the MDM engine to extract data based on key values. Fetches member records from the MDM database using a key such as : Enterprise ID, Source code and Member ID number or Member Record number. Output from the Master Data Engine a MemRow List, which is again a collection of rows that represent one or more member objects. You can retrieve members by executing the MemGet Interaction, which also takes a MemRow List as a parameter.

The MEMGET functionality needs to be performed by the MDM Connector. In MEMGET , user will extract some data from MDM Server engine, depending upon the key values. The user can extract data by giving some filtering conditions. The MDM Connector will provide a GUI interface to select the filtering properties for the MEMGET. The output of the MEMGET will be the data desired by the user, In MDM Connector some mapping will be performed to extract user desired data depending upon the fields specified by the user.

The MEMGET functionality is performed by the MDM Connector in output context. The result of the MEMGET functionality is input for the MDM Connector and output from the MDM engine. For MEMGET functionality MDM Connector will act as MDM Input Stage. In MDM Connector there is a graphical user interface for MEMGET which is designed in java swings and invoked from MDM Connector using DMDI(Dynamic Meta Data Import Interface) . This GUI is designed to take values of various filtering properties required for MEMGET operation by the user.


MEMPUT is a type of interaction with the MDM engine in which data is inserted into the MDM database.The Input parameters is constructed through segments like MemHead and MemName. Each one of these segments represents a row. Each row contains different attributes that are used to make up a member. Each row is then added to a MemRowList and passed in as a parameter to the MemPut Interaction. TheMemPut interaction sends these attributes to the Master Data Engine where the rows are processed and relevant tables in the Master Data Engine database are either inserted or updated.

This MEMPUT functionality should be performed by the MDM Connector. The MEMPUT is basically an input to the MDM database and output from the MDM Connector. For example if the user wants to enter data into the existing MDM database then MEMPUT can be used. MEMPUT functionality is performed by the MDM Connector in the output context.

For MEMPUT functionality , MDM Connector will act as MDM Output Stage.


This project thesis gives overall view of design and development of MDM input/output stage. The main body of the thesis is preceded by detailed table of contents, lists of figures, tables, and glossary followed by units used in the report which is followed by appendices which contains the screen shots. The body of the thesis is divided into 7 chapters.

Chapter 1 gives an brief introduction about the master data management including the Literature Survey which is aimed at presenting a review of existing literature on the subjects on Master data management concepts and practices, Motivation for the developing the project, the problem statement, the obective, scope of project and the methodology used in project process.

Chapter 2 is Software Requirement Specification which explains the user characteristics, assumptions and dependencies, constraints and functional requirements of the system.

Chapter 3 is High Level Design which explains the architectural strategies, system architecture, component interfaces and flow of data in the system with the help of Data Flow Diagrams (DFD).

Chapter 4 is Detailed Design which focuses on the major modules and their respective class diagrams and state diagrams. It explains the key components of the IC Framework by providing functional description of the modules.

Chapter 5 is Implementation which explains the programming language, development environment, code conventions followed during implementation of the project. This chapter also puts light on the difficulties encountered in the course of implementation of the project and strategies used to tackle them.

Chapter 6 is Software Testing which explains the test environment and briefly explains the test cases which were executed during various testing.

Chapter 7 is Conclusion which gives the outcome of the work carried out and also brings out the limitations of the project and future enhancements.


A Software Requirements Specification (SRS) is a complete description of the behavior of the system to be developed. It includes the functional and nonfunctional requirement for the software to be developed. The functional requirement includes what the software should do and nonfunctional requirement include the constraint on the design or implementation of the system. Requirements should be able to measure, and can be tested, and detail enough to be sufficient for system design.

2.1 Overall Description

This section provides a description of the general factors that affect the product and its requirements. This section provides a background for those, which are defined in detail in section 2.2, and makes them easier to understand. This section also deals with user characteristics, constraints on using the product and dependencies of the product on other applications.

2.1.1 Product Perspective

The MDM platform is aimed towards the enhancement of the capability of organizations Master Data Management (MDM) requirements and is the dynamic and intelligent linking of an organization's data. The data to be linked can reside in disparate applications, databases, and locations inside and outside of the organization. By using MDM software and intelligently linking this customer data, an organization can provide a more complete and intelligent view of this information. Furthermore, this linked data can be used to provide a 360-degree view of the records, and also to support personalized customer interactions, and to uncover trends and patterns that provide insight into gaining increased profitability. It helps both organizations and service provider by increasing the profitability by leveraging the services provided by the MDM application. This can be invoked either by user friendly interface or by simple command line interface.

2.1.2 Product Functions

The Hub, and ultimately the Master Data Engine, receive data in either real-time or

batch from individual systems by way of relational database by having connections, and by calling APIs, having web services, messaging solutions, or also from flat files. These source systems provide the data that is used for scoring, matching, and linking members within the IBM Initiate Master Data Service software. The Master Data Engine runs the source data through a derivation routine.

Derivation is the process of extracting demographic data elements to be used in

scoring and matching. The derived data is then stored in a highly optimized

format in a relational database. The cornerstone of the IBM Initiate Master Data Service platform is the set of intelligent algorithms provided by your selected Hub which is defined. And these are the algorithms that use a series of statistical formulas and customizable parameters to rapidly evaluate the data and provide member matches. These matches can be used to form linkages that form member entities. IBM Initiate Master Data Service software marks exceptions and data entities that fall outside the defined matching parameters as possible matches, and

creates tasks for later review and resolution. This process, known as data

remediation, is fully supported in the solution. The product can be developed as a stand-alone application or as a web service so that the services provided by the NEC product can be utilized by other applications and tools.

2.1.3 User Characteristics

MDM provides capabilities for user management and allows user for the configuration of the software, specifically the data dictionary (data model) and algorithms that define a Hub. The member, entity types, different sources, and applications, record attributes, user task threshold levels are defined and configure algorithms through the workbench.

There is also a Web-based client application that enables the searching and retrieving of member information. Users are able to attach notes to member records, but otherwise, no information contained in the Master Data Engine Hub can be modified and configured through the application such as enterprise veiwer.

Initiate inspector is a web-based client that allows the users to work on the tasks created by users or by the mdm processes. Thresholds are mentioned based on which the records are considered to be duplicates and are linked or two different records from different sources and are not linked, finally manually reviews are done. Also there are graphical relationships and hierarchies between members and organizations enable users to better understand their data.

2.1.4 Constraints

The workbench should be compatible with the master data engine version. The users should have access to workbench and initiate inspector for data stewardship. The data should be pre processed and should be in a format that is input to the mdm system.

Companies should prepare a business case for MDM and create a metric for the MDM project. Initiate will provide web service based interface for feeding patient demographic information in the master data index (mdi).

Information Servere 9.1 must be installed in the system.

2.1.4 Assumptions and Dependencies

It is assumed that we have clear understanding of the data sources involved, the data fields needed, and how to gather that data in a way that IBM Initiate Master Data Service software can consume it. This helps you build your data dictionary.

have a supported database platform and You need to have the proper software installation files for your operating system (for example, Windows 64 bit, Linux 64 bit, etc…) and an empty database in order to create your instance.

This instance which is created can be used by the MDM Connector only if it knows the properties for making the connection to MDM instance. The connection can be established using some API'S calls and data can be retrieved from MDM instance using these API'S only into the MDM Connector. To use the MDM Connector, It should be installed in the information server datastage, It is assumed that user has the understanding of how to design and run jobs in datastage as datastage is an essential ETL tool. The Connector is calling DMDI(Dynamic meta data import interface) to invoke the user interface for the user to configure the properties of connector, this user interface takes the connection properties from stage properties and use these properties to establish connection to MDM and retrieve metadata required for the job. To use the DMDI technology various jars are required and the jar which contains the user interface needs to be kept at information server datastage client system.

2.2 Specific Requirements

This section of the SRS contain all the software requirements to a level of detail sufficient to enable designers to design a system to satisfy those requirements and it also help testers to design their test cases and to verify them to check, whether system satisfies those requirements or not.

2.2.1 Functional Requirements

This section describes the functional requirements of the system. The requirements are expressed in the natural language style.

The data recquired to process is collected from different sources which are defined and the the standardized data is then processed in the mdm engine and stored In the initate hub connection should be established to the MDM server either locally or remotely and run the jobs on the server.

This data is used by the information server ETL job using MDM connector or MDM I/O system in order to perform various functionalities required from this connector which is provided by the information server to the MDM.

The functions which is required from the MDM Connector are:

It should extract data from MDM Server instance based on some key values and filtering properties

It should load data into MDM Server instance when it is empty.

It should incrementally load or update into already present MDM server instance.

It should provide a user interface to provide functionalities as currently provided by MDM Workbench.

It should fetch Metadata from MDM and provide that data to be used by the jobs designed using MDM Connector.

2.2.2 Performance Requirements

Requirements at MDM Side:

The application requires sufficient RAM to run the tests properly. A minimum of 8 GB RAM is recommended.

The capacity of Hard Disk Drive should be 320 GB or more. There should be enough memory according to the user data.

The time to process scales according to the input information. The weight generation process and bulk cross match can take several hours depending upon the size of your database, the complexity of your algorithm, and the number of attributes. The time estimate is dependent upon multiple variables.

Requirements at Information Server side:

IBM InfoSphere Information Server console

Microsoft .NET Framework 1.1 with Service Pack 1 installed.

IBM InfoSphere Information Server Web console

The following Web browsers are supported:

Microsoft Internet Explorer 6 Service Pack 2

Microsoft Internet Explorer 7

Mozilla Firefox 2

2.2.3 Supportability

This section describes any requirements that will enhance the supportability or maintainability of the mdm system that is being built, which also includes the coding type of standards, different naming conventions, and efficeient maintenance utilities. Software Design Architecture

The application uses standard modular software design and development architectures and methodologies, so that defect locating, feature enhancements, future expansion can be achieved easily.

Software oriented architecture is used

Agile development methodologies is used Coding Standard

Industry quality standards is maintained throughout the software development life cycle. Supporting API's are made use. All the features of java programming languges are looked upon to develop standardized code, and optimized code.

2.2.4 Software Requirement

Operating System : Microsoft Windows XP (or higher), Linux

IDE : Eclipse

Language : Core-Java

Software Packages : JDK 1.6,

Software Technologies : JMX, SOAP, LDAP

IBM Technologies: JavaStage, DMDI

Database : IBM DB2 Version 10.1

Browser : Microsoft Internet Explorer or Mozilla Firefox Browsers

products : MDM server engine 10.0, workbench 10.0 ,Information server data satge version 9.1

Architecture - SOA

Methodology - Agile Development methodology

2.2.5 Hardware Requirement

Processor : 2.6 GHz 32bit Processor Windows 2003 Server, 1 GHz 32bit Processor Windows XP Client

Storage Capacity : 8 GB RAM, 80 GB Hard Disk Windows 2003 Server, 4 GB RAM, 40 GB Hard Disk Windows XP Client

2.2.6 Design Constraints

This section describes any design constraints on the system being built. Design constraints represents decision that have been mandated and must be adhered to

languages : core java

Technologies: Java Swings, DMDI, javastage

developing environment : eclipse

2.2.7 User Interfaces

Describes the logical characteristics of each interface between the software product and the users. This may include sample screen images, any GUI standards or product family style guides that are to be followed, screen layout constraints, standard buttons and functions (e.g., help) that will appear on every screen, keyboard shortcuts, error message display standards and so on. It defines the software components for which a user interface is needed. It also specifies the details of the user interface design that are to be documented in a separate user interface specification. GUI Components:

JButton,JLabel,JFrame,JScrollPane,Container,JPanel,JTable, JComoBox,JList,Layout Manager.


JButton is used to send the property values from DMDI GUI to the property window of the connector, they are used to close the GUI window


A display area for a short text string. A label does not react to input events. As a result, it cannot get the keyboard focus. In the development environment, it will display the name of the properties or the name of the functions need to be performed

JScrollPane Provides a scrollable view of a light weight component. A JScrollPane manages a viewport, optional vertical and horizontal scroll bars, and optional row and column heading viewports.




Layout Manager


A generic Abstract Window Toolkit (AWT) container object is a component that can contain other AWT components. Components added to a container are tracked in a list. The order of the list will define the components front-to-back stacking order within the container. If no index is specified when adding a component to a container, it will be added to the end of the list (and hence to the bottom of the stacking order).

MDM input output stage has the graphical user interface provided in the information server, datastage client jobs. Workbench is a graphic user interface that provides user management and configuration management tools for IBM Initiate Master Data Service Jmx client can be invoked using a command line interface also. The Initiate Inbound Message-Based Transaction Service is a generic interface designed to manage client-specific messages between the source systems and the Identity hub Engine and database

Inspector enables data stewards to understand and resolve data quality issues using a simple, drag-and-drop interface.

Java SDK to develop interfaces that add, update and retrieve data from the hub for external applications.

Two user interfaces are designed for MDM Input/output stage. One is designed for input functionality and one is for output functionality. The interface will provide the same functionality as provided by MDM workbench.


The core feature in the mdm system implementation process is the data information itself. Thus, the implementation of a master data management problem solution, its important to consider a unique developmental approach that is suitable for describing such data or information management or manipulation. Also there is need of a approach that will describes all associated important aspects and details about the data, including data mapping and data transforming of the unique data between systems and while creating a business services system to interact with the master data management system.

In addition, this data manipulation must use a consistent method that is easy to

understand and maintain. This consistent method is especially important when

there is large sized project.

3.1 Design Considerations

In this section the issues which are recquired to be addressed or resolved before attempting to devise a complete design solution is discussed.

3.1.1 General Constraints

A typical InfoSphere MDM Server implementation, is composed of three main parts: MDM Client Applications accessed by an end user through a web browser, InfoSphere MDM Server, and a database. When considering the security of such architecture, it is important to protect the individual parts as well as the communication between them. Thus, there is a need to introduce models that make it easier to describe system behavior in a consistent and reusable way. These models need to be sustainable for continued development of a new system that sometimes replaces or, more often, works in unison with many other systems. In many cases, you will have to modify the surrounding systems to be able to integrate with the new mdm engine.

InfoSphere Information Server uses a powerful architecture that helps developers maximize speed, flexibility and effectiveness in building, deploying, updating and managing their data integration infrastructure. InfoSphere DataStage leverages the productivity-enhancing features of InfoSphere Information Server to help reduce the learning curve, simplify administration and optimize the use of development

resources. The result is an accelerated development and maintenance cycle for data integration applications. With InfoSphere DataStage, organizations can achieve strong return on investment (ROI) by quickly constructing their data integration solutions to gain access to trustworthy information and share it across applications and databases.

InfoSphere Information Server has a single design interface that is shared by both InfoSphere DataStage and IBM InfoSphere QualityStage® modules, enabling designers to use any combination of data quality and data transformation capabilities to help ensure that the right data is brought together at the right time. InfoSphere Information Server also provides a unified metadata repository for InfoSphere DataStage and all other modules. Users can immediately access technical and process metadata developed during data profiling, cleansing and integration processes to speed development and reduce the chance for errors.

When implementing a master data management solution and information server datastage jobs also consider the organizational measures that need to be addressed with the introduction of this system. The introduction of data governance is a preferred practice.

3.1.2 Development Methods

The development methodology used in this project is agile development methodology. The importance of this method is :

Design is more focused rather than code

software development is based on an iterative process

Intended in delivering working software speedily and quickly meet changing requirements.

The goal of agile method involves reducing the extra overheads in the software process so that changing requirements can be easily answered without too much of extra work.

Benefits to the Customer

Customer is more actively involved and is aware of status of application regularly. At each iteration recquirements can be mentioned. As the delivery is rapid the key functions are avilable very soon. Testing is done at every iteration so better quality is developed. And also customer is guaranteed of getting atleast some of the functionality by some fixed duration.

Benefits to the Project Teams

Project teams are working together more actively in all the stages, and hence collaboratively takes decisions and is more efficient.

Since the methodology is Incremental, specific requirements are more focused upon. More importance is only on developing the application.

frequent feedback are received as the testing is integrated hence efficient. requirements are gathered whenever recquired incrementally so less time is spent and as and when they arise. Also less time is spent for planning.

Teams build the applications in cooperative environment.

Architectural Strategies

This section describes the design decisions and strategies that affect the overall organization of the system and its higher-level structures. These strategies will provide insight into the key abstractions and mechanisms used in the system architecture.

3.2.1 Programming Language

Java provides the libraries for designing front end and for communication between

different modules. It also supports multi-threaded programming which is very

essential for networking application. Hence Java is used as programming language

for the development of the application. Java swings technology is used to develop the user interface for the MDM Connector.

· The application is developed using Eclipse 3.4 application wizard. The Java

provides some extensive packages ranging from simple to complex network

operations that were used to develop this application.

One characteristic of Java is portability i.e. programs written in the Java will run similarly on any supported hardware platform

3.2.2 Technologies used

Java Integration Stage

By using the Java Integration stage, java code can be invoked from datastage parallel jobs. Java Integration stage can be used to integrate java code into job design by writing your Java code using Java Integration stage API. The Java Integration stage API defines interfaces and classes for writing Java code which can be invoked from within InfoSphere DataStage and Quality Stage parallel jobs. The Java Integration stage is backward compatible to the current Java Pack so that existing Java classes work unmodified with the Java Integration stage. It also supports the existing API exposed in the existing Java Pack.

Java code must implement a subclass of the Processor class. The Processor

class consists of methods invoked by Java Integration stage. When a Java

Integration stage starts, the stage instantiates the Processor class and calls the

logic within Processor implementations. We are writing the MDM Connector logic inside the Processor implementations.

The Processor class provides the following list of the methods that the Java

Integration stage can call to interact with your Java code at job execution time or














At minimum, The Java code must implement the following two abstract methods.

public abstract boolean validateConfiguration() -

specifies the current configuration (number and types of links), and the values for the user properties.Java code must validate a given configuration and user properties and return false to Java Integration stage if there are problems with them.

public abstract void process() - The process() method is an entry point for processing records from the input link or to the output link. As long as a row is available on any of the stage input links (if any and whatever the number of output links is), the Java Integration stage calls this method, if the job does notabort. Your Java code must consume all rows from stage input links.

In an output link, where DataStage columns are set from the Java data types that

are produced by the Java Integration stage, the Java Integration stage converts the

Java data types to InfoSphere DataStage data types. Conversely, in an input link,

where Java Bean properties or columns are set from the DataStage columns, the

InfoSphere DataStage data types are converted to Java data types.

Java code can be used to define custom properties and use these property

values in Java code. At job design time, Java Integration stage editor calls

getUserPropertyDefinitions() method in the Processor class to get a list of

user-defined property definitions and then shows the editor panel to allow the

users to specify the string value for each property.

MDM Input/Output stage is developed on top of java stage and then converted into a standalone stage using a stage generation tool. All the methods that are defined for javastage is used in the development code for MDM Stage. MDM Stage after using the javastage as base can be converted into standalone connector and this connector can be installed in the infosphere information server data stage.


The Dynamic Metadata Import (DMDI) interface provides a flexible infrastructure for importing metadata from any external resource, from any client and with a variety of deployment options. It was originally created to provide a means to importing metadata from Software Group Adapters (SWG Adapters) that implement the Enterprise Metadata (EMD) Specification. DMDI subsequently evolved to meet the needs of Information Server.

The implementation of a connector may include implementation of the DMDI ResourceWrapper interface. This interface is then invoked at design time from a connector's stage editor. The design is based around a wizard approach to acquiring metadata. This may typically involve setting up some properties in one or more steps of the wizard, before eventually displaying some metadata objects from which the user can select. These metadata objects may be a list of tables and their fields, or business objects or interfaces. The design is completely flexible in this respect, although the general idea is that a tree-view of these metadata objects be displayed. Once metadata objects to be imported have been selected, the final step involves generating the metadata for these objects, returning it to the client, and populating the links with the modified metadata. The process may also set or modify properties of the stage.

In MDM Connector DMDI is used to invoke the GUI from Infosphere information servere datastage job. The GUI is developed and integrated with DMDI and then invoked from the connector.


The Java Management Extensions (JMX) technology is a standard part of the Java Platform, Standard Edition (Java SE platform).The JMX technology provides a simple, standard way of managing resources such as applications, devices, and services. Because the JMX technology is dynamic, you can use it to monitor and manage resources as they are created, installed and implemented. You can also use the JMX technology to monitor and manage the Java Virtual Machine (Java VM).The JMX specification defines the architecture, design patterns, APIs, and services in the Java programming language for management and monitoring of applications and networks. Using the JMX technology, a given resource is instrumented by one or more Java objects known as Managed Beans, or MBeans. These MBeans are registered in a core-managed object server, known as an MBean server. The MBean server acts as a management agent and can run on most devices that have been enabled for the Java programming language. The specifications define JMX agents that you use to manage any resources that have been correctly configured for management. A JMX agent consists of an MBean server, in which MBeans areregistered, and a set of services for handling the MBeans. In this way, JMX agents directly control resources and make them available to remote management applications.The way in which resources are instrumented is completely independent from the management infrastructure. Resources can therefore be rendered manageable regardless of how their management applications are implemented.

In the project Java management extension (JMX) technology is been used to remotely manage the MDM jobs and that needs to be done at Infosphere Information server and which will convert the data into the form as required by the MDM automatically and pass that data to MDM for processing. In this way connectivity is achieved between Infosphere Information Server and Infosphere MDM.

3.2.2 Future Plans

Multi-data-domain MDM. Currently many organizations use MDM only to the customer domain, other domains, like products, financials, and locations should also be looked upon. Single-data-domain may inhibit information correlation across multiple domains.

Multi-department, multi-application MDM. Which involves of distributing the data into different applications and the classes which are depending on them.

Real-time MDM. Real-time is very crucial to clarification, and the real time distribution of new and updated reference data.

Coordination with other disciplines. MDM should be coordinated with related data management disciplines. A program for data governance or stewardship can provide an effective collaborative process for such coordination.

3.2.3 Error Detection and Recovery

Eclipse 3.4 helps to fix the build errors more quickly. The output window displays a list of errors generated during the build. Eclipse 3.4 has an integrated debugger to correct logic errors.

Infosphere Information server datastage provides a job log window, where it will show the status of the job. If there is some error in the code it will clearly depict the java class and line number where the error is occurred. This window displays the list of errors occurred during job invocation. This provides a clear understanding of the code where it fails and it is easy to correct the errors.

3.2.5 Data Storage Management

DB2 is used for data storage hub. DB2 is a family of relational database management system (RDBMS) products from IBM that serve a number of different operating system platforms. You will create a new database for the IBM® Initiate® Master Data Service® software to reference. You will then install the IBM Initiate Master Data Service engine (typical install shield executable) and use the madconfig utility to configure the ODBC connection, create the instance directory, and establish the windows service.

Bootstrapping your database involves creating the core database tables, defining the field properties, and indexing the tables. During the bootstrap process several of the data dictionary tables will be populated with default settings. Your database will be bootstrapped as part of the instance creation. But you can bootstrap separately.

3.2.6 Communication Mechanism

The communication is required between the MDM Connector and MDM Server instance. This communication is achieved by using the MPINET

The MPINET is the server that provides TCP/IP communication and socket connections for the various API calls from the clients. This server establishes a tcp/ip socket connection from the clients to the server, for various api calls and jobs. The server provides communication protocol which is particularly controlled and optimized and this is used by MDM server. More than one connection can be established to the server, and is multithreaded and and implements database connection pooling for optimum performance and fast response times.

You can use the SDK to interact with the Master Data Engine. From basic tasks like establishing a connection to configuring its behavior and defining the comparison strategy, you can perform any interaction or get/modify data model elements.

The SDK is also useful in encrypting the data in the communication with the MDM engine, which should be configured for the secure socket layer protocol communication for particular port and host.

System Architecture

IIS Architecture

Client Tier

IBM Information Server provides a number of client interfaces, optimized to

different user roles within an organization. The clients tier includes IBM

InfoSphere DataStage and QualityStage clients (Administrator, Designer, and

Director), IBM Information Server console, and IBM Information Server Web

console. There are two broad categories of clients - Administrative clients and User clients. Both these types of clients have desktop and Web based interfaces.

Administrative clients.These clients allow you to manage the areas of security, licensing, logging, and scheduling. Administration tasks are performed in the IBM Information Server Web console. The IBM Information Server Web console is a browser-based interface for administrative activities such as managing security and

creating views of scheduled tasks.For IBM InfoSphere DataStage and IBM InfoSphere QualityStage project administration, you use the IBM InfoSphere DataStage Administrator client. It administers IBM InfoSphere DataStage projects and conducts housekeeping on the server. It is used to specify general server defaults, add and delete projects, and to set project properties. User and group

privileges are also set using the Administrator client.

User clients

These clients help perform client tasks such as creating, managing, and designing jobs, as well as validating, running, scheduling. and monitoring jobs. The IBM Information Server console is a rich client-based interface for activities such as profiling data and developing service-oriented applications.The IBM InfoSphere DataStage and QualityStage Designer helps you create, manage, and design jobs.

The IBM InfoSphere DataStage and QualityStage Director client is the client component that validates, runs, schedules, and monitors jobs on the

IBM InfoSphere DataStage Server.

Server tiers

The server tiers of the Information Server Platform that includes the Services,

Engine, Repository, Working Areas, and Information Services Director Resource

Providers as follows:

Services tier

IBM Information Server is built entirely on a set of shared services that centralize core tasks across the platform. Shared services allow these tasks to be managed and controlled in one place, regardless of which suite component is being used.

The Services Tier includes both common and product-specific services:

- Common services are used across the Information Server suite for tasks

such as security, user administration, logging, reporting, metadata, and


- Product-specific services provide tasks for specific products within the Information Server suite. For example, IBM InfoSphere Information Analyzer calls a column analyzer service (a product-specific service) that was created for enterprise data analysis. IBM Information Server products can access three general categories of service:

- Design

Design services help developers create function-specific services that can also be shared.

- Execution

Execution services include logging, scheduling, monitoring, reporting, security, and Web framework.

- Metadata

Using metadata services, metadata is shared "live" across tools so that changes made in one IBM Information Server component are instantly visible across all of the suite components. Metadata services are tightly integrated with the common repository. You can also exchange metadata with external tools by using metadata services.

Repository tier

The shared repository is used to store all IBM Information Server product module objects1 (including IBM InfoSphere DataStage objects), and is shared with other applications in the suite. Clients can access metadata and results of data analysis from the respective service layers.

Engine tier

This is the parallel runtime engine that executes the IBM Information Server tasks. It comprises the Information Server engine, Service Agents, and Connectors and Packaged Application Connectivity Kits (PACKS2).

- The IBM Information Server engine consists of the products that you install, such as IBM InfoSphere DataStage and IBM InfoSphere QualityStage. It runs jobs to extract, transform, load, and standardize data. The engine runs DataStage and QualityStage jobs. It also executes the parallel jobs for Information Analyzer tasks.

- Service Agents are Javaâ„¢ processes that run in the background on each

computer that hosts IBM InfoSphere DataStage.They provide the communication between the Services and Engine tiers of Information


- Connectors and PACKS

IBM Information Server connects to a variety of information sources whether they are structured, unstructured, on the mainframe, or applications. Metadata-driven connectivity is shared across the suite components, and connection objects are reusable across functions. Connectors provide design-time importing of metadata, data browsing and sampling, run-time dynamic metadata access, error handling, and high functionality and high performance run-time data access.

Master Data Engine

The MDM engine is the heart of the master data service and consists of the main logic. The rules of data processing, the rules and the algorithm are configured and implemented into the hub. The data members are compared and linked based on the comparison scores which shows the relationship among the members.