The core feature in the mdm system implementation process is the data information itself. Thus, the implementation of a master data management problem solution, its important to consider a unique developmental approach that is suitable for describing such data or information management or manipulation. Also there is need of a approach that will describes all associated important aspects and details about the data, including data mapping and data transforming of the unique data between systems and while creating a business services system to interact with the master data management system.
In addition, this data manipulation must use a consistent method that is easy to
understand and maintain. This consistent method is especially important when
there is large sized project.
3.1 Design Considerations
In this section the issues which are recquired to be addressed or resolved before attempting to devise a complete design solution is discussed.
3.1.1 General Constraints
A typical InfoSphere MDM Server implementation, is composed of three main parts: MDM Client Applications accessed by an end user through a web browser, InfoSphere MDM Server, and a database. When considering the security of such architecture, it is important to protect the individual parts as well as the communication between them. Thus, there is a need to introduce models that make it easier to describe system behavior in a consistent and reusable way. These models need to be sustainable for continued development of a new system that sometimes replaces or, more often, works in unison with many other systems. In many cases, you will have to modify the surrounding systems to be able to integrate with the new mdm engine.
InfoSphere Information Server uses a powerful architecture that helps developers maximize speed, flexibility and effectiveness in building, deploying, updating and managing their data integration infrastructure. InfoSphere DataStage leverages the productivity-enhancing features of InfoSphere Information Server to help reduce the learning curve, simplify administration and optimize the use of development
resources. The result is an accelerated development and maintenance cycle for data integration applications. With InfoSphere DataStage, organizations can achieve strong return on investment (ROI) by quickly constructing their data integration solutions to gain access to trustworthy information and share it across applications and databases.
InfoSphere Information Server has a single design interface that is shared by both InfoSphere DataStage and IBM InfoSphere QualityStage® modules, enabling designers to use any combination of data quality and data transformation capabilities to help ensure that the right data is brought together at the right time. InfoSphere Information Server also provides a unified metadata repository for InfoSphere DataStage and all other modules. Users can immediately access technical and process metadata developed during data profiling, cleansing and integration processes to speed development and reduce the chance for errors.
When implementing a master data management solution and information server datastage jobs also consider the organizational measures that need to be addressed with the introduction of this system. The introduction of data governance is a preferred practice.
3.1.2 Development Methods
The development methodology used in this project is agile development methodology. The importance of this method is :
Design is more focused rather than code
software development is based on an iterative process
Intended in delivering working software speedily and quickly meet changing requirements.
The goal of agile method involves reducing the extra overheads in the software process so that changing requirements can be easily answered without too much of extra work.
Benefits to the Customer
Customer is more actively involved and is aware of status of application regularly. At each iteration recquirements can be mentioned. As the delivery is rapid the key functions are avilable very soon. Testing is done at every iteration so better quality is developed. And also customer is guaranteed of getting atleast some of the functionality by some fixed duration.
Benefits to the Project Teams
Project teams are working together more actively in all the stages, and hence collaboratively takes decisions and is more efficient.
Since the methodology is Incremental, specific requirements are more focused upon. More importance is only on developing the application.
frequent feedback are received as the testing is integrated hence efficient. requirements are gathered whenever recquired incrementally so less time is spent and as and when they arise. Also less time is spent for planning.
Teams build the applications in cooperative environment.
This section describes the design decisions and strategies that affect the overall organization of the system and its higher-level structures. These strategies will provide insight into the key abstractions and mechanisms used in the system architecture.
3.2.1 Programming Language
Java provides the libraries for designing front end and for communication between
different modules. It also supports multi-threaded programming which is very
essential for networking application. Hence Java is used as programming language
for the development of the application. Java swings technology is used to develop the user interface for the MDM Connector.
ï‚· The application is developed using Eclipse 3.4 application wizard. The Java
provides some extensive packages ranging from simple to complex network
operations that were used to develop this application.
One characteristic of Java is portability i.e. programs written in the Java will run similarly on any supported hardware platform
3.2.2 Technologies used
Java Integration Stage
By using the Java Integration stage, java code can be invoked from datastage parallel jobs. Java Integration stage can be used to integrate java code into job design by writing your Java code using Java Integration stage API. The Java Integration stage API defines interfaces and classes for writing Java code which can be invoked from within InfoSphere DataStage and Quality Stage parallel jobs. The Java Integration stage is backward compatible to the current Java Pack so that existing Java classes work unmodified with the Java Integration stage. It also supports the existing API exposed in the existing Java Pack.
Java code must implement a subclass of the Processor class. The Processor
class consists of methods invoked by Java Integration stage. When a Java
Integration stage starts, the stage instantiates the Processor class and calls the
logic within Processor implementations. We are writing the MDM Connector logic inside the Processor implementations.
The Processor class provides the following list of the methods that the Java
Integration stage can call to interact with your Java code at job execution time or
At minimum, The Java code must implement the following two abstract methods.
public abstract boolean validateConfiguration() -
specifies the current configuration (number and types of links), and the values for the user properties.Java code must validate a given configuration and user properties and return false to Java Integration stage if there are problems with them.
public abstract void process() - The process() method is an entry point for processing records from the input link or to the output link. As long as a row is available on any of the stage input links (if any and whatever the number of output links is), the Java Integration stage calls this method, if the job does notabort. Your Java code must consume all rows from stage input links.
In an output link, where DataStage columns are set from the Java data types that
are produced by the Java Integration stage, the Java Integration stage converts the
Java data types to InfoSphere DataStage data types. Conversely, in an input link,
where Java Bean properties or columns are set from the DataStage columns, the
InfoSphere DataStage data types are converted to Java data types.
Java code can be used to define custom properties and use these property
values in Java code. At job design time, Java Integration stage editor calls
getUserPropertyDefinitions() method in the Processor class to get a list of
user-defined property definitions and then shows the editor panel to allow the
users to specify the string value for each property.
MDM Input/Output stage is developed on top of java stage and then converted into a standalone stage using a stage generation tool. All the methods that are defined for javastage is used in the development code for MDM Stage. MDM Stage after using the javastage as base can be converted into standalone connector and this connector can be installed in the infosphere information server data stage.
The Dynamic Metadata Import (DMDI) interface provides a flexible infrastructure for importing metadata from any external resource, from any client and with a variety of deployment options. It was originally created to provide a means to importing metadata from Software Group Adapters (SWG Adapters) that implement the Enterprise Metadata (EMD) Specification. DMDI subsequently evolved to meet the needs of Information Server.
The implementation of a connector may include implementation of the DMDI ResourceWrapper interface. This interface is then invoked at design time from a connector's stage editor. The design is based around a wizard approach to acquiring metadata. This may typically involve setting up some properties in one or more steps of the wizard, before eventually displaying some metadata objects from which the user can select. These metadata objects may be a list of tables and their fields, or business objects or interfaces. The design is completely flexible in this respect, although the general idea is that a tree-view of these metadata objects be displayed. Once metadata objects to be imported have been selected, the final step involves generating the metadata for these objects, returning it to the client, and populating the links with the modified metadata. The process may also set or modify properties of the stage.
In MDM Connector DMDI is used to invoke the GUI from Infosphere information servere datastage job. The GUI is developed and integrated with DMDI and then invoked from the connector.
The Java Management Extensions (JMX) technology is a standard part of the Java Platform, Standard Edition (Java SE platform).The JMX technology provides a simple, standard way of managing resources such as applications, devices, and services. Because the JMX technology is dynamic, you can use it to monitor and manage resources as they are created, installed and implemented. You can also use the JMX technology to monitor and manage the Java Virtual Machine (Java VM).The JMX specification defines the architecture, design patterns, APIs, and services in the Java programming language for management and monitoring of applications and networks. Using the JMX technology, a given resource is instrumented by one or more Java objects known as Managed Beans, or MBeans. These MBeans are registered in a core-managed object server, known as an MBean server. The MBean server acts as a management agent and can run on most devices that have been enabled for the Java programming language. The specifications define JMX agents that you use to manage any resources that have been correctly configured for management. A JMX agent consists of an MBean server, in which MBeans areregistered, and a set of services for handling the MBeans. In this way, JMX agents directly control resources and make them available to remote management applications.The way in which resources are instrumented is completely independent from the management infrastructure. Resources can therefore be rendered manageable regardless of how their management applications are implemented.
In the project Java management extension (JMX) technology is been used to remotely manage the MDM jobs and that needs to be done at Infosphere Information server and which will convert the data into the form as required by the MDM automatically and pass that data to MDM for processing. In this way connectivity is achieved between Infosphere Information Server and Infosphere MDM.
3.2.2 Future Plans
Multi-data-domain MDM. Currently many organizations use MDM only to the customer domain, other domains, like products, financials, and locations should also be looked upon. Single-data-domain may inhibit information correlation across multiple domains.
Multi-department, multi-application MDM. Which involves of distributing the data into different applications and the classes which are depending on them.
Real-time MDM. Real-time is very crucial to clarification, and the real time distribution of new and updated reference data.
Coordination with other disciplines. MDM should be coordinated with related data management disciplines. A program for data governance or stewardship can provide an effective collaborative process for such coordination.
3.2.3 Error Detection and Recovery
Eclipse 3.4 helps to fix the build errors more quickly. The output window displays a list of errors generated during the build. Eclipse 3.4 has an integrated debugger to correct logic errors.
Infosphere Information server datastage provides a job log window, where it will show the status of the job. If there is some error in the code it will clearly depict the java class and line number where the error is occurred. This window displays the list of errors occurred during job invocation. This provides a clear understanding of the code where it fails and it is easy to correct the errors.
3.2.5 Data Storage Management
DB2 is used for data storage hub. DB2 is a family of relational database management system (RDBMS) products from IBM that serve a number of different operating system platforms. You will create a new database for the IBM® Initiate® Master Data Service® software to reference. You will then install the IBM Initiate Master Data Service engine (typical install shield executable) and use the madconfig utility to configure the ODBC connection, create the instance directory, and establish the windows service.
Bootstrapping your database involves creating the core database tables, defining the field properties, and indexing the tables. During the bootstrap process several of the data dictionary tables will be populated with default settings. Your database will be bootstrapped as part of the instance creation. But you can bootstrap separately.
3.2.6 Communication Mechanism
The communication is required between the MDM Connector and MDM Server instance. This communication is achieved by using the MPINET
The MPINET is the server that provides TCP/IP communication and socket connections for the various API calls from the clients. This server establishes a tcp/ip socket connection from the clients to the server, for various api calls and jobs. The server provides communication protocol which is particularly controlled and optimized and this is used by MDM server. More than one connection can be established to the server, and is multithreaded and and implements database connection pooling for optimum performance and fast response times.
You can use the SDK to interact with the Master Data Engine. From basic tasks like establishing a connection to configuring its behavior and defining the comparison strategy, you can perform any interaction or get/modify data model elements.
The SDK is also useful in encrypting the data in the communication with the MDM engine, which should be configured for the secure socket layer protocol communication for particular port and host.