Chapter 3. The Technological Environment
The enterprise network is probably the most fundamental IT service. To successfully
operate the network and meet changing business requirements, it's imperative to understand trends and technology maturity to separate hype from reality and to deploy appropriate technology, correctly timed. Communication is required for organizations to function, yet costs need to be appropriately managed, and the portfolio of network services delivered by IT organizations should be formalized. The responsibility for telephony, fixed as well as mobile, is rapidly moving into the networking organization. At the same time, security is becoming more embedded into the network infrastructure, and virtualization, cloud computing and green IT aspects will increasingly affect architectural and product decisions. Communication, collaboration and e-learning products continue to proliferate, making access to the network a necessity of daily life. Hence, new skill sets are required and new technologies need to be mastered.
The technological environments found in organizations today vary greatly. Each environment has unique characteristics that must be understood in order to effectively integrate the enterprise. Most large organizations have a heterogeneous mix of legacy and current technologies from a variety of vendors. As organizations become increasingly dependant on information as a strategic resource, the need to understand and effectively integrate this environment becomes more and more critical. The components of the enterprise technological environment are vast. This chapter focuses on the major technological components that comprise the typical enterprise and provides a framework for understanding and integrating these components.
The efficient and effective flow of information is crucial to the survival of all organizations today. Organizations need to make certain that the backbone of enterprise technological environment, the hardware and network infrastructure, is safe from harm, immune to threats, and capable of meeting growing and diverse corporate demands. Meeting performance expectations while maintaining low cost and low risk in changing architectures can only be achieved with the right infrastructure strategy that optimizes IT efficiency and simplifies management.
Network infrastructure transports application and network-management traffic. Building a network infrastructure requires planning, designing, modeling, and careful implementation. An organization's network infrastructure must offer high performance, scalability, and availability. The effective and efficient access to information provided by a well designed and integrated network infrastructure can create a distinct competitive advantage. This section looks at the major network components found in most organizations today and discusses their role in the integrated enterprise.
Major Network Components
Routers & Switches
A router's primary purpose is to connect two or more networks and to filter network signals so that only desired information travels between them. Routers are aware of many possible paths across a network and can choose the best route for each data packet to travel. Routers operate by examining incoming data for its network routing and transport information. The information examined includes the source and destination routing address.
Switches are passive network devices that provide connectivity between multiple servers and storage devices. Switches allow multiple devices to communicate simultaneously, with no reduction in transmission speed and providing scalable bandwidth. The benefits of switches are that they provide each pair of communicating devices with a fast connection and they segregate the communication so that it does not enter other portions of the network.
A firewall is a device that prevents unauthorized electronic access to the entire network. Firewalls operate by examining incoming or outgoing packets for information at the network addressing level. Firewalls can be divided into three general categories:
3.Stateful Inspection Proxies
Packet-screening firewalls examine incoming and outgoing packets for their network address information. This type of firewall can restrict access to specific Web sites or to permit access to the network from specific Internet Sites. Proxy Servers operate by examining incoming or outgoing packets for their source/destination addresses and information carried within the data area of each network packet. Because the proxy server examines the data area, individual programs can be permitted or restricted. Stateful Inspection Proxies monitor network signals to ensure that they are part of a legitimate ongoing conversation, rather than malicious insertions.
Storage Area Networks (SANs)
SANs are made up of servers and stand-alone storage devices connected by a dedicated network. SAN storage devices do not contain any server functionality and they do not implement a file system. The hosts themselves implement and manage the file system. Any server can access any storage device. One server can access many storage devices, and multiple servers can access the same storage device, allowing for independent server and storage scalability.
A server is a computer system that has been designated for running a specific server application or applications. Servers that are designated for only one server application are often named for the application they are running. Server applications can be divided among server computers over a large range, depending on the workload. Under light loads, every server application can run concurrently on a single computer. Under heavy loading, multiple server computers may be required for each application. Under medium loading, it is common to use one server computer per server application, in order to limit the amount of damage caused by failure of any single server computer or security breach of any single server application. Some types of servers are as follows:
- Application Servers
- Database Servers
- FTP Servers
- List Servers
- Mail Servers
- Web Servers
Mainframes are high-performance computers used for large-scale computing purposes that require greater availability and security than a smaller-scale machine can offer. Traditionally, mainframes have been associated with distributed computing; however, today mainframes have become more multi-purposed. A mainframe can handle such tasks as multiple workload processing, utilization tracking, network analysis, control centralization, and resource allocation.
Clients are applications or systems that access a remote service on a server via a network. The term client was first applied to devices that were not capable of running their own stand-alone programs, but could interact with remote computers on a network. Types of clients include:
Fat clients that perform the bulk of any data processing operations and do not necessarily rely on the server. The most common form of fat client is a personal computer. Fat clients generally have high performance.
Thin clients that use the resources of the host computer. Thin clients are generally used to graphically display pictures provided by an application server, which performs the bulk of any required data processing. Thin clients are generally highly manageable.
Hybrid clients are a mixture of both Fat and Thin clients. They processes locally, but relies on a server for storage data. Hybrid clients offer features from both Fat clients and Thin clients, making them highly manageable, while possessing higher performance than Thin clients.
The term “topology” refers to the physical layout of a network. It also refers to how different nodes in a network are connected to each other and how they communicate. There are three types of topologies: signal, logical and physical.
Signal topology is the mapping of the actual connections between the nodes of a network, as evidenced by the path that the signals take when propagating between the nodes. Logical topology is the mapping of the apparent connections between the nodes of a network, as evidenced by the path that data appears to take when traveling between the nodes. A physical topology is the mapping of the nodes of a network and the physical connections between them. This involves the layout of wiring, cables, the locations of nodes, and the interconnections between the nodes and the cabling or wiring system. Some examples of physical topologies are as follows:
A Point-to-Point topology is a dedicated connection between one server and one storage device. The value of a permanent point-to-point network is the value of guaranteed, or nearly so, communications between the two endpoints.
A star topology connects all cables to a central point of concentration. This point is usually a hub or switch. Nodes communicate across the network by passing data through the hub. The main disadvantage of this kind of topology is that if central hub stops working then there will be no transmission at any node.
Each node on the network is connected to two other nodes in the network and with the first and last nodes being connected to each other, forming a ring. All of the data that is transmitted travels from one node to the next node in a circular manner and the data generally flows in a single direction only.
All devices are connected to a central cable, called the bus or backbone. There are two kinds of Bus topologies; Linear and Distributed.
In a linear bus topology, all of the nodes of the network are connected to a common transmission medium, which has exactly two endpoints (the bus or backbone). All data that is transmitted between nodes in the network is transmitted over this common transmission medium and is able to be received by all nodes in the network virtually simultaneously. In a distributed bus topology, all of the nodes of the network are connected to a common transmission medium, which has more than two endpoints that are created by adding branches to the main section of the transmission medium. Distributed functions in exactly the same fashion as the linear topology.
Devices are connected with many redundant interconnections between network nodes. In a true mesh topology every node has a connection to every other node in the network. There are two types of Mesh topologies: Full and Partial.
Figure X - Full Mesh Topology
In a full mesh topology, each of the nodes of the network is connected to each of the other nodes in the network with a point-to-point link allowing for data to be simultaneously transmitted from any single node to all of the other nodes. In a partial mesh topology, some of the nodes of the network are connected to more than one other node in the network with a point-to-point link, making it possible to take advantage of some of the redundancy that is provided by a physical fully connected mesh topology without the expense and complexity required for a connection between every node in the network.
Figure X - Partial Mesh Topology
Tree and Hypertree
The Tree topology is a hybrid of the bus and star layouts. The basic topology is similar to that of a bus, with nodes connected in sequence to a linear central cable. Tree networks may have "branches" that contain multiple workstations that are connected point-to-point in a star-like pattern. Signals from a transmitting node travel the length of the medium and are received by all other nodes. The Hypertree topology is a combination of two or more tree topologies to make one new hypertree topology.
Figure X - Hypertree Topology
The hybrid topology is a type of network topology that is composed of one or more interconnections of two or more networks that are based upon different physical topologies. Hybrid topologies include Star-bus, Hierarchical star, Star-wired ring, and hybrid mesh.
Enterprises must understand what they have in their environment, and decide whether these products and technologies will meet their future needs. Understanding how various networking components can be interdependent is an essential prerequisite to performing this analysis. As the line between IT and the business blurs even more and IT practitioners in organizations are targeted on business performance, the integrated network to become an inextricable part of the chain, ensuring that business performance requirements are met.
The technology that will have the greatest impact on enterprise communications will be delivered through the convergence of voice, video and data technologies. Although the principal business impact will come from new and innovative applications (and the integration of these applications) an infrastructure that is unprepared for the demands of these new applications will quickly negate any possible advantage. Understanding the emerging future trends in this rapidly moving area will enable an enterprise to build an infrastructure that is ready to support emerging applications.
Increasingly, mobile devices are considered part of the corporate network strategy vs. a silo, as has been the case. Wireless local area networks (WLANs) are starting to focus more on running voice and video over the wireless medium. Some cases of the all wireless office are also starting to emerge. A WLAN has long been thought of as a separate and distinct network architecture. To achieve the promise of wireless in the enterprise, WLANs will need to become a more-integral part of the entire wired infrastructure. To maneuver this "mine field," enterprises will need to understand the standards development process, implementation hurdles, and the growing number of potential technologies and vendors.
Storage is of significant importance to industry because the value of information (and the virtual space it consumes) continues to climb. We have moved on from book-keeping and asset-management tasks to business-to-business multimedia, video-on-demand, and voice/data integration. The number of e-mail messages alone has grown from 9.7 billion per day in 2000 to more than 35 billion messages per day today in 2007. Within those e-mails we embed a variety of media and file types, forcing a focus on information sharing rather than server-centric data storage. Material has to be shared via storage networking environments to meet current information needs In addition, the increased storage and information management demands related to the Health Insurance Portability and Accountability Act (HIPAA), Sarbanes-Oxley, and other government mandated regulations has created an enormous demand for enterprise storage and information management solutions.
This overall 50% annual increase in data creation is coupled with increased interest in retaining digital, rather than physical, copies of material, so storage requirements are also extending across time. Business' information technology budgets are responding to storage demands, spending an estimated 40% on storage-related needs in large organizations. The information being saved provides a variety of values to a business. It yields buying/spending patterns, reduced costs, new products and services, or targeted marketing campaigns. With the potential competitive advantage gained from analysis and access to all of a company's data, information availability becomes critically important. Retail industries can lose over $1 million per hour of downtime, while brokerages stand to lose more than $6.5 million per hour.
To meet growing storage needs, the industry has introduced a selection of storage solution alternatives, each addressing specific data storage and management needs Direct attached storage (DAS) systems attach storage drives directly to servers, network attached storage (NAS) environments are made up of specialized servers dedicated to storage, storage area networks (SANs) are highly scalable and allow hosts to implement their own storage file systems, and content addressable storage (CAS) systems are a mechanism for storing and retrieving information based on content rather than location. Because the storage needs of all organizations are growing exponentially today, huge investment are made each year in storage-related hardware, software, and skilled employees to design and navigate through these complex enterprise solutions.
Virtualization is a general term referring to the abstraction (or virutalization) of computer resources. Virtualization hides the physical characteristics of computing resources from the applications and end-users that utilize them. Example of virtualization include making a single physical resource (such as a server or storage device) appear to function as multiple virtual resources. Virtulation can also include making multiple physical resources (such as storage devices or servers) appear and function as a single virtual resource.
There are three areas of IT where virtualization is most prevelant: network virtualization, storage virtualization and server virtualization:
- Network virtualization is a method of combining the available resources in a network by splitting up the available bandwidth into channels, each of which is independent from the others, and each of which can be assigned (or reassigned) to a particular server or device in real time. The idea is that virtualization disguises the true complexity of the network by separating it into manageable parts, much like partitioned hard drive that simplify file management.
- Storage virtualization is the pooling of physical storage from multiple network storage devices into what appears to be a single storage device that is managed from a central point. Storage virtualization is commonly used in storage area networks (SANs).
- Server virtualization is the masking of server resources (including the number and identity of individual physical servers, processors, and operating systems) from server users. The intention is to spare the user from having to understand and manage complicated details of server resources while increasing resource sharing and utilization and maintaining the capacity to expand later.
Grid computing is a form of virtualization where several computers run simultaneously to become a super computer of sorts. These super computers can compute computationally-intensive operations and virtualization allow this pooling of computing capabilities.
Cloud computing is a term used to describe massively scalable computing that is made available to end users over Internet Protocol (IP) or other access method (i.e. as a service). The term derives from the fact that most technology architecture diagrams depict the Internet or IP availability by using a drawing of a cloud. The computing resources being accessed are typically owned and operated by a third-party in centralized data center locations. Consumers of the cloud are concerned with services it can perform rather than the underlying technologies used to achieve the requested function. The label suggests that function comes from the cloud and is often understood to mean a public network, typically the Internet.
Green computing (or green IT) describes the study and the using of computer resources in an efficient way. Green computing starts with manufacturers producing environmentally friendly products and encouraging IT departments to consider more friendly options like virtualization, power management and proper recycling habits. The government has also recently proposed new compliance regulations that would work towards certifying data centers as green. Some criteria include using low-emission building materials, recycling, using alternative energy technologies, and other green technologies.
Factors Effecting Network Performance
Since the enterprise network is probably the most fundamental IT service, high network availability is a core concern. A high availability solution is a network, which is available for requests when called upon. Network designers need to achieve as close to 100% uptime as possible. While 100% uptime is virtually impossible, most networks strive for 99.999% uptime. To calculate the expected percent of uptime per year we can use the following formula:
% of uptime per year = (8760 - expected number of hours down per year) / 8760
So if four hours of downtime per month is acceptable to your organization, then 48 hours of downtime per year is acceptable. Fitting that into the formula, 48 hours per year equates to 99.452% of uptime per year. In order to obtain a 99.999% uptime per year, a network would expect to only have five seconds of downtime per month and only one minute of total downtime per year. In order to design for high network availability, the fault tolerance of different components of the network infrastructure must be understood.
Fault Tolerance is a setup or configuration that helps prevent a computer or network device from failing in the event of an unexpected problem or error. The following are some examples of failures and solutions to those failures:
- Power Failure - Have computers and network devices running on a UPS.
- Power Surge - Utilize surge protectors.
- Data loss - Run scheduled backups and mirror data on an alternate location.
- Device/Computer Failure - Have a second device or computer. Also have replacement components available.
- Overload - Setup alternate computers or network devices that can be used as an alternative access point or can share the load through load balancing.
- Viruses - Make sure you maintain up-to-date virus definitions.
Data Storage and Retrieval
Data storage and retrieval is the process of gathering and cataloging data so that it can be found and utilized when needed. Organizations today are faced with many challenges as a result of the recent explosive growth of data. The problems of data consolidation, backup and recovery, storage growth and administration have mandated the need for storage networking. There are two main technologies utilized for storage networking: network-attached storage and storage area networks.
Network-Attached Storage (NAS)devices are specialized servers that are dedicated to providing storage resources. The devices plug into the local area network (LAN) where servers and clients can access NAS storage resources. Any server or client on the network can directly access the storage resources. Storage Area Networks (SANs) are made up of servers and stand-alone storage devices connected by a dedicated network. SAN storage devices do not have any server functionality and do not implement a file system. Hosts implement and manage the file system. SANs are typically used by larger organizations with complex data storage needs.
Bandwidth and Latency Considerations
Bandwidth refers to the data transfer rate supported by a network and is a major factor in network performance. The greater the bandwidth capacity, the greater network performance. However, network bandwidth is not the only factor that determines the performance of a network. Latency is another key element of network performance. The term latency refers to any of several kinds of delays commonly incurred in processing of network data. A low latency network connection is one that generally experiences small delay times while a high latency connection generally suffers from long delays.
Although the theoretical peak bandwidth of a network connection is fixed according to the technology used, the actual bandwidth obtained varies over time and is affected by latencies. Excessive latency creates bottlenecks that prevent data from filling the network pipe, thus decreasing effective bandwidth. The impact of latency on network bandwidth can be temporary (lasting a few seconds) or persistent (constant) depending on the source of the delays. Two common types of latency are router latency and architecture and peer induced latency.
Latency is the measurement of time it takes a packet to travel from place to place and the underlying fiber and copper networks they traverse handle packets at the speed of light. With router latency, the routing points in the network are the sources of latency. In an IP network, packets are routed from their origin to their destination through a series of routers. Each packet has a source and a destination address which is stored in a header. A router receives a packet of data then holds it long enough to open the header and read the destination address, and might make changes to the header depending on network congestion, andsends it on to the next router. The processing speed of the router governs how quickly this takes place. Therefore, the efficiency of the routers deployed on a carrier network will impact latency.
As routers on a network receive packets, open headers, and read destination addresses, they calculate the routing options they have to get the packet to its destination. Different networks may have different sets of routing rules. Larger networks and those of tier one carriers typically provide the shortest and most efficient routes from point to point. Smaller networks that do not have the expensive architecture of the tier-one carriers are often required to take packets a further distance through more routers thus causing more latency. This is known as aarchitecture and peer induced latency.
Interfaces & Conversions
Interfaces define the capabilities of communicating, transporting, and exchanging of information through a common dialog or method. Interfaces allow the interaction of systems to occur based on a common framework. Some common types of interfaces include:
ODBC - Open Database Connectivity is an interface for accessing data in a heterogeneous environment of relational and non-relational database management systems. ODBC provides an open way of accessing data stored in a variety of databases.
JDBC - Java Database Connectivity is an application programming interface (API) for the Java programming language that defines how a client may access a database. It provides methods for querying and updating data in a database.
API - An Application Programming Interface is a source code interface that supports requests made by computer programs.
Distributed Object - Software modules that are designed to work together, but reside either in multiple computers connected via a network or in different processes inside the same computer. One object sends a message to another object in a remote machine or process to perform some task. The results are sent back to the calling object.
Message Oriented Middleware (MOM) - is a client/server infrastructure that increases the interoperability, portability, and flexibility of an application by allowing the application to be distributed over multiple platforms. MOM buffers the application developer from the details of various operating systems and network interfaces.
Data and Storage Infrastructure
Databases in Organizations
For many organizations, transaction processing and warehousing is crucial to the survival of the business. Important records detailing a company's customer history, product inventory, supplier information, and any other information need to run the business are typically stored in and retrieved from databases. Databases provide a convenient means of storing vast amounts of information, allowing the information to be sorted, searched, viewed, and manipulated according to the needs of the business. Many organizations rely so heavily on the functions of databases that the business comes to a halt if the corporate databases are unavailable. The critical nature of databases today makes database management and maintenance a vital function for most organizations.
Types of Databases
Analytic databases, also known as On Line Analytical Processing (OLAP) databases, are primarily historical archives used for analysis. The system should be able to make these analyses easy to perform by all types of users. Security and performance are major considerations with analytic databases. Data in analytic databases is typically static and often read-only. Most analytic databases are multidimensional and allow users to view the data in several different ways and dimensions. Data warehouses are the most common form of analytic database.
Operational databases, also know as On Line Transaction Processing (OLTP) databases, are used to manage dynamic data typically associated with the operation of the business. These types of databases typically allow users to add, change or delete data. Operational databases are at the core of most organizations today and are responsible for all the data and transactions that are processed daily. Operational databases feed the data warehouses where later that data can be queried and analyzed.
Hierarchical and Network Databases
Besides differentiating databases according to function, databases can also be differentiated according to how they are structured. Essentially a data model is a description of both a container for data and a methodology for storing and retrieving data from that container. Before the 1980's, the two most commonly used database models were the hierarchical and network models.
With hierarchical databases, data is stored in tree-like structures. There are several parents and child tables. The tables are formed top down where child tables are dependent on a single parent or upper table.
The graphic above shows a hierarchical system, where the root table is a parent with two children and those children have two children each.
The network database model is similar to the hierarchical model but without a top-down approach. There are still child tables that are dependent upon a parent table. However the child tables can have several parent tables. This forms a matrix design instead of a vertical layout. The network database model was designed to solve some of the more serious problems with the hierarchical database model. Specifically, the network model solves the problem of data redundancy by representing relationships in terms of sets rather than hierarchy.
An example of a network database model.
The relational database model was developed by Dr. E. F. Codd at IBM in the late 1960s. Dr. Codd developed the relational database model in response to problems associated with the existing database models. At the core of the relational model is the concept of a table (also called a relation) in which all data is stored. Each table is made up of records (horizontal rows also known as tuples) and fields (vertical columns also known as attributes). Relational databases are currently the most commonly used types of databases. The relational model relates tables by actions or relationships in order to connect them. Tables are connected by keys (Primary & Foreign). The primary key is a unique tuple descriptor made up of a single or multiple attributes. A foreign key is when that attribute is used as a primary key in another table. Essentially the foreign key is used to reduce data redundancy. A good relational database has almost zero redundancy and all tables are connected by relationships to other tables.
The figure above shows all the tables connected through a series of relationships to other tables by primary and foreign keys.
Object Oriented Databases
Object Oriented (OO) Databases store information in the form of objects. Essentially OO databases are both an object oriented programming language and a database management system (DBMS). Objected oriented database management systems (ODBMS) are no longer thought of to replace relational database management systems (RDBMS), but to complement them. ODBMS make it easier to integrate with object-oriented programming languages such as Java and .NET.
An XML database is a data persistence software system that allows data to be imported, accessed and exported in the XML format.
Two major classes of XML database exist:
- XML-enabled - These map all XML to a traditional database (such as a relational database), accepting XML as input and rendering XML as output.
- Native XML (NXD) - The internal model of such databases depends on XML and uses XML documents as the fundamental unit of storage.
Current Trends in Enterprise Database Applications
Data Warehouses & Knowledge Management
A data warehouse is a repository for stored information. Data mining is used to extract this information for analysis and reporting. Advantages of data warehousing are complex queries can be run easily and efficiently on large sets of data. Knowledge management (KM) is a new emerging business practice for handling knowledge creation, codification, sharing, and innovation that often involves the use of a data warehouse. KM encompasses both technical and organizational aspects of business.
Key components of KM include:
- Generating new knowledge
- Accessing the value of knowledge from outside sources
- Using accessible knowledge in decision making
- Embedding knowledge in processes, products, and/or services
- Representing knowledge in documents, databases, and software
- Facilitating knowledge growth through culture and incentives
- Transferring existing knowledge into other parts of the organization
- Measuring the value of knowledge assets and/or impact of knowledge management
Master Data Management
Often, knowledge management, data mining, and other applications need a consolidated view of data in order to perform their functions. Master data management (MDM) provides and maintains a consistent view of an organization's core business entities, which may involve data that is scattered across a range of systems. The type of data involved in this process varies by industry and organization, but examples include customers, suppliers, products, employees and finances. Presently, many MDM applications concentrate on the handling of customer data because this aids the sales and marketing process, and can help improve sales and thus revenues. A hot new buzzword for customer MDM solutions is customer data integration, or CDI.
Online Analytical Processing
Online analytical processing (OLAP) is enables a user to easily and selectively extract and view data from different points-of-view. For example, a user can request that data be analyzed to display a spreadsheet showing all of a company's beach ball products sold in Florida in the month of July, compare revenue figures with those for the same products in September, and then see a comparison of other product sales in Florida in the same time period. To facilitate this kind of analysis, OLAP data is stored in a multidimensional database. Whereas a relational database can be thought of as two-dimensional, a multidimensional database considers each data attribute (such as product, geographic sales region, and time period) as a separate "dimension." OLAP software can locate the intersection of dimensions (all products sold in the Eastern region above a certain price during a certain time period) and display them.
Integration Opportunities and Problems
To avoid every adapter having to convert data to/from every other applications' formats, enterprise application integration (EAI) systems usually stipulate an application-independent (or common) data format. The EAI system usually provides a data transformation service as well to help convert between application-specific and common formats. This is done in two steps: the adapter converts information from the application's format to the common format. Then, semantic transformations are may be applied on the data (converting zip codes to city names, splitting/merging objects from one application into objects needed in another application, etc.).
Semantics gives meaning to something. Data with meaning is information. Data without meaning is nothing but a collection of bytes or characters. The semantics, or meaning, of data needs to be resolved across systems - this is known as data consistency. Different system may represent the data with different labels and formats that are relevant to their respective uses but the data often requires some sort of correlation in order to be useful by other systems and users. Beginning with the metadata, a model of an organization's information can be constructed. This model may contain information on the relationships and rules that represents the semantics of the data and interactions with other data, processes, and systems.
Metadata is essentially data about data. Metadata is information about documents, music files, photos and other forms of data. It is used to find information faster. Metadata is a key factor in the future of the semantic web (Web 2.0). It will allow for data to be related and easily integrated. Metadata can be generated automatically, however human intervention will allow for more precision. Some characteristics of metadata include:
Integrated - The biggest challenge in building a data warehouse is integrating all of the disparate sources of data and transforming the data into meaningful information. The same is true for a metadata repository. A metadata repository typically needs to be able to integrate a variety of types and sources of metadata and turn the results into meaningful, accessible business and technical metadata.
Scalable - A metadata repository that is not built to expand substantially over time will soon become obsolete. The growth of decision support systems and the increased used of knowledge management systems are factors are driving the current proliferation of metadata repositories.
Robust - A metadata repository must have sufficient functionality and performance to meet the needs of the organization that it serves. The repository's architecture must be able to support both business and technical user reports and views of the metadata.
Customizable - Metadata should be able to be customized into any type of relationship that the user wishes to create. This allows for users to establish their own relationships in order to easily integrate different types of media.
Open - The technology used for the metadata integration and access processes must be open and flexible. For example, the database used to store the metadata is generally relational and the metadata architecture should be sufficiently flexible to allow a company to switch from one relational database to another without massive architectural changes.
Metadata and XML
XML plays an important role in the description and processing of metadata and several XML based standards have been developed that relate to metadata.
Normalized Metadata Format (NMF) is an open specification that describes an XML Schema based representation for metadata. NMF provides a simple, flexible way to define and interchange metadata using mainstream XML tools and technologies and provides a straight-forward mechanical mapping to relational databases. Meta Object Facility (MOF) is an Object Management Group (OMG) standard for distributed repositories and metadata management. The XML Metadata Interchange (XMI) specification is an OMG standard for exchanging metadata information via XML.
There are other emerging XML standards for representing metadata. One of the most recent is the Web Ontology Language (OWL). OWL was designed to provide a common way to process the content of web information (instead of displaying it). OWL was designed to be read by computer applications (instead of humans). OWL is built upon the Resource Description Framework (RDF), which is a framework for describing resources on the web. RDF provides a model for data, and a syntax so that independent parties can exchange and use it and is written in XML.
The Common Warehouse Metamodel specifies interfaces that can be used to enable interchange of warehouse and business intelligence metadata between warehouse tools, warehouse platforms and warehouse metadata repositories in distributed heterogeneous environments. CWM is based on three standards:
- UML - Unified Modeling Language, an OMG modeling standard
- MOF - Meta Object Facility, an OMG metamodeling and metadata repository standard
- XMI - XML Metadata Interchange, an OMG metadata interchange standard
The Applications Portfolio
An Applications Portfolio (AP) is an inventory of software consisting of bundles of software or standalone pieces. These pieces can be as simple as a web browser or as complicated as a multiple system database used to process billions of financial transactions. Both are important in their own right, but each has its own needs, uses, and issues that must be addressed and tracked. An applications portfolio management strategy can be developed in order to determine what continued investments, if any, should be make in the applications in the portfolio. Consequently, the main goal of having an AP is to be able to assess the applications in use in any given organization using a variety of criteria, including:
Compliance with government regulations - With new regulations enacted each year (Most notably, The Sarbanes-Oxley Act and the Health Insurance Portability and Accountability Act) it is increasingly important for software systems to comply with these federal standards. Maintaining a complete and in-depth applications portfolio helps organizations with their compliance issues.
Costs to maintain and operate - An applications portfolio can be used to analyze and compare the operation and maintenance cost of the applications in the portfolio. The goal of an AP is to analyze these costs and if the costs to maintain and operate an application exceed its inherent value to the organization, a decision about the future of the application needs to be made. An AP can also help to identify areas of over and under investment and assist in the reallocation of funds to provide the most benefits or greatest value.
Ability to meet current and future business requirements - The ability of an application to meet current and future business requirements is crucial. The application must fully support the needs of the business and be aligned with the business processes it supports. A piece of software may need to be retired if it supports only a subset of needed current or future business processes.
Operational performance and technical status - An AP helps to ensure that applications maintain a high operational performance. A system that has frequent downtime due to system incompatibility, file corruption, fatal errors and general application faults may not be worth keeping in production. This is especially true if the effort keeping the system working exceeds the time of practical system use. The AP can help administer this process and aids in decision-making regarding system modification or replacement.
Risks - An AP can aid in the weighing of risks associated with each application and system in an organization. There are inherent risks anytime a piece of software is implemented. These risks include system failure, user rejection, incompatibility, and many others. In a well-developed applications portfolio, each application has a pre-determined set of risks that may be weighted by severity and/or likelihood. This information allows for easier assessment of potential application issues. It is always important to know how critical each application is to an organization. This is especially important in disaster recovery modes, where decision need to be made regarding what systems need to be brought online first. Once the business-criticality of each application has been assessed, the risk-urgency (the severity of consequences of a risk to the system) can be developed.
The AP ultimately is a major tool for the development of approaches, priorities, and timeframes for enhancement, renovation, consolidation, elimination, or replacement of the applications in the portfolio. According to Gartner, 40% of large enterprises will execute application portfolio management in the next two years. The motive for the rapid growth in the use of AP is the trend of other organizations having achieved successes in cost reduction, managing the complexities of hundreds of established applications, and improving budgeting effectiveness. Applications portfolio management is critical to understanding and managing the 40 percent to 80 percent of IT budgets devoted to maintaining and enhancing software. Most organizations don't track established applications over time to ascertain return on investment (or to determine which should be disposed of), and few manage application portfolios with tools. In other words, these organizations haven't truly associated the substantial amount of money they're spending with what they are spending it on.
Transition from Legacy Systems
Legacy systems are alive and well in many organizations today. In many of these cases, the legacy system may be in need of updating or replacement. There are many challenges and issues to consider when considering the replacement of a legacy system, some of which include:
Management rarely approves a major expenditure if the only result is lower maintenance costs, instead of additional business functionality
This is where a business case must be made to show the need to purchase or build a new system. Simply saying that the legacy system does not work well will not show this, it needs to have some type of additional business positive to push it along. Upgrading something that essentially still does its job can be a sticking point with the decision makers of any organization.
Development of complex systems takes years, so unanticipated business processes will have to be added to keep pace with the changing business climate, which increases the risk of failure
Many legacy systems remain in use due simply to the fact that they are quite large and engrained in the daily business routine and upgrading can and most likely will be a major headache. Not only due to having to redo the system but because it can take a long time from development to production or in the case of an off the shelf solution, from the request for proposal to production. The longer it takes to implement the system, the larger the window for failure. This requires close supervision and excellent management skills to pull off and still keep a multi-year project relevant at the time of release.
Documentation for the old system is frequently inadequate
This is especially frustrating if an organization is upgrading a system and not replacing it. In some cases legacy systems were written decades ago by people who no longer can be contacted for information. So dissecting the system and figuring out how it works can be tedious. Furthermore, this can be a large problem if it's an application that interfaces with other applications. Discovering how they communicate can be very difficult and time consuming.
Like most large projects, the development process will take longer and cost more than planned, testing management's patience
This happens quite frequently, as project scope creep is hard to avoid, particularly with large projects. This is something that needs to be accounted for in the beginning of the application change project to lessen the surprise when it does occur. With any homegrown or off the shelf system, unexpected problems occur and it is difficult if not impossible to completely prevent.
Transition from Monolithic Systems to Service-Oriented Systems
Many organizations rely on older, monolithic systems that run their major enterprise processes. With these traditional systems, the processing, data and the user interface all reside with the same system. This traditional systems architecture has several short-comings including:
- Functionality is often not well modularized. This leads to complexity that makes system comprehension difficult.
- The resource requirements can be very high, requiring very expensive machines.
- Initial development and upgrading can be time consuming and difficult.
- Because of the lack of modularity, it is very difficult to make use of previously existing functionality. In other words it is difficult to reuse code so the same problem may be solved many times over.
- Reliability may be a problem. A crash of one part may bring down the whole system.
For many organizations, the solution to these problems is to use a Service Oriented Architecture (SOA). SOA is defined by the OASIS consortium as:
"A paradigm for organizing and utilizing distributed capabilities that may be under the control of different ownership domains. It provides a uniform means to offer, discover, interact with and use capabilities to produce desired effects consistent with measurable preconditions and expectations."
SOA is a way to expose a monolithic application as a set of reusable services and allows for the opportunity to create new business processes by assembling and assigning those new separate services. Additionally, because SOA is based on open standards, it is a valuable approach to enable interoperability with business partners that old monolithic systems more often than not did not easily allow. Since SOA is not tied to any development language or operating system, it is ideal for updating older systems that need to communicate easily with other systems. Most organizations that are transition to SOA from an older legacy environment do so for three main reasons:
- Flexibility: SOA can improve business responsiveness with a low IT impact and cost.
- Reuse: SOA is based on clearly identified reusable components. It must leverage the existing legacy application, creating new value from it.
- Interoperability: SOA enables communication with other systems within the organization or from business partners.
SOA can enable certain capabilities unrealized by monolithic systems. These include:
- Design services to mirror the real-world business activities that comprise the enterprise business processes. SOA enables IT to be aligned with strategic business objectives.
- More easily evolve the initial architecture of a system to include new business function or future technologies. SOA enables technical flexibility and, thus, business responsiveness.
- After existing applications are exposed as services, you can orchestrate them with other services to create complete business processes. SOA stimulates the reuse of existing assets.
- Disparate systems, implemented with a variety of technologies, can more easily interact. SOA enables interoperability between applications.
Integration Opportunities, Problems, and Future Trends
In the past different enterprise systems has distinct, proprietary standards for data exchange. For example, in the 1960s doing business with GM, Sears and K-Mart required three different system interfaces since no standard for information exchange existed at that time. This problem gave rise to modern electronic data exchange (EDI) standards.
EDI is the computer-to-computer exchange of business data in standard formats. In EDI, information is organized according to a specified format set by both parties, allowing a “hands off” computer transaction that requires no human intervention or rekeying on either end. The information contained in an EDI transaction set is, for the most part, the same as on a conventionally printed document. Generally speaking, EDI is considered to be a technical representation of a business conversation between two entities, either internal or external. EDI is considered to describe the rigorously standardized format of electronic documents. The EDI standards were designed to be independent of communication and software technologies.
There are four major sets of EDI standards:
- The UN-recommended UN/EDIFACT is the only international standard and is predominant outside of North America.
- The US standard ANSI ASC X12 (X12) is predominant in North America.
- The TRADACOMS standard developed by the ANA (Article Numbering Association) is predominant in the UK retail industry.
- The ODETTE standard used within the European automotive industry
All of these standards date back to the early 1980s. Today, many of these standards are yeilding to the rapid adoption of ebXML.
Initiated in 1999, ebXML (electronic business XML) began as an effort by the United Nations Center for Trade Facilitation and Electronic Business (CEFACT) and Organization for the Advancement of Structured Information Standards (OASIS). It sought to enable the exchange of electronic business data on a global scale using XML to facilitate an interoperable, secure, and consistent mechanism for global data exchange. Use of this technical framework would be instrumental in accessing businesses of all sizes, anywhere in the world.
Application messaging architectures provide a platform that supports interoperability among loosely coupled applications over a bus. When broad interoperability is required due to the messages needing to go between multiple applications or even multiple enterprises, in-depth application messaging architectures might be required. This is because the chance of the applications that are spread across the organization or further conforming to one another is not high. Additionally, the set of participating applications can be asynchronous and display disconnected operations, which can work with no direct communication session, yet they still might require guaranteed delivery from certain events. Differences in the base of the various applications imply a need for an integration mechanism. The workings of application messaging satisfy this need, allowing for open, close, send, and receive actions with application-defined message structures. Heterogeneity in the disconnected execution and notification modes of applications introduce requirements for service qualities that include routing, assured just-once delivery, and retained sequencing. The architecture that has emerged is called store-and-forward queuing. In a store-and-forward model, messages are sent to a queue, which are in turn hosted at specific destination network addresses. The navigation of messages from their starting point occurs through a network that ensures the integrity of message delivery to the destination and presentation to the addressee.
Remote Procedure Calls
A Remote Procedure Call (RPC) is a protocol that one program can use to request a service from a program located in another computer in a network without having to understand network details. RPCs traditionally utilize the client/server model. The requesting program is a client and the service-providing program is the server. The main issue with RPC is that it is a synchronous operation that requires the requesting program to be suspended until the results of the remote procedure are returned. XML-RPCs utilize XML exchange data. Given the widespread support for XML, XML-RPC offers a viable invocation model over the Internet using HTTP.
Programming Language Mismatches
A large issue with integrating systems is programming language mismatches. There are hundreds of languages out there and each application in an organization might very well be written in a different one. Therefore it can be tremendously difficult to get two different applications written in two completely different languages to communicate. Though there are several methods for communication between applications written in different languages, Web services is emerging as the standard of choice for most organizations. Web services offer open, XML-based standards for inter-application data exchange.
Coupling refers to the interrelatedness and interdependencies between two or more components. A form of coupling is a dependency on a specific commonly understood interface between two applications. Loose coupling describes an approach where integration interfaces are developed with minimal assumptions between the sending/receiving parties, thus reducing the risk that a change in one application/module will force a change in another application/module.
Loose coupling has multiple dimensions. Integration between two applications may be loosely coupled in time using M\message-oriented middleware, meaning the availability of one system does not affect the other. Alternatively, integration may be loosely coupled in format using middleware to perform data transformation, meaning differences in data models do not prevent integration. In Web services and Service Oriented Architecture, loose coupling implies that the implementation is hidden from the caller. Loosely coupled services, even if they use incompatible system technologies, may be joined to create composite services, or disassembled just as easily into their functional components.
Event-driven programming is a programming paradigm in which the flow of the program is determined by sensor outputs or user actions (mouse clicks, key presses) or messages from other programs or threads. Event-driven programming can also be defined as an application architecture technique in which the application has a main loop which is clearly divided down to two sections: the first is event selection (or event detection), and the second is event handling. In embedded systems the same may be achieved using interrupts instead of a constantly running main loop; in that case the former portion of the architecture resides completely in hardware.
Event-Drive Architecture (EDA) is centered on a broader concept of events. An event is anything that happens or is thought to happen. Examples of events vary widely and include bank transactions, stock trades, customer orders, address changes, shipment deliveries, birthdays, moon landings, the Great Depression and anything that occurs in a simulation or dream. Events may be instantaneous, or they may happen over time.
Data about an event can be recorded in an event object (an event object may be called just an "event," which overloads the term and sometimes leads to confusion). Event objects exist in many forms, such as an XML document containing a customer order, an e-mail confirmation of an airline reservation, a database row containing a new customer address, data from a radio frequency identification (RFID) sensor reading or a financial data feed message that reports an equity trade.
An event object generally includes some attributes of the event, one or more time stamps and sometimes an identifier of the event source and related context data. If the event is instantaneous, then there may be one time stamp noting when it happened; otherwise, there may be separate time stamps for the beginning and end of the event. There may be yet another time stamp for when the event object was created or sent. When an event object is transmitted in the form of a message, the combination is called an "event notification" or "notification." EDA is defined as an architectural style in which one or more components of a system execute in response to receiving one or more event notifications. The flow of work through the system is determined, in part, by transmitting notifications. According to Gartner, EDA applications must implement these three principles:
- Notifications are pushed by the event source, not pulled by the event consumer. In other words, the event source determines when the message that contains the event object is sent.
- he arrival of a notification causes the event consumer to act immediately. However, in some cases, the action is merely to save the event object for subsequent processing. The consumer is waiting for a notification and is driven to do something by its arrival.
- A notification does not specify what action the event consumer will perform; it is a report, not a request. The consumer contains the logic that determines how it will respond. In this respect, notifications are fundamentally different from procedure calls or method invocations, which explicitly specify the particular functions that the message recipient will perform.
Using SOA With EDA
As a general principle, architectural styles often are composable. In other words, multiple
architectural styles can apply simultaneously to different aspects of one system. For example, an application may use relational data architecture, object-oriented architecture and model-driven architecture all at once because these styles are not mutually exclusive. This principle applies because SOA and EDA are complementary and composable. Most new applications that use EDA with business events should be implemented as SOA simply because SOA is helpful in most distributed software systems. The key qualifier is the term "business." A business event is an event that is relevant to the business: It is a meaningful change in something in the company or related to its activities. Examples of business events include bank transactions,
stock trades, customer orders, address changes, shipment deliveries and hiring employees. Business events typically are of the right level of granularity to be implemented as SOA service interfaces.
Objects, Components, and Services
An object is an instance (or instantiation) of a class. A class is is a cohesive package that consists of a particular kind of metadata. It describes the rules by which objects behave; these objects are referred to as instances of that class. This blueprint includes attributes and methods that the created objects all share. The class object contains a combination of data and the instructions that operate on that data, making the object capable of receiving messages, processing data, and sending messages to other objects.
A component is a decomposition of an engineered system into functional or logical components with well-defined interfaces used for communication across the components. A software component is a system element offering a predefined service or event, and able to communicate with other components. Components are considered to be a higher level of abstraction than objects and as such they do not share state and communicate by exchanging messages carrying data.
The term service refers to a discretely defined set of contiguous and autonomous business or technical functionality. A Web service provides one way of implementing the automated aspects of a given business or technical service. A Web service is a software system designed to support interoperable Machine to Machine interaction over a network.
The differences among these three entities are primarily driven by two factors: location and environment. Location refers to the relative locations of the entity (e.g., object) and client, or, more specifically, the relative location of the processes in which the entity and the client live. Environment refers to the hosting runtime environment for the entity and the client (for example, IBM's WebSphere or Microsoft's .NET). Objects, components, and Web services have many other differences (some quite unexpected), but they are mainly derivatives of location and environment. When both the entity and the client are located in the same process, the relationship is characterized as an object relationship. The environments for the entity and the client must be the same, because a single process can't live in more than one environment.
When the entity and the clients are located in different processes, then the environment becomes the defining characteristic. When the environment is the same for the client and the entity, the relationship is characterized as a component relationship. You can implement a component relationship in, for example, WebSphere's EJB (Enterprise Java Beans) environment or Microsoft's .NET managed components environment, two popular component-supporting technologies.
When the environment is different for the client and the entity, the relationship is characterized as a Web service relationship. Web service technologies include SOAP (Simple Object Access Protocol), WSDL (Web Services Description Language), UDDI (Universal Description, Discovery, and Integration), and others. Figure X contrasts objects, components, and Web services based on differences in location and environment.
When to Use an Object, Component, or Service
Clearly you want the fastest communications that will meet your constraints. Let's consider building a point-of-sale (POS) system. Your POS system may want to interact with an inventory system, say, to let it know that someone has just purchased an espresso machine and that now would therefore be a good time to decrement the espresso machine inventory.
There are many inventory systems in the world. When building the POS system, you may not know with which of these inventory systems you will eventually be interacting. Similarly, the inventory system, when it was built, probably had no idea which POS system would be the one letting it know about espresso machine sales.
Since the POS and inventory systems were developed independently, they may well have been written for different environments. The POS might be, for example, a WebSphere system and the POS a .NET system. Although it is possible the POS and the inventory systems were developed for the same environment (say, both for WebSphere), you don't want to take any chances. You should use environment-agnostic Web service protocols to connect the two.
Now let's look more closely at the POS itself rather than how it connects with other systems. The POS is built by a single group of coordinated developers. They are not likely to build part of the POS in WebSphere and part of it in .NET. They are going to choose one environment and stick with it.
This does not mean, however, that they are not using distributed programming. They probably do have different parts of the POS running on different machines and processes, perhaps because they are using a three-tier architecture. Each sales station is probably running its own machine/process; there may be another part of the POS that runs on a machine that consolidates sales across the store, another part that consolidates sales across the region, and yet another part that stores data in a central data repository. All of these require distribution; they just don't require different environments. Therefore, the most efficient technology available is the component technology.
Within a given process of the POS, perhaps hundreds of thousands of lines of code are bouncing back and forth. Objects are a great way to organize this code. Everything is happening within a single process, so you don't need the overhead of components and certainly don't need the overhead of Web services.
For the POS architect, it is not a choice among objects, components, and Web services. It is a matter of choosing which to use for what kinds of communications. Good system architectures therefore don't look at objects, components, and Web services as mutually exclusive choices. Instead, they look at them as building blocks—all useful, but for different purposes. Web services are useful for tying together autonomous systems; components for coordinating the process distribution within a system; objects for organizing the code within a process.
Integrated Security Architecture
All security involves trade-offs and most security trade-offs are subjective, meaning they are open to debate. Often these trade-offs involve the need to ensure that users can get to everything they need versus restricting access to things. In order to understand the implications of these trade-off decions, questions that must be answered typically include:
- What assets are we trying to protect?
- What are the risks to those assets?
- How well does the security solution mitigate those risks?
- What other risks does the security solution cause?
- What trade-offs does the security solution require?
Essentially, the tradeoff is between security and usability. The most secure system is one that is disconnected and locked into a safe. This has implications for all technologies. We can make any technology more secure, but by doing so we will probably make it less usable. So, how do we make it more secure and more usable? This is where the third axis of the tradeoff comes into play. Any good engineer is familiar with the principle of "good, fast, and cheap.” You get to pick any two.
This fundamental tradeoff between security, usability, and cost is extremely important to recognize. Yes, it is possible to have both security and usability, but there is a cost, in terms of money, in terms of time, and in terms of personnel. It is possible to make something both cost efficient and usable, and making something secure and cost-efficient is not very hard. However, making something both secure and usable takes a lot of effort and thinking. Security takes planning, and it takes resources. Security administrators face some interesting tradeoffs. Fundamentally, the choice to be made is between a system that is secure and usable, one that is secure and cheap, or one that is cheap and usable. We cannot have everything. The best practice is not to make the same person responsible for both security and system administration. The goals of those two tasks are far too often in conflict to combine both functions into one position.
TCP/IP and Internetworking
Internetworking and Extranets
Internetworking involves connecting two or more distinct computer networks or network segments together to form an internetwork (often shortened to internet), using devices such as routers and switches to pass data across the network/internetwork. These devices function on layer 3 of the OSI Basic Reference Model (Network Layer) (see following figure). Within the network layer, data is passed in the form of packets and processed/routed accordingly with level 3 protocols depending on the suite utilized. IPv4, IPv6, ARP, ICMP, RIP, OSPF, BGP, IGMP, and IS-IS are some examples of protocols that could be used for processing.
An intranet is a private computer network that uses Internet protocols and network connectivity to securely share part of an organization's information or operations with its employees. Sometimes the term refers only to the most visible service, the internal website. Intranets are very similar to intranetworks in almost entirely all aspects except that they are kept private. An extranet can be viewed as part of a company's intranet that is extended to users outside the company (e.g.: normally over the Internet). Extranets are used to securely share part of an organization's information or operations with suppliers, vendors, partners, customers or other businesses.
In relation to computing infrastructure, a protocol is the set of standards that governs transmissions between two endpoints. Protocols are developed to define communications and each protocol has its own unique set of rules. Interpretation of a protocol can occur at the hardware level, software level, or a combination of the two. When designing software or hardware, engineers must follow the defined protocol if they intend to successfully connect to other networked devices/programs. Some examples of properties that protocols are defined with may include but are not limited to:
- Message format
- Error handling
- Termination procedures
Several protocols are defined for each layer of the 7 Layer OSI Model. In the lower levels the protocols define hardware devices and at higher levels protocols are defined for the application layer of computing. In the enterprise application integration, we are typically more concerned with the higher level protocols defined in the application layer. Common protocols in the application layer include DHCP, DNS, FTP, HTTP, IMAP, LDAP, SIP, SMTP, and SOAP.
TCP/IP is the set of communications protocols used for the Internet and other similar networks. It is named from two of the most important protocols in it: the Transmission Control Protocol (TCP) and the Internet Protocol (IP), which were the first two networking protocols defined in this standard. TCP/IP spans several layers of the 7 Layer OSI Model to create a link between two or more networked devices. TCP/IP was created in the 1970's by DARPA to lay the foundation for a wide area network later to become known as the Internet.
The original TCP/IP network model was developed for the DoD and consisted of three layers, but the model that is now utilized generally is based around four layers, (the data link layer, the network layer, the transport layer, and the application layer).
The three top layers in the OSI model—the application layer, the presentation layer and the session layer—are not distinguished separately in the TCP/IP model where it is just the application layer. Within each of these layers the TCP/IP model executes various procedures in regards to the task to accomplish the overall communication link. The following table provides a comparison of the OSI and TCP/IP network models.
Model Architecture Comparison
4. Application layer - Telnet, FTP, SMTP, DNS, RIP, SNMP
4. Transport layer
3. Transport layer - TCP, UDP
3. Network layer
2. Network layer (Internet layer) - IP, IGMP, ICMP, ARP
1. Link layer (Network Interface layer) - Ethernet, Token Ring, Frame Relay, ATM
Routing is the process of selecting paths in a network along which to send network traffic. In packet switching networks such as the Internet, routing directs the transit of logically addressed packets from their source toward their ultimate destination through hardware devices such as routers, bridges, gateways, firewalls, and switches. The routing process usually directs forwarding on the basis of routing tables. A routing table is a set of rules, often viewed in table format that is used to determine where data packets will be directed. A routing table contains the information necessary to forward a packet along the best path toward its destination. Each packet contains information about its origin and destination. When a packet is received, a network device examines the packet and matches it to the routing table entry providing the best match for its destination. The table then provides the device with instructions for sending the packet to the next hop on its route across the network.
Integration Problems and Opportunities
Implementing a functional internetwork is no simple task. Many challenges must be faced, especially in the areas of connectivity, reliability, network management, and flexibility. Each area is key in establishing an efficient and effective internetwork. The challenge when connecting various systems is to support communication among disparate technologies. Different sites, for example, may use different types of media operating at varying speeds or may have different types of systems that need to communicate.
Because companies rely heavily on data communication, internetworks must provide a certain level of reliability. This is an unpredictable world, so many large internetworks include redundancy to allow for communication even when problems occur. Furthermore, network management must provide centralized support and troubleshooting capabilities in an internetwork. Configuration, security, performance, and other issues must be adequately addressed for the internetwork to function smoothly. Because nothing in this world is stagnant, internetworks must be flexible enough to change with new demands.
Security is the condition of being protected against danger or loss. In the general sense, security is a concept similar to safety. The nuance between the two is an added emphasis on being protected from dangers that originate from outside. Individuals or actions that encroach upon the condition of protection are responsible for the breach of security. In terms of application integration, there are five main areas of security for consideration:
Computing security - is a branch of computer science that addresses enforcement of 'secure' behavior on the operation of computers. The definition of 'secure' varies by application, and is typically defined implicitly or explicitly by a security policy that addresses confidentiality, integrity and availability of electronic information that is processed by or stored on computer systems. Some good computing practices include keeping unauthorized users from using your computer, changing passwords regularly, and keeping business email separate from personal accounts.
Data security - is the means of ensuring that data is kept safe from corruption and that access to it is suitably controlled. Data security is enforced with profiles that restrict users from changing data. The subject of data security is very broad but generally defines the actions taken to ensure the quality of data remains high with little or no errors.
Application security - encompasses measures taken to prevent exceptions in the security policy of an application or the underlying system (vulnerabilities) through flaws in the design, development, or deployment of the application. Application security is generally addressed in the design of the application and is used to prevent wormholes from appearing within the system. For example if an error occurs in a program and dumps someone's bank account number in the stack trace, that would not be desirable and could be a serious security concern for the organization responsible.
Information security - is the means of protecting information and information systems from unauthorized access, use, disclosure, disruption, modification, or destruction. Information security is almost identical to computing security but focuses on more of the information involved and not choice on the end of the user. Good practices involve enforcing strict business rules to prevent the alteration of data when the action is not desired.
Network security - consists of the provisions made in an underlying computer network infrastructure, policies adopted by the network administrator to protect the network and the network-accessible resources from unauthorized access and the effectiveness (or lack) of these measures combined together.
Security within an internetwork is essential. Many people think of network security from the perspective of protecting the private network from outside attacks. However, it is just as important to protect the network from internal attacks, especially because most security breaches come from inside. Networks must also be secured so that the internal network cannot be used as a tool to attack other external sites. There are a variety of strategies that organizations take in order to secure data across the internetwork/extranet, some of the more popular include:
Isolation - the goal of this strategy is to isolate systems with differing levels of public access from each other. Firewalls offer isolation of network devices from outside or unauthorized users. Firewalls can be configured based on the requirements to limit access to certain networks/ports. Another form of isolation is dependent on architecture design. By creating sub-networks or independent entities, network engineers can isolate all or any part of a network that should not be accessible by the general user. By designing high risk systems to only communicate with a small independent network, engineers can limit the chance of unauthorized access or failure scenario.
Strong authentication - where possible, systems should implement some form of two-factor authentication. Two-factor authentication is the practice of utilizing two distinct methods for determining whether or not the user is allowed to access the system. By enforcing a strong form of authentication organizations can reduce the risk of an unauthorized user gaining access to a specific system. Authentication methods are grouped into three categories with respect to human factors. These three categories are as follows:
- Something the user has (ID card, security token, or cell phone)
- Something the user knows (Password, pass phrase, or personal identification number (PIN))
- Something the user is or does (e.g., fingerprint or retinal pattern, DNA sequence (there are assorted definitions of what is sufficient), signature or voice recognition, unique bio-electric signals, or another biometric identifier)
Recently the question of two-factor security has come into question mostly due to the idea that if hackers want your information they can deploy a variety of false interfaces that can collect your information such as man-in-the-middle attacks and Trojan attacks. In order for organizations to avoid these occurrences they must take extra measures to ensure they make it clear they the user is on their page and not a spoofed page.
Granular access controls - Granular access controls are essential to the secure operation of complex systems. If your organization must interact with a number of different suppliers, customers, vendors and business partners, you need to take steps to enforce the principle of least privilege. Digital certificates are one of the most common ways for organizations to deal with various levels of security/permissions. Digital certificates are usually configured in a public key environment where the user has their own personal unique key that combines with the public key and is ultimately fed through the authentication routine. This method allows numerous privacy levels to all function within the same system securely. At an organizational level the security authority at the company would certify the user in a specific domain and after that point they are restricted to the certain actions that are associated with the profile they were approved with. VeriSign, a trusted certificate authority, eventually created classes of certification that can be applied to users of a particular system.
The classes are as follows:
- Class 1 for individuals, intended for email
- Class 2 for organizations, for which proof of identity is required
- Class 3 for servers and software signing, for which independent verification and checking of identity and authority is done by the issuing certificate authority (CA)
- Class 4 for online business transactions between companies
- Class 5 for private organizations or governmental security
Encryption - By nature, extranets involve sharing sensitive organizational data over the Internet. Encryption ensures that extranet clients make use of virtual private network (VPN) technology that provides strong encryption for data in transit over these unsecured networks. Also, ensure that both the VPN solution (both client and server hardware and software) and the encryption algorithm they use meet security requirements. The basic task of encryption is to transform information into a stream formatted using a mathematic algorithm. Based on the application encryption can vary greatly, but the common trend is for great encryption schemes for data being passed out of a private network.
Internet Protocol version 6 (IPv6) is a network layer for packet-switched internetworks. It is designated as the successor of IPv4, the current version of the Internet Protocol for general use on the Internet. The main change brought by IPv6 is a much larger address space that allows greater flexibility in assigning addresses. Routing is more streamlined as opposed to traditional IPv4 transport but requires a bit more bandwidth, which can be a problem in areas where bandwidth is extremely limited. Some of the major advantages of IPv6 include:
Larger Address Space - IPv6 relieves the threat of space exhaustion, which is closely approaching in the current IPv4 protocol. More address space also means that network administration will become much easier due to the elimination of complex subnetting schemes.
More Efficient Routing Infrastructure - IPv6 also offers an auto-configuration feature that allows the device to send a request once connected to receive an address.
Better Security - As a result of decreased steps from addressing to browsing the overall security of the protocol is much better. Addressing also makes it possible to better track hackers since all addresses are unique. Based on a variety of methods cyber-crime can be better enforced once all devices are converted to the IPv6 protocol.
Better Quality of Service (QoS) - The overall quality of service will increase due to the simplified routing procedures and more direct routing schemes. The protocol will likely reduce bandwidth usage but this will be verified once a large scale adoption of the system is in place.
Event-Driven Architecture (EDA) is a software architecture pattern promoting the production, detection, consumption of, and reaction to events. Event-driven architecture complements service-oriented architecture (SOA) because services can be started by triggers such as events. EDA is comprised of different event layers and processing styles depending on the event being processed. By implementing this loosely coupled but well distributed structure, organizations can better analyze/understand the actions that drive business.
SaaS / Web 2.0
Software as a service (SaaS) is a software application delivery model where a software vendor develops a web-native software application and hosts and operates (either independently or through a third-party) the application for use by its customers over the Internet. Web 2.0 technologies can be more generally defined as networks where users determine what types of content they want to view or publish. Currently there has been a huge drive from companies to implement solutions that fall into the Web 2.0 category. Main advantages of these systems include flexibility and intuition on the users end. The most popular instances of Web 2.0 technologies include:
Web 3.0 is a term that is used to describe various evolutions of Web usage and interaction along several paths. These include transforming the Web into a database, a move towards making content accessible by multiple non-browser applications, the leveraging of artificial intelligence technologies, the Semantic web, the Geospatial Web, or the 3D web. Although the future of Web 3.0 is still not certain what we are sure of is that the general concept is to transform the Internet into an all encompassing database with the infusion of artificial intelligence and services.
An ontology is a formal representation of a set of concepts within a domain and the relationships between those concepts. It is used to reason about the properties of that domain and may be used to define the domain. The term is borrowed from philosophy, where an ontology is a systematic account of existence. An enterprise ontology is a collection of terms and definitions relevant to business enterprises. Semantic heterogeneity is a major issue in enterprise integration. This problem is not correctly addressed by today's EI solutions that focus mainly on the technical integration. Addressing the semantic aspect will promote EI by providing it more consistency and robustness. Efforts to solve the semantic problem at different levels are still immature.
One such effort is the development of the Web Ontology Language (OWL) to express ontologies in a standard, XML based language. OWL is a set of markup languages that are designed for use by applications that need to process the content of information instead of just presenting information to humans. OWL ontologies describe the hierarchical organization of ideas in a domain, in a way that can be parsed and understood by software.
The Semantic Web is an evolving extension of the World Wide Web in which the semantics of information and services on the web is defined, making it possible for the web to understand and satisfy the requests of people and machines to use the web content. At its core, the semantic web comprises a set of design principles,collaborative working groups, and a variety of enabling technologies, such as OWL. Some elements of the semantic web are expressed as prospective future possibilities that are yet to be implemented or realized.
Semantic integration is the process of using business semantics to automate the communication between computer systems. Semantic integration relies on metadata publishing to allow ontology's to be linked or mapped. An information model of an organization's information set containing relationships and rules that represent the semantics of the data and it's interaction with other data and processes may be represented with the Web Ontology Language (OWL). While the Semantic Web may be years away from attainment, a model-driven enterprise is achievable today. By creating an active model of data entities and mapping those entities to their respective sources exposed as Web services, true enterprise information integration may be realized.
Cases (4 pages)
3.1 Hardware and network infrastructure at AlphaCo
3.2 Data infrastructure and applications portfolio at AlphaCo
Modeling Techniques (5 pages) - This is now done as a separate chapter
Boxes and Arrows - ANSI standard flowcharting, dataflow diagrams
UML, Overview, basic constructs
Issues with traditional modeling approaches
Emerging Modeling Techniques
XML modeling techniques/standards
BPML, overview, basic constructs - this is now covered in the Process chapter