This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
High performance data integration and workflow solutions can be built on SSIS (SQL Server Integration Services) platform. Using ETL we can create packages that can be SSIS packages which are made up of tasks that can move data from source to destination and alter it if required.
One of ETL(Extraction, Transformation, and Load) tool is SSIS (SQL Server Integration Services) whose main job is to extract, transform and loading of data apart from ETL job it can also be used for several other purposes for example, to automate maintenance of SQL Server databases, update multidimensional cube data etc as well.
Data Transformation Services (DTS) was part of Microsoft SQL SERVER release 7.0 . DTS acts as a was the backbone of the Import/Export Wizard, and the main role of DTS's is to transform data from almost any OLE DB-compliant data source to another destination. Apart from the transform functionality it also has the ability to execute programs and run scripts, making workflow a minor feature.
Microsoft has released DTS with new features in SQL Server 2000.
After Five years, with the release of Microsoft SQL Server 2005 Microsoft released SSIS for the first time, SSIS replaced Data Transformation Services(DTS), which had been a feature of SQL Server till 2005. DTS is no longer an understated feature, but one of the main business intelligence (BI) foundations. Transformation Services was included in all versions, but SSIS is only available in the "Standard" and "Enterprise" editions.
DTS and SSIS may look similar from user perspective but in fact both are quite different SSIS is modifications to the DTS but it has been written right from the scratch and SSIS address many features that DTS doesn't, known as DTS limitations
SQL Server 2005 released in October, 2005. As mentioned earlier SSIS was included in that release. Updates to the initial version came along with SQL Server 2008 and SQL Server 2008 R/2.
Tools in SSIS
This section introduces briefly about the tools in SSIS.
Import and Export Wizard
SSIS contains Import and Export wizard which can be used to move data quickly from any OLE-DB data source to any destination. SSIS import and export wizard can be started in different ways, one way is to start through SQL server business intelligence development studio alternatively you can also start it using SQL server management studio. The wizard is a quick way to move the data and perform very light transformations of data.
SQL Server Import and Export wizard Screen Shot: -
The Business Intelligence Development Studio
Business Intelligence Development Studio(BIDS) is the Integrated Development Environment(IDE) from Microsoft used for developing data analysis and Business Intelligence solutions utilizing the Microsoft SQL Server Analysis(SSAS), SQL Server Reporting Services(SSRS) and SQL Server Integration Services(SSIS). BIDS is the central tool that we will spend most of our time in as a SQL Server SSIS developer. Designer in SQL Server 2000.
BIDS Screen Shot :-
Architecture of SQL Server Integration Services
SSIS which was part of the SQL server 2005 has become a major in the field of Extraction, Transformation and Loading (ETL). SSIS was a complete code rewrites from SQL Server 2000 DTS. What's especially nice about SSIS is its price tag, which is free with the purchase of SQL Server. Other ETL tools can cost hundreds of thousands of dollars based on how you scale the software.
As shown in the following diagram, MicrosoftÂ SQL ServerÂ Integration Services consists of diverse components. The SSIS architecture consists of four main components:
The SSIS Service
The SSIS runtime engine and the runtime executables
The SSIS data flow engine and the data flow components
The SSIS clients
The SSIS runtime engine and its complementary programs actually run SSIS packages. The engine saves the layout of packages and manages the logging, debugging, configuration, connections, and transactions. Additionally, it manages handling events when one is raised in package. The runtime executables provide the following functionality to a package.
Containers: Provide structure and scope to your package
Tasks: Provide the functionality to your package
Event Handlers: Respond to raised events in your package
Precedence Constraints: Provide ordinal relationship between various items in your package
SSIS Components :-
SSIS creates packages which are composed of tasks that can move data from source to destination, and if necessary transform it. Within SSIS package the workflow can be defined, the SSIS runtime engine ensures the tasks inside the package are executed according to the workflow.
Package consists of a collection of tasks; those are executed in a orderly fashion by SSIS runtime engine.Â Package is an XML file that can be saved on SQL Server or it can also be saved on a file system. A package can be executed by SQL Server Agent Job, DTEXEC command (a command line utility bundled with SSIS to execute a package; another similar utility DTEXECUI, has a GUI), from BIDS environment or by calling one package by another package.ute
A task is nothing but a unit work that can be executed by SSIS. They provide basic functionality to package (which is nothing but a collection of tasks), it is similar to that a method does in a programming language. The following are some of the tasks available to you:
ActiveX Script Task: This task executes an ActiveX script in SSIS package. Legacy DTS packages contain this kind of tasks.
Analysis Services Execute DDL Task: This task can be used for executing a DDL task in Analysis Services. For example, to create, to drop, or to alter a cube.
Analysis Services Processing Task: This task can be used for processing a SQL Server Analysis Services cube, dimension, or mining model.
Bulk Insert Task: This task can be used for loading data into a table by using the BULK INSERT SQL command.
Data Flow Task: Using Data flow task we can load ,transform data into an OLE DB destination.
Data Mining Query Task: This task allows you to run predictive queries against Analysis Services data-mining models.
Execute DTS 2000 Package Task: this task exposes SQL Server 2000 DTS packages to SSIS 2005 package.
Execute Package Task: Allows us to execute a package from within a package, making SSIS packages modular.
Execute Process Task: This task executes a program external to package, for example split a file into many files before processing them as individual files.
Execute SQL Task: This task executes a SQL statement or stored procedure.
File System Task: This task can handle operations relating to files and directories such as creating, renaming, or deleting a directory similarly file operations such as moving, copying, or deleting files.
FTP Task: This task can be used to sends/ receives files from an FTP site.
Message Queue Task: This task can be used to send/receives messages from a Microsoft Message Queue (MSMQ).
Script Task: Slightly more advanced than the ActiveX Script task. This task allows us to perform more intense scripting in the Visual Studio programming environment.
Note: this task is most important task in real time. We will use C#,VC++ or vb.net code for writing script task
Send Mail Task: Using with Send Mail Task we can send email messages , for this we need SMTP server.
Web Service Task: Executes a method on a Web service.
WMI Data Reader Task: This task can be used to run WQL queries against the Windows Management Instrumentation. Like reading the event log, get a list of applications that are installed, or determine hardware that is installed, to name a few examples.
WMI Event Watcher Task: This task allows SSIS to wait for certain WMI events that occur in the operating system and respond.
XML Task: This task can be used to parses or processes an XML file. It can merge, split or reformat an XML file.
There is also an array of tasks that can be used to maintain SQL Server environment.
Note:-- In real time we will work a lot on Control Flow tasks , Data flow tasks.
Data Source Elements
Data sources are the connections that hold connection to any OLE-DB compliant data sources such as Oracle, SQL Server, db2 or nontraditional data sources like outlook etc. The data sources can be localized to a single SSIS package or shared across multiple packages in BIDS.A connection is defined in the Connection Manager. The Connection Manager dialog box may vary vastly based on the type of connection.
Typical Connection manager:-
Items in a package can be linked using precedence constraints link which defines logical flow and specify the conditions upon which the items are executed. Precedence constraints defines an ordinal relationship between various items in the package; which helps manage the order the tasks will execute, directs the order of task execution and defines links among containers and tasks; condition evaluation that determine the sequence in which they are processed. More specifically, they provide transition from one task or container to another.
Constraints will allow different paths of execution depending on the true or false success(Green) or failure(Red) or completion(blue) of other tasks (below image is example) . constraints together with the tasks comprise the workflow of the package.
Containers group a variety of package components (including other containers), affect their scope, sequence of execution and mutual interaction. They are used to create logical groups of tasks. There are four types of containers in SSIS listed below:
Task Host Containers - Default container, every task falls into it.
Sequence Containers - Defines a subset of the overall package control flow.
For Loop Containers - Defines a repeating control flow in a package.
For Each Loop Containers - Loops for collection, enumerates through a collection for example it will be used when each record of a record-set needs to be processed.
Variable in SSIS is same as the variables in any other programming language. Variables holds the values and variables are temporary storage for parameters whose values can change during the package execution and from one package to another package. It is used to dynamically configure a package at runtime. For example, to execute the same T-SQL statement or a script against a different set of connections. Depending on the place where a variable has been defined, its scope varies. Variables can be declared at package, container, task or handlers level.
Data Flow Elements:-
Data flow task creates a new data flow once we create a Data Flow Task, it generates a new data flow. As the Controller Flow handles the main workflow of the package similarly the data flow handles the transformation of data. Anything that manipulates data can be categorized as data flow category. Data changes as it moves through the each step of data flow, the data changes based on what the transform does. For example a new column is derived using the Derived Column transform, and that new column is then available to subsequent transformations or to the destination.
Source is a location from where data gets pulled into the data pump. Generally sources will point to the Connection Manager in SSIS. By pointing to the Connection Manager, you can reuse connections throughout the package, because we need to create connection in one place. There Six sources with SSIS:-
OLE DB Source.
Flat File Source .
Raw File Source.
Raw File Source.
Data Reader Source.
Inside the data flow, destinations accept the data from the data sources and from the transformations. The flexible architecture can send the data to nearly any OLE DB-compliant data source or to a flat file. Like sources, destinations are managed through the Connection Manager. The following destinations are available to you in SSIS:
Data Mining Model Training .
Flat File destination.
OLE DB Destination.
Raw file destination.
SQL Server Destination.
SQL Server Mobile destination.
Transformations are key components to the data flow that change the data to a desired format. For example, if we want the data to be sorted and aggregated. Two transformations can accomplish this task . The nicest thing about transformations in SSIS is that it's all done in-memory and it no longer requires elaborate scripting as in SQL Server 2000 DTS. List of transforms:
Aggregate: Aggregates data from transform or source.
Audit: The transformation that exposes auditing information to the package, such as when the package was run and by whom.
Character Map: This transformation makes string data changes , such as changing data from lowercase to uppercase.
Conditional Split: Splits the data based on certain conditions being met. For example, this transformation could be instructed to send data down a different path if the State column is equal to Florida.
Copy Column: Adds a copy of a column to the transformation output. We can later transform the copy, keeping the original for auditing purposes.
Data Conversion: Converts a column's data type to another data type.
Data Mining Query: Performs a data-mining query against Analysis Services.
Derived Column: Creates a new derived column calculated from an expression.
Export Column: This transformation allows you to export a column from the data flow to a file. For example, we can use this transformation to write a column that contains an image to a file.
Fuzzy Grouping: Performs data cleansing by finding rows that are likely duplicates.
Fuzzy Lookup: Matches and standardizes data based on fuzzy logic. For example, this can transform the name Jon to John.
Import Column: Reads data from a file and adds it into a data flow.
Lookup: Performs a lookup on data to be used later in a transformation. For example, we can use this transformation to look up a city based on the zip code.
Merge: Merges two sorted data sets into a single data set in a data flow.
Merge Join: Merges two data sets into a single data set using a join function.
Multicast: Sends a copy of the data to an additional path in the workflow.
OLE DB Command: Executes an OLE DB command for each row in the data flow.
Percentage Sampling: Captures a sampling of the data from the data flow by using a percentage of the total rows in the data flow.
Pivot: Pivots the data on a column into a more non-relational form. Pivoting a table means that we can slice the data in multiple ways, much like in OLAP and Excel.
Row Count: Stores the row count from the data flow into a variable.
Row Sampling: Captures a sampling of the data from the data flow by using a row count of the total rows in the data flow.
Script Component: Uses a script to transform the data. For example, we can use this to apply specialized business logic to data flow.
Slowly Changing Dimension: Coordinates the conditional insert or update of data in a slowly changing dimension.
Sort: Sorts the data in the data flow by a given column.
Term Extraction: Looks up a noun or adjective in text data.
Term Lookup: Looks up terms extracted from text and references the value from a reference table.
Union All: Merges multiple data sets into a single data set.
Unpivot: Unpivots the data from a non-normalized format to a relational format.
Note :- In real time we work on transformations a lot.
Error Handling and Logging:-
Workflows can be represented by Event handlers, much like any other workflow in SSIS. For example event handlers can be used to notify an operator if any component fails inside the package.
In the data flow, we can specify in a transformation or connection what we wish to happen if an error exists in the data. We can select that the entire transformation fails and exits upon an error, or the bad rows can be redirected to a failed data flow branch. We can also choose to ignore any errors. An example, where if an error occurs during the Derived Column transformation, it will be outputted to the data flow. We can then use that outputted information to write to an output log.
There are more than a dozen events that can be logged for each task or package. We can enable partial logging for one task and enable much more detailed logging for billing tasks. Some of the events that can be monitored are OnError, OnPostValidate, OnProgress, and OnWarning. The logs can be written to nearly any connection: SQL Profiler, text files, SQL Server, the Windows Event log, or an XML file.
The following tools will be included for free of cost if you purchase Microsoft SQL Server database:-
SSIS(SQL Server Integration Services) ,
SSRS(SQL Server Reporting Services) and
SSAS( SQL Server Analysis Services).
Almost all the companies/organizations/firms in this world needs database to maintain/store their data and use it for perform analytical operations, data manipulation, report generations at later point of time. Advantage with choosing SQL Server as their database would be, they will get lot many tools to perform these analytical report generation operations for a free of cost so they don't need additional cost write software to perform these operations, these tools include SSIS( for migration of data, integration: from one data source to another data source or old version to another new version ), SSRS ( for developing reports for example financial report like profit and loss statement , balance sheet etc, sales forecasting reports etc..) and SSAS ( for design, build, test and deploy your multi-dimensional databases). Getting these extra features for free means cost benefits to the company.
SQL server database can be integrated with almost all programming languages like C#, JAVA etc and what not and it is also available in different flavors so that it can be installed in various operating systems more over can use SSIS , SSRS and SSAS in conjunction with various databases (data received from any software and any database.
I , Personally feel that SSIS is very good ETL tool for data extraction , transform and loading compare to other ETL tools available in the market.