SSIS (SQL Server Integration Services) is one of the robust features of SQL Server 2005 introduced by Microsoft. This tool is useful to load data from various sources to various destinations. This SSIS tool is also called ETL tool, useful for data extraction, Transformation and Loading of data. This is ETL process, data can be extracted from source using various sources provided by Microsoft in this tool, data can be transformed and finally can be loaded into the destination.
SSIS packages allow us to create control flow and data flow to do the data load from source to destination. Control flow can be used to organize the flow of tasks and Data Flow tasks can be used for ETL (extraction, transformation, and load) process.
This SSIS tool is very efficient for ETL process, and can be automated in the SQL Server database. This is efficient way of migration data from source to destination. Data can be migrated to relational of multidimensional databases as well.
History and Versions of SSIS
Initially Data Transformation Services(DTS) was introduced in Microsoft SQL Server 7.0 version. DTS purpose was to transform data from any OLE DB-compliant data source to another destination with minor feature of making workflow, and can execute programs and run scripts.
In SQL Server 2000 release, DTS has released with some new features like Dynamic Properties task, with which packages can be altered dynamically at runtime. Multiphase data pump and extended logging features are added up. ActiveX Script task was encorporated to do conditional loading of data with the help of VBScript.
In SQL Server 2005, Microsoft has introduced SSIS with lot of new features to load the data faster and robust way. It is not the extension of DTS, in fact, SSIS has developed with new features.
In SQL Server 2008 and SQL Server 2008 R/2 release, some more updates are added up.
We can move the data in two ways using SSIS.
Using Import and Export Wizard.
Creating SSIS Package.
SSIS Import and Export wizard can move data quickly from any OLE-DB data source to any destination.
Business Intelligence Development Studio(BIDS)
Business Intelligence Development Studio(BIDS) is the Integrated Development Environment using SQL Server Integration Services(SSIS), SQL Server Analysis Services(SSAS) and SQL Server Reporting Services(SSRS) for data migration, data analysis and reporting respectively. BIDs is a common tool for all these three.
BIDS Screen Shot :-
Architecture of SQL Server Integration Services
SSIS is part of SQL Server 2005 for the Extraction,Transformation and Load. SSIS was developed by rewriting the code from DTS 2000. Mainly, the nicest thing about SSIS is its price. It is free with the SQL Server Perchase, whereas other ETL Tools need to be buy separately, which costs hundreds of dollors.
Microsoft SQL Server Integration Services consists of varied components. The four main components of SSIS architecture are
The SSIS Service.
The SSIS runtime engine and the runtime executables.
The SSIS data flow engine and the data flow components.
The SSIS clients.
Integration Services architecture
SSIS Service tracks the execution of packages and helps storage of packages.It handles operational portion of SSIS. By default SSIS Service is turned off, it turns on when a package is executed for the first time.
The SSIS runtime engine and its corresponding programs run SSIS packages. The SSIS runtime engine manages the configuration, debugging, logging, connections, events and transactions and saves the layout of packages.
The runtime executables provide the functionality to a package.
Containers - Provide structure and scope to the package.
Tasks - Provide the functionality to the package.
Event Handlers - Respond to events raised within the package.
Precedence Constraints - Provide ordinal relationship between various items in the package.
SSIS Components :-
SSIS creates packages which are composed of tasks that can move data from source to destination, and if necessary transform it. Within SSIS package the workflow can be defined, the SSIS runtime engine ensures the tasks inside the package are executed according to the workflow.
Package is a collection of tasks which are executed in an organized fashion by SSIS runtime engine. Packages actually saved in the msdb database of SQL Server. The extension of package file is .DTSX file, which is an XML structured file. A package can be executed by SQL Server Agent Job, DTEXEC command (a command line utility bundled with SSIS to execute a package; another similar utility DTEXECUI, has a GUI), from BIDS environment or by calling one package by another package.ute
A task is an individual unit of work. Tasks provide basic functionality to package, it is similar to that a method does in a programming language. The following are some of the tasks available to you:
SSIS Tasks are classified as
Data Flow Tasks
Data Flow Task - This task is mainly useful for Extraction, Transformation and Loading.
Data Preparation Tasks
File System Task - this task is useful for performing operations on files and directories. Using this task, we can create, move, copy or delete files and directories.
FTP Task - using FTP task, uploading and downloading of data files from remote servers.
Web Service Task - this task can execute Web Service methods.
XML Task - using this task package can retrieve XML documents, and modify XML documents at run time.
Data Profiling Task- using this task, we can identify the problems of data to check the data quality.
Execute Package Task - Allows us to execute a package from within a package, making SSIS packages modular.
Execute Process Task
Message Queue Task - This task can be used to send/receives messages from a Microsoft Message Queue (MSMQ).
Send Mail Tasks - Using with Send Mail Task we can send email messages , for this we need SMTP server.
WMI Data Reader Task- This task can be used to run WQL queries against the Windows Management Instrumentation. Like reading the event log, get a list of applications that are installed, or determine hardware that is installed, to name a few examples.
WMI Event Watcher Task- This task allows SSIS to wait for certain WMI events that occur in the operating system and respond.
SQL Server Tasks
Bulk Insert task - using this task we can copy large amount of data from flat file to SQL Server. This task supports only OLEDB connections for destination database. This task supports file connection manager as data source.
Execute SQL Task - this task can be used to run SQL statements or stored procedures from package. We can run single or multiple SQL Statements sequentially.
Transfer Database Task - using this task, we can transfer SQL Server database between two instances of SQL Server. This task can either copy or move a database.
Transfer Error Message Task
Transfer Jobs Task
Transfer Logins Task
Transfer Master Stored Procedures Task
Transfer SQL Server Objects Task
Script Task - using this task we can extend the functionality of the package,which is not possible using available built-in tasks in SSIS. We can write custom code in vb.net. (In SQL Server 2008, we can write code in vb.net or c# ).
Analysis Services Tasks
Analysis Services Execute DDL Task
Analysis Services Processing Task- This task can be used for processing a SQL Server Analysis Services cube, dimension, or mining model.
Data Mining Query Task - This task allows you to run predictive queries against Analysis Services data-mining models.
Back Up Database Task
Check Database Integrity Task
Execute SQL Server Agent Job Task
Execute T-SQL Statement Task
History Cleanup Task
Maintenance Cleanup Task
Notify Operator Task
Rebuild Index Task
Reorganize Index Task
Shrink Database Task
Update Statistics Task
Backward Compatibility Tasks
ActiveX Script Task - using this task we can continue using the code developed in DTS (SQL Server 2000). This task was provided only for the backward compatibility. For the advanced functionality, we have Script Task available in SSIS 2005.
Execute DTS 2000 Package Task - this task is used to run packages developed in SQL Server 2000 tools. This task was provided only for the backward compatibility.
Data Source Elements
Data sources are the connections that hold connection to any OLE-DB compliant data sources such as Oracle, SQL Server, db2 or nontraditional data sources like outlook etc. The data sources can be localized to a single SSIS package or shared across multiple packages in BIDS.A connection is defined in the Connection Manager. The Connection Manager dialog box may vary vastly based on the type of connection.
Typical Connection manager:-
Items in a package can be linked using precedence constraints link which defines logical flow and specify the conditions upon which the items are executed. Precedence constraints defines an ordinal relationship between various items in the package; which helps manage the order the tasks will execute, directs the order of task execution and defines links among containers and tasks; condition evaluation that determine the sequence in which they are processed. More specifically, they provide transition from one task or container to another.
Constraints will allow different paths of execution depending on the true or false success(Green) or failure(Red) or completion(blue) of other tasks (below image is example) . constraints together with the tasks comprise the workflow of the package.
Containers group a variety of package components (including other containers), affect their scope, sequence of execution and mutual interaction. They are used to create logical groups of tasks. There are four types of containers in SSIS listed below:
Task Host Containers - Default container, every task falls into it.
Sequence Containers - Defines a subset of the overall package control flow.
For Loop Containers - Defines a repeating control flow in a package.
For Each Loop Containers - Loops for collection, enumerates through a collection for example it will be used when each record of a record-set needs to be processed.
Variable in SSIS is same as the variables in any other programming language. Variables holds the values and variables are temporary storage for parameters whose values can change during the package execution and from one package to another package. It is used to dynamically configure a package at runtime. For example, to execute the same T-SQL statement or a script against a different set of connections. Depending on the place where a variable has been defined, its scope varies. Variables can be declared at package, container, task or handlers level.
Data Flow Elements:-
Data flow task creates a new data flow once we create a Data Flow Task, it generates a new data flow. As the Controller Flow handles the main workflow of the package similarly the data flow handles the transformation of data. Anything that manipulates data can be categorized as data flow category. Data changes as it moves through the each step of data flow, the data changes based on what the transform does. For example a new column is derived using the Derived Column transform, and that new column is then available to subsequent transformations or to the destination.
Source is a location from where data gets pulled into the data pump. Generally sources will point to the Connection Manager in SSIS. By pointing to the Connection Manager, you can reuse connections throughout the package, because we need to create connection in one place. There Six sources with SSIS:-
OLE DB Source.
Flat File Source .
Raw File Source.
Raw File Source.
Data Reader Source.
Inside the data flow, destinations accept the data from the data sources and from the transformations. The flexible architecture can send the data to nearly any OLE DB-compliant data source or to a flat file. Like sources, destinations are managed through the Connection Manager. The following destinations are available to you in SSIS:
Data Mining Model Training .
Flat File destination.
OLE DB Destination.
Raw file destination.
SQL Server Destination.
SQL Server Mobile destination.
List of Transformations :-
Transformations are key components to the data flow that change the data to a desired format. For example, if we want the data to be sorted and aggregated. Two transformations can accomplish this task . The nicest thing about transformations in SSIS is that it's all done in-memory and it no longer requires elaborate scripting as in SQL Server 2000 DTS. List of transforms:
Aggregate: This transformation applies aggregate functions, like Average, it also provides Group By clause to specify groups to aggregate.
Audit: The transformation that exposes auditing information to the package, such as when the package was run and by whom.
Character Map: This transformation makes string data changes , such as changing data from lowercase to uppercase.
Conditional Split: Splits the data based on certain conditions being met. For example, this transformation could be instructed to send data down a different path if the State column is equal to Florida.
Copy Column: Adds a copy of a column to the transformation output. We can later transform the copy, keeping the original for auditing purposes.
Data Conversion: Converts a column's data type to another data type.
Data Mining Query: Performs a data-mining query against Analysis Services.
Derived Column: Creates a new derived column calculated from an expression.
Export Column: This transformation allows you to export a column from the data flow to a file. For example, we can use this transformation to write a column that contains an image to a file.
Fuzzy Grouping: Performs data cleansing by finding rows that are likely duplicates.
Fuzzy Lookup: Matches and standardizes data based on fuzzy logic. For example, this can transform the name Jon to John.
Import Column: Reads data from a file and adds it into a data flow.
Lookup: Performs a lookup on data to be used later in a transformation. For example, we can use this transformation to look up a city based on the zip code.
Merge: Merges two sorted data sets into a single data set in a data flow.
Merge Join: Merges two data sets into a single data set using a join function.
Multicast: Sends a copy of the data to an additional path in the workflow.
OLE DB Command: Executes an OLE DB command for each row in the data flow.
Percentage Sampling: Captures a sampling of the data from the data flow by using a percentage of the total rows in the data flow.
Pivot: Pivots the data on a column into a more non-relational form. Pivoting a table means that we can slice the data in multiple ways, much like in OLAP and Excel.
Row Count: Stores the row count from the data flow into a variable.
Row Sampling: Captures a sampling of the data from the data flow by using a row count of the total rows in the data flow.
Script Component: Uses a script to transform the data. For example, we can use this to apply specialized business logic to data flow.
Slowly Changing Dimension: Coordinates the conditional insert or update of data in a slowly changing dimension.
Sort: Sorts the data in the data flow by a given column.
Term Extraction: Looks up a noun or adjective in text data.
Term Lookup: Looks up terms extracted from text and references the value from a reference table.
Union All: Merges multiple data sets into a single data set.
Unpivot: Unpivots the data from a non-normalized format to a relational format.
Note :- In real time we work on transformations a lot.
Error Handling and Logging:-
Workflows can be represented by Event handlers, much like any other workflow in SSIS. For example event handlers can be used to notify an operator if any component fails inside the package.
In the data flow, we can specify in a transformation or connection what we wish to happen if an error exists in the data. We can select that the entire transformation fails and exits upon an error, or the bad rows can be redirected to a failed data flow branch. We can also choose to ignore any errors. An example, where if an error occurs during the Derived Column transformation, it will be outputted to the data flow. We can then use that outputted information to write to an output log.
There are more than a dozen events that can be logged for each task or package. We can enable partial logging for one task and enable much more detailed logging for billing tasks. Some of the events that can be monitored are OnError, OnPostValidate, OnProgress, and OnWarning. The logs can be written to nearly any connection: SQL Profiler, text files, SQL Server, the Windows Event log, or an XML file.
Following Screen Shots are the usage of Aggregate,Audit,Copy column,Data Conversion,
Conditional Split, Character Map,Sort,Lookup Transformations.
Conditional Split Transformation
Copy Column Tranformation
Data Conversion Transformation
Aggreate, Sort, Multi Cast, Merge Join Transformations.
Union All Transformation.
The following tools will be included for free of cost if you purchase Microsoft SQL Server database:-
SSIS(SQL Server Integration Services) ,
SSRS(SQL Server Reporting Services) and
SSAS( SQL Server Analysis Services).
Almost all the companies/organizations/firms in this world needs database to maintain/store their data and use it for perform analytical operations, data manipulation, report generations at later point of time. Advantage with choosing SQL Server as their database would be, they will get lot many tools to perform these analytical report generation operations for a free of cost so they don't need additional cost write software to perform these operations, these tools include SSIS( for migration of data, integration: from one data source to another data source or old version to another new version ), SSRS ( for developing reports for example financial report like profit and loss statement , balance sheet etc, sales forecasting reports etc..) and SSAS ( for design, build, test and deploy your multi-dimensional databases). Getting these extra features for free means cost benefits to the company.
SQL server database can be integrated with almost all programming languages like C#, JAVA etc and what not and it is also available in different flavors so that it can be installed in various operating systems more over can use SSIS , SSRS and SSAS in conjunction with various databases (data received from any software and any database.
I , Personally feel that SSIS is very good ETL tool for data extraction , transform and loading compare to other ETL tools available in the market.