Wednesday 29 April 2015

Load data from Flat File to Database in SSIS by using FastParse

                   Is it property that ignores the validation for specific columns which has the data type such as integer or data time. If the data type is DateTime and integer we can set in the columns it will ignores the validation part of that. Ones its ignore make the process fast we can read the data directly as compare to with the normal flat file source property it has validation the data.

Set fast parse property in flat file

1. Right-click the Flat File source or Data Conversion transformation, and then click Show Advanced Editor.


2. In the Advanced Editor dialog box, click the Input and Output Properties tab.
3. In the Inputs and Outputs pane, click the column for which you want to enable fast parse.


4. In the Properties window, expand the Custom Properties node, and then set the FastParse property to True. 

 5. Click OK.

Note: Fast parse supports only the most commonly used data format representations such as YYYYMMDD and YYYY-MM-DD, does not recognize special characters in currency data, and cannot convert hexa decimal or scientific representation of integers.

Example: 
           To transfer 2 lakh’s records in a flat file to database by using flatfile connection manager.

Without using FastParse Propery

• After that we can enable the FastParse property to True of columns of Id, Prev Id, Join Date such data types are integer and datetime only.


So speed gain is quite large when using FastParse.
I was also surprised that SSIS - FastParse OFF method was faster by 50% to using FastParse.

Let me know what you think about this article.

Friday 24 April 2015

How to import Excel Data when column length is more than 255 characters in middle of a process in SSIS

Scenario:
        We have 190 rows of data in MS Excel spreadsheet. In excel data contains only 2 columns of Id and Report of Employee. The length of address is less than 50 characters but only employee id 100 contains Report length is 322.

ID
Report
1
 (rpt)
2
YES OR NO
3
SYSREMOTELOGINS TYPES
4
SYSREMOTELOGINS TYPES (UPDATE)
5
AF: aggregate function
6
AP: application
7
C : check cns
8
D : default (maybe cns)
9
EN: event notification

-

-

-

-
100
select into/bulkcopy select into/bulkcopy select into/bulkcopy select into/bulkcopy select into/bulkcopy select into/bulkcopy select into/bulkcopy implicit_transactions implicit_transactions implicit_transactions implicit_transactions implicit_transactions implicit_transactions implicit_transactions implicit_transactions
101
NULL
102
sub
103
select into/bulkcopy
104
merge publish
105
ignore duplicate rows
106
Datetime12314

Solution:

 First Created a SQL table which will store the data taken from MS Excel spreadsheet.

Create table Emp_Report(
Id int not null,
Report varchar(500)
)

  I build up a SSIS Package with two connection managers and one Derived column Component. There are

    • Excel Connection Manager – This points to our MS Excel spreadsheet.
    • OLE DB Connection Manger – This points to our SQL Server database.
    • Derived Column – It converts the Unicode values into varchar string values.


 After setting up the SSIS Package, then I tried to run but it is failed due to the below reason.



Error Message: [Excel Source [65]] Error: There was an error with output column "Report" (140) on output "Excel Source Output" (73). The column status returned was: "Text was truncated or one or more characters had no match in the target code page."

If you handle the truncation error in your SSIS package it will insert only 255 characters in columns and truncate rest of the characters (Report in our case).

The reason for this is that Ms-Excel connection manager actually works on a guess. Yes, that is right it works on the assumptions. It reads a registry key called "TypeGuessRows". The default value for this registry key is 8. This mean excel will scan the first 8 row to determine the data type and data length of the column.

The valid value for the key “TypeGuessRows” can be between 0 to 8.
Run àregedit

HKEY_LOCAL_MACHINE ► Software ►Wow6432Node► Microsoft ► Jet ► Engines ► Excel.

       So you can change the value to 0. When the TypeGuessRows values become 0 it scans the16,384 rows before assuming the data type and data length of the column.


Finally you completes the data loading form Excel to Database. 

Let me know what you think about this article.

Dynamically change table structure in SSIS with Example



Solution: The Solution of the above problem is solved by using Script Component, Execute Sql tasks and Dataflow task.


• Create two SQL tables which will store the data Source and Destination information.

CREATE TABLE [dbo].[EmpData](
[Eid] [int] NOT NULL,
[Ename] [nvarchar](50) NULL,
[Sal] [decimal](18, 0) NULL
)

CREATE TABLE [dbo].[Dest_EmpData](
[Eid] [int] NOT NULL,
[Ename] [nvarchar](50) NULL
)
And add the data depends on requirement.

• Sequence Container is used to created a subset of a package. The subset is contains three execute sql tasks, there are successfully completed after dataflow task is executed.

• The first Execute SQL Task 1 which reads [dbo].[EmpData] source table data from database and calculate the columns count and stores the value in the package parameter
[User]::[ Source_Col_Count] by using below query.


• The second Execute SQL Task 2 which reads [dbo].[Dest_EmpData] destination table data from database and calculate the columns count and stores the value in the package parameter
[User]::[ Source_Dest_Count] by using below query.


• After execute the two tasks then we check condition by using precedence constraint.The result of precedence constraint is true then executes task of Execute SQL Task 3.


• The Third Execute SQL Task 3 is used to change the structure of table by using Alter command as follows


• After completion of Sequence Container we have attached Data Flow Task, it is used to pass data from Source [dbo].[EmpData] to [dbo].[Dest_EmpData] by using OLE DB Source and Destination Components.


Note: When u run package, it shows one error at the time of validation OLE DB Destination is failed due to table structure is different. The Error message is shown below


To avoid the above error we can change the package properties, in Dataflow Task àPropertiesàSet [Delay Validation] property to True.



 Let me know what you think about this article.

Thursday 23 April 2015

Column Count in SSIS

Introduction:
         Here I will explain how to calculate the column count in SSIS by using Script Component.

Description:

         In SSIS Row count transformation is available but Column count transformation is not available but in some situations we want the how many columns are participated in particular area in package and what are the data types of columns and names also. Now I will explain to calculate column count by using script component.


• OLE DB Source is used to get the [Person].[Address] Table data from database by using SQL Command from Adventure Works database.


• Derived column component is used to create new column [full address] from the existing  columns.


• Data Conversion component is used to convert [full address] from Unicode String to String [DT_STR].

• In the Script Component
     Script tab: ReadWriteVariables à User::Column_Count
     Input Columns: Select all input columns depends on requirement and the column count value             also depends on it.
     Input and Outputs: Create Output Columns àCount [DT_I4].

• Code of Script Component


1. Initialize the local variables of ( i ) (Column count) , Rowno.
2. Local value of ( i ) is assigned to the package variable CoulumnCount.
3. Display the Column index.
4. Display the Columntype and Column Name.
5. Increment the column count of variable ( i ).
6. Display the Column count value.
7. Rowno value is increment by one.

• Finally the value of Column Count is stored in package variable.


Note : The Column count value is depends on how many columns selected as input columns in the script component.

Let me know what you think about this article.

Wednesday 22 April 2015

How to deploy SSIS package without using manifest file

Deploy SSIS Package by using Command line

Introduction:
               SQL Server introduces the command line tool DTUTIL.EXE; it is used to manage SSIS packages. The tool can copy, move, delete, or verify the existence of a package and it is a good choice when you want to script out the deployment of SSIS Packages.

Description:

• First we create the package with the name Sample1.dtsx at d:\ssis\ location.
• Open a command prompt and navigate the ssis project folder as shown below.


• In this example that follows, I will explain how to deploy the Sample1.dtsx package to the FileSystem and SQL Server.

• To deploy the package to the file system by using DTUTIL command as follows.

         à Execute DTUYIL Command at Command line as shown below.


• To deploy the package to the SQL Server by using DTUTIL command as follows.
       à Before and After position of packages at Integration Services in SQL Server.


Note: The above command deploys to the default SQL Server instance on the local machine.

Let me know what you think about this Article.

Tuesday 21 April 2015

How will you calculate the Buffersize and How many Buffers are used to insert records in SSIS

              In SSIS, the dataflow engine uses a buffer-oriented architecture to efficiently handle datasets in memory. It manipulates data as it is transferred from source to destination.

The default buffer size in SSIS is 10,485,760 Bytes (10 MB Approx) &
Default Buffer Max Rows is 10,000 (Ten Thousand) only.
DataFlow Task (right click) --> Properties


                In the above diagram its shows clearly and which place the Buffersize and Buffermaxrows are allocated in the SSIS Package.

              The Number of Buffers created is depends on how many rows fit into the buffer, and how many rows fit into a buffer depends on a few other factors like

• Row size and all columns information from the input records
• Default Buffer Size property in the dataflow task [MaxBufferSize (100MB) and MinBufferSize (64 KB)]
• DefaultBufferMaxRows Property in the dataflow task [Default value is 10,000].

For gives Better Buffer performance you can do few things.
• Remove the unwanted columns from the source and set the datatype of each column properly.
• If your system has sufficient memory then increases the buffer size in dataflow task.

For Example:
We have 1 cr records from one database to another database by using SSIS package.
By using DefaultBufferMaxRows that means 10,000


It’s completed on 1.03.757 (1 min 3 seconds)










Change the value :
DefaultBufferMaxRows – 500000 (5 lakh’s)


It’s completed on 25.178 ( 25 seconds)











Let me know what you think about this Article.

Saturday 18 April 2015

How to split the different columns (single table) into different destination by using SSIS transformations



Solution: The solution of the above problem by using multicast and sort operations as follows.



• Using OLE DB Source to get data from the database.
• Multicast is used to one to many datasources.
• In Sort1 select 3 Input columns Product_ID, Product_Name, Product_Qty and Sort the data by using Product_ID.


• In Sort2 select 2 Input columns Tax_Rate, Tax_Amount and Sort the data by using Product_ID.



• In Sort3 select 2 Input columns Revenue_Rate,Total_Revenue and Sort the data by using Product_ID.


• Destination1 is used to store the values of Product_ID, Product_Name, Product_Qty and mappings are

• Destination2 is used to store the values of Tax_Rate, Tax_Amount and mapping are


• Destination3 is used to store the values of Revenue_Rate,Total_Revenue and mapping are



• Final Output with Data viewers

Friday 17 April 2015

Parallel Processing in SSIS with Example

Parallel Processing in SSIS

                 In this article, I will explain how many types of parallel processing available in ssis and explain briefly by using one example.
                 SQL Server Integration Services (SSIS) allows parallel execution. The idea of parallel design is to break a large amount of data into smaller, independent pieces, worked on it. In SSIS services breaking the large amount of data into smaller pieces by using tasks to partitioning the data to be processed. Finally, you need to take the small pieces of independent work and complete the work done faster.

It has specified into two different ways.
• Control Flow Parallelism
• Data Flow Parallelism

Control Flow Parallelism
                The first one is MaxConcurrentExecutables, a property of the package. It defines how many tasks (executables) can run simultaneously. The default value for this property is -1, equalling to the total number of processors plus two. You can increase this number in order to allow for a greater degree of parallelism.

                This could be good strategy to use if you have multiple packages that pull from multiple sources, have relatively basic transformations, and /or push the data to multiple destinations.

Example: To transfer 8 lakh’s records from one database to another database by using more than one Data Flow task.

Suppose we have a package 8 Data Flow Tasks.


Each Data Flow task (DFT) to transfer 1 lakh records from Source to destination as follows
OLE DB source -->OLE DB Destination


DFT 1 contains 1 to 1 lakh records.
DFT 2 contains 1 lakh to 2 lakh records.
DFT 3 contains 2 lakh to 3 lakh records.
DFT 4 contains 3 lakh to 4 lakh records.
DFT 5 contains 4 lakh to 5 lakh records.
DFT 6 contains 5 lakh to 6 lakh records.
DFT 7 contains 6 lakh to 7 lakh records.
DFT 8 contains 7 lakh to 8 lakh records.

Once u run the package with the default property of MaxConcurrentExecutables: -1



It takes to complete 7.847 seconds.


Now u put the value MaxConcurrentExecutables: 10 and run the package


It gives the result within 3.1657 seconds.


Data Flow Parallelism
                     The same principals in Control Flow Parallelism apply to the Data Flow. The equivalent property to “MaxConcurrentExecutables” in the Data Flow is called “EngineThreads”. This property is set to 10 by default, but this does not mean that a data flow will use 10 processor threads, but rather that the data flow will use up to 10 threads in order to efficiently process the items in the data flow. Depending on the context of the performance of your package, you may find that partitioning your sources and destinations will increase performance of inserting records, but make sure that there are no dependencies in your data flow.
                      In some cases it cause a deadlock you will quickly lose whatever performance gain you have achieved.

Example: To transfer 8 lakh’s records from one database to another database by using one Data Flow task.

Suppose we have a package that contains only one Data Flow Task and that Task contains 8 Flows in the form of OLE DB source --> OLE DB Destination.


Once u run the package with the default property of EngineThreads: 10





It takes to complete 1.466 seconds.


Now u put the value EngineThreads: 100 and run the package


It gives the result within 1.107 seconds.


Note: Time variance depends on Data and Input/ Output Components.

Let me know what you think about this article.