Wednesday, May 14, 2014

Make sure that you write/create optimized SSIS Packages

Recently I worked on the package that loops many text files and insert the data into a staging table and then from the staging into original table which has  five columns only defined as VARCHAR(50) in the database. Pretty simple , not? Text files have many data and the destination table has many columns (90+) and very wide data tape client defines (VARCHAR(3000)). Actually he needs to get only few columns but  who cares, right? We have observed that the memory during the package execution grow  rapidly and very quickly MSSQL Server service was become unavailabe. You can imagine that during this spike nobody can work and a local DBA started getting complains from the end users. I would like to say that the server is very strong (4CPU+ 28RAM)...So what's problem?

At the source component , the estimated sized of a row is determined by the maximum columns sizes of all columns returned by the query (remember we have 90+ columns defined as VARCHAR(3000)). This is where  the performance problem resides.

After sometime investigations we dropped the wide stage table and created a stage table with exactly same structure as original table has.
That means columns have VARCHAR(50) and we also changed the job schedule to run it frequently to deal with small data sets.  So performance was drastically improved. We have not seen any more memory growing and not reducing, there is no more compliance from the end user.

Please read this article if you deal with packages especially if it works with large data sets.
http://technet.microsoft.com/en-us/library/cc966529.aspx