I have been so used to use int identity fields as primary key for any table and to believe it’s the fastest way to define a clustered primary key that I never thought this can cause serious performance problems to SSIS. Until I tested it and discovered that int identity primary keys are among the slowest way to insert huge amount of data with SSIS. In the post I’ll describe the technique that – from my tests – is the fastest way to insert data into tables with clustered index using SSIS.
As a test case I used a very simple table with only two fields:
CREATE TABLE [dbo].[FastLoadTests]( [Id] [int] Identity NOT NULL, [TestString] [varchar](2048) NOT NULL ) ON [PRIMARY]
I filled it with a very simple task that generates one million of monotonically increasing Id and TestStrings of exactly 512 bytes each, throwing them into an OleDb destination adapter that uses FastLoad to fill the table.
If the table has no index at all the package runs in 18.5 seconds, pretty nice. Now, I created an index on the table like this:
CREATE UNIQUE CLUSTERED INDEX [FastLoadTests_ClusteredIndex] ON [dbo].[FastLoadTests] ([Id] ASC) ON [PRIMARY]
And run the package once more. The execution time is now 1.08.5 (one minute and eight seconds). It is more or less 4 times slower. Clearly, as SQL has to sort one million rows, I was expecting poor performances. The first trial has been that of reducing the number of rows to sort.
Looking at the various options in the OleDb destination adapter it is easy to find that it has a parameter (Maximun Insert Commit Size) that defaults to 0.
Setting it to 10.000 reduced execution time to 44.5 seconds, a better result but still more than two times slower than the table with no clustered index. Is this the best result? Not at all! Let’s try something different.
Reading (carefully) the MS documentation about the OleDB destination adapter (http://msdn2.microsoft.com/en-us/library/ms141237.aspx) you’ll discover that you can gain performance if the input data is sorted accordingly to the clustered index on the table, specifying the ORDER option with the advanced editor.
After some trials I managed to set it with the advanced editor:
It has been surprising that, running the package, I got this error:
[Test Table ] Error: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80004005. An OLE DB record is available. Source: “Microsoft SQL Native Client” Hresult: 0x80004005 Description: “Could not bulk load. The sorted column ‘Id’ is not valid. The ORDER hint is ignored.”.
So, it seems that setting this option on an INT IDENTITTY field causes FastLoad to fail. Really surprising because it seems that SQL Server does not know that the keys generated by itself will be sorted! Anyway, removing the Identity setting from the table and using the package generated ID resulted in execution time of 46.23 seconds. No performance gain at all. Anyway as it seemed an interesting way to search into, I went more on trials.
The last and resolving trial test has been that of REMOVING the Maximum Insert Commit Size parameter from the destination adapter. Running the package again I got a result of 19.07 seconds execution time, more or less the same time as inserting the data with no clustered index at all. Got it! Now I have a really fast way to insert data.
The results are summarized in this table
0 MICS Index
10.000 MICS Index
0 MICSInt Identity key 18.85 41.15 1.08.57SSIS generated key 18.59 44.54 1.10.03SSIS key and ORDER option 18.85 46.23 19.07
*MICS = Maximum Insert Commit Size
You can easily see that using the ORDER option gives a dramatic speed improvement when you have a clustered index but ONLY if you are able to generate the sorting key in the SSIS package and DO NOT USE the MICS parameter.
I think that the documentation for this behavior should have been made far more accessible and that ORDER parameter is so important that should have been shown in the first page of the standard editor for OleDb destination, hope it will be done in the next release of Sql Server.
Moreover, as using MICS parameter leads to a dramatic performance degradation, I think that the OleDb destination adapter should issue a warning if both ORDER and MICS parameter are set so to make the programmer think twice before using them together.