If you have ever been engaged in any Microsoft Dynamics CRM data integration project, I am relatively sure that you have invested time and resources to tune your data integration component to its maximum possible performance so that it takes the least time to finish the data integration tasks.
This blog post shows you how to load one million records into Microsoft Dynamics CRM 2011 on-premise installation with a two-hour time range, using our product - SSIS Integration Toolkit for Microsoft Dynamics CRM, by taking advantage of the Balanced Data Distributor (BDD) component that Microsoft released to public community that works for SQL Server Integration Services (SSIS).
In case you don't know BDD component, here is a little background information about the component. BDD is a data flow transformation component that takes a single input and evenly distributes the incoming rows to one or more outputs uniformly via multithreading. The purpose of BDD component is to maximize the output performance of ETL data flow tasks. BDD can be used when your downstream pipeline component (say the destination component) is the bottleneck of the entire data flow task.
When working with Microsoft Dynamics CRM data integration, we have a perfect reason to use BDD, mainly because writing data into CRM is slow due to the nature of web service interface. In other words, In most of cases, you would find that the CRM destination component which writes data into CRM is the bottleneck of your data flow tasks. Using BDD, we can distribute incoming rows from upstream pipeline components and split them into multiple CRM destination components, so they write data into CRM simultaneously and concurrently by taking advantage of the multi-threading capability of SSIS engine.
To demonstrate the benefits of using BDD component, I first tried a single CRM destination component in my data flow task without using BDD, so the data flow writes data into CRM using a single thread. It took me 5 hours, 48 minutes to finish the load of 1,000,000 record into CRM contact entity. Here is what the data flow task looks like.
Next, I tried to use BDD and split the input into 10 outputs so that we write to CRM contact entity using 10 concurrent threads. The data flow finishing loading 1,000,000 records within 2 hours, 3 minutes. Here is what the data flow task looks like.
The following screen shots shows how the data flow runs using dtexec command line.
The improvement is about 2.84 times, it's not surprising that it's not exactly 10 times faster.
[UPDATE - Apr 24, 2012] In order to make full use of the BDD component, you need to increase the connection limit that is imposed by Microsoft .NET framework, which is a maximum of 2 connections per host (e.g. server) for service calls as far as CRM platform is concerned. In order to overwrite this limit, you need to modify DTExec.exe.config and DtsDebugHost.exe.config files under DTS\binn folder by adding the following connectionManagement section.
<configuration> ... <system.net> <connectionManagement> <add address="*" maxconnection="100"/> </connectionManagement> </system.net> </configuration>
The above configuration allows up to 100 connections per host at the same time. You may change the number based on your needs.
After making the above changes, I observed more performance improvement. I was able to load 0.9 millions of records within one hour (5 outputs and 10 outputs had almost identical performance benchmark). Note that this was done on a desktop computer, if you have a better server with performant IO and more computer power, I am relatively sure that you can load one million records within one hour.
A few facts
- This is not a scientific benchmark.
- My testing was conducted on a desktop computer of 4-year old which has everything installed in the single box. The following is the spec of the computer.
- Processor: Intel Core 2 Quad Q9550 @2.83GHz
- Memory: 8GB PC2-6400 DDR2-SDRAM
- Hard Disk: Seagate 7200RPM SATA 1.5Gb/s
- Operating System: Windows 2008 R2 Server
- Database Server: SQL Server 2008 R2
- Microsoft Dynamics CRM Server 2011 with Rollup 6
- SSIS Adapter: KingswaySoft SSIS Integration Toolkit for Microsoft Dynamics CRM
- The testing was done in an on-premise environment, your data load performance would be different if you are using CRM online or partner-hosted environment.
- I have intentionally used 64-bit dtexec.exe with the hope that we can take advantage of SSIS 64-bit run-time. Controversy to what I believed, running it using 32-bit dtexec.exe is actually not slower, but 10% faster than 64-bit runtime. The reason is probably related to the cost associated with memory addressing in 64-bit runtime.
- My input data is very simple, it has only two fields, firstname and lastname. When you have more fields, you would expect the data load performance to degrade in certain scale.
- The single-destinationed data flow task writes about 47.84 records to CRM server per second (54.27 records/s when 32-bit runtime is used), you may use this as a baseline rate if you want to compare yours with mine.
- BDD improves the data load performance by taking advantage of the multi-threading capability of SSIS engine.
- You should carefully choose a right number of the outputs for BDD component. It's not the case that the more the better. Depending on your servers' capacity (including processor, memory, IO system) and the network latency between your client system and CRM server, it could be 3, 5, 10, or something else for the maximized performance, which you may find out by running different tests.
- There are many ways that you can use to improve the data load performance, BDD is just one of the easy ways that make the data load faster, which is the main topic that we are trying to cover in this blog post.
Thanks for reading.