Entity Framework Core 5 vs SQLBulkCopy

Entity Framework Core 5 is a great ORM to use and connect to the database with. It is easy to use and easy to understand. It offers just enough for the most common scenarios. So what about inserting big amounts of data in a one go? Would it be fast enough?

Let’s have a look at the code

As my example, I’ll take a very simple entity – a Profile and PrimeHotel repository available here at my GitHub

My DbContext is very simple and it looks like this:

And Profile entity looks like this:

Because I’ll be using WebApi for the ease of demonstration, I’ll create a ProfileController.

For now, it’s pretty empty, but now you get the base that we will start with.

Let’s get profiles… lots of them! 

To test an insert of many entities at once we need to generate a lot of testing data. I like to have my test data as close to real values as possible, so to get those, I’ll use a Bogus nugget package.

Bogus is robust and very easy to use fake data generator. It will generate random values, that will fit a given context, like a surname, age, address, e-mail, company name, and so on. There are dozens of options. Go see for yourself in its documentation

Generating any number of profiles will look like this:

Inserting profiles with Entity Framework Core 5

I don’t want to send all those profiles in a request, because that would be a huge amount of data. Transferring that to a controller and deserialization on the ASP.NET Core 5 side can take a while, and it’s not really the part I want to test. This is why I choose to generate my profiles in the controller method and insert it right after that.

The code for the whole thing is really straightforward:

Additionally, I added a Stopwatch to measure how long does it take to generate profiles as well as insert them. In the end, I’m returning an anonymous type to easily return more than one result at a time.

Finally, let’s test it out. For 1000 profiles I got:

But wait, let’s try something bigger, like 100000 entities:

 

25 seconds? Really? Not that impressive.

What it does underneath? Let’s check with SQL Server Profiler:

SqlBulkCopy to the rescue

SqlBulkCopy is a class that was introduced a while ago, specifically in .Net Framework 2.0 – 18 years ago! SqlBulkCopy will only work to save data in a SQL Server database, but its source can be anything, as long as it’s results can be loaded to DataTable or read by IDataReader.

Let’s have a look at how we can use it in our example.

First, we need to define a DataTable. It needs to represent the Profiles table because this is our destination table that we are going to use.

Then with WriteToServerAsync we are loading profiles to the database. Let’s have a look at how does it looks like in SQL.

How does it work internally? StackOverflow comes with an answer:

SqlBulkCopy does not create a data file. It streams the data table directly from the .Net DataTable object to the server using the available communication protocol (Named Pipes, TCP/IP, etc…) and insert the data to the destination table in bulk using the same technique used by BCP.

Two totally different approaches. What about the performance? Let’s compare those two.

The performance

The outcome of a performance check is not a huge surprise. SqlBulkCopy is meant to insert data fast and it is extremely performant. 

Big differences start to show when you insert more than 10 thousand entities at a time. Over that number, it might be wort to reimplement your code to use SqlBulkCopy instead of Entity Framework Core 5.

What about other operations?

When it comes to handling big amounts of data things are starting to be a little more tricky. You might want to take a look at the database improvements and what data you actually need to operate on. Have in mind that operations on big chunks of data are much faster when done on the database side. 

I had a case once where I needed to perform and update on around a million entities, once a day. I combined a few things and it worked out pretty good.

  • create a temporary table, i.e. T1
  • insert data with SqlBulkCopy
  • perform an update on the database side
  • select any data you might need, i.e. for logging
  • delete a temporary table

I know this is moving business logic inside the database, but some sacrifices are required if this update operation needs to be fast.

All code posted here is available on my GitHub.

Cheers!

 

6 thoughts on “Entity Framework Core 5 vs SQLBulkCopy

  1. with BulkInsert you can also configure the batch size (number of rows) before commit (so has relevance for tranlog size etc), so would be interesting to see what/any impact tweaking that param in your test timings.

  2. I’ve found that SaveChangesASync() is dramatically slower than SaveChanges() non-async, especially for inserting or saving large amounts of data. Its to the point where my implementation is ~ var saveResults = await Task.Run(() => Context.SaveChanges()).ConfigureAwait(false);

    1. Gregory. I just tested and in my case there is no visible difference between SaveChanges() and SaveChangesAsync().
      In your case it might be a matter of handling concurrency. Task.Run creates new thread and ConfigureAwait(false) enforce synchronization with that thread ant the end. This might be the cause of such difference.
      In general we should avoid using Task.Run wherever we can, but have a read about it from Stephen Cleary (https://blog.stephencleary.com/)

      1. No, in my testing it was done before Task.Run was introduced. There must be something else at play, as I was able to recreate it in a sample app. Whenever I have time, I’ll go back and have a deeper look.

  3. – You should change options.UseSqlServer to options.UseSqlServer(options => options.MaxBatchSize(100000)) if you want fewer round trips to the database.
    – Also you should consider that SaveChanges() does change tracking and validation.

Leave a Reply

Your email address will not be published. Required fields are marked *