Tag Archives: SqlBulkCopy

Bulk copy with Entity Framework Core 5

MS SQL Server provides functionality to quickly insert large amounts of data. It is called Bulk Copy and is performed by SqlBulkCopy class. I already compared how fast is it compared to EF Core 5 in this post: https://www.michalbialecki.com/2020/05/03/entity-framework-core-5-vs-sqlbulkcopy-2/, but this time I want to check something different – linq2db library.

What is Linq2db

Let’s check how Linq2db is described on its website:

LINQ to DB is the fastest LINQ database access library offering a simple, light, fast, and type-safe layer between your POCO objects and your database.

Sounds impressive and recently I discovered that there is an EF Core extensions package linq2db.EntityFrameworkCore, which is integrated with EF Core and enriches a DbContext with some cool features.

The most interesting are: 

  • Bulk copy (bulk insert)
  • Fast Eager Loading (incomparable faster on massive Include query)
  • MERGE statement support
  • Temporary Tables support
  • Full-Text Search extensions
  • and a few more

Let’s write some code

In the PrimeHotel project, I already implemented SqlBulkCopy method to insert profiles to db. First I generate profiles with Bogus library and then insert them. In this example, I’m going to use 3 methods from ProfileController:

  • GenerateAndInsert – implemented with pure EF Core
  • GenerateAndInsertWithSqlCopy – implemented with SqlBulkCopy class
  • GenerateAndInsertWithLinq2db – implemented with Linq2db

Let me show you quickly how those 3 methods look like. The first one is GenerateAndInsert, implemented with pure Entity Framework Core 5.

[HttpPost("GenerateAndInsert")]
public async Task<IActionResult> GenerateAndInsert([FromBody] int count = 1000)
{
    Stopwatch s = new Stopwatch();
    s.Start();

    var profiles = GenerateProfiles(count);
    var gererationTime = s.Elapsed.ToString();
    s.Restart();

    primeDbContext.Profiles.AddRange(profiles);
    var insertedCount = await primeDbContext.SaveChangesAsync();

    return Ok(new {
            inserted = insertedCount,
            generationTime = gererationTime,
            insertTime = s.Elapsed.ToString()
        });
}

I use a Stopwatch class to measure how long does it take to generate profiles with the GenerateProfiles method and how long does it take to insert them. 

GenerateAndInsertWithSqlCopy is implemented with SqlBulkCopy class:

[HttpPost("GenerateAndInsertWithSqlCopy")]
public async Task<IActionResult> GenerateAndInsertWithSqlCopy([FromBody] int count = 1000)
{
    Stopwatch s = new Stopwatch();
    s.Start();

    var profiles = GenerateProfiles(count);
    var gererationTime = s.Elapsed.ToString();
    s.Restart();

    var dt = new DataTable();
    dt.Columns.Add("Id");
    dt.Columns.Add("Ref");
    dt.Columns.Add("Forename");
    dt.Columns.Add("Surname");
    dt.Columns.Add("Email");
    dt.Columns.Add("TelNo");
    dt.Columns.Add("DateOfBirth");

    foreach (var profile in profiles)
    {
        dt.Rows.Add(string.Empty, profile.Ref, profile.Forename, profile.Surname, profile.Email, profile.TelNo, profile.DateOfBirth);
    }

    using var sqlBulk = new SqlBulkCopy(connectionString);
    sqlBulk.DestinationTableName = "Profiles";
    await sqlBulk.WriteToServerAsync(dt);

    return Ok(new
    {
        inserted = dt.Rows.Count,
        generationTime = gererationTime,
        insertTime = s.Elapsed.ToString()
    });
}

Notice that this implementation is much longer and I needed to create DataTable object, to pass my data as a table.

And finally, GenerateAndInsertWithLinq2db implementation, which uses linq2db library.

[HttpPost("GenerateAndInsertWithLinq2db")]
public async Task<IActionResult> GenerateAndInsertWithLinq2db([FromBody] int count = 1000)
{
    Stopwatch s = new Stopwatch();
    s.Start();

    var profiles = GenerateProfiles(count);
    var gererationTime = s.Elapsed.ToString();
    s.Restart();

    using (var db = primeDbContext.CreateLinqToDbConnection())
    {
        await db.BulkCopyAsync(new BulkCopyOptions { TableName = "Profiles" }, profiles);
    }

    return Ok(new
    {
        inserted = profiles.Count(),
        generationTime = gererationTime,
        insertTime = s.Elapsed.ToString()
    });
}

This method is almost as short as EF Core one, but it creates a DataConnection with CreateLinqToDbConnection method.

The results

I compared those 3 methods with inserting 1k, 10k, 50k, 100k and 500k records. How fast would it be? Let’s check.

  EF Core Bulk insert linq2db bulk insert
1000 0.22 0.035 0.048
10000 1.96 0.2 0.318
50000 9.63 0.985 1.54
100000 19.35 1.79 3.18
500000 104 9.47 16.56

Here is a table with results in seconds. EF Core itself isn’t impressive at all, but combined with Linq2db library is almost as fast as a bulk copy.

And here is the chart, the lower value, the better.

Funny thing – while testing I noticed, that generating test data is actually slower than inserting to DB, wow 😀

The summary

Linq2db is an impressive library and already offers a lot. From GitHub, it seems that it is a well-established project with a bunch of contributors. Knowing that I’m surprised that I haven’t come across it earlier. 

A bulk insert with linq2db is almost as fast as using SqlBulkCopy class but is much cleaner and shorter. It is also less error-prone and I would definitely use it in my projects.

All code posted here is available on my GitHub.

Hope it will be useful for you as well, cheers 😄

Entity Framework Core 5 vs SQLBulkCopy

Entity Framework Core 5 is a great ORM to use and connect to the database with. It is easy to use and easy to understand. It offers just enough for the most common scenarios. So what about inserting big amounts of data in a one go? Would it be fast enough?

Let’s have a look at the code

As my example, I’ll take a very simple entity – a Profile and PrimeHotel repository available here at my GitHub

My DbContext is very simple and it looks like this:

    public class PrimeDbContext : DbContext
    {
        public PrimeDbContext(DbContextOptions<PrimeDbContext> options)
            : base(options)
        {
        }

        public virtual DbSet<Room> Rooms { get; set; }

        public virtual DbSet<Profile> Profiles { get; set; }

        public virtual DbSet<Reservation> Reservations { get; set; }
    }

And Profile entity looks like this:

    public class Profile
    {
        public int Id { get; set; }

        public string Ref { get; set; }

        public string Forename { get; set; }

        public string Surname { get; set; }

        public string TelNo { get; set; }

        public string Email { get; set; }

        public DateTime? DateOfBirth { get; set; }
    }

Because I’ll be using WebApi for the ease of demonstration, I’ll create a ProfileController.

    [ApiController]
    [Route("[controller]")]
    public class ProfileController : ControllerBase
    {
        private readonly PrimeDbContext primeDbContext;
        private readonly string connectionString;

        public ProfileController(PrimeDbContext _primeDbContext, IConfiguration _configuration)
        {
            connectionString = _configuration.GetConnectionString("HotelDB");
            primeDbContext = _primeDbContext;
        }
    }

For now, it’s pretty empty, but now you get the base that we will start with.

Let’s get profiles… lots of them! 

To test an insert of many entities at once we need to generate a lot of testing data. I like to have my test data as close to real values as possible, so to get those, I’ll use a Bogus nugget package.

Bogus is robust and very easy to use fake data generator. It will generate random values, that will fit a given context, like a surname, age, address, e-mail, company name, and so on. There are dozens of options. Go see for yourself in its documentation

Generating any number of profiles will look like this:

    private IEnumerable<Profile> GenerateProfiles(int count)
    {
        var profileGenerator = new Faker<Profile>()
            .RuleFor(p => p.Ref, v => v.Person.UserName)
            .RuleFor(p => p.Forename, v => v.Person.FirstName)
            .RuleFor(p => p.Surname, v => v.Person.LastName)
            .RuleFor(p => p.Email, v => v.Person.Email)
            .RuleFor(p => p.TelNo, v => v.Person.Phone)
            .RuleFor(p => p.DateOfBirth, v => v.Person.DateOfBirth);

        return profileGenerator.Generate(count);
    }

Inserting profiles with Entity Framework Core 5

I don’t want to send all those profiles in a request, because that would be a huge amount of data. Transferring that to a controller and deserialization on the ASP.NET Core 5 side can take a while, and it’s not really the part I want to test. This is why I choose to generate my profiles in the controller method and insert it right after that.

The code for the whole thing is really straightforward:

    [HttpPost("GenerateAndInsert")]
    public async Task<IActionResult> GenerateAndInsert([FromBody] int count = 1000)
    {
        Stopwatch s = new Stopwatch();
        s.Start();

        var profiles = GenerateProfiles(count);
        var gererationTime = s.Elapsed.ToString();
        s.Restart();

        primeDbContext.Profiles.AddRange(profiles);
        var insertedCount = await primeDbContext.SaveChangesAsync();

        return Ok(new {
                inserted = insertedCount,
                generationTime = gererationTime,
                insertTime = s.Elapsed.ToString()
            });
    }

Additionally, I added a Stopwatch to measure how long does it take to generate profiles as well as insert them. In the end, I’m returning an anonymous type to easily return more than one result at a time.

Finally, let’s test it out. For 1000 profiles I got:

But wait, let’s try something bigger, like 100000 entities:

 

25 seconds? Really? Not that impressive.

What it does underneath? Let’s check with SQL Server Profiler:

exec sp_executesql N'SET NOCOUNT ON;
DECLARE @inserted0 TABLE ([Id] int, [_Position] [int]);
MERGE [Profiles] USING (
VALUES (@p0, @p1, @p2, @p3, @p4, @p5, @p6, 0),
(@p7, @p8, @p9, @p10, @p11, @p12, @p13, 1),
(@p14, @p15, @p16, @p17, @p18, @p19, @p20, 2),
(@p21, @p22, @p23, @p24, @p25, @p26, @p27, 3),
...
) AS i ([DateOfBirth], [Email], [Forename], [Ref], [ReservationId], [Surname], [TelNo], _Position) ON 1=0
WHEN NOT MATCHED THEN
INSERT ([DateOfBirth], [Email], [Forename], [Ref], [ReservationId], [Surname], [TelNo])
VALUES (i.[DateOfBirth], i.[Email], i.[Forename], i.[Ref], i.[ReservationId], i.[Surname], i.[TelNo])
OUTPUT INSERTED.[Id], i._Position
INTO @inserted0;

SELECT [t].[Id] FROM [Profiles] t
INNER JOIN @inserted0 i ON ([t].[Id] = [i].[Id])
ORDER BY [i].[_Position];

',N'@p0 datetime2(7),
@p1 nvarchar(4000),
@p2 nvarchar(4000),
@p3 nvarchar(4000),
@p4 int,
@p5 nvarchar(4000),
...
@p0='1995-02-22 09:40:44.0952799',
@p1=N'Sherri_Orn@gmail.com',
@p2=N'Sherri',
...

SqlBulkCopy to the rescue

SqlBulkCopy is a class that was introduced a while ago, specifically in .Net Framework 2.0 – 18 years ago! SqlBulkCopy will only work to save data in a SQL Server database, but its source can be anything, as long as it’s results can be loaded to DataTable or read by IDataReader.

Let’s have a look at how we can use it in our example.

    [HttpPost("GenerateAndInsertWithSqlCopy")]
    public async Task<IActionResult> GenerateAndInsertWithSqlCopy([FromBody] int count = 1000)
    {
        Stopwatch s = new Stopwatch();
        s.Start();

        var profiles = GenerateProfiles(count);
        var gererationTime = s.Elapsed.ToString();
        s.Restart();

        var dt = new DataTable();
        dt.Columns.Add("Id");
        dt.Columns.Add("Ref");
        dt.Columns.Add("Forename");
        dt.Columns.Add("Surname");
        dt.Columns.Add("Email");
        dt.Columns.Add("TelNo");
        dt.Columns.Add("DateOfBirth");

        foreach (var profile in profiles)
        {
            dt.Rows.Add(string.Empty, profile.Ref, profile.Forename, profile.Surname, profile.Email, profile.TelNo, profile.DateOfBirth);
        }

        using var sqlBulk = new SqlBulkCopy(connectionString);
        sqlBulk.DestinationTableName = "Profiles";
        await sqlBulk.WriteToServerAsync(dt);

        return Ok(new
        {
            inserted = dt.Rows.Count,
            generationTime = gererationTime,
            insertTime = s.Elapsed.ToString()
        });
    }

First, we need to define a DataTable. It needs to represent the Profiles table because this is our destination table that we are going to use.

Then with WriteToServerAsync we are loading profiles to the database. Let’s have a look at how does it looks like in SQL.

select @@trancount; 
SET FMTONLY ON select * from [Profiles] 
SET FMTONLY OFF exec ..sp_tablecollations_100 N'.[Profiles]'

insert bulk [Profiles] (
   [Ref] NVarChar(max) COLLATE SQL_Latin1_General_CP1_CI_AS, 
   [Forename] NVarChar(max) COLLATE SQL_Latin1_General_CP1_CI_AS, 
   [Surname] NVarChar(max) COLLATE SQL_Latin1_General_CP1_CI_AS, 
   [TelNo] NVarChar(max) COLLATE SQL_Latin1_General_CP1_CI_AS, 
   [Email] NVarChar(max) COLLATE SQL_Latin1_General_CP1_CI_AS, 
   [DateOfBirth] DateTime2(7))

How does it work internally? StackOverflow comes with an answer:

SqlBulkCopy does not create a data file. It streams the data table directly from the .Net DataTable object to the server using the available communication protocol (Named Pipes, TCP/IP, etc…) and insert the data to the destination table in bulk using the same technique used by BCP.

Two totally different approaches. What about the performance? Let’s compare those two.

The performance

The outcome of a performance check is not a huge surprise. SqlBulkCopy is meant to insert data fast and it is extremely performant. 

Big differences start to show when you insert more than 10 thousand entities at a time. Over that number, it might be wort to reimplement your code to use SqlBulkCopy instead of Entity Framework Core 5.

What about other operations?

When it comes to handling big amounts of data things are starting to be a little more tricky. You might want to take a look at the database improvements and what data you actually need to operate on. Have in mind that operations on big chunks of data are much faster when done on the database side. 

I had a case once where I needed to perform and update on around a million entities, once a day. I combined a few things and it worked out pretty good.

  • create a temporary table, i.e. T1
  • insert data with SqlBulkCopy
  • perform an update on the database side
  • select any data you might need, i.e. for logging
  • delete a temporary table

I know this is moving business logic inside the database, but some sacrifices are required if this update operation needs to be fast.

All code posted here is available on my GitHub.

Cheers!