Tag Archives: dapper

Entity Framework Core – is it fast?

Entity Framework Core is a great ORM, that recently reached version 5. Is it fast? Is it faster than it’s predecessor, Entity Framework 6, which still offers slightly more functionality? Let’s check that out.

This comparison was made by Chad Golden, comparing the performance of adding, updating, and deleting 1000 entities. The exact data and code are available on his blog: https://chadgolden.com/blog/comparing-performance-of-ef6-to-ef-core-3

The conclusions are obvious: in almost every test conducted by Chad, Entity Framework Core 3 is faster than Entity Framework 6 – exactly 2.25 to 4.15 times faster! So if performance is important to your application and it operates on large amounts of data, EF Core should be a natural choice.

Is it faster than Dapper?

Dapper is a very popular object-relational mapper and, like EF Core, it facilitates working with the database. It’s called the king of Micro ORM because it’s very fast and does some of the work for us. If we compare EF Core and Dapper, we immediately notice that the capabilities of EF Core are much greater. Microsoft technology allows you to track objects, migrate the database schema, and interact with the database without writing SQL queries. Dapper, on the other hand, maps the objects returned by the database, but all SQL commands have to be written yourself. This certainly allows more freedom in operating the database, but there is a greater risk of making a mistake when writing a SQL query. Similarly to updating the database schema, EF Core can create changes and generate a migration by itself, and in Dapper, you have to manually edit the SQL code.

There is no doubt, however, that Dapper has its supporters, mainly due to its performance. On the blog exceptionnotfound.net we can find a comparison between Entity Framework Core 3 and Dapper version 2.

As you can see, we compare 3 database reads here, where Entity Framework Core with object tracking in one case, non-tracking in the other, and Dapper’s third. Tracking changes to entities in EF Core can be turned off with the AsNoTracking() option, which makes reading operations significantly faster. More information on this test can be found here: https://exceptionnotfound.net/dapper-vs-entity-framework-core-query-performance-benchmarking-2019/

Summary

All in all – Dapper is much faster to read from the database and will certainly be comparatively fast when writing. However, it requires writing SQL queries, which can expose the developer to errors. I have personally used Dapper on several projects, and basically, only one has been dictated by performance. For the simple logic of saving and retrieving data from the database, I would use Entity Framework Core because of its simplicity and convenience in introducing changes.

 

Bulk insert in Dapper

Dapper is a simple object mapper, a nuget package that extends the IDbConnection interface. This powerful package come in handy when writing simple CRUD operations. The thing I struggle from time to time is handling big data with Dapper. When handling hundreds of thousands of objects at once brings a whole variety of performance problems you might run into. I’ll show you today how to handle many inserts with Dapper.

The problem

Let’s have a simple repository, that inserts users into DB. Table in DB will look like this:

Now let’s have a look at the code:

public async Task InsertMany(IEnumerable<string> userNames)
{
    using (var connection = new SqlConnection(ConnectionString))
    {
        await connection.ExecuteAsync(
            "INSERT INTO [Users] (Name, LastUpdatedAt) VALUES (@Name, getdate())",
            userNames.Select(u => new { Name = u })).ConfigureAwait(false);
    }
}

Very simple code, that takes user names and passes a collection of objects to Dapper extension method ExecuteAsync. This is a wonderful shortcut, that instead of one object, you can pass a collection and have this sql run for every object. No need to write a loop for that! But how this is done in Dapper? Lucky for us, Dapper code is open and available on GitHub. In SqlMapper.Async.cs on line 590 you will see:

There is a loop inside the code. Fine, nothing wrong with that… as long as you don’t need to work with big data. With this approach, you end up having a call to DB for every object in the list. We can do it better.

What if we could…

What if we could merge multiple insert sqls into one big sql? This brilliant idea gave me my colleague, Miron. Thanks, bro!:) So instead of having:

We can have:

The limit here is 1000, cause SQL server does not allow to set more values in one insert command. Code gets a bit more complicated, cause we need to create separate sqls for every 1000 users.

public async Task InsertInBulk(IList<string> userNames)
{
    var sqls = GetSqlsInBatches(userNames);
    using (var connection = new SqlConnection(ConnectionString))
    {
        foreach (var sql in sqls)
        {
            await connection.ExecuteAsync(sql);
        }
    }
}

private IList<string> GetSqlsInBatches(IList<string> userNames)
{
    var insertSql = "INSERT INTO [Users] (Name, LastUpdatedAt) VALUES ";
    var valuesSql = "('{0}', getdate())";
    var batchSize = 1000;

    var sqlsToExecute = new List<string>();
    var numberOfBatches = (int)Math.Ceiling((double)userNames.Count / batchSize);

    for (int i = 0; i < numberOfBatches; i++)
    {
        var userToInsert = userNames.Skip(i * batchSize).Take(batchSize);
        var valuesToInsert = userToInsert.Select(u => string.Format(valuesSql, u));
        sqlsToExecute.Add(insertSql + string.Join(',', valuesToInsert));
    }

    return sqlsToExecute;
}

Lets compare!

Code is nice and tidy, but is it faster? To check it I uesd a local database and a simple users name generator. It’s just a random, 10 character string.

public async Task<JsonResult> InsertInBulk(int? number = 100)
{
    var userNames = new List<string>();
    for (int i = 0; i < number; i++)
    {
        userNames.Add(RandomString(10));
    }

    var stopwatch = new Stopwatch();
    stopwatch.Start();

    await _usersRepository.InsertInBulk(userNames);

    stopwatch.Stop();
    return Json(
        new
            {
                users = number,
                time = stopwatch.Elapsed
            });
}

I tested this code for 100, 1000, 10k and 100k. Results surprised me.

The more users I added, the best performance gain I got. For 10k users it 42x and for 100k users it’s 48x improvement in performance. This is awesome!

It’s not safe

Immediately after posting this article, I got comments from you, that this code is not safe. Joining raw strings like that in a SQL statement is a major security flaw, cause it’s exposed for SQL injection. And that is something we need to take care of. So I came up with the code, that Nicholas Paldino suggested in his comment. I used DynamicParameters to pass values to my sql statement.

public async Task SafeInsertMany(IEnumerable<string> userNames)
{
    using (var connection = new SqlConnection(ConnectionString))
    {
        var parameters = userNames.Select(u =>
            {
                var tempParams = new DynamicParameters();
                tempParams.Add("@Name", u, DbType.String, ParameterDirection.Input);
                return tempParams;
            });

        await connection.ExecuteAsync(
            "INSERT INTO [Users] (Name, LastUpdatedAt) VALUES (@Name, getdate())",
            parameters).ConfigureAwait(false);
    }
}

 

This code works fine, however it’s performance is comparable to regular approach. So it is not really a way to insert big amounts of data. An ideal way to go here is to use SQL Bulk Copy and forget about Dapper.

  All code posted here you can find on my GitHub: https://github.com/mikuam/Blog

I know that there is a commercial Dapper extension, that helps with bulk operations. You can have a look here. But wouldn’t it be nice, to have a free nuget package for it? What do you think?

Code review #1 – dapper and varchar parameters

This is a first post about great code review feedback, that I either gave or received. It will always consist of 3 parts: context, review feedback and explanation. You can go ahead and read previous ones here: https://www.michalbialecki.com/2019/06/21/code-reviews/. So lets not wait anymore and get to it.

The context

This is a simple ASP.Net application, that is requesting database to get count of elements filtered by one parameter. In this case we need a number of users providing a country code, that is always two character string.

This is how DB schema looks like:

Code in .net app is written with Dapper nuget package, that extends functionalities of IDbCommand and offers entities mapping with considerably good performance. It looks like this:

public async Task<IEnumerable<UserDto>> GetCountByCountryCode(string countryCode)
{
    using (var connection = new SqlConnection(ConnectionString))
    {
        return await connection.QueryAsync<UserDto>(
            "SELECT count(*) FROM [Users] WHERE CountryCode = @CountryCode",
            new { CountryCode = countryCode }).ConfigureAwait(false);
    }
}

Looks pretty standard, right? What is wrong here then?

Review feedback

Please convert countryCode parameter to ANSI string in GetCountByCountryCode method, cause if you use it like that, it’s not optimal.

Explanation

Notice, that CountryCode in database schema is a varchar(2) and this means that it stores two 1-byte characters. On the contrary nvarchar type is 2-byte per character type, that can store multilingual data. When using .net String type we are using unicode strings by default and therefore if we pass countryCode string to SQL it will have to be converted to ANSI string first.

The correct code should look like this:

public async Task<IEnumerable<UserDto>> GetCountByCountryCodeAsAnsi(string countryCode)
{
    using (var connection = new SqlConnection(ConnectionString))
    {
        return await connection.QueryAsync<UserDto>(
            "SELECT count(*) FROM [Users] WHERE CountryCode = @CountryCode",
            new { CountryCode = new DbString() { Value = countryCode, IsAnsi = true, Length = 2 } })
            .ConfigureAwait(false);
    }
}

If we run SQL Server Profiler and check what requests are we doing, this is what we will get:

As you can see first query needs to convert CountryCode parameter from nvarchar(4000) to varchar(2) in order to compare it.

In order to check how that would impact the performance, I created a SQL table with 1000000(one milion) records and compared results.

Before review it took 242 miliseconds and after review it took only 55 miliseconds. So as you see it is more that 4 times performance improvement in this specific case.

  All code posted here you can find on my GitHub: https://github.com/mikuam/console-app-net-core