Monthly Archives: March 2018

Azure Cosmos DB – key-value database in the cloud

Azure CosmosDB table API is a key-value storage hosted in the cloud. It’s a part of Azure Cosmos DB, that is Microsoft’s multi-model database. It’s a globally distributed, low latency, high throughput solution with client SDKs available for .NET, Java, Python, and Node.js.

Interesting thing is that Microsoft guarantees that for a typical 1KB item read will take under 10ms and indexed writes under 15ms, where’s median is under 5ms with 99.99% availability SLA.

azure-cosmos-db

Image from https://docs.microsoft.com/en-us/azure/cosmos-db/media/introduction/

Why NoSQL?

First of all, if you’re not that familiar with the differences between relational and non-relational databases, go have a look at my short article about it: https://www.michalbialecki.com/2018/03/16/relational-vs-non-relational-databases/

NoSQL databases are databases where data are kept without taking care of relations, consistency, and transactions. The most important thing here is scalability and performance. They gained it’s popularity thanks to Web 2.0 companies like Facebook, Google, and Amazon.

Different data organization – data can be kept is a few different forms, like key-value pairs, columns, documents or graphs.

No data consistency – there are no triggers, foreign keys, relations to guard data consistency, an application needs to be prepared for that.

Horizontal scaling – easily scaling by adding more machines, not by adding more power to an existing machine.

What is a key-value database

It is a data storage designed for storing simple key-value pairs, where a key is a unique identifier, that has a value assigned to it. Is it a storage similar in concept to dictionary or hashmap. On the contrary from relational databases, key-value databases don’t have predefined structure and every row can have a different collection of fields.

Using CosmosDB Table API

To start using CosmosDB Table API go to Azure Portal and create a CosmosDB account for table storage. Create a table – in my example it’s name is accounts. Then you need to copy the primary connection string – this is all you need.

Now let’s have a look at the simple retrieve operation.

// Create a retrieve operation that takes a customer entity.
TableOperation retrieveOperation = TableOperation.Retrieve<CustomerEntity>("Smith", "Ben");

// Execute the retrieve operation.
TableResult retrievedResult = table.Execute(retrieveOperation);

Notice, that in order to get an entity you needed to provide two keys: Smith and Ben. This is because every table entity have a PartitionKey and RowKey property. RowKey is unique among one PartitionKey and the combination of both is unique per table. This gives you a great opportunity to partition data inside of one table, without a need to build your own thing.

Before starting coding install: Microsoft.Azure.CosmosDB.Table, Microsoft.Azure.DozumentDB and Microsoft.Azure.KeyValue.Core. The last one is Microsoft.Azure.Storage.Common that I installed with v8.6.0-preview version(you need to check include prerelease in nuget package manager to see it). It might work with the newer one, but it is not available when I write this text.

You can create a table client in such a way:

    var storageAccount = CloudStorageAccount.Parse(CosmosBDConnectionString);
    var tableClient = storageAccount.CreateCloudTableClient();
    var accountsTable = tableClient.GetTableReference("accounts");

An entity that I use for my accounts table looks like this:

using Microsoft.Azure.CosmosDB.Table;

namespace MichalBialecki.com.ServiceBus.Examples
{
    public class AccountEntity : TableEntity
    {
        public AccountEntity() { }

        public AccountEntity(string partition, string accountNumber)
        {
            PartitionKey = partition;
            RowKey = accountNumber;
        }

        public double Balance { get; set; }
    }
}

Notice that there is a constructor with PartitionKey and RowKey as parameters – it has to be there in order for entity class to work.

In an example that I wrote, I need to update account balance by the amount I’m given. In order to do that, I need to retrieve an entity and update it if it exists or add it if it doesn’t. The code might look like this:

    private static readonly object _lock = new object();

    public double UpdateAccount(int accountNumber, double amount)
    {
        lock(_lock)
        {
            return UpdateAccountThreadSafe(accountNumber, amount);
        }
    }

    private double UpdateAccountThreadSafe(int accountNumber, double amount)
    {
        var getOperation = TableOperation.Retrieve<AccountEntity>(PartitionKey, accountNumber.ToString());
        var result = accountsTable.Execute(getOperation);
        if (result.Result != null)
        {
            var account = result.Result as AccountEntity;
            account.Balance += amount;
            var replaceOperation = TableOperation.Replace(account);
            accountsTable.Execute(replaceOperation);

            return account.Balance;
        }
        else
        {
            var account = new AccountEntity
            {
                PartitionKey = PartitionKey,
                RowKey = accountNumber.ToString(),
                Balance = amount
            };
            accountsTable.Execute(TableOperation.Insert(account));

            return amount;
        }
    }

I used a locking mechanism so that I’m sure that this operation is atomic. It is because I made this class as a singleton that I want to use in parallel while processing service bus messages.

After reading bunch of messages, my table looks like this. Also Balance is saved there without a need to define it in any schema.

If you’re interested in more simple examples, you can find it at this Microsoft page: https://docs.microsoft.com/en-us/azure/cosmos-db/tutorial-develop-table-dotnet.

If you’re interested in CosmosDB document storage, go to my article about it: https://www.michalbialecki.com/2017/12/30/getting-started-with-cosmosdb-in-azure-with-net-core/

Relational vs non-relational databases

Both relational and non-relational databases represent rather wide variety of possibilities and implementations but I’ll focus on the main differences between the two. First of all, it is about how data is managed. In relational databases you can use SQL, that is simple and lightweight language for writing database scripts. Non-relational databases do not support it, so you might refer to them as NoSQL databases.

The second big difference is structure how data are stored. In relational databases data are divided into tables, that may have relations between them. With the support of primary keys, triggers and functions you are capable creating complex dependencies between tables. This most likely will represent business model and logic inside of the database. NoSQL databases are based on the very simple structures like key-value storage or a graph, that does not support such relations.

Image from: https://codewave.com/insights/nagesh-on-when-to-use-mongodb-and-why/

Short summary

Knowing that upfront lets have a short summary of the two:

[table id=1 /]

What should I use?

If you’re starting with a new project and you’re wondering what to use, it’s a good opportunity to consider using NoSQL database. I would especially encourage you to try it for small and pet projects, because the best known like MongoDB and Couchbase works as SaaS. NoSQL is also relevant for big projects because it offers great scaling and is very easy to set up. On the other hand, non-relational databases may seem too simple or even limited, because of lack of triggers, stored procedures and joins. It may also be a threat to use another data management system, because it brings overhead for the team, to get accustomed to it. However, I strongly recommend trying it.

If you’re interested in NoSQL databases, check out my post about document storage – Azure Cosmos DB: Getting started with CosmosDB in Azure with .NET Core

Getting started with Microsoft Orleans

Microsoft Orleans is a developer-friendly framework for building distributed, high-scale computing applications. It does not require from developer to implement concurrency and data storage model. It requires developer to use predefined code blocks and enforces application to be build in a certain way. As a result Microsoft Orleans empowers developer with a framework with an exceptional performance.

Orleans proved its strengths in many scenarios, where the most recognizable ones are cloud services for Halo 4 and 5 games.

The framework

Microsoft Orleans is a framework that is build as an actor model. It it not a new idea in computer science, thus it originated in 1973. It is a concept of a concurrent model that treats actors as universal primitives. As everything is an object in object oriented programming, here everything is an actor.  An actor is a entity, that when received a message, and can:

  • send finite numer of messages to other actors
  • create finite number of new actors
  • designate the behavior to be used for the next message it receives

Every operation is asynchronous, so that it returns a Task and operations on actors can be handled simultaneously. In Orleans actors are called grains and they are almost singletons, so that it is almost impossible to execute work on the same actor in parallel. Grains can hold and persist it’s state, so that every actor can have it’s own data that it manage. I mentioned that every operation can be executed in parallel and that means, that we are not sure if certain operations will be executed before others. This means that we also cannot be sure, that application state is consistent, so Microsoft assures eventual consistency. We are not sure that application state is correct, but we know it will be eventually. Orleans also handles errors gracefully and if a grain fails, it will be created anew and it’s state will be recovered.

An example

Let’s assume, that e-mail accounts are grains and an operation on an actor is just sending and removing e-mails. Model of an actor can look like this:

Now sending an e-mail will mean, that we need to have at least two e-mail accounts involved.

Every grain is managing it’s own state and no one else can access it. When grain receives a message to send and e-mail, it sends messages to all recipient actors that should be notified and they update their state. Very simple scenario with clear responsibilities. Now if we follow the rule, that everything is an actor, then we can say that e-mail message is also an actor and handles it’s own state and every property of an account can be an actor. It can go as deep as we need to, however simpler solutions are just easier to maintain.

Where can I use it?

Actor model is best suited for data that is well grained, so that actors can be easily identified and their state can be easily decoupled. Accessing data by an actor is instant, because it holds it in memory and the same goes to notifying other actors. Taking that into account, Microsoft Orleans will be most beneficial where application needs to handle many small operations that changes application state. In a traditional storage, in example SQL database, application needs to handle concurrency when accessing the data, where in Orleans data are well divided. You may think that there have to be data updates that changes shared storage, but that’s a matter of changing the way the architecture is planned.

You can think of an actor as a micro-service with it’s own database and message bus to other micro-services. There can be millions of micro-services, but all of them will be unique and will hold it’s own state.

If you’re interested into an introduction by one of an Orleans creator, have a look at this: https://youtu.be/7CWEc8dBH38?t=412

There’s also very good example of usage by NCR company here: https://www.youtube.com/watch?v=hI9hjwwaWBw