CQRS and Event Sourcing in backend systems
Nowadays software development is a very common thing. New apps are born every single day. You can start programming career by loading your brains with one or two related technologies and creating your first todo application for training. Anyway, the real depth of profession and ability to architect outstanding systems require to have some fundamental knowledge. Including understanding of architect approaches in software. Such thoughts led me to the workshop of Dino Esposito. He gave me understanding of CQRS concept. Now I would like to share my vision of it and how it's applicable to the real life.
This post is about CQRS as an architecture approach and Event Sourcing as another approach, but they could cooperate together; and they do usually.
What is CQRS?
If you hear the term for the first time, this is short abbreviation for Seek (if) You Are Ass
I'm kidding of course. To get into details, this is:
Command and Query Responsibility Segregation.
Just try to repeat this phrase in your mind once more, maybe in inverse direction. Segregation of Responsibility between Query and Command. In fact that should explain everything. And I can complete the post, but to understand it, we need to get a bit deeper.
First of all, Command means any workflow, that alters the current state of the system. And Query equals to any other workflow, that reports a view of the current state of the system.
When you change something, it is command; when you get something to view, it is query.
Let's take a look at the scheme to see that separation between command and query.
Developers started to use such an approach back in 2010, when they developed financial systems. They had to be fast and handle many transactions very quickly. So it led to the fact, that canonical approach couldn't work any more in such conditions.
Canonicaly how do you do any operation? You have some code, which peroforms checks, maybe business logic transformation, and then put data to data base. Once it’s completed, return result of data update operation. You return new object with ID to UI. All is clear. The next picture shows the difference.
When you need to perform thousands of such operations in fractions of seconds it becomes challenging to achieve, keeping a fast response to user. Users will need to wait for usual specific operation, which will be completed successfuly in 99% of cases.
So CQRS approach comes upon the stage. You divide your stack in two parallel streams. One - for commands (data entering), which can be async in some sense. The second one - for query (data reading). Both should be potentially very fast and convenient.
Let's think of a simple example. You buy something online, or even easier - you perform money transfer from your bank account. If you want to do the operation, you're 99% sure that you have needed funds. So, application can simply start performing transaction notifying user, that it's successful. In most cases it will be successful. Even if it's not, the system can roll back transaction and report you about an error later. Keeping your initial request fast and convenient.
Users usually would like to process transaction immediately, see positive green label that it's completed. Main thing is a process transaction, that is very fast.
Let's take a look at another picture, demonstrating the second example of any modern social network.
They are dealing with huge amount of data operations - when you post something, like something, comment something... it would be impossible to use canonical model and wait for response when you've liked something and system needs to put data everywhere. When reading it, perform very complicated data gathering.
That's why they use CQRS and different data representations. When you open your facebook page, you see kind of snapshot of what you should see at the moment. It could be pre-calculated for your user account in some temporary storage, and it could not be super fresh and latest. But in some sense it performs well for user.
When you make some action, the code produces range of commands (separately from reading model, and your view is not even updated immediately in a backend): command to put a like for some post, command to notify all post observers about your like, command to produce push notifications for mobile users .... )
Benefits of CQRS approach
- Distinct Optimization
- Scalability Potential
- Simplified Design
- Hassle-Free Stack Enhancement
- Teams Scalability
You can think of scalability for code, too. For example when you need to change something in canonical workflow, usually you will need to change both read and write operations and then test everything.
But in CQRS you can work only on the part of system, not touching other parts. Profit.
You can even scale development team wise. CQRS based project can have one team working on commands stack, and different team for query stack. Which brings segregation in terms of team management as well for bigger projects.
It’s about ensuring that all changes made to the application state during the entire lifetime of the application are stored as a sequence of events.
In fact we are living in the world of events (you did something, you step outside of a track, killed butterfly, world changed and will never roll this change back. Remember Ray Bradbury's novel Sound of a Thunder?).
All we get in life or any system are events. But we tend to think in terms of some models. Because it's easier, it's something we can easily imagine and explain. That's abstraction level, which helps us to use our brain effectively. That helps to scale down a big amount of data and signals from outer world to a simple statement or description of something.
Like in a life of a growing child, there's been a lot of things happening to him, a lot of people were saying something to him, he's doing different things, reacted on something, different events filled his life for years. These all form his personality. And now after all these years of events we can say - he is smart, but stubborn. Tens of years of events turned into a couple of words of explanation.
Now let's think about IT world, a shared google drive document. In fact it was produced through thousands of editions by dozens of people, and google can show us the whole history if needed. Yet it's easier to think of it as the whole text about something, that brings some important information.
And probably the best example in IT world is version control system. It stores all the changes and can replay them. So the changes (events of changing code) are real data. But as a developer you probably work with the latest state of code in some branch.
Probably not every source control system stores changes, but sample serves well to explain the idea.
Important thing to understand events are immutable. We cannot delete or edit them. We can undo them if needed.
I have mentioned all these to highlight the idea of Events as real data of application.
Event Sourcing approach means that we save and keep all events happened to the system and can replay them to get the current state of a model. When you perform some event, it is stored to the event data base (MongoDB, anything). This is our data, but we need something meaningful to show to the user.
There are 2 principal approaches here. First one - to replay all events (according to some filtration maybe) and produce needed aggregation of a model, which can be presented to user.
Some could say, that this is not the best case of performance we can get. And this is true. Such behavior can be complicated and costly for big systems. So, there is another approach.
When we do data change (saving event), we produce a new command, which alters the current model representation. So we store data in two ways - as events with all historical data, and as relational snapshot of data, serving well as a ready-to-use reading model.
Does it look somewhat similar to CQRS, discussed previously?
Such architecture brings us ability to have different projections (views) over the same event store.
EVENT Sourcing Benefits
- Can serve as data storage for other systems
- Consistency is minimal set of rules allowing to keep data consistent
Event Sourcing benefits are inspired by its main idea to store system real data.
EVENT Sourcing Technical Stack
Technically you can use any JSON friendly storage like NoSQL data bases (MongoDB, CosmosDB), or even relational ones, that supports JSON indexing (SQL Server 2016+, PostgreSQL). But there are also special event stores, like NEventStore, open source event store, created for .NET applications.
JSON is mentioned here, because of flexible events structure. Usually we don't have strict structure as in relational data bases. Even one single source can produce different kinds of events with different structure.
Real Life sample of where all it is applicable
Let's see some real life samples, which naturally could complete our understanding of these approaches.
We have already seen the samples of:
- Sample of money transfer for CQRS - Sample of social network
Now I would like to review 3 others:
- Sample of Telematic project for event sourcing - Sample of High Load Reporting for CQRS
Telematic for public transport
Imagine the dashboard project showing speed, error codes and plenty of other pieces of information from tracking devices, installed in public transport vehicles. System requires to store all events and signals from vehicles for month history tracking, but dashboard should show kind of aggregated information according to user selected filters: average speed of buses in some regions during this week.
System is organized as event sourced storage, saving all signals from vehicles. Dashboard gets data model by replaying events for needed period and filtration.
System issues: performance for dashboard. Team is considering to apply CQRS to have pre-calculated data models for dashboard.
System benefit: it serves as full ecosystem for other systems. Customer has a big variety of micro services, which are using the system as source of data.
High Load Reporting
This is practical case of accounting and reporting project. In the very beginning it was designed as regular CRUD application to let accountants enter data and get pdf reports.
But testing has shown, that there are pick high load periods, when all users simultaneously load reports.
Solution was to apply CQRS. During every data change new command is created to update pre-calculated model for report. Storing of report was implemented using MongoDB, because it behaves very fast for reading by ID or similar cases.
I showed you the main idea of CQRS and Event Sourcing approaches, based on real life projects in IT industry. Any theoretical stuff makes real sense, when it is applied to the real projects. I hope you will think of it and even try to implement it in your complicated project scenarios. If you can give me more samples, I would be absolutely happy!
Originally published in Akveo Blog