Many articles about CQRS and Event Sourcing show the most heavyweight implementation of the query side (aka read models), without even mentioning a more lightweight alternative that can even come without the dreaded "problem" of eventual consistency.
Disclaimer: CQRS and Event Sourcing are orthogonal concepts. While typically an ES architecture needs CQRS for the query side, CQRS can be implemented without ES, even without Events, and even without separate data storage (e.g. DB tables or EventStores)!
Now, this implementation undoubtedly has some selling points. Queries can be very fast and the write and read models are almost independent (only coupled by event contracts). But those advantages come at a price:
In reality, this is rarely an issue at all, given the projection of published events into the read model is fairly fast. Still, the whole process from storing events to updating the read models might take a while, even in the range of many seconds or longer. It is important that UI/UX is aware and is also designed with eventual consistency in mind.
If subscription-based event storage like EventStoreDB is used instead of a message queue or streaming platform, this might be simple.
I don't want to dive too deep into all the broken implementations of CQRS/ES systems out there. But I want to mention that the usage of a message broker (or streaming platform) imposes a little dilemma. In such a case there is no single source of truth for events because the message broker could have different data than the Event Store (lost messages, messed up order, and whatnot). Such problems can be avoided by using a grown-up Event Store implementation like EventStoreDB with its subscription model. If no such system is used events must be stored in the Event Store first before they are published! Using the "outbox pattern" is a possibility to get that right (you'll have to research that for yourself in case).
In an events-sourced write model, the current state (the "decision state") gets rehydrated each time a command comes in, by projecting the full event stream of past events into a state that is suitable to decide if the command can be handled and which new event(s) should be recorded as a result.
package customer
import (
"github.com/AntonStoeckl/go-iddd/src/customeraccounts/hexagon/application/domain"
"github.com/AntonStoeckl/go-iddd/src/customeraccounts/hexagon/application/domain/customer/value"
"github.com/AntonStoeckl/go-iddd/src/shared/es"
)
type currentState struct {
id value.CustomerID
personName value.PersonName
emailAddress value.EmailAddress
isDeleted bool
currentStreamVersion uint
}
func buildCurrentStateFrom(eventStream es.EventStream) currentState {
customer := currentState{}
for _, event := range eventStream {
switch actualEvent := event.(type) {
case domain.CustomerRegistered:
customer.id = actualEvent.CustomerID()
customer.personName = actualEvent.PersonName()
customer.emailAddress = actualEvent.EmailAddress()
case domain.CustomerEmailAddressConfirmed:
customer.emailAddress = actualEvent.EmailAddress()
case domain.CustomerEmailAddressChanged:
customer.emailAddress = actualEvent.EmailAddress()
case domain.CustomerNameChanged:
customer.personName = actualEvent.PersonName()
case domain.CustomerDeleted:
customer.isDeleted = true
default:
panic("unknown event " + event.Meta().EventName())
}
customer.currentStreamVersion = event.Meta().StreamVersion()
}
return customer
}
So why not do the same for the read model?
We just need to define a structure for the read model, a view, and rebuild it live from the event stream each time a query arrives.
package customer
import (
"github.com/AntonStoeckl/go-iddd/src/customeraccounts/hexagon/application/domain"
"github.com/AntonStoeckl/go-iddd/src/shared/es"
)
type View struct {
ID string
EmailAddress string
IsEmailAddressConfirmed bool
GivenName string
FamilyName string
IsDeleted bool
Version uint
}
func BuildViewFrom(eventStream es.EventStream) View {
customerView := View{}
for _, event := range eventStream {
switch actualEvent := event.(type) {
case domain.CustomerRegistered:
customerView.ID = actualEvent.CustomerID().String()
customerView.GivenName = actualEvent.PersonName().GivenName()
customerView.FamilyName = actualEvent.PersonName().FamilyName()
customerView.EmailAddress = actualEvent.EmailAddress().String()
case domain.CustomerEmailAddressConfirmed:
customerView.EmailAddress = actualEvent.EmailAddress().String()
customerView.IsEmailAddressConfirmed = true
case domain.CustomerEmailAddressChanged:
customerView.EmailAddress = actualEvent.EmailAddress().String()
customerView.IsEmailAddressConfirmed = false
case domain.CustomerNameChanged:
customerView.GivenName = actualEvent.PersonName().GivenName()
customerView.FamilyName = actualEvent.PersonName().FamilyName()
case domain.CustomerDeleted:
customerView.IsDeleted = true
default:
panic("unknown event " + event.Meta().EventName())
}
customerView.Version = event.Meta().StreamVersion()
}
return customerView
}
If you're wondering why there is an isEmailAddressConfirmed flag in the read model but not in the write model, I'm using the power of the type system here. EmailAddress is just an interface and there are two implementations, ConfirmedEmailAddress and UnconfirmedEmailAddress (Go has no algebraic data types yet).
If that's not already clear, at the very moment a new event was persisted it will also be contained in the event stream that is loaded to answer an incoming query.
Some words about eventual consistency are needed here. As it turns out, all data is stale at the moment it's queried. Even the fastest queries will take a single-digit number of microseconds until the data reaches the consumer who issued them. If the consumers are humans they will need something even in the range of seconds to read and process the information, before they can make a decision based on the data. The true difference with live projections is that no stale data will be read, measured from the moment the latest event was stored in the Event Store because it is immediately consistent in the data storage, not more and not less.
This strategy has the same issue as when it is used for the write models: if the structure of the model changes the previous snapshots are worthless and have to be ignored/removed and the service needs a "cache warming" phase with degraded performance. Often this "cache invalidation" is done by versioning those snapshots, so that old versions get invalid. Generally, cache invalidation depends a lot on the implementation, e.g. for local in-memory caches, a restart of the server node after deployment would do the job.
Well, actually ... why would that matter? If more nodes are needed to scale up queries it should not matter that those nodes also serve the write side. And even if that's an issue, some routing that separates command and query requests could do the job.
I will use the following example for the 3 different scenarios which I will show below:
An online shop like Amazon might have a couple of subdomains involved when we order something there: Order, Payment, Shipping, Invoicing
With our example, we might want to offer a read model where customers can see all the information about a past or ongoing order.
Often in such a situation, the materialized view implementation is used to aggregate events from multiple write models into one read model and this is indeed a powerful strategy!
One idea to solve that with live projections is to aggregate the read models for Order, Payment, Shipping, Invoicing with a special query proxy service that queries those fine-grained read models into a CustomerOrder view. We have done that in a previous job with success. But for sure this comes with the extra complexity of another service that might even have shared ownership from multiple teams. Another fact to consider is the synchronous dependency to multiple read model services to serve one query.
Another possibility is to do the aggregation in the client(s),e.g. website, mobile app, ... but the disadvantage is that each client has to implement the same aggregation.
For sure customers also want to see all their past and ongoing orders! So we need to project all their Order streams into one view. So we want to aggregate multiple streams of the same type that belong to one customer.
One possibility to keep most advantages of live projections could be the following:
There is an extra service to serve those reporting models, which consumes events from all involved write models. But instead of materializing the views it just stores those events and then does live projections if queries arrive.
But: This contains a similar challenge as mentioned above for the heavyweight read model implementation. When the reporting model service is deployed the first time, or when additional views need to consume events from write models that were not there from the beginning, then events from one or multiple write models need to be replayed ...
We are talking about many events here, so very probably some sort of caching (described above) will be necessary to have decent performance.
"Reporting model" is not an "official" term, I just have no better name.
A read model for reporting could be one that the shop uses to build reports that show, for example:
For this purpose, we want to aggregate certain events types, from one or multiple stream types, related to any customer.
This can be solved with the same strategy described for list views, with the same challenges.
I think it's a good idea to start with the live projection strategy that I described and then, if necessary, switch to the full-blown implementation with a separate service for the read model.
The overhead for having the lightweight implementation first should be quite small and the challenge of properly replaying events and the work to invest to implement the heavyweight strategy does not change if you move it to later.
But by postponing it you might save some refinements of the big read model as well as some replay runs that might have been necessary.
Win-Win situation 😀