As Event Sourcing gains traction as an application persistence pattern, moving from a niche application and into the mainstream line of business applications, there are some questions that need to be answered:
These questions are harder to respond to than they initially seem. The main assumption I often hear is that it is simple to build a store that allows the appending of events and reading those back. It simply takes "Just one or two tables in a relational database".
Unfortunately, those two operations are not enough in real-life scenarios as they are far too limiting for the type of systems we would want to build.
We want to be able to build systems that integrate nicely with Message & Event-Driven architecture, that embrace the reactive nature of such systems. And, that also can be used in the serverless cloud world as well as on-premise.
The other aspect to consider is education. How do we help large numbers of newcomers to easily ingest all the information needed to be successful at building event-sourced systems? How do we simplify learning? By establishing boundaries in which event sourcing and event stream databases are enclosed. The boundary around the database provides access to the complexity through a well-defined external-facing API.
The criteria set for this attempt are:
We need to define what is stored how it's organized and all operations and capabilities that allow to full support of a truly event-sourced system.
The utopian answer would be to create an SQL ANSI-like standard for event store databases. This article is a draft at such an attempt, and no definitive or formal definitions are provided.
The goals of this article are to:
Take this list lightly, it is certainly not The Definitive List. It exists to help broaden the adoption of events stream databases, compare them and, first and foremost, trigger a discussion inside the community.
The primary data structures are Events
and Streams
.
Events are the fined-grained data we append to the database. In Event-Sourced systems, those are traditionally business events. Nevertheless, from the point of view of the database, those are just data units and might be interpreted as events, commands, or documents by the application.
They have the following attributes:
The Id
of the event can be used for deduplication when appending data to the store.
The Revision
is typically used for optimistic locking.
The Timestamp
in the system metadata should never be used for application-level purposes.
Revision
& Positions
are strictly increasing in their respective streams, they do not need to be necessarily gapless but that would be an added bonus. These numbers are managed by the event store server only.
While this requirement might seem trivial at first glance, this needs to be enforced at various levels of load.
This should hold true when appending events at a sustained rate of 10E3 events/seconds, across multiple clients, on a database containing 10E6 Streams and 1/2 1E9 events: 50 events per stream with 10 million streams.
Combine this with optimistic lock & idempotent appends as well as allowing immediate read-your-own-write semantics and this becomes a hard problem to solve.
The CorrelationId
and CausationId
are considered system metadata, in order to allow the database engine to provide different ordering and query capabilities than just reading a stream in sequence order. For instance, Get all Events with CorrelationId
, Get All Events caused by EventId
In most usages and event stores out there, a stream represents a given entity and are just named strings. The interpretation of the value is then considered an application-level concern. I suggest adding some structure, similar to what exists in the relational database world where a table is part of a database and a schema inside it.
Streams could have the following attributes:
The fully qualified name of an instance stream is then [Schema
].[Category
].[Id
]. This is similar to a fully qualified name of a table in a relational database.
We have other levels of streams as well: [Schema
].[Category
], [Schema
], and All
.
Note using [Schema
].[Category
].[Id
] is just a sample to illustrate the need for a tree-like structure of streams.
Time To Live
and Maximum count
are used for transient data that can be automatically deleted(*) by the database.
When TimeStamp
+ Time To Live
> Current TimeStamp or the number of events in the stream > Maximum Count
the events are eligible for deletion.
When and how these deletions occur is left to the specific database engine implementation.
Having a tree-like structure of streams allows:
Category
names when it makes sense from a modeling perspective. For example, having a Customer.User.1
stream and a PersonalInformation.User.1
stream.(*) Yes you can and should delete data in an event store. Deleting data from the active store should be part of the archiving strategy.
There are 2 broad groups of operations non-streamed and streamed. The streamed operations exist to enable reactive (sub)systems.
Append(Stream,ExpectedRevision, Event[]) -> Result
Appends are transactional and have ACID behavior, this means that appending events to the database is not eventually consistent: you can read-your-own-writes. Events are appended to one and only one instance stream. Either one or more events can be appended to one stream. The ExpectedRevision is used for optimistic locking, if the revision supplied does not match the target stream revision, a concurrency error must be raised.
Event structure
ExpectedRevision
Result
Returning the Revision
is necessary for the next append operation, when using optimistic concurrency.
Returning the Positions
and Revision
also allows for the caller to wait for any reactive components, (projections for instance) until that Revision
or Positions
have been processed.
Idempotency of append operations is one of those less talked about requirements. We wouldn't want to add the same events multiple times because some client application has gone rogue, would we? Unfortunately, that behavior is rarely documented.
Idempotency checks should take the ExpectedRevision
and EventId
into account. If it is Any
no idempotency check is performed. This will result in events being duplicated.
All other optimistic concurrency check levels need well-defined idempotency behavior. And this gets complicated rapidly, here are a few scenarios:
Streams have metadata as well and can be set at any level of the stream
structure. They could be implemented as system streams. Appending metadata to a Stream is allowed even if the target stream does not exist yet in the database. A strict separation of system & application-level metadata should be enforced.
AppendMetadata(Stream, ExpectedRevision, Metadata) -> Result
Stream is any level in the structure.
ExpectedRevision.
Result
It should be possible to read streams either in a forward or backward way. It should be possible to read a stream from a given inclusive Revision
or to read all Event
from a Position
Read(Stream, Direction, Revision) -> Results
Read(Stream, Direction, Position) -> Results
Read(Direction, Position) -> Results
Stream is any level in the structure.
Direction:
Revision can be:
Result is a list of Event
that can be iterated upon.
Behavior:
Reading events from:
Schema
yields, in order, all events from all categories and streams in the Schema
.Category
yields, in order all events from all streams in the Category
.Schema.Category.Id
yields, in order, all events of that particular stream.What happens when there are 2 streams with the same category
and id
but in different schemas
? Customer.User.1
& PersonalInformation.User.1
and the operation is Read("Users.1", Start)
We should probably always use fully qualified stream names to avoid any confusion.
It should be possible to truncate an instance Stream
before a certain Revision. This is especially useful for implementing a "Closing The Book" pattern.
Deleting a stream will allow it to be recreated at a later point. Some stores have the concept of a Tombstone, where a Stream
can never be recreated again.
Truncate(Stream, Revision)-> Results
Delete(Stream)-> Results
There are two broad types of streaming operations needed.
The first type is typically used by long-lived processes. The client starts a subscription, will eventually catch up and get new events as they are appended to the store. Support for single and competing consumers falls into this type of operation.
The second type is more like push notification. This is useful in serverless scenarios initiated by the server when a new event is appended, a notification is sent to a predefined consumer. For example, pushing notifications to SQS, EventGrid, Integration components.
Check if a stream exists. StreamExist(Stream)-> Result
.
Getting the last Revision & Position of a stream. StreamHead(Stream) -> Result
.
Get the last known Position
in the store HeadPosition() -> Position
.
Get All Schemas, Categories, Id, Event type Those are similar to what we can do with information_schema.table
Streams(Filter)-> string[]
.
Get the count of Events between two positions/revisions The count of events between two revisions/positions is not the difference between the number, as some events might have been deleted Count(Stream, Revision, Revision) -> Number
Count(Position , Position) -> Number
.
Revision
seems like an unnecessary concept, it's tied to a specific stream instance. Position
can be used in every place instead, or should it be separated since Revision
is useful to denote that the stream owns that event. While Position
denotes the fact that it's the event location in the hierarchy of streams.
The initial idea for this list has been around for some time already here, well before I joined EventStore. It has been heavily inspired by EventStoreDB since this is my preferred purpose-built store, and by other stores as well.
If you have comments, additions, or any other thoughts on this article, don't hesitate to use our Discuss forum: https://discuss.eventstore.com.