“How does EventStoreDB work with Artificial Intelligence?” is a question that’s increasingly asked by industry analysts, investors and customers alike.
After all, though EventStoreDB isn’t a vector database specifically designed for AI or Machine Learning, it was designed to capture data at its most fundamental level - a sequence of state changes - with the ability to transform, stream and replay data.
These sequences, or streams, can easily be transformed into vectors and other AI/ML friendly data structures. As such, the answer to that question is yes: EventStoreDB is strongly applicable to AI/ML scenarios, just as it is to practically any advanced analytics application.
To quote one CTO in a recent discussion, “EventStoreDB is fundamental to our AI strategy, or should I say EventStoreDB is fundamental, and therefore is fundamental to our AI strategy”.
When I was last raising money for Event Store prior to the current AI boom, we had several AI themed investment funds interested in EventStoreDB purely for its potential in AI/ML scenarios. Data was cited as a key challenge for AI success.
As an event based state transition database, one of the key benefits of EventStoreDB is that no business critical data is lost. Where other databases may only keep current state, EventStoreDB also keeps any state changes as an immutable log. As such, you can access the what, why and when of your real-time data and as no data is overwritten, your data integrity remains intact.
This ordered sequence of events can then be replayed into various models that, once trained, can react in real time.
In this article from October 2021, Wolfgang Werner described how he and his team leveraged EventStoreDB to train a ML model for on-demand pricing of special milling cutters. Read the article here.
Wolfgang and his team realised they had the data at hand to train the model:
“That’s one of the benefits of event-sourced systems: you end up with a treasure trove of data, the event log, that can be interpreted after the fact to answer questions that come up without even having considered the question at design time.”
They leveraged ESDB’s projections capability to replay the data and capture it in a Jupyter notebook:
“For known grinding times, the training data set, we created a projection in EventStoreDB that extracted the grinding times for each order along with the article number of the produced product. Luckily, the grinding times were recorded as part of the grinding.stopped event explicitly. But even if it hadn’t been, it would have been trivial to compute them based on ‘work started’ and ‘work stopped’ time per order.”
The developers who implemented this were not highly experienced data scientists, none of the developers involved with the project had prior experience with machine learning.
"Keeping an ordered sequence of events that can be replayed into various models and then, once trained, to react in real time represents an amazing opportunity to naturally leverage an enterprise's fundamental line of business data in powerful AI/ML scenarios."
Event sourcing (the underlying event log) enabled this opportunity:
“If we hadn’t modelled our work logging solution as an event-sourced system, the data required would not have been available to us. Gathering that data by experiment or simulations alone would have been either prohibitively expensive and/or wouldn’t have given us a data set having the required size and quality.”
As new events come into the system the model can be continuously improved:
“As the event log grows, so does the size of the training data set. By repeating the training process outlined above with an updated training set will allow to further increase the model’s accuracy. Additionally, reviewing predictions, comparing them with actual grinding times, and feeding the result back into the training process can be used as an additional enhancement, if need be. This could even be done in an automated fashion using continuous integration (CI) pipelines, making the model more accurate without human interaction.”
Ultimately, keeping an event log, with the business context (like the Work.Stopped event mentioned above) is powerful for machine learning scenarios - especially combined with projections, replay, and streaming.
Event sourcing as a pattern helps you build the type of applications that meet the requirements of modern systems (distributed, modular, real time, etc) and as a by-product, by persisting events in an event log (the sequence of state changes) provides an ideal underlying data model for AI/ML both from a training and agent perspective.