Event Store Blog

EventStoreDB 21.2.0 Released

Written by Oskar Dudycz | Feb 26, 2021 1:06:51 PM

We're happy to announce that 21.2 EventStoreDB version has been released. In the last quarter, we focused on delivering stable versions of gRPC clients. We also worked hard to make the upgrade from the older version more comfortable. As always, we also included a set of enhancements and bug fixes to make your dev and operations experience better. The Release Notes are available here, but we'll go over the changes more generally in this post. 

gRPC Clients

NodeJS, Java and Rust clients got the official v1 versions (.NET was already non-preview). We're continuing our effort to have GO and Haskell clients join the v1 release party soon.

We'll introduce each client in a separate post to give you more details about their specifics. We work hard to make sure that all of our clients have the same developer experience. Our goal is to have a consistent naming and feature set while following the different environment conventions. E.g. GO, and Rust do not throw exceptions but return errors. We don't want to break this convention just for the sake of keeping all clients the same with the lowest common denominator.

NodeJS supports both JavaScript and TypeScript with full-fledged type support.

The Java client supports all Java versions from 8. You can also use it in Scala, Kotlin and Groovy.

The .NET client with the latest version has the support of .NET 5. For this reason, you can benefit from the latest .NET Framework improvements.

We provided unified documentation with snippets for the C#, Java, NodeJS and Rust clients. Check here for more information.

Keepalive pings

The reliability of the connection between the client application and database is crucial for the stability of the solution. If the network is not stable or has some periodic issues, the client may drop the connection. Stability is essential for the stream subscriptions where a client is listening to database notifications. Having an existing connection open when an app resumes activity allows for the initial gRPC calls to be made quickly, without any delay caused by the re-established connection.

We’ve implemented support for the built-in gRPC mechanism for keeping the connection alive. gRPC allows sending http2 pings on the transport to detect if the connection is down. If the other side does not acknowledge the ping within a certain period, the connection will be closed. Note that pings are only necessary when there's no activity on the connection.

We enabled Keepalive pings by default, with the default interval set to 10 seconds. Value bases on the gRPC proposal that suggests this value as the minimum. This value compromises making sure that the connection is open and not making too many redundant network calls.

You can customise the Keepalive settings by the connection string or settings:

  • keepAliveInterval controls the period (in milliseconds) after which a keepalive ping is sent on the transport.
  • keepAliveTimeout controls the amount of time (in milliseconds) the sender of the keepalive ping waits for an acknowledgement. If it does not receive an acknowledgement within this time, it will close the connection.

To disable the Keepalive ping, you need to set the keepAliveInterval value to -1.

As a general rule, we do not recommend putting EventStoreDB behind a load balancer. However, if you are using it and want to benefit from the Keepalive feature, then you should make sure if the compatible settings are properly set. Some load balancers may also override the Keepalive settings. Most of them require setting the idle timeout larger/longer than the keepAliveTimeout. We suggest checking the load balancer documentation before using Keepalive pings.

Compatibility mode

To ease the transition from the older versions of the database server, we introduced the compatibility mode for TCP clients. This feature will allow v20 clients (and further) to connect to nodes with an older database version using gossip discovery mode.

You can specify the compatibility mode using the connection string or connection settings CompatibilityMode parameter.

There are two options to set the compatibility mode:

  1. Manual, specifying the server version you are using, e.g.
"GossipSeeds=192.168.0.2:1111,192.168.0.3:1111; HeartBeatTimeout=500; CompatibilityMode=5"
  1. Automatic, using auto as the CompatibilityMode value. The TCP client will then do their best to understand the node version and apply the needed settings.
"GossipSeeds=192.168.0.2:1111,192.168.0.3:1111; HeartBeatTimeout=500; CompatibilityMode=auto"

Once this is set up, the migration path from v5 to v20 is easier. First, you can upgrade the TCP client to the latest version and set the Compatibility Mode. Later, you can proceed with the cluster upgrade steps:

  1. Take down the cluster,
  2. Perform an in-place upgrade of the nodes, ensuring that the relevant configuration and certificates are set up,
  3. Bring the nodes back online and wait for them to stabilise.

We delivered this feature thanks to James Connor’s contribution. We appreciate all the community work and encourage you to send your pull requests, suggestions, and other types of contribution.

Gossip seeds available for Single Node

With the latest release, all nodes have gossip enabled by default. You can connect using gossip seeds regardless of whether you have a cluster or not.

Heartbeat improvements

EventStoreDB sends heartbeat messages over TCP connections to determine if a client or node is still alive, particularly when the connection is idle. This process is extremely important to maintain the stability and availability of the cluster.

When no data was received within the “heartbeat interval” from the other node we schedule a heartbeat request. If we don’t receive a response from the remote party within the “heartbeat timeout” then the node is considered dead and the connection is closed.

Correctly configuring the heartbeat settings is non-trivial and depends on the network traffic, topology, application characteristics, etc. If you set them too short then you may receive false positives, and if you set them too long, the discovery of dead clients and nodes is slower. Read more in the Heartbeat timeouts documentation.

We found that in a scenario where one side of the connection is sending a lot of data (the sender) while the other side is idle (the receiver), a false-positive heartbeat timeout can occur for the following reasons:

  • The heartbeat request may need to wait behind a lot of other data on the send queue on the sender’s side before being sent and similarly the heartbeat request may wait behind a lot of other data on the receive queue on the receiver’s side before being processed.

  • Since the receiver is already receiving a lot of data, it assumes that the connection is alive and does not find it necessary to schedule any heartbeat request to the sender.

  • The sender’s heartbeat request can eventually take more time than the heartbeat timeout to reach the receiver and be processed causing a false-positive heartbeat timeout to occur.

In the latest database release, we have extended the heartbeat logic by proactively scheduling a heartbeat request from the receiver to the sender to generate some data on the connection and thereby prevent the heartbeat timeout. Thanks to that, even if the responses are received in the longer cadence, but they are coming regularly then the connection will be marked as active and remain open.