Introducing Hazeltest

Introducing Hazeltest

What if the release candidate whose production fitness you’re supposed to test is a Helm chart describing a Hazelcast cluster? Well, ideally, there’s a little testing application that puts realistic load on a Hazelcast cluster, thus facilitating the discovery of misconfigurations or other errors that might have creeped into the chart, helping you to assert the release candidate’s fitness more effectively and more comfortably.

If your’re reading this, it’s likely you work in IT like me, and so you may have faced a situation like the following: Something – let’s call it the release candidate – needs to be properly tested before it’s released. Sounds familiar? If so, then you also may have asked yourself a question akin to the following: How can I make sure the release candidate is actually fit for release?

Interestingly, this basic question seems to apply to pretty much every kind of thing supposed to eventually see the light of a production environment: Whenever the notion of production environment is present, potentially with some staging areas leading up to it, all people contributing to the overall state and functionality of that environment need to make sure that whatever it is they are contributing fulfills a certain set of requirements, i.e., they need to make sure their release candidate, represented by some kind of artifact, is fit for release into that production environment.

In the following sections, you’ll learn more about the challenge and then a possible solution for verifying a Hazelcast cluster running on Kubernetes – and hence the Helm chart describing it as the release candidate – is fit for release. Let’s kick this off with some context on the requirements defining fitness in this case so the sections on the challenge and, finally, the solution are more understandable.


Irrespective of the kind of artifact representing the release candidate, there must be some kind of process for asserting the candidate’s production fitness, and this process quite naturally involves some degree of testing. This, in turn, means the quality of the testing being done on the candidate determines to a large degree the quality and weight of the fitness statement – without good tests, one may claim the candidate is fit for release, but that statement really won’t mean much and a poorly tested candidate could make its way to production, potentially causing all kinds of trouble for the overall stability of the production environment and its ability to satisfy requirements.

In recent months, I’ve had a fair share of exposure to Hazelcast, and the release candidates I needed (and need) to test are not source code artifacts (not directly, at least), but the aforementioned Helm charts containing the setup and configuration of the Hazelcast cluster that will eventually be rolled out to the production Kubernetes clusters (or won’t, if the release candidate is not fit for production). Helm is a fantastic way to formulate the deployment contract for any kind of application supposed to run on Kubernetes, and using it to describe a Hazelcast cluster works very well, too, but in a large and complex chart that grew even larger and yet more complex as time passed, things can eventually get a bit difficult to comprehend. In the case of said Hazelcast Helm charts, requirements contributing to their growth in size and complexity originate from three main sources:

Architectural decisions. Those decisions profoundly impact the setup of the Hazelcast clusters and some of its configuration. Most of them have been made before I joined the project, but because those decisions laid the pathway for the cluster setup, they cast their long shadows until this very day. One of the more interesting examples of those decisions was to use a mixed setup of compute-only members (Hazelcast members not holding any data) and full members, rather than going for a full-member-only setup.

Requirements of internal stakeholders. Probably the most obvious part: There are internal stakeholders (largely people responsible for an application or a set of applications that will eventually use the Hazelcast cluster in some way) putting forward requirements corresponding to the unique needs and characteristics of their application or applications. Such requirements concern, for example, the naming scheme for the data structures used in Hazelcast, the expiration and eviction times, as well as the kind and configuration of the memory to be used for data stored in those structures.

Kubernetes resource constraints. Naturally, the Kubernetes clusters running the Hazelcast clusters have a finite amount of resources in terms of CPU, memory, storage, and network bandwidth (the production Kubernetes clusters are absolutely massive, but even their resources can be exhausted). On top of that, Hazelcast is a stateful application (even the compute members not holding any data contribute to the overall state of the Hazelcast cluster and hold the parts of state they need, so even they carry some state information) with heavy-weight members (in terms of resource usage, memory in particular), and it seems the Kubernetes scheduler works best with applications that are neither stateful nor consist of heavy-weight members. An example where this manifested was a situation during the earlier phases of testing different Hazelcast resource configurations: We had configured the Hazelcast cluster to run on fewer, but very heavy-weight members, and the scheduler wasn’t able to distribute the load evenly onto the available worker nodes, so when load was generated on multiple Hazelcast clusters at the same time, a small set of the workers would run into NotReady state due to memory pressure, which resulted in undesirable effects on other applications running on the same workers. The “lessons learned” here was not only the obvious conclusion that more light-weight members work better on our Kubernetes clusters, but also that we need to extend our test scope to make sure a configuration not only works for the Hazelcast cluster, but also for the Kubernetes cluster running it.

With all of those different sources of requirements making the Hazelcast Helm charts ever more complex, how can the fitness of the Helm charts representing the release candidates be asserted?

The Challenge

If I were asked to summarize the above sections in only two words, I’d choose high and complexity. What follows from this complexity is a lot of room for misconfiguration or other errors in the Helm charts leading to undesired behavior of the resulting Hazelcast cluster. Therefore, thoroughly testing the Helm charts – specifically before one of them moves on to production – is not only desirable, but mandatory, and ideally, this testing is conducted in an automated way.

What we have used until recently for most of the testing is an application called PadoGrid, an open source application with source code hosted on GitHub that provides a fantastic playing ground for testing all kinds of data grid and computing technologies (Hazelcast is one of them, but since it’s based on what the developer calls distributed workspaces and pluggable bundles, it also works with other technologies like Spark, Kafka, and Hadoop). There are different sub-programs available in PadoGrid, one of which is the perf_test application for Hazelcast. This handy tool offers the capability of running tests that can be configured by means of text-based properties files that describe the groups and operations to run in scope of a test.

In the early stages of using Hazelcast, defining tests using PadoGrid’s perf_test app was perfectly sufficient – requirements were neither particularly special nor very plentiful, and so all we needed to make sure basically was that the Hazelcast cluster survived getting loaded with data to its maximum capacity and then check the expiration settings and eviction policies defined on the individual data structures worked correctly (and keep an eye out for the memory usage on the underlying Kubernetes workers, specifically when load was generated on multiple Hazelcast clusters simultaneously).

Today, though, many added requirements demand for much more fine-grained testing, which the perf_test application may not have been built to satisfy. For example, when a couple of internal clients reported their application’s getMap() calls executed against a Hazelcast cluster were taking very long to return, detailed logging of each getMap() call in the testing application would have helped greatly, as well as the possibility to quickly scale up the number of clients to a couple of hundred since the behavior only occurred when the Hazelcast cluster in question was under a bit of load (simply scaling up the number of PadoGrid instances didn’t do the trick here due to the way it handles starting its internal applications, like perf_test). In addition to that, the text-based properties format offered by the perf_test application wasn’t powerful enough to formulate complex, long-running tests for simulating realistic load on a Hazelcast cluster.

All in all, then, it was time to amp up our testing game!

The Idea

So why not implement a more specialized application for testing Hazelcast? PadoGrid is a “jack of all trades” kind of application in the realm of data grid and computing technologies – it’s not specialized in running sophisticated load tests on Hazelcast clusters, so it’s obvious there are some limitations, even with the perf_test app at one’s disposal. The testing application, then, should fulfill the following requirements:

Ease of use. The application must provide intelligent testing logic supplied with default configurations that make sense for most use cases so getting started is as simple as firing up the application. A part of this is also a “lessons learned” from working with PadoGrid: Manually copying a test configuration into a running Pod and then starting the perf_test app on it is a huge blocker if you want to scale out the testing application to hundreds of instances in a very short time (we tried building Docker images with the configuration pre-loaded and perf_test as the entrypoint, but it seemed a bit clunky to build a new image for every test configuration).

Flexibility in configuring number of maps used. Whether one instance of the test application uses one map in Hazelcast or ten – or fifty or… – has an impact on the Hazelcast cluster under test in terms of the resources it has to allocate, so the testing possibilities become a lot more numerous and versatile when the test application supports quickly and easily altering the number of maps it interacts with on the Hazelcast cluster.

Elaborate logging. Understanding how complex workflows play out in various load or error scenarios on the client side helps isolate misconfigurations on the Hazelcast cluster side, hence the testing application – acting as the client – should inform elaborately about what it’s doing all the time. This information should be provided in such a way that logging platforms like Splunk can easily index, search, and visualize the data.

Optimized for Kubernetes. At least in the project I currently work in, the testing application will run exclusively on Kubernetes, so it makes sense to design it with this in mind. More specifically, this means the following:

  • Necessity for liveness and readiness probes so the Deployment Controller won’t have to fly blindly when rolling out a new version, for example
  • Small resource footprint so our Kubernetes clusters will be able to run a four-digit number of instances quite comfortably
  • Fast scale-outs (e. g., go from one replica to 500) should happen quickly, so the application must have a very short bootstrap

Intelligent test logic. The freedom PadoGrid’s perf_test app provides its users with through the ability to define tests by means of plain-text properties files is amazing, but comes at the cost that defining long-running, complex tests is tedious. So what if you don’t need that much freedom, but simply a couple of pre-built “test loops” able to generate realistic load on a Hazelcast cluster? This is the basic idea of this last and most important requirement: The application, upon starting up, should immediately engage test components that create realistic and heterogenous load on the Hazelcast cluster under test, and ideally, those components can easily be run in an endless loop and come with reasonable defaults.

Sounds doable, doesn’t it? So, what’s the result?

The Solution: Meet Hazeltest!

The ideas described above, along with a whole lot of coffee, flowed into an application called Hazeltest, which you can find over on my GitHub.


The application is implemented in Golang for the following reasons:

  • Container images for Golang applications tend to be incredibly tiny
  • Golang applications are very fast since they’re compiled directly into a binary file rather than requiring a virtual machine of some sorts to be run
  • The language was built for concurrency
  • Golang is awesome, and I’ve been on the lookout for a useful, real-world project to learn it in, so learning Golang by implementing Hazeltest is really hitting two birds with one stone

So far, the decision to use Golang to implement Hazeltest played out very nicely – the application starts in less than a second on our Kubernetes clusters and its image weighs in at only 7.9 MBs of compressed size, so even hundreds of instances requesting the image from the internal registry isn’t that big of a deal. Plus, goroutines seem to be a really good fit for the individual runners the application offers (more on that in the next paragraph). Most importantly, though, implementing Hazeltest in Golang has been tremendous fun thus far (as you’ll see, there’s still a lot to do, but the experiences with Golang made thus far beg for more).

At the very heart of a testing application sits – no surprise – some kind of testing logic, and in Hazeltest, the basic idea for implementing this logic is the concept of self-sufficient runners: For each data structure available in Hazelcast (maps, queues, topics, …), a set of runners puts load on the Hazelcast cluster. Each runner has its own distinct “feature” in terms of the data it stores in Hazelcast, such that all runners combined create heterogenous load, but since the test loop for one category of runners – for example, those which are dedicated to testing Hazelcast maps – will be more or less the same (within the boundaries of some configuration), all those runners access the same test loop implementation by creating their own instances thereof. Each runner also gets its own goroutine so it’s isolated from the others. The test loop implementation contains the actual interactions with the Hazelcast cluster, like ingesting, reading, and deleting data.

To encapsulate all configuration and a unique data structure to write to Hazelcast in a runner also has a nice side effect on the source code: Adding more runners does not increase complexity beyond simply making the program larger because the runner implementation is completely self-sufficient and so only adds another component to the source code, but introduces no additional coupling to existing components other than the test loop implementation.

To take into account the “ease of use” requirement, Hazeltest immediately starts all available runners after bootstrap is completed, so creating more load on the Hazelcast cluster under test is as simple as spawning more Hazeltest instances. In addition to that, the runners are configurable by means of a simple Yaml file, and each runner comes with sensible defaults so providing all properties is not mandatory.

Current Implementation State

Hazeltest is not yet finished at the time of writing this, but two runners are available, and I have already used Hazeltest to test some of our Hazelcast clusters. Both available runners have been implemented for Hazelcast’s map data structure, and so a basic test loop has also been implemented encapsulating commonly used functionality that both runners rely upon. More specifically, today’s version of Hazeltest offers the features described in the following sections.


The first runner available today is the PokedexRunner, which runs the test loop with the 151 Pokémon of the first-generation Pokédex. It serializes them into a string-based Json structure, which is then saved to Hazelcast. The PokedexRunner is not intended to put a lot of data into Hazelcast (i.e., it is not intended to load-test a Hazelcast cluster in terms of its memory), but instead stresses the CPU. The second available runner, on the other hand, is the LoadRunner, and as its name indicates, it is designed to “load up” the Hazelcast cluster under test with lots of data such as to test the behavior of the cluster once its maximum storage capacity has been reached. As opposed to the PokedexRunner, which is – by nature of the data it works with – restricted to 151 elements in each map, the LoadRunner can be configured arbitrarily regarding the number of elements it should put into each map, and the elements’ size is configurable, too.

Test Loop

Today’s only test loop is designed for maps. Each single loop consists of three simple invocations: Ingest all elements, read all elements, and delete some elements (with the number of elements for the latter operation being chosen randomly). This is a pretty simple test loop, but for now, it gets the job done. The number of iterations each test loop will perform is configurable for each runner, so it’s very easy to configure a runner to execute its test loop for, let’s say, the entire night.


As far as endpoints are concerned, there are liveness and readiness probes as well as a status endpoint that allows interested clients to query the current state of all runners, as well as their configuration.

Configuration Of Number Of Maps Used

As mentioned previously, the number of maps a client queries from a Hazelcast cluster was an important variable in analyzing and reproducing a recent issue, and so I thought it would be a good idea to make the number of maps each Hazeltest instance queries from a Hazelcast cluster by means of getMap() invocations be as easily configurable as possible. The result is three different “settings” on a scale from low number of maps, but large maps all the way to very high number of maps, but smaller maps. I will outline those three different settings in the next blog post, which will give you a short introduction to working with Hazeltest.

Potential Next Features

Right now, Hazeltest is still in a very early phase of development, and there are a couple of things I’m not quite satisfied with yet. Plus, although Hazeltest has already played its desired role in the analysis of a bunch of recent problems, there is so much more room for more features – and so much more room for me to learn Golang while implementing them. Thus, in the weeks to come, I would like to introduce the following:

  • More runners. Right now, Hazeltest offers only two runners, both of which aimed at testing Hazelcast’s map data structure. While introducing more runners for maps will benefit the quality of the testing since more runners mean more testing and traffic versatility, other Hazelcast data structures like queues and topics need to be tested in our clusters, too, so it makes sense to add runners for those data structures first.

  • Improve map test loop. I talked a lot about “intelligent test logic” earlier on, and while the test loop available for maps today gets the job done, it’s not particularly intelligent – while there is randomness in the deletion operation concerning the number of elements to be deleted, the test loop still runs the three same basic operations over and over again, namely, ingest all, read all, and delete some. There is much more room for additional randomness and playing around with the operation order in order to make the test loop generate more realistic load.

  • Reporting. As the basis for test automation, the application needs to offer detailed reports once all of its runners have completed – after all, other tools starting the application (think of a Jenkins CI/CD pipeline) need to know the test results. Ideally, I want the test process for a particular Hazelcast cluster (and hence the Helm chart that produced it) to be completely automated, and more elaborate reporting is the foundation for building higher levels of automation.

Plus, there are some more features I don’t want to talk about just yet, so stay tuned for more.


Hazeltest provides easy-to-use runners for executing test loops against a Hazelcast cluster under test, and along with the possibility to fine-tune the number of maps Hazeltest instances use on Hazelcast it offers useful testing versatility. Because of this versatility, the Helm charts describing each Hazelcast cluster under test can be tested more efficiently, hence the overall process of making sure a Helm chart – representing the release candidate – is actually fit for production becomes smoother, faster, and a lot more convenient.

Golang is of great help in the implementation of Hazeltest – thanks to it, application startup is lightning-fast, launching runners concurrently is a bliss, and the container images the application binary ends up in are super small. All of this means it’s no problem to easily scale out the application to hundreds of instances, thus not only testing the Hazelcast cluster in terms of the correctness of all data structure configurations and its ability to function well even when on full capacity, but also regarding the number of clients the cluster has to serve.

Most importantly, though, writing Hazeltest in Golang and thus learning the programming language has been tremendous fun thus far, and I’m very excited to go further, so expect to see more features added to Hazeltest in the upcoming weeks!