Data Architecture and CAP Theorem: Where Does it Clash?

ODSC - Open Data Science
5 min readSep 7, 2023

Editor’s note: Joep Kokkeler is a speaker for ODSC West this Fall. Be sure to check out his talk, “Capturing CAP in a Kappa Data Architecture,” there!

Before diving into different types of data architecture, let’s focus on the cap theorem first. The cap theorem states that in any system (excluding acid transactions for now) you can have two of the following but never all three: consistency, availability, and tolerance. To deal with this, you can make the right choice when choosing an architecture type.

When talking about architecture, people always assume that they know what is meant by which architecture type. But what if you don’t? Let’s say you know the technical implementation of lambda architecture but you don’t know what this means in comparison to other implementations.

Let me explain using an easy analogy.

Everybody’s been to a zoo at one time in their life. There are lots of different animals, and lots of different skill sets are needed to care for these animals by the caretakers. So you need to have the right knowledge to run a zoo or you hire people that know how to take care so that you can enjoy your zoo. With choosing architecture, it is almost the same, as you don’t need to know everything about every implementation but you need to know what you are dealing with.

Monolith

Let’s start with the monolith architecture, as the name suggests, it’s large, it’s old, it takes up a lot of room, and it’s probably an elephant. Specifically, an African bush elephant, the largest elephant around. And there’s one old guy in your crew that knows how to properly take care of it, who knows which buttons he can push and which to stay far away from. 1 of the interns tried to work with the great beast but after putting out many fires in its wake, it’s best that the guy who was born around the same time the term architecture was coined to take care of the monolith.

Microservice

The next one would be the microservice. In a microservice architecture, it is important to create services that all have their own responsibilities. Sometimes those responsibilities spill over and before you know it, you are running a mini monolith. And trying to guide a young grey elephant to where you want it to go is a whole different skill set than running microservices. My best comparison to microservices is a pack of wolves. When in the wild, a pack moves forward in a straight line (to save energy) with everybody in the right order of the pack. And every spot in the pack has a certain responsibility.

There’s an image circulating on the internet where you see a pack in the wild in the snow and the caption says that the wolves in front are sick or old, to take care that they are not falling behind or used in a “buffer” when attacked. Taking in mind that moving in snow costs a lot of energy, it doesn’t make sense to have an already weakened wolf in the front. Same as with your micro services: the first service may be a bit bloated but it does do the heavy lifting for the others. The lambda architecture is known for its small component size and that you always want more of them.

So it’s more comparable to a group of meerkats. A group of meerkats is called a mob, which I like a lot more. Instead of saying that you run multiple groups of lambdas, you have a number of mobs under your wing. The thing with meerkats and with lambdas is that you have to make the responsibility as small as you can, or else the meerkat would just leave it as is. In the past, it would take the meerkat a couple of minutes to actually do the thing that you wanted it to do, but we got around this issue by feeding it with stuff to do so it never stops. This makes it easy for the meerkat to do its job and make sure it will pass the results of the job to the next meerkat, and you’ll have a lot of mobs under your control doing what you want them to do.

Kappa

Last but not least, the kappa architecture. Imagine a fat penguin, one of those guys from “Escape to Madagascar.” Penguins always need to move in groups, so inside your kappa architecture, there are a lot of groups of penguins just smiling and waving. But we actually want our architecture to be really fast and ready clever. So imagine the same penguin but now with a rocket attached to its back. That’s the foundation of our kappa architecture. A lot of penguins with a lot of rockets attached to them. Please note that you don’t have to be a rocket scientist to work on a kappa architecture.

Conclusion

Now we have a view of what kind of beasts we are looking at, how can we implement one of those architectures to have all three of the components of the Cap Theorem? To know what’s the best beast (or a penguin with a rocket) for the cap theorem, come see me at ODSC West in San
Francisco for a more detailed explanation of this problem and to see me do a live demo on the different implementations.

About the Author:

Joep Kokkeler has more than 12 years of experience in developing, engineering, architecting, and visualizing data products in various markets ranging from energy to clothing manufacturing. He’s focused on enabling teams to be better at handling data and providing the teams with the tools and knowledge needed to go live and stay in production.

He was a member of the Teqnation program committee, did a presentation on Kafka and Hue usage during football, developing and deploying on Hololens, Total Devops using Gitlab, Evolution of a data science product, using the elastic stack from PoC to Production, Xbox Kinect on a bike at Devoxx London.

Originally posted on OpenDataScience.com

Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday. You can also get data science training on-demand wherever you are with our Ai+ Training platform. Interested in attending an ODSC event? Learn more about our upcoming events here.

--

--

ODSC - Open Data Science

Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.