Emerging Need of Cloud Native Stream Processing
Enterprise applications are evolved over the last few decades and sharpen itself with the evolution of IT infrastructure and user needs. Now it is the time for its next evolution, there are a lot of companies evaluating and revisiting their enterprise architecture and applications to move into the cloud. As the initial step, they have started converting their services and apps into micro-services. Since stream processing to be one of the key components in the enterprise architecture; we foresee stream processors will also follow the same trend. Emerging cloud-native stream processing software requires characteristics such as being lightweight, loosely coupled and adhering to the agile dev-ops process. However, most of the traditional stream processors are heavy and depends on bulky monolithic technologies which makes them harder moving to the cloud.
[Related Article: The Benefits of Cloud Native ML And AI]
In this blog, I will discuss on Siddhi Stream Processor (https://siddhi.io) which is a 100% open-source cloud-native stream processing agent that allows to implement stream processing applications and run them natively on cloud (i.e, Kubernetes). Siddhi also comes with a set of connectors which can be used to integrate with external systems such as NATS, Kafka, RDBMS, MongoDB, and Prometheus. Let’s understand more about Siddhi and its capabilities below.
What is Siddhi?
Siddhi core is a Complex Event Processing and Stream Processing engine which took birth in 2011. In the Sinhala language [1], ‘Siddhi’ means ‘Events’. Overtime Siddhi engine is integrated with various other open-source and proprietary projects such as Apache Eagle [2], Apache Flink [3], UBER [4] and Punch Platform [5], and many others. Siddhi engine also plays the primary role in various WSO2 products such as WSO2 Stream Processor, WSO2 Data Analytics Server, WSO2 API Management (Traffic Manager component), and WSO2 Identity and Access Management (Adaptive Authentication component). With all the experiences and knowledge gained over the years on building Stream Processing applications, Siddhi team is now working in building distributed Cloud Native Stream Processor powered by the Siddhi engine.
Writing Streaming Applications
Streaming applications help to express the streaming requirement/need. As per Forrester [6], Streaming applications can be used to achieve below things.
- Identify perishable insights
- Continuous integration
- Orchestration of business processes
- Embedded execution of code
- Sense, think, and act in real-time
We could identify various patterns in stream processing based on its usages in numerous domains; these streaming processing patterns help to solve the commonly known stream processing problems and to provide some hints for the users on where to apply what stream processing pattern.
As shown in the above figure, Streaming patterns can be applied from the event collection phase itself. then it can spread across into event cleansing, event transformation, event enrichment, summarization, rule processing, event correlation, trend analysis, machine learning, data pipelining, event publishing, on-demand processing, etc. Each of these patterns requires specific streaming constructs such as windows, patterns, IO connectors and ML algorithms to achieve the requirement. Siddhi natively supports all of these streaming constructs which required for the user to achieve their requirements [7].
There are various ways followed over the past to write streaming applications such as code it by yourself, use a stream processing tool and streaming SQL based stream processor. Streaming SQL is the most used and handy way to write Streaming applications; it gives you the natural feeling of dealing with the events. Please check below example,
Consider the above example streaming application which is related to a healthcare use case where Glucose reading events are received through HTTP source and mapped into a stream called ‘GlucoseReadingStream’. There is a query is written to identify abnormal event occurrence over time; such identified abnormal events are pushed to a stream ‘AbnormalGlucoseReadingStream’. Those events are sent as email alerts to the respective authorities through email sink. If you check the query, you can understand that there is a window defined which accumulates events in-memory for some time (15 minutes) to identify the abnormal Glucose readings.
Cloud Nativeness
Cloud nativeness became one of the unavoidable requirements in the enterprise application world. As explained in the overview, organizations are looking for cloud-native technologies and applications for resilient, responsiveness and scalability. Kubernetes plays an important role in achieving cloud nativeness and various technologies are gathering around Kubernetes to build cloud-native applications; CNCF [8] is born to moderate it. Micro-services are the starting point of the Cloud-Native journey and a lot of applications; especially stateless applications have gone through promising miles in that. Stream Processing applications are so new to Cloud-Native journey and there is no much promising things happened around it; this is maintained because existing Stream Processors are not designed in cloud-native or friendly manner. Below are some of the areas that blocking existing Stream Processors to move in to cloud.
- Majority of the existing Stream Processors are heavy & built on top of bulky monolithic technologies
- Node to node traffic is high
- Based on master-slave architecture
- Streaming applications are stateful
- Cannot scale the nodes independently
As given above, some design/architectural factors of the existing Stream Processors is not cloud friendly and it hinders to move into the cloud. Then, Stream Processors should have an architecture which supports cloud deployment. But stateful nature of the streaming applications is unavoidable in stream processing context then it is important to invest time/energy to achieve those requirements in cloud frameworks such as Kubernetes.
Why Siddhi?
Siddhi Stream Processor is designed to build stream processing applications and run them in the cloud. It is built on top Siddhi stream processing library which is lightweight, it can boot up within a few seconds and able to process more than 100K events per second. Due to the micro nature of the Siddhi architecture, it allows you to deploy as containers. Siddhi provides native support to deploy a streaming application in Kubernetes with Siddhi K8s operator. Siddhi Kubernetes deployment patterns are built on top of features provided by Kubernetes and other cloud-native technologies such as NATS and Prometheus.
Siddhi deployment patterns are not based on master-slave architecture and there is no traffic involved between the nodes thus container-based deployment become easy and it leads to deploying Siddhi in Kubernetes natively. In Kubernetes, Siddhi supports to run stateful stream processing applications by involving a messaging system such as NATS and state snapshot persistence with a volume mount.
Now, let’s consider the sample Siddhi application that is discussed earlier. The specific Siddhi application is used to identify abnormal situations in patients and also it performs some stateful event processing to identify such abnormal event occurrences. If we are going to move such stream processing applications to the cloud then we have to consider the high availability of the deployment since it process sensitive data.
Siddhi provides native support to deploy such stateful streaming applications in a highly available manner. Below Kubernetes custom resource definition (CRD) which can be used to deploy a stateful stream processing application in Kubernetes.
Above CRD creates deployment shown in the below image in Kubernetes environment. As you can see, Siddhi streaming application is divided into two child applications; first child application exposes HTTP endpoint to consume Glucose readings related events and those events are pushed to the NATS messaging system. Second child application consumes events from NATS messaging system, continue the processing as per defined query and send email alerts. By following this approach, the deployment avoids event losses and achieves high availability. Please refer the Siddhi Kubernetes distributed deployment guide [9] for the in-depth details on this.
[Related Article: Why Value-Stream-As-A-Service Could Be Your Business’s Next Big Thing]
The Conclusion
Siddhi-io is an open-source project powered by WSO2 Inc. Please check out the project source code in [10], tryout [11] and provide your feedback. You can also reach through the community channel [12] and contribute with your ideas or clarify your queries and doubts. You could try out the Siddhi cloud-native deployment patterns in Katakoda [13].
[1] https://en.wikipedia.org/wiki/Sinhala_language
[2] http://eagle.apache.org/docs/
[3] https://github.com/apache/bahir-flink
[4] https://www.youtube.com/watch?v=nncxYGD6m7E
[5] https://doc.punchplatform.com/4.0.2/src/doc/03_USER_GUIDE/Cep.html?highlight=siddhi
[6] https://www.forrester.com/report/The+Forrester+Wave+Streaming+Analytics+Q3+2017/-/E-RES136545
[7] https://siddhi.io/en/v5.1/docs/query-guide/
[10] https://github.com/siddhi-io/siddhi
[11] https://siddhi.io/en/v5.1/download/
[12] https://siddhi.io/community/
[13] https://www.katacoda.com/siddhi/courses/siddhi-deployment
Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday.