Introduction to real-time data processing and how it’s different from batch data processing

Introduction

This document will introduce real-time data processing and how it’s different from batch data processing.

Imagine that you’re sitting on a bridge overlooking a river. From this perspective, you are fixed and water is flowing by you, carrying leaves and branches. You can see that objects upstream are coming towards you, but you can only interact with things within your reach from the bridge. Likewise, once a branch is past the bridge, you will lose the opportunity to catch or manipulate it.

Thinking about real time data is much the same. In fact, that’s why we call real time data “streams”. It can be very different from how people are used to using data and desktop applications, which is called “Batch Processing”.

Batch processing

Batch Processing - A way of processing large amounts of data collected over a period of time. In this type of processing function, data is collected, grouped, then processed and the output is sent in a batch or collective response. This type of processing is not time based, and is executed by the batch monitor in the low end of the main memory.

Screen Shot 2022-03-11 at 7.14.22 AM

Batch Processing

Advantages of batch processing

  • Ideal for processing large amounts of data or transactions
  • Increased efficiency over processing each individually
  • Allows a good audit trail
  • Processing can be timed or occur off peak usage times

Real Time Processing

Real Time Processing - Real time processing systems are high speed quick response systems. This is best used in situations where a large number of events need to be processed in a short time. Quick processing returns immediate responses from the system and is tailored to applications where real time data is required.

Screen Shot 2022-03-11 at 7.15.40 AM

Real time processing

Advantages of real time processing

  • No significant delay on responses to processes
  • Information is always up to date
  • Allows the user to make decisions on “live” or “real-time” data

Corva’s system is designed around real time processing. This is important to note when building apps in Dev Center, because many existing algorithms are written around batch processing. In a batch processing structure, we would have a massive data set available to do what we want with but in a real time processing structure we only have the latest record to process. Does this mean these algorithms cant be handled by Corva’s platform?

Yes but it’s not what Corva was designed for really.

As developers we need to make sure we understand the differences in these two types of data processing to effectively create apps that will run efficiently. Yes, we could make an api call for 10,000 records then process those records but is that sustainable and reliable? No.

In conclusion, both methods have their advantages and disadvantages. The main thing we need to remember is Corva is set up for stream data processing and apps built in Dev Center should be designed around this.