Home » Batch data vs Streaming data

Batch data vs Streaming data

Batch Data

Batch data is processed in large volumes at specific intervals. For example, imagine an e-commerce company that collects all sales transactions for the day. At 6 PM, after business hours, the ETL (Extract, Transform, Load) process is run to transform and load this data into a data warehouse. This batch processing allows for in-depth analysis of daily sales trends, inventory levels, and customer behavior. The key characteristics of batch data are:

  • Volume: Large chunks of data are processed together.
  • Frequency: Processed at scheduled intervals (e.g., daily, weekly).
  • Latency: Higher latency as data waits to be processed.

Streaming Data

Streaming data, on the other hand, is processed in real-time as it is generated. Consider a financial institution that monitors transactions for fraud detection. Every transaction is analyzed in real-time to identify potential fraud and take immediate action. This continuous processing ensures timely insights and actions. The key characteristics of streaming data are:

Latency: Very low latency, as data is processed immediately.
Velocity: Data flows in continuously.
Frequency: Processed instantly or near real-time.

Other Types of Data

Besides batch and streaming data, there are other types of data that you might want to know:

  1. Real-time Data: Similar to streaming data, but often implies a more immediate need for processing and action. Examples include real-time sensor data in IoT applications.
  2. Near Real-time Data: Data that is processed with minimal delay, but not instantly. This is often used when immediate processing isn’t critical, but quick insights are still needed. An example could be social media analytics updated every few seconds.
  3. Event-driven Data: Data processing that is triggered by specific events or changes in the data. For example, a stock trading system might execute trades based on market conditions.