If you like what we're working on, please  star us on GitHub. This enables us to continue to give back to the community.

Should I use batch ingestion or streaming?

Anton Knight
Anton KnightAnswered

One of the most essential ideas in big data is the separation between batch processing and stream processing. There is no universally accepted definition for these two terms, but commonly:

  • Data is gathered over time and then supplied into an analytics system in batches under the batch-oriented methodology. The data is gathered in bulk (data warehouse ingestion) and then transmitted for analysis.
  • The streaming approach involves a gradual introduction of data into analysis tools. In most cases, processing occurs instantaneously.

As far as definitions go, those are the bare bones. Let’s take a look at which you should use to better demonstrate the point.

Cases and Applications of Batch Ingestion

A data ingestion strategy is employed when the quantity of data to be processed is too great, or when the data sources are older systems that cannot supply data in streams.

Information produced by mainframes is often handled in batches. It is usually impractical to transform data stored on a mainframe into streaming data due to the time required to access and integrate the data into current analytics platforms.

Although data streams might entail “huge” data, data ingestion is not strictly required for working with vast amounts of it, and is useful when real time analytics findings are not required and processing massive volumes of information is more essential than getting rapid analytics results.

Use cases:

  • Customer Payroll and Orders¬†
  • Billing Orders

Cases and Applications of Stream Processing

If you need analytics findings in real time, stream processing is essential. Data streams allow you to send newly created data into analytics tools in near real time.

There are several applications for stream processing, notably in fraud detection. Anomalies in transaction data indicative of fraud may be spotted in real time using stream processing, allowing you to halt potentially fraudulent transactions in their tracks.

Stream processing examples:

  • Detection of Fraud
  • An Evaluation of Public Opinion Using Social Media
  • Checking the Logs
  • Examining buying patterns
Open source package for ml validation

Build Test Suites for ML Models & Data with Deepchecks

Get StartedOur GithubOur Github

Subscribe to Our Newsletter

Do you want to stay informed? Keep up-to-date with industry news, the latest trends in MLOps, and observability of ML systems.