What is the difference between stream processing and batch processing?
Stream processing involves the real-time analysis of new data in motion, while batch processing involves a periodic analysis of static information.
In batch processing, data produced in the past and held in a file (such as a SQL database), is scoured, manipulated and reported upon when an analyst initiates the action. The processed information is typically known as data at rest, as it’s not actively changing or moving during analysis.
In contrast, stream processing involves real-time analytics of data in motion. Instead of being stored in a static file, streams of data are processed on the fly as they are generated. Because of its immediacy, stream processing generates a faster, up-to-the-moment picture of a device or a service.
For example, imagine a database that is regularly updated with log files produced by a matrix of sensors that monitor the production line and other systems on a factory floor. In a batch processing system, the database would be polled frequently, with regular reports about the aggregate condition of the sensors and any anomalies that might indicate a problem. For operations analysts and administrators, this information is only as current as the timestamp on the report that’s being run, which is only as current as the database’s most recent update. If the report is run hourly, that means the data could be old or outdated by the time it’s received.
In a stream processing system, stream data is captured and processed in flight, giving operations professionals a real-time look at the conditions on the factory floor. Instead of waiting an hour to discover that a machine has already failed, this information is delivered instantaneously, enabling them to respond proactively.
Also, because stream processing does not rely on the data stored in a database, it can be implemented without the storage requirements. As a low-latency solution, only the most relevant information in the data stream is picked out and stored permanently, while the data that isn’t immediately relevant is stored for future searches, security events or audits.