As a product manager, you sit at the intersection of UX, business, and technology.
Your existing technology stack may be a limitation to implementing your vision, especially if you are a data-driven product manager confronting the reality of exponentially increasing, and almost unimaginable data growth. Customers require a modern experience, one that feels effortless, intuitive, and personalized. No customer has ever asked for a slower, static, generic experience, and your product vision includes a real-time requirement operating on massive amounts of data to deliver a meaningful new experience to your users.
You run a feasibility study internally and discover limitations in your current batch-first data warehouse stack. You hear the words: 'We have too much data to process. You want real-time but our batch jobs run overnight and take 8 to 10 hours." Your vision becomes a stale, wishful dream that you knew would work if only you had the right tools.
Without the right tools ideas are just that: ideas
The key to augmenting your existing batch-first environment, unlocking the real-time potential of your data, and realizing your dream product consists of three equally important components, all operating in real-time:
- Real-time messaging. Data needs to be moved from sources to processing and applications, reliably with consistent performance and without risk of data loss (i.e. strong durability guarantees).
- Real-time compute. Simply moving data is insufficient; these data must be transformed and joined with other data (both real-time and historical), which requires real-time computation, often in multiple stages.
- Real-time storage. In-memory systems are often prohibitively expensive and unstable, relying on replication alone to avoid data loss in the event of power failures or network connectivity issues, and potentially resulting in inconsistent experiences for end users. In order to be reliable for core use cases, data must be written to disk and acknowledged only after data is replicated to multiple disks. "Okay" solutions are inadequate, and you require enterprise grade.
This transition from a batch-first paradigm to a real-time-first (or streaming-first) paradigm can be accomplished seamlessly using Streamlio, an enterprise-grade, unified, real-time solution integrating messaging, compute, and stream storage. Streamlio augments the existing batch-first infrastructure – typically a monolithic, legacy infrastructure – and makes turn-key real-time infrastructure possible. We seamlessly ingest and process incoming “hot” data (real-time data on the order of milliseconds) and “warm” data (stored data on the order of seconds, days, months, or even a year). Connectors from Streamlio tap into data warehouse sources, transported via data messaging, and allow real-time infrastructure to augment existing data warehouses and data lakes.
Real-time personalization is one example of a real-time use case. Imagine the data-driven physical market of the future, where a shopper places tomatoes, pasta sauce, and garlic bread into his smart-basket and companies are then able to bid, in real-time, on advertisements offering pricing discounts to the shopper’s smartphone. A shopper can choose to make his shopping profile accessible to advertisers, and this historical purchasing profile, combined with demographic information, food allergies, time of day, and real-time basket of physical goods, can drive real-time mobile advertising bids for, say, discounted Italian pasta campaigns. The value of showing him pasta sauce advertisements could have been near zero based on his previous day’s purchase history, but with the real-time basket of tomatoes, pasta sauce, and garlic bread, there is an increased propensity to purchase pasta, hence the increased value of a real-time ad bid.
In the future, local pasta companies can compete on a level playing field with the mega brands
This type of real-time personalization requires messaging, compute, and stream storage as follows:
- The smart physical basket and RFID-tagged good, which combined generate a real-time item ID that is transmitted to an edge computer.
- The real-time basket of item IDs and the shopper’s ID, transmitted (via the messaging system) to a central compute data center where...
- Demographic data, food allergies, and historical purchasing patterns are queried based on user ID and combined with real-time data (via the compute system). And finally..
- Advertisers bid in real-time on the user + feature set of items, with winning bids stored for reliability and payment (via stream storage).
The winning advertiser is then routed directly to the edge node (via messaging) and transmitted to the shopper, with a display notification of discounted pasta brand influencing his purchasing decision, all as he walks through the store and before he has made his next purchase.
Another developing landscape of real-time needs is developing in the Internet of Things and the Industrial Internet of Things (IoT/IIoT). The pattern is again messaging, compute, and stream storage. Sensor data from millions or hundreds of millions of sources flow into edge nodes. These edge nodes require a messaging system to buffer and transport data to a compute system, often within the same node, to aggregate and perform calculations (such as outlier detection) prior to discarding raw data and transmitting transformed data to a central datacenter. The volume of raw data in Iot/IIoT use cases would overwhelm the current network, and is only expected to grow.
One such use case is anomaly detection to prevent a coordinated network attack or bot attack. To discover an attack, the aggregate sensor traffic must be calculated in real-time at the edge node and compared to historical ranges of normal traffic. If raw traffic were sent to the central datacenter, a bot attack could overwhelm the network and cause cascading outages. Preventing such a coordinated attack requires all data to be aggregated and compared against historical norms; if a surge is detected at the edge node, security measures can be deployed to prevent cascading failures.
Solving bot attacks, like the Mirai botnet attack, as soon as they happen requires a new kind of solution
Product managers leveraging real-time data and developing use cases around Smart Cities can combine geolocation sources generated from mobile phones, traffic data from autonomous vehicles and smart surfaces, and weather forecasts to create new products. Vast data are no longer stored, to be queried for simple business intelligence and policy decision making, but rather used in real-time to generate automated real-time functionality with no human in the loop. Automated actions can include optimization of traffic lights to reduce fuel consumption, or linking emergency services directly to sensor data generated from vehicles in an accident to deploy instantly and save lives.
Streamlio is designed to seamlessly integrate messaging, compute and stream storage, the three requirements of real-time. We deliver an end-to-end real-time solution that augments existing data stores and data warehouses, unlocking real-time products that Product Managers have envisioned but could never bring to market because of difficulties updating existing batch-first technology stack to a streaming-first paradigm. Data will continue to increase in volume and velocity, and a batch-first paradigm based on legacy storage is insufficient to deliver the user experience modern customers are expecting. We are the ex-Twitter/Yahoo co-creators of underlying technology, Apache Pulsar (for durable messaging), Heron (for compute), and BookKeeper (for stream storage), with proven reliability at scale.