This is part 2 of a series of blog posts that highlights key features of Apache Pulsar (incubating). Apache Pulsar is a next-generation pub-sub messaging system developed at Yahoo. In part 1 of the series, we discussed how Pulsar supports a flexible messaging model, multi-tenancy, geo-replication, and durability. In this post, we'll continue the discussion by showing how Pulsar achieves I/O isolation and scalability and by highlighting its security model, multi-language API, and easy operability.
In most messaging systems, consumer lag can produce general performance degradation. If a consumer on a topic starts lagging, this can affect other consumers that might be going faster and staying at the top of queue. This happens because the slow consumer forces the messaging system to retrieve the data from the storage media, which leads to I/O thrashing and results in very low throughput. This slows down the consumers whose data needs to be brought into memory before it can be served. This happens because reads and writes share a single path of execution.
Pulsar resolves this issue by using Apache BookKeeper as its message storage system. By using BookKeeper, Pulsar is able to provide I/O isolation between reads and writes by using different paths of execution for reads and writes. Regular reads are served directly by Pulsar brokers, while durable writes are made to BookKeeper Write-ahead logs (WALs) and catch-up reads are made from BookKeeper stable storage.
It's important to provide predictable latency for publishing applications under all circumstances. With I/O isolation, you can achieve lower and more predictable publish latency even when disks are saturated due to heavy read activity.
In a messaging system, it's important to be able to scale as more and more data pours into the system. Scalability for a pub-sub system can be across different dimensions:
- High throughput - Pulsar achieves high throughput by driving disks close to their maximum I/O bandwidth. Throughput can vary based on the message size. For example, with a 1 KB message, Pulsar saturates the disk at 120 MB/sec. But for very small messages, such as 10 bytes, we have measured throughput of 1.8M messages/sec. In both the cases, a single publisher is writing to a single topic with one partition and with publish latency below 5ms at the 99pct.
- Number of topics - Topic scalability is the ability of a pub-sub system to support a large number of topics, ranging from hundreds to millions of topics, while continuing to provide consistent performance. The key to scaling the number of topics lies in how the underlying data is organized in stable storage. If the data for a topic is stored in dedicated files or directories, it will have trouble scaling because I/O will be scattered across the disk, as these files will be flushed from the page cache to disk periodically. But since Pulsar data is stored in bookies (BookKeeper servers), messages from different topics are aggregated, sorted, and stored in large files and then indexes. This allows Pulsar to scale to a large number of topics.
Pulsar supports a pluggable authentication mechanism that clients can use to authenticate themselves. Pulsar can also be configured to support multiple authentication providers. The purpose of the authentication provider is to establish the identity of the client and to assign the client a role token. This token is used to determine which operations the client is authorized to perform. Pulsar supports two authentication providers out of the box: TLS Client Authentication and Athenz (a role-based authentication system created by Yahoo).
Applications can interact with Pulsar using multiple languages. Pulsar has official client libraries for three popular languages: C++, Java, and Python. This enables users to write their applications in their language of choice and run them in production. These client APIs are intuitive and consistent across languages. Furthermore, the official Pulsar clients provide support for both synchronous and asynchronous read and write operations to accommodate different application styles. For both synchronous and asynchronous operations, the semantics are exactly the same: either the API method blocks until the operation completes or it returns a future object that can be used to track completion.
Pulsar is very easy to administer. You can easily expand capacity by adding more broker nodes and storage nodes while the system is in operation. If a storage node becomes unavailable, automatically all the entries it contains are marked as under replicated and a background process is automatically triggered to copy the data from available replicas into new storage node. Pulsar provides multiple ways for managing the cluster, using admin CLI tool, using Java admin library and the other using admin REST API. The latter provides flexibility to write your own external tool or incorporate certain operations into an existing tool.
Pulsar was designed at Yahoo! for addressing several gaps in existing open source systems. Several use cases require the ability to handle hundreds of billions of message with high throughput, strong durability guarantees, and low latency. Pulsar has currently been in production at Yahoo! for three years serving 1.4 million topics, providing a consistent publish latency of less than 5ms, deployed globally in 10+ datacenters with full mesh replication, and has processed more than 100 trillion messages to date.
Pulsar is a next-generation pub-sub messaging system designed from the ground up to address numerous gaps in open source messaging systems. In this two-part series, we highlighted a variety of Pulsar features and explained in detail how Pulsar achieves clean I/O separation using brokers (serving nodes) and bookies (storage nodes) and how it supports enterprise-grade features like durability, multi-tenancy, geo-replication, high message throughput, and topic scalability.
This post features contributions from Matteo Merli and Karthik Ramasamy.