When using a messaging system distributed across many machines, racks, or even datacenters, things can and will fail. In the case of clients reading messages from a topic in such a system, both the client reading the messages and the broker serving the topic can fail. In the case of such a failure, it is useful to be able to start reading from where you left off once everything recovers, both so that you don't miss messages and also so that you don't have to process messages that have already been processed. This "restart" point is often called an offset. In Apache Pulsar, the recently open-sourced distributed pub-sub messaging system originally created by Yahoo, we call it a cursor. (For a visual explanation of cursors in Pulsar, check out the instructional video below.)
A Pulsar cursor is more than just an offset in a message stream from which you can restart reading after failure. A differentiating feature of Pulsar is selective acking. In an offset-based approach, when you ack a message, you are indicating that you are done with that message as well as all preceding messages in the stream. With selective acking, you can acknowledge individual messages and leave messages earlier in the stream unacknowledged. Pulsar cursors support both approaches.
Figure 1. A basic illustration of cursors in Pulsar
In Pulsar, each subscription for a topic has a cursor. This cursor is updated any time the subscriber acks a message on the topic. Updating the cursor ensures that the subscriber will not receive that message again — even when the subscriber crashes, recovers, and reattaches to the subscription.
With cursors you have a method to start reading from where you left off, but cursors introduce their own problem: where do you store them?
You could store them on the client, but that requires that you make your client stateful, and if that client crashes unrecoverably, that subscriber is lost forever. You could store them in ZooKeeper, which you already have access to if you're running Pulsar. But if you have thousands of topics with thousands of subscribers, you'll quickly overwhelm the write capacity of ZooKeeper and your instance will fall over, taking Pulsar down with it. You could store them on something like HBase, MongoDB, or another strongly consistent data store, but this introduces another system to maintain and provision for, as well as another point of failure for your messaging system.
Fortunately, Pulsar takes care of cursor management for you,1 and it does it in a scalable manner using a system already at your disposal: Apache BookKeeper.
How Pulsar uses BookKeeper to store cursors is quite simple. The broker maintains a BookKeeper ledger for each subscriber, called the cursor ledger. A ledger can be thought of as an append-only replicated log.
When a subscriber has read a message and processed it, it sends an acknowledgement message to the broker. The broker then writes an update into the cursor ledger for that subscription. If the broker then fails, it recovers the cursor for the subscription by reading the last entry of the cursor ledger from BookKeeper.
Figure 2. Multiple subscribers on a topic
BookKeeper ledgers are append only, and a ledger cannot be partially deleted. To avoid having the cursor ledger grow forever, Pulsar periodically creates a new cursor ledger, writes the last entry of the current ledger as the first entry of the new ledger and swaps it in. As only the last entry of the cursor ledger is of any interest, the rest of the data can be discarded without issue.
I've also put together a YouTube video to walk you through how cursors function in the Apache Pulsar messaging system.
1. Pulsar also allows you to manage your cursor yourself if you're into that sort of thing, but we won't cover that here in this post.