Queue Story 1 — Introduction to Beanstalkg

Vimukthi Wickramasinghe
Vimukthi in Cyberspace
3 min readDec 4, 2016

--

Queues are to the cloud what blood veins and arteries are to the human body. They carry data back and forth between systems in the cloud that needs it. There are number of open source(Apache Kafka, RabbitMQ etc.) or proprietary(Amazon SQS, Google Cloud Pub Sub etc.) queue systems available in the market right now that can be readily used to implement a backend processing system for your cloud application. At my present company we use Beanstalkd. Its an open source queue system with an added benefit, it supports job scheduling. For example you can have a tube(Beanstalkd terminology for a single queue) which serves jobs based on priority(PRI) as well as a time delay(DELAY). While also guaranteeing that a worker retrieving a job gets a unique job and that job is done inside a predefined interval(TTR) or the its taken back into the queue to be rescheduled. Therefore Beanstalkd is not just a message queue its a job queue.

A job queue vs a message queue

There is an important distinction between being a message queue and being a job queue. A job queue is more useful because it adds an extra layer of guarantees on top of what a basic queue is. Those guarantees are,

  1. FIFO — A job queue has to support strict first in first out(FIFO) ordering of the messages it delivers. This is important because unlike a stream of data that needs to get processed, a stream of jobs are generally transactional in nature. For example when a payment processing system needs to process a payment, it needs a guarantee that the actual payment message will arrive in its queue before the refund message for the same payment.
  2. Priority — Unlike ordinary messages that gets processed in a backend system, jobs can have a priority. Take a billing system as an example where it can both record usage of a asset as well as create reports. Obviously a “current usage record job” should take precedence over a “create reports job” in this case since recording usage can not be postponed. This is why a job queue most probably is a priority queue supporting priority delivery for certain messages.
  3. Scheduling — With an ordinary message its rare that you need to specify a time or a delay that it should be visible to receivers after you had put that message to the queue. But a job has certain requirements on when it should be executed. For example a given job is to import files from a report location periodically, the job queue needs to surface the same job periodically for the import worker which does the work. Therefore a job queue needs to support functionality to schedule a job at a given time. Beanstalkd supports this buy allowing senders to specify a delay parameter to a job.
  4. Time to run — This also is a distinctive factor about a job queue, it can guarantee that a certain job has been executed to completion by requiring a receiver of a job to make confirmation before a certain period has elapsed, called the time to run. If the confirmation has not been received within the time to run the queue can safely assume that the job execution failed for some reason and it can reschedule it to be executed again.

Beanstalkd supports all these guarantees and its probably the best at doing this in the market right now(I’m very interested to hear about it if you know of a better alternative).

Beanstalkg

Recently we have been finding out that there are some issues with current implementation of Beanstalkd that makes life difficult at our scale. Which prompted me to rewrite Beanstalkd with the idea to match the scaling requirements we have while also not compromising on the job queue guarantees mentioned above. Its called Beanstalkg(‘g’ at the end because its written in go language :). This blog series is about my thoughts on Beanstalkg design and architecture. Stay tuned for more.

--

--