[update: the script and templates discussed below are available on github: http://github.com/earzur/cacti-beanstalkd-templates]
At Silentale, we use working queues to orchestrate the different services in our platform.
Passing messages between different processes is a pretty common pattern for distributing a process across many computers and our platform makes an extensive use of Beanstalkd, which is a fast, lightweight and robust server that does just that: let us implement working queues.
Monitoring the number of jobs available (ready, in the beanstalkd idiom), the number of workers consuming jobs, and the number of jobs that failed (buried) allows operators to get a pretty good idea of how the platform is performing and add or remove capacity when needed. This is particularly useful in our AWS-based environment, when you can add and remove capacity in no time.
For instance, when sending a new batch of invitations to beta users to test our product, we get significant spikes in our working queues, and may need to temporarily add new servers, and then shut them down when we have finished processing the queues.
The protocol used by beanstalkd is pretty simple and provide ways to easily collect usage statistics about the server itself and every individual queue that you might have defined.
Another tool we use a lot at Silentale is Cacti. By default, Cacti allows to graph data collected through snmp-enabled servers pretty easily, so you can graph metrics about your servers such as network bandwidth, load average, and memory usage. But you are not limited to snmp only, you can also graph data collected through scripts. This is a bit more involved, as the Mysql Cacti templates for cacti demonstrates.
After spending a while looking for an equivalent project that would support beanstalkd and work in our environment, I went on writing my own cacti templates for monitoring our beanstalk queues …
available statistics
The beanstalk protocol provides a lot of different figures about what’s going on in the server both globally and by individual queue. For my little project, I chose to concentrate on the individual queues, as monitoring the number of new jobs in each queue is critical to our operations.The beanstalk-client gem’s ‘tube_stats’ method returns a hash containing a lot of different values, out of which the ones that we want to graph.
- current-jobs-buried
- current-jobs-delayed
- current-jobs-ready
- current-jobs-reserved
- current-jobs-urgent
- current-using
- current-waiting
- current-watching
- total-jobs
#!/usr/bin/env ruby require 'rubygems' require 'beanstalk-client' require 'trollop' script_name = __FILE__.split('/').last opts = Trollop::options do version "#{script_name} © 2009 Silentale S.A." banner <<-EOS display statistics about a queue on a beanstalkd server EOS opt :server, 'beanstalk server address', :type => :string opt :port, 'beanstalk server port (default: 11300)', :type => :integer opt :queue, 'name of the beanstalk queue', :type => :string end Trollop::die :server, 'is mandatory' unless opts[:server] Trollop::die :queue, 'is mandatory' unless opts[:queue] opts[:port] = 11300 unless opts[:port] B = Beanstalk::Connection.new "#{opts[:server]}:#{opts[:port]}" ts=B.stats_tube opts[:queue] ts.delete 'name' result = '' ts.keys.sort.each do |k| result << "#{k}:#{ts[k]} " end puts result
We just connect to the server, issue the tube_stats method, and dump the resulting hash, after having sorted the keys by names.
cacti configuration
The output of the script is pretty straightforward, a single line withname: value pairs, separated by spaces. Cacti can directly parse the output using what is called a data input method. Data input methods link external scripts with data fields that can be used in data templates, then those data templates can be used in graph templates.
One unfortunate problem with cacti is that everything has to be done using the GUI, involving a lot of clicks and typing. I didn’t find any way to circumvent this problem, using scripts or any automated tool.
If you managed to read this far and know of any such tool (or a pointer to cacti’s data model) so we can script such tasks, I will be more than happy to receive comments
data input method
Using cacti’s console, defining a new data input method is straightforward. You just have to create the output fields one by one and map them to the output of the script.As you can see in this screenshot, you need to specify the full path to the data-collection script, specify input parameters between brackets (< and >) (here we define the <server>,<port> and <queue>), and then specify the resulting data field one by one.
Now we need to define a data template, that cacti will use to update the RRD archives and generate the graphs.
data template
The data template maps the output fields of the data input method to “data source items”, which are fields in the RRD archive that will store the collected data.Those fields have to be carefully named, I have used a convention which I found in the MySQL cacti templates. A few letters in capitals, the same for every fields in the template, followed by the name of the parameter. It is extremely useful to do that, because cacti orders the field names alphabetically in the field selection dropdown list, and there can be many of them ! so remember to carefuly name your input fields in this screen !
In this screen you also tell cacti how to collect the input parameters for your script. This way, cacti will allow you to specify default values when you will create graphs, by associating hosts to the graph templates. Checking the box at the left of the input field will make cacti ask for the associated input value in the forms where you create graphs for hosts. Not checking the box will let cacti decide for you. Here, i’ve left the host field blank with the box unchecked to make sure cacti will automatically fill it with host name to which you are attaching a graph using the template.
graph template
Graph templates are pretty simple to define. You specify the list of fields that you want to display in the graphs, the graph style (colors, line size, etc …)In our platform, we want new users, those that just registered, to see messages in their timeline as soon as possible, so we are using a nifty beanstalkd feature which allow us to mark some jobs “urgent” and make sure that those jobs will be handled before the others. Another very nice feature is the ability to delay the jobs for later handling. For instance, if our backend fails at parsing some kind of messages because of encoding issues or whatever. Instead of giving up, we decided to just delay those jobs for a while, get an alert, try to fix the problem in our backend and let the messages flow back in again. That’s the kind of things that beanstalkd and working queues can help you do. I just can’t figure the complexity of doing such things in a monolithic, old-school process !
Let’s define a graph that will display
- the number of jobs ready to be processed
- the number of jobs with the “urgent” flag
- the number of jobs which have been delayed
- the number of processes currently actively picking up jobs from the queue
Then you’ll be presented with a screen where you will have to painfully enter the names of the fields you wish to graph, and you will quickly realise why I insisted on the naming convention for the data fields above …
Almost there !!!!
And now, you just have to associate this graph template to the host running your beanstalk server. After a while, you’ll realize that cacti started collecting data and will display nice graphs showing the state of your beanstalk queues, thanks to rrdtool !Not wanting to disclose too much about our operations, i have edited this screenshot to remove figures and hostnames, but you get the idea. The most interesting one is on the upper right corner, with the crawler queue, where we have a constant pool (green stripe on the lower part) of workers pulling crawling jobs (IMAP,POP,twitter, etc …) and posting every new message they’ve found in another queue whose job is to index them … One of my goals as an operator is to keep all these queues as flat as possible. The whitespace indicates that I need to add some crawling capacity, because some jobs are not performed on-time. They are ultimately, but not within the window of time that we allowed ourselves to operate … we are also currently working on some changes that should help reduce the spikes in that particular graph.
What’s next ?
As you can see, defining custom templates for cacti is a really involved task. You spend a lot of time fiddling with settings and try/error/revert cycles. That’s why I plan to package my templates, make sure they are generic enough to work outside of our own environment and publish the result or contribute it to the MySQL cacti templates project. Stay tuned …Having such monitoring in place is invaluable and has been absolutely necessary for us to keep our service online !
One comment
Cacti templates for beanstalkd « Silentale Devteam
wrote on 4 January 2010
[...] erwan in sysadminTags: beanstalkd, cacti, github, templates Following on the previous post about monitoring beanstalkd with cacti, I have just uploaded the corresponding templates and script to [...]