Archive for the ‘Ruby’ Category

Monitoring beanstalkd with cacti & creating custom cacti templates

[update: the script and templates discussed below are available on github: http://github.com/earzur/cacti-beanstalkd-templates]

At Silentale, we use working queues to orchestrate the different services in our platform.

Passing messages between different processes is a pretty common pattern for distributing a process across many computers and our platform makes an extensive use of Beanstalkd, which is a fast, lightweight and robust server that does just that: let us implement working queues.

Monitoring the number of jobs available (ready, in the beanstalkd idiom), the number of workers consuming jobs, and the number of jobs that failed (buried) allows operators to get a pretty good idea of how the platform is performing and add or remove capacity when needed. This is particularly useful in our AWS-based environment, when you can add and remove capacity in no time.

For instance, when sending a new batch of invitations to beta users to test our product, we get significant spikes in our working queues, and may need to temporarily add new servers, and then shut them down when we have finished processing the queues.

The protocol used by beanstalkd is pretty simple and provide ways to easily collect usage statistics about the server itself and every individual queue that you might have defined.

Another tool we use a lot at Silentale is Cacti. By default, Cacti allows to graph data collected through snmp-enabled servers pretty easily, so you can graph metrics about your servers such as network bandwidth, load average, and memory usage. But you are not limited to snmp only, you can also graph data collected through scripts. This is a bit more involved, as the Mysql Cacti templates for cacti demonstrates.

After spending a while looking for an equivalent project that would support beanstalkd and work in our environment, I went on writing my own cacti templates for monitoring our beanstalk queues …

available statistics

The beanstalk protocol provides a lot of different figures about what’s going on in the server both globally and by individual queue. For my little project, I chose to concentrate on the individual queues, as monitoring the number of new jobs in each queue is critical to our operations.

The beanstalk-client gem’s ‘tube_stats’ method returns a hash containing a lot of different values, out of which the ones that we want to graph.

  • current-jobs-buried
  • current-jobs-delayed
  • current-jobs-ready
  • current-jobs-reserved
  • current-jobs-urgent
  • current-using
  • current-waiting
  • current-watching
  • total-jobs
Collecting those involves writing a very short and simple ruby script:

#!/usr/bin/env ruby
 
require 'rubygems'
require 'beanstalk-client'
require 'trollop'
 
script_name = __FILE__.split('/').last
opts = Trollop::options do
version "#{script_name} © 2009 Silentale S.A."
banner <<-EOS
display statistics about a queue on a beanstalkd server
EOS
opt :server, 'beanstalk server address', :type => :string
opt :port, 'beanstalk server port (default: 11300)', :type => :integer
opt :queue, 'name of the beanstalk queue', :type => :string
end
Trollop::die :server, 'is mandatory' unless opts[:server]
Trollop::die :queue, 'is mandatory' unless opts[:queue]
opts[:port] = 11300 unless opts[:port]
 
B = Beanstalk::Connection.new "#{opts[:server]}:#{opts[:port]}"
 
ts=B.stats_tube opts[:queue]
 
ts.delete 'name'
 
result = ''
ts.keys.sort.each do |k|
result << "#{k}:#{ts[k]} "
end
 
puts result

We just connect to the server, issue the tube_stats method, and dump the resulting hash, after having sorted the keys by names.

cacti configuration

The output of the script is pretty straightforward, a single line with name: value pairs, separated by spaces. Cacti can directly parse the output using what is called a data input method. Data input methods link external scripts with data fields that can be used in data templates, then those data templates can be used in graph templates.

One unfortunate problem with cacti is that everything has to be done using the GUI, involving a lot of clicks and typing. I didn’t find any way to circumvent this problem, using scripts or any automated tool.

If you managed to read this far and know of any such tool (or a pointer to cacti’s data model) so we can script such tasks, I will be more than happy to receive comments ;-)

data input method

Using cacti’s console, defining a new data input method is straightforward. You just have to create the output fields one by one and map them to the output of the script.

Data input method screen

Data input method screen

As you can see in this screenshot, you need to specify the full path to the data-collection script, specify input parameters between brackets (< and >) (here we define the <server>,<port> and <queue>), and then specify the resulting data field one by one.

Now we need to define a data template, that cacti will use to update the RRD archives and generate the graphs.

data template

The data template maps the output fields of the data input method to “data source items”, which are fields in the RRD archive that will store the collected data.

Those fields have to be carefully named, I have used a convention which I found in the MySQL cacti templates. A few letters in capitals, the same for every fields in the template, followed by the name of the parameter. It is extremely useful to do that, because cacti orders the field names alphabetically in the field selection dropdown list, and there can be many of them ! so remember to carefuly name your input fields in this screen !

In this screen you also tell cacti how to collect the input parameters for your script. This way, cacti will allow you to specify default values when you will create graphs, by associating hosts to the graph templates. Checking the box at the left of the input field will make cacti ask for the associated input value in the forms where you create graphs for hosts. Not checking the box will let cacti decide for you. Here, i’ve left the host field blank with the box unchecked to make sure cacti will automatically fill it with host name to which you are attaching a graph using the template.

Bottom of the "data template" screen

Bottom of the "data template" screen

graph template

Graph templates are pretty simple to define. You specify the list of fields that you want to display in the graphs, the graph style (colors, line size, etc …)

In our platform, we want new users, those that just registered, to see messages in their timeline as soon as possible, so we are using a nifty beanstalkd feature which allow us to mark some jobs “urgent” and make sure that those jobs will be handled before the others. Another very nice feature is the ability to delay the jobs for later handling. For instance, if our backend fails at parsing some kind of messages because of encoding issues or whatever. Instead of giving up, we decided to just delay those jobs for a while, get an alert, try to fix the problem in our backend and let the messages flow back in again. That’s the kind of things that beanstalkd and working queues can help you do. I just can’t figure the complexity of doing such things in a monolithic, old-school process !

Let’s define a graph that will display

  • the number of jobs ready to be processed
  • the number of jobs with the “urgent” flag
  • the number of jobs which have been delayed
  • the number of processes currently actively picking up jobs from the queue
Create a new graph template … Most of the default values will work ok, and again, take care of the naming convention here, your graphs will be much easier to pick up when linking them with hosts …

Add graph template screenshot ...

Add graph template screenshot ...

Then you’ll be presented with a screen where you will have to painfully enter the names of the fields you wish to graph, and you will quickly realise why I insisted on the naming convention for the data fields above … ;-)

Graph template with data fields defined

Graph template with data fields defined

Almost there !!!!

And now, you just have to associate this graph template to the host running your beanstalk server. After a while, you’ll realize that cacti started collecting data and will display nice graphs showing the state of your beanstalk queues, thanks to rrdtool !

Not wanting to disclose too much about our operations, i have edited this screenshot to remove figures and hostnames, but you get the idea. The most interesting one is on the upper right corner, with the crawler queue, where we have a constant pool (green stripe on the lower part) of workers pulling crawling jobs (IMAP,POP,twitter, etc …) and posting every new message they’ve found in another queue whose job is to index them … One of my goals as an operator is to keep all these queues as flat as possible. The whitespace indicates that I need to add some crawling capacity, because some jobs are not performed on-time. They are ultimately, but not within the window of time that we allowed ourselves to operate … we are also currently working on some changes that should help reduce the spikes in that particular graph.

resulting graphs in cacti's thumbnail view mode

resulting graphs in cacti's thumbnail view mode

What’s next ?

As you can see, defining custom templates for cacti is a really involved task. You spend a lot of time fiddling with settings and try/error/revert cycles. That’s why I plan to package my templates, make sure they are generic enough to work outside of our own environment and publish the result or contribute it to the MySQL cacti templates project. Stay tuned …

Having such monitoring in place is invaluable and has been absolutely necessary for us to keep our service online !

Class level inheritable attributes (@@ vs @)

First, here is some Ruby code:

class Message
  @content = "Welcome Alexandre and Catalin!"
  @@date = Time.at(0)
 
  def self.content
    @content
  end
 
  def self.date
    @@date
  end
end
 
class Email > Message
end
 
puts Message.content # => "Welcome Alexandre and Catalin!"
puts Email.content # => nil
 
puts Message.date # => "Thu Jan 01 01:00:00 +0100 1970"
puts Email.date # => "Thu Jan 01 01:00:00 +0100 1970"

At first, when you see “@myvariable” in a class method (prefixed by “self”), you tell yourself that, as you are at the class level, using “@myvariable” and “@@my_variable” has the same behaviour.

But you saw in the previous code that it’s not true. In fact, it has to be not true. Why? Because updating “Email.date” will also update “Message.date”.

The excellent post Class and Instance Variables In Ruby by John Nunemaker details all of this perfectly. He suggests a module ClassLevelInheritableAttributes to avoid inheritance side effects when using class variables.

With this module you can inherit from class variables, but setting your own value in your subclass won’t update the value set in the superclass. Cool.

require 'profile'

The Profile concept is pretty common in web applications. To define a Profile class, a Ruby developer usually creates a profile.rb file in its lib directory.

# profile.rb
class Profile
end

To use this class in a executable script, we can add the lib path in the Ruby load path and then require all files without having to always specify the absolute or relative path of the file we want to load.

#!/usr/bin/env ruby
# my_app
 
# Add the 'lib' directory in the Ruby load path
$: << File.expand_path File.join(File.dirname(__FILE__), '..', 'lib')
 
# Load the Profile class
require 'profile'
 
# Check that the Profile class has been loaded
puts 'The Profile class is not loaded' unless defined? Profile

We end up with the following directory structure:

my_app/
  bin/
    my_app
  lib/
    profile.rb

Execute the my_app script and you’ll see this:

The Profile class is not loaded
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
  0.00     0.00      0.00        2     0.00     0.00  IO#write
  0.00     0.00      0.00        1     0.00     0.00  Kernel.puts
  0.00     0.01      0.00        1     0.00    10.00  #toplevel

Not only our shiny new Profile class is not loaded, but also some profiling stuff are written to STDERR.

The problem is that a profile.rb file is already defined in the Ruby distribution. If you use the Ruby distribution provided by Leopard, check this file: /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/profile.rb.

require 'profiler'
 
END {
  Profiler__::print_profile(STDERR)
}
Profiler__::start_profile

It starts the Ruby profiler and shows the result at the end of the program -pretty cool BTW-.

To solve this problem, you have to insert you library directories at the beginning of the Ruby load path. That’s what Ruby gems do. Or you can make sure that you always require files by their complete path.

Make a choice for your whole project, because the require statement will load files twice if you do not always give it the same paths. Example:

# Load the Profile class
require 'profile'
# The 'profile.rb' file is read a second time.
require '/home/marcel/code/my_project/lib/profile'

Loading the same class multiple times can be dangerous if you use aliasmethodchain. It will create an infinite loop.

Be cool with DRb, it's far from "scalable"

When you begin to learn DRb, you quickly land on the famous Chad Fowler page, entitled “Intro to DRb”.

The “Concurrency” chapter is particularly interesting when you want to make a local resource available in the wild, allowing one request at a time on your resource.

So, with DRb, a dash of method_missing and a pinch of mutex, you have a perfect recipe to remotely access and protect your resource… but: DRb is not what we use to call a high availability entry point.

Let’s code the proof. Here is the code of the DRb server, it simulates a long and expensive task:

require 'drb'
 
class Server
 
  def initialize()
    @i = 0
    @mutex = Mutex.new
  end
 
  def method_missing(name, *args)
    @mutex.synchronize do
      @i += 1
      p @i
      sleep 1 # You CPU works very hard here..
      end
    @i
  end
 
end
 
server = DRb.start_service("druby://:34100", Server.new)
p "listening"
server.thread.join

You can now start the server via the ruby server.rb command and begin to code the client. It creates 100 processes, each one calling the DRb server and writing the response on the standard output:

require 'drb'
 
client = DRbObject.new(nil, "druby://:34100")
 
pids = []
 
100.times do
  pids << fork {
  p "#{Time.now}: #{client.call}"
}
end
 
p "#{Time.now}: created the 100 processes"
 
pids.each { |pid| Process.waitpid(pid) }
 
p "#{Time.now}: done"

Launch the client and see what happen:

"Sat Oct 05 11:54:17 +0200 2008: 2"
"Sat Oct 05 11:54:17 +0200 2008: 3"
"Sat Oct 05 11:54:20 +0200 2008: created the 100 processes"
"Sat Oct 05 11:54:17 +0200 2008: 4"
"Sat Oct 05 11:54:17 +0200 2008: 5"
"Sat Oct 05 11:54:17 +0200 2008: 6"
"Sat Oct 05 11:54:17 +0200 2008: 7"
"Sat Oct 05 11:54:17 +0200 2008: 8"
"Sat Oct 05 11:54:17 +0200 2008: 9"
"Sat Oct 05 11:54:18 +0200 2008: 10"
.......
"Sat Oct 05 11:54:18 +0200 2008: 59"
"Sat Oct 05 11:54:18 +0200 2008: 60"
"Sat Oct 05 11:54:19 +0200 2008: 61"
"Sat Oct 05 11:54:19 +0200 2008: 61"
DRb::DRbConnError: druby://:34100 - #
 
method open	in drb.rb at line 736
method each	in drb.rb at line 729
method open	in drb.rb at line 729
method initialize	in drb.rb at line 1189
method new	in drb.rb at line 1169
method open	in drb.rb at line 1169
method method_missing	in drb.rb at line 1085
method with_friend	in drb.rb at line 1103
method method_missing	in drb.rb at line 1084
at top level	in client.rb at line 9
method fork	in client.rb at line 8
at top level	in client.rb at line 8
method times	in client.rb at line 7
at top level	in client.rb at line 7

Almost the 100 processes are created before the DRb server returned its first calculation. After several tries on a PowerBook and an EC2 instance, the DRb server rejects any new client after about 65 simultaneous requests.

But, it’s important to mention that the DRb server did not crash at all. You simply have to wait that it handles the remaining requests.

DRb is pretty good, allowing the Ruby developers to code remote services in no time.

If you need high availability services, your next step could be REST servers, distributed/dispatched thanks to HAProxy or nginx. And of course, you should also take a look at Erlang.

Beware of the Proxy design pattern -read method_missing-

You probably read about how easy it was to implement the Proxy design pattern in Ruby.

Thanks to the Ruby method_missing method, you can pass messages to underlying objects. See the previous article Local resource available in the wild, thanks to DRb for a fully described example.

But there’s one caveat, you have to be very careful when implementing your method_missing method.

Take this code for example:

def method_missing(name, *args, &block)
  # Get the first arg, which contain information about which underlying
  # object to call.
  id = arg[0]
 
  # Call the corresponding underlying object with the first argument removed
  my_underlying_objects[id].__send__(name, *args[1..-1], &block)
end

If you execute this code, you’ll be stuck in an infinite loop. Why ? There’s a typo, one typo which will cause a segmentation fault. I wrote arg[0] instead of args[0].

To detect this problem before it happens, we can take advantage of the Kernel#caller method. It generates the current execution stask. Here is how we can use it to detect that the current object is calling himself:

def method_missing(name, *args, &block)
  # Check that we're not calling 'method_missing' recursively
  if caller.first.include?(__FILE__)
    raise "#{self.class} is calling itself -method #{name}-. Verify that you do not call a non existing method !!"
  end
 
  # Get the first arg, which contain information about which underlying
  # object to call.
  id = args[0]
 
  # Call the corresponding underlying object with the first argument removed
  my_underlying_objects[id].__send__(name, *args[1..-1], &block)
end

That’s all, we just check that the caller method is not in the current file. If your method_missing code become more and more complex, especially if it includes some meta-programming tricks, you’ll feel A LOT safer!

One last thing: Kernel#caller is not what we could call a non-expensive method, you should only use it in development.

Leopard, where are those Ruby gems?

It’s always useful to check the code of those downloaded Ruby gems. You should try for at least two reasons: learning and submitting bugs. It’s always a good idea to give technical details about bugs you encounter. The community is small and responsive, get involved :)

You’re lucky, Leopard user, Apple made a great work embedding Ruby in Mac OS X 10.5. You have the latest version of Ruby, 1.8.6, and RubyGems installed.

Apple guys set a specific directory for all Ruby related things.

/Library/Ruby/Gems/1.8/

This directory contains exploded gems and their documentation. For example, you can examine the content of the ActiveRecord gem and its documentation with those two commands (I hope you use TextMate…).

mate /Library/Ruby/Gems/1.8/gems/activerecord-2.1.0/
open /Library/Ruby/Gems/1.8/doc/activerecord-2.1.0/rdoc/index.html

No need to google “activerecord”, everything’s on you hard drive.

Note: Rubygems has evolved since the release of Leopard. But hopefully, it’s simple to update it. Just download the last version and one simple command will do the work.

Local resource available in the wild, thanks to DRb

You have a resource that you want to share between multiple processes, and it could be a resource persited on the local hard drive, like an index, a persitent hash (Berkeley DB, InfinitiyDB), or simply a file.

With DRb, aka Distributed Ruby, you can share a resource via TCP. DRb will do the annoying job for you: marshalling. And that is COOL, and RMI is NOT COOL.

As usual in Ruby, using a library is as simple as calling the require method. To use DRb in your application, write this:

require 'drb'

In this post, we’ll implement a Remote Hash. It will be accessible to an unlimited number of processes on an unlimited number of computers. Let’s code a simple DRb server for your resource.

class Server
  def start
    print "starting Ferret servers..."
    DRb.start_service("druby://localhost:7000", HashProxy.new)
    puts " done"
  end
 
  def join
    DRb.thread.join
  end
 
  def shutdown
    print "stopping Ferret servers..."
    DRb.stop_service
    puts " done"
  end
end
 
s = Server.new
s.start
trap("INT") {s.shutdown} # Catch CTRL-C to do a clean shutdown of the DRb server
s.join

The instance of HashProxy will be the distributed object between the DRb server and the DRb clients. We call it “proxy” because it will exactly have the same behaviour as the real resource hidden behind it. This is where the method_missing magic happen.

class HashProxy
  def initialize *args
    @local_resource = Hash.new *args
  end
 
  def method_missing(name, *args, &block)
    @local_resource.__send__(name, *args, &block)
  end
end

The Object.send method is an alias to Object.send, to avoid conflics with a possibly existing method named send in the current object or its superclasses or included modules.

There is one problem with this implementation, DRb will, like a web server, handle client requests simultaneously. We have to protect our hash thanks to a mutex. Every clients will have to wait in line to access the remote resource.

require 'thread'
 
class HashProxy
  def initialize *args
    @mutex = Mutex.new
    @local_resource = Hash.new *args
  end
 
  def method_missing(name, *args, &block)
    @mutex.synchronize do
      @local_resource.__send__(name, *args, &block)
    end
  end
end

As we said earlier, each method of HashProxy, and so each method of Hash, is now available to any remote Ruby code, using the HashProxy class, instead of Hash:

class RemoteHash
  def initialize
    @hash_proxy = DRbObject.new(nil,"druby://localhost:7000")
  end
 
  def method_missing(name, *args, &block)
    @hash_proxy.__send__(name, *args, &block)
  end
end

In your application, you’ll use your remote resource like a local resource, without knowing about those network and marshalling things.

h = RemoteHash.new
h[:roger] = 1
h[:moore] = -1
p h[:roger] # => 1

Unfortunately, I couldn’t call methods with blocks.

h.select {|k,v| v > 0}
# =>
# ArgumentError: wrong number of arguments (0 for 1)
#
# method select at line 9
# method __send__ at line 9
# method method_missing at line 9
# at top level  at line 17
# Program exited.
 
p h.sort {|a,b| a[1]<=>b[1]}
# =>
# DRb::DRbConnError: DRb::DRbServerNotFound
#
# method current_server in drb.rb at line 1650
# method to_id  in drb.rb at line 1712
# method initialize in drb.rb at line 1048
# method new  in drb.rb at line 642
# method make_proxy in drb.rb at line 642
# method dump in drb.rb at line 559
# method send_request in drb.rb at line 605
# method send_request in drb.rb at line 906
# method send_message in drb.rb at line 1194
# method method_missing in drb.rb at line 1086
# method open in drb.rb at line 1170
# method method_missing in drb.rb at line 1085
# method with_friend  in drb.rb at line 1103
# method method_missing in drb.rb at line 1084
# method __send__ at line 9
# method method_missing at line 9
# at top level  at line 18
# Program exited.

One last word, about methodmissing, Jay Field wrote an excellent article about dynamically defining the methods of an external class, instead of using methodmissing. It will surely help you debugging your piece of art.

Dynamic Graph Visualization in Flex, Ruby and Amazon SQS – Part 2 – Ruby and SQS

Update: An Amazon guy sent me a mail because I used the old SQS API. Please use the gems from RightScale instead of the one presented below, it supports the new SQS WSDL, which is cheaper.

Thanks to the first part of this article, you master Flex and ActionScript 3 like nobody. Don’t you ?

It’s now time to retrieve some data. You could need a realtime updated graph to monitor a bunch of rackmount servers, your database, your mailbox, or the activity of your friends of Facebook, who knows. After taking a look at the Technorati top blogs list, I wondered what was the activity of readers on those blogs, ie. check how many comments are posted on their last articles.

This article will focus on the data provider, coded in Ruby. And remember, the retrieved data will be sent to an Amazon SQS queue.

I know, why would we need to use Ruby and Amazon SQS when we can do all those tasks directly in ActionScript 3 ? Like I wrote before, you could use it to monitor something which is not directly accessible by Flash, as the ActionScript code is executed on the client computer. Data providers could also be CPU/Network expensive and/or spreaded in a computing cloud. Hey, finally, don’t you want to read some Ruby code ?

Let’s parse those RSS feeds. You’ll need one gem, the “Flexible RSS and Atom reader for Ruby”: simple-rss.

require 'open-uri' # To retrieve data from the word wide web
 
require 'rubygems' # Needed to load installed gems
require 'simple-rss'
 
feeds_url = %w{
  http://feeds.feedburner.com/Techcrunch
  http://feeds.feedburner.com/Mashable
  ... a lot more here ...
}
 
feeds_url.each do |feed_url|
 
  # Parse each feed
  rss = SimpleRSS.parse open(feed_url)
  rss.entries.each do |entry|
 
    begin
      entry.guid << "/" unless entry.guid[-1] == 47
      comments_feed_url = "#{entry.guid}feed"
      rss_comments = SimpleRSS.parse open(comments_feed_url)
      rss_comments.entries.each do |comment|
        puts "Found a new comment #{comment.title} on '#{entry.title}'"
      end
    rescue
      puts "There was a problem retrieving comments from #{comments_feed_url}"
    end
 
  end
end

As I want to keep the code simple, I only chose Wordpress blogs, as you can retrieve the comments feed for each entry easily: adding “/feed” to the end of the entry url.

Whis this code, you’ll get something like:

Found a new comment By: Tim-TechFruit on 'LoveFilm merges with Amazon’s DVD rentals'
Found a new comment By: Esi on 'Cuill has Irish roots'
Found a new comment By: Sam B on 'Women more likely to give passwords for chocolate? Yeah, but did they offer the men guilt-free sex?'
Found a new comment By: john b on 'Women more likely to give passwords for chocolate? Yeah, but did they offer the men guilt-free sex?'
Found a new comment By: Carl on 'Women more likely to give passwords for chocolate? Yeah, but did they offer the men guilt-free sex?'

We said earlier that we wanted to monitor the comments, right ? We will put this code in a infinite loop, and keep the found comments in a hash, just to announce only the new ones.

comments = {} # hash of found comments
 
loop do
  feeds_url.each do |feed_url|
 
    # Parse each feed
    rss = SimpleRSS.parse open(feed_url)
    rss.entries.each do |entry|
 
      begin
        entry.guid << "/" unless entry.guid[-1] == 47
        comments_feed_url = "#{entry.guid}feed"
        rss_comments = SimpleRSS.parse open(comments_feed_url)
        rss_comments.entries.each do |comment|
          if not comments[comment.guid]
            comments[comment.guid] = comment.guid
            puts "Found a new comment #{comment.title} on '#{entry.title}'"
          end
        end
      rescue
        puts "There was a problem retrieving comments from #{comments_feed_url}"
      end
 
    end
  end
 
  puts "... waiting 1 minute ..."
  sleep 60
end

Whis this code, you’ll get something like:

Found a new comment by: Buster Cherry on 'Crazy Gets Evicted!'
Found a new comment by: DONT MAKE FUN OF HOMELESS on 'Crazy Gets Evicted!'
Found a new comment by: joe on 'Crazy Gets Evicted!'
Found a new comment by: Oh, Give Me a Home Where the... on 'Crazy Gets Evicted!'
... waiting 1 minute ...
... waiting 1 minute ...
... waiting 1 minute ...
Found a new comment by: beth on 'Ben Savage Is Still Alive'
Found a new comment by: Dain Starr on 'Ben Savage Is Still Alive'
Found a new comment by: I LOVE BEN SAVAGE!! on 'Ben Savage Is Still Alive'
Found a new comment by: Chuy on 'Ben Savage Is Still Alive'

It would be a good idea to check each new feed in a new thread, as 99% of the processing time is devoted to network requests. We’ll use a new gem for this: fastthread.

require 'fastthread'
 
threads = []
comments = {} # hash of found comments
 
feeds_url.each do |feed_url|
  t = Thread.new do
  loop do
      rss = SimpleRSS.parse open(feed_url)
      rss.entries.each do |entry|
 
        # ...
      end
 
      sleep 60
    end
  end
  threads << t
 
end
 
# Wait for all threads to exit (never, in this case)
threads.each { |t| t.join }

With this code, you’ll see the same output as earlier, but, you guess, with every messages mixed.

Finally, we can put our data in an Amazon SQS queue. In the graph, we’ll show one node per feed, add a child node when a new comment comes in. This way, we only need one message format, only containing the feed URL.

It’s time to register to Amazon SQS. You’ll get two keys: AWS access key and AWS secret access key. You’ll also need to install a new gem: SQS.

Just require ’sqs’ as the other gems, set your credentials, and create a queue named “feeds”.

require 'rubygems'
require 'sqs'
 
SQS.access_key_id = "my key here"
SQS.secret_access_key = "my secret key here"
queue = SQS.create_queue "feeds"

Each time a comment is detected, we can put a message in the queue.

rss_comments.entries.each do |comment|
  if not comments[comment.guid]
    comments[comment.guid] = comment.guid
    puts "Found a new comment #{comment.title} on '#{entry.title}'"
    queue.send_message "#{feed_url}"
  end
end

You can download the full code here.

When you’ve finished playing, you can delete the queue with the following piece of code.

SQS.get_queue("feeds").delete!

You’re now ready to animate this with our Flex graph. It will be the subject of the next and last part of this article. (actually, there will be a fourth post to illustrate all of this, think about a racing game…).

Dynamic Graph Visualization in Flex, Ruby and Amazon SQS – Part 1 – Flex

In this first post, we’ll try to create a animated graph in Flex. Data is retrieved periodically from an Amazon SQS queue and the graph is updated in realtime. Why Amazon SQS ? You work in a distributed environment, don’t you ?

As Ruby is my favourite language these days, the data provider will be implemented in Ruby. Is there a better language to show concise but featureful snippets of code ?

Part 1 is dedicated to Flex. I’m not a real fan of Flash for web apps, but sometimes, you can’t avoid it when you have to impress colleagues… I’m sure you could do the same thing with Silverlight, but as far as I know, my Mac Mini does not really like Microsoft IDEs. In the second part, we’ll retrieve data from a famous API, in Ruby, and put them in an Amazon SQS queue. The last and third part will show you how to interact between Flex and Amazon SQS, the graph will be alive, to make you glad like hell.

You’ll need the Flex 3.0 SDK. For Flex and Air fans out there, you can obviously use Flex Builder 3 Professional.

If you discover the Flex 3.0 SDK, take a look at the Adobe Flex resources website, a gold mine. If you fell lazy, for now, simply follow these instructions to install the Flex SDK.

It’s time to begin coding, simply create a .mxml file. It will contain all our Flex code.

<?xml version="1.0"?>
<mx:Application
    xmlns:mx="http://www.adobe.com/2006/mxml"
    xmlns:adobe="http://www.adobe.com/2006/fc">
</mx:Application>

You can compile and run it with the commands:

mxmlc dynamicgraph.mxml
open dynamicgraph.swf

See this code in action.

To generate graphs, I noticed two Flex components, flexvizgraphlib and SpringGraph. flexvizgraphlib seems promising but misses one mandatory feature for this project: realtime node updates. SpringGraph is a bit old, but it’s well designed and answers our need, so let’s use it. Copy the SpringGraph.swc file to the “/frameworks/libs” folder of your Flex SDK installation, simple.

Let’s create a simple graph:

<?xml version="1.0"?>
  <mx:Application
      xmlns:mx="http://www.adobe.com/2006/mxml"
      xmlns:adobe="http://www.adobe.com/2006/fc"
      initialize="onLoad()">
 
  <adobe:SpringGraph id="springgraph" width="100%" height="100%"
  	backgroundColor="#869ca7" repulsionFactor="1">
    <adobe:itemRenderer>
      <mx:Component>
          <mx:Label fontSize="14" text="{data.id}" color="#ffffff"/>
       </mx:Component>
    </adobe:itemRenderer>
  </adobe:SpringGraph>
 
  <mx:Script>
    <![CDATA[
      import com.adobe.flex.extras.controls.springgraph.Graph;
      import com.adobe.flex.extras.controls.springgraph.Item;
 
      private var graph: Graph = new Graph();
      private var rootItem: Item;
 
      private function onLoad(): void {
        addNode("R");
        addNode("1", "R");
        addNode("1.1", "1");
        addNode("1.2", "1");
        addNode("1.3", "1");
        addNode("2", "R");
        addNode("2.1", "2");
        addNode("2.2", "2");
      }
 
      private function addNode(id:String, linkedTo:String = null): void {
        var item: Item = new Item(id);
        graph.add(item);
        if(linkedTo)
          graph.link(graph.find(linkedTo), item);
        springgraph.dataProvider = graph;
      }
    ]]>
  </mx:Script>
 
</mx:Application>

See this code in action.

It seems to be a lot of code, but it’s dead simple. You’re smart, just read it and you’ll understand it.

So, we’ve just created an animated 8-nodes graph, and we can even drag nodes and pan the whole graph with the mouse. Remember, the objective of this project is to retrieve data from an Amazon SQS queue. We want this to be fluid, so we won’t update the graph more than x times per second. Let’s implement a Timer which will periodically add nodes to our graph.

import com.adobe.flex.extras.controls.springgraph.Graph;
import com.adobe.flex.extras.controls.springgraph.Item;
import flash.utils.Timer;
import flash.events.TimerEvent;
 
private var graph: Graph = new Graph();
private var rootItem: Item;
private var timer: Timer = new Timer(300, 0);
private var itemCount: int = 0;
 
private function onLoad(): void {
  addNode("R")
  timer.addEventListener("timer", timerHandler);
  timer.start();
}
 
public function timerHandler(event:TimerEvent):void {
  addNode(new Number(++itemCount).toString(), "R");
 
  if(itemCount == 30)
    timer.stop();
}
 
private function addNode(id:String, linkedTo:String = null): void {
   (...)
}

See this code in action.

This code creates 30 nodes, every 300 milliseconds. And to reward your hard work, you can still play with the nodes while the graph is updated. We can now put the finishing touches, special effects. Let’s blur new nodes and play a funny sound as they appear.

Add this XML code after the <adobe:SpringGraph/> tag definition:

  <mx:Parallel id="newItemEffect">
    <mx:SoundEffect source="@Embed(source='assets/bloop.mp3')"/>
    <mx:Blur duration="200"
      blurXFrom="10.0" blurXTo="0.0"
      blurYFrom="10.0" blurYTo="0.0"/>
  </mx:Parallel>

You can then simply associate this effect to “new-node” events:

private function onLoad(): void {
  addNode("R")
  timer.addEventListener("timer", timerHandler);
  timer.start();
  springgraph.addItemEffect = newItemEffect;
}

See this code in action.

Of course, all credits go to Mark Shepherd, for providing this wonderful library and its source code.

If you want to know more about Flex, especially making a Rails app powered by a Flex UI, go buy Flexible Rails: Flex 3 on Rails 2 ! French readers, you can get it here. I bought this book 2 years ago, when it still was a beta PDF. Peter Armstrong then found a paperback editor, and you’ll easily understand why if you read it.

Check back soon for the next part.