Posts Tagged ‘DRb’

Be cool with DRb, it's far from "scalable"

When you begin to learn DRb, you quickly land on the famous Chad Fowler page, entitled “Intro to DRb”.

The “Concurrency” chapter is particularly interesting when you want to make a local resource available in the wild, allowing one request at a time on your resource.

So, with DRb, a dash of method_missing and a pinch of mutex, you have a perfect recipe to remotely access and protect your resource… but: DRb is not what we use to call a high availability entry point.

Let’s code the proof. Here is the code of the DRb server, it simulates a long and expensive task:

require 'drb'
 
class Server
 
  def initialize()
    @i = 0
    @mutex = Mutex.new
  end
 
  def method_missing(name, *args)
    @mutex.synchronize do
      @i += 1
      p @i
      sleep 1 # You CPU works very hard here..
      end
    @i
  end
 
end
 
server = DRb.start_service("druby://:34100", Server.new)
p "listening"
server.thread.join

You can now start the server via the ruby server.rb command and begin to code the client. It creates 100 processes, each one calling the DRb server and writing the response on the standard output:

require 'drb'
 
client = DRbObject.new(nil, "druby://:34100")
 
pids = []
 
100.times do
  pids << fork {
  p "#{Time.now}: #{client.call}"
}
end
 
p "#{Time.now}: created the 100 processes"
 
pids.each { |pid| Process.waitpid(pid) }
 
p "#{Time.now}: done"

Launch the client and see what happen:

"Sat Oct 05 11:54:17 +0200 2008: 2"
"Sat Oct 05 11:54:17 +0200 2008: 3"
"Sat Oct 05 11:54:20 +0200 2008: created the 100 processes"
"Sat Oct 05 11:54:17 +0200 2008: 4"
"Sat Oct 05 11:54:17 +0200 2008: 5"
"Sat Oct 05 11:54:17 +0200 2008: 6"
"Sat Oct 05 11:54:17 +0200 2008: 7"
"Sat Oct 05 11:54:17 +0200 2008: 8"
"Sat Oct 05 11:54:17 +0200 2008: 9"
"Sat Oct 05 11:54:18 +0200 2008: 10"
.......
"Sat Oct 05 11:54:18 +0200 2008: 59"
"Sat Oct 05 11:54:18 +0200 2008: 60"
"Sat Oct 05 11:54:19 +0200 2008: 61"
"Sat Oct 05 11:54:19 +0200 2008: 61"
DRb::DRbConnError: druby://:34100 - #
 
method open	in drb.rb at line 736
method each	in drb.rb at line 729
method open	in drb.rb at line 729
method initialize	in drb.rb at line 1189
method new	in drb.rb at line 1169
method open	in drb.rb at line 1169
method method_missing	in drb.rb at line 1085
method with_friend	in drb.rb at line 1103
method method_missing	in drb.rb at line 1084
at top level	in client.rb at line 9
method fork	in client.rb at line 8
at top level	in client.rb at line 8
method times	in client.rb at line 7
at top level	in client.rb at line 7

Almost the 100 processes are created before the DRb server returned its first calculation. After several tries on a PowerBook and an EC2 instance, the DRb server rejects any new client after about 65 simultaneous requests.

But, it’s important to mention that the DRb server did not crash at all. You simply have to wait that it handles the remaining requests.

DRb is pretty good, allowing the Ruby developers to code remote services in no time.

If you need high availability services, your next step could be REST servers, distributed/dispatched thanks to HAProxy or nginx. And of course, you should also take a look at Erlang.

Local resource available in the wild, thanks to DRb

You have a resource that you want to share between multiple processes, and it could be a resource persited on the local hard drive, like an index, a persitent hash (Berkeley DB, InfinitiyDB), or simply a file.

With DRb, aka Distributed Ruby, you can share a resource via TCP. DRb will do the annoying job for you: marshalling. And that is COOL, and RMI is NOT COOL.

As usual in Ruby, using a library is as simple as calling the require method. To use DRb in your application, write this:

require 'drb'

In this post, we’ll implement a Remote Hash. It will be accessible to an unlimited number of processes on an unlimited number of computers. Let’s code a simple DRb server for your resource.

class Server
  def start
    print "starting Ferret servers..."
    DRb.start_service("druby://localhost:7000", HashProxy.new)
    puts " done"
  end
 
  def join
    DRb.thread.join
  end
 
  def shutdown
    print "stopping Ferret servers..."
    DRb.stop_service
    puts " done"
  end
end
 
s = Server.new
s.start
trap("INT") {s.shutdown} # Catch CTRL-C to do a clean shutdown of the DRb server
s.join

The instance of HashProxy will be the distributed object between the DRb server and the DRb clients. We call it “proxy” because it will exactly have the same behaviour as the real resource hidden behind it. This is where the method_missing magic happen.

class HashProxy
  def initialize *args
    @local_resource = Hash.new *args
  end
 
  def method_missing(name, *args, &block)
    @local_resource.__send__(name, *args, &block)
  end
end

The Object.send method is an alias to Object.send, to avoid conflics with a possibly existing method named send in the current object or its superclasses or included modules.

There is one problem with this implementation, DRb will, like a web server, handle client requests simultaneously. We have to protect our hash thanks to a mutex. Every clients will have to wait in line to access the remote resource.

require 'thread'
 
class HashProxy
  def initialize *args
    @mutex = Mutex.new
    @local_resource = Hash.new *args
  end
 
  def method_missing(name, *args, &block)
    @mutex.synchronize do
      @local_resource.__send__(name, *args, &block)
    end
  end
end

As we said earlier, each method of HashProxy, and so each method of Hash, is now available to any remote Ruby code, using the HashProxy class, instead of Hash:

class RemoteHash
  def initialize
    @hash_proxy = DRbObject.new(nil,"druby://localhost:7000")
  end
 
  def method_missing(name, *args, &block)
    @hash_proxy.__send__(name, *args, &block)
  end
end

In your application, you’ll use your remote resource like a local resource, without knowing about those network and marshalling things.

h = RemoteHash.new
h[:roger] = 1
h[:moore] = -1
p h[:roger] # => 1

Unfortunately, I couldn’t call methods with blocks.

h.select {|k,v| v > 0}
# =>
# ArgumentError: wrong number of arguments (0 for 1)
#
# method select at line 9
# method __send__ at line 9
# method method_missing at line 9
# at top level  at line 17
# Program exited.
 
p h.sort {|a,b| a[1]<=>b[1]}
# =>
# DRb::DRbConnError: DRb::DRbServerNotFound
#
# method current_server in drb.rb at line 1650
# method to_id  in drb.rb at line 1712
# method initialize in drb.rb at line 1048
# method new  in drb.rb at line 642
# method make_proxy in drb.rb at line 642
# method dump in drb.rb at line 559
# method send_request in drb.rb at line 605
# method send_request in drb.rb at line 906
# method send_message in drb.rb at line 1194
# method method_missing in drb.rb at line 1086
# method open in drb.rb at line 1170
# method method_missing in drb.rb at line 1085
# method with_friend  in drb.rb at line 1103
# method method_missing in drb.rb at line 1084
# method __send__ at line 9
# method method_missing at line 9
# at top level  at line 18
# Program exited.

One last word, about methodmissing, Jay Field wrote an excellent article about dynamically defining the methods of an external class, instead of using methodmissing. It will surely help you debugging your piece of art.