Archive for the ‘ruby’ Category

ThruDB for Rails? ActiveDocument

Thursday, January 10th, 2008

Since Matt Knox talked about ThruDB on last tuesday’s meeting of NYC.rb, my brain has been thinking about document-oriented databases, about how tired I am of SQL, about how tired I am of trying to scale database servers, about how tempting is to have more flexible models and data structures, and about how tempting it is to have a clear and simple scalability path.

The samples included in the ThruDB tutorial are, to be honest, ugly. But they are designed to show how thrift provides language-agnostic data types and how ThruDB can be accessed from different languages.

However, I have several ideas in my head about how to implement something I’m calling, for the time being, ActiveDocument. It won’t be a direct replacement for ActiveRecord, but it will have similar features (i.e. validations and callback hooks) and it will allow for very simple usage of ThruDB. I might later add support for CouchDB, SimpleDB and other similar technologies, but just like Rails doesn’t try to be a full database server abstraction, your ActiveDocument code will not work on different servers unless it’s limited to very simple operations. The world of document-oriented databases is even less standardized than relational database servers.

Here’s a little look at how it might look:

class User < ActiveDocument::Model
  attribute :login, :string, :indexed, :sortable
  attribute :email, :string, :indexed
  attribute :created_on, :datetime
  attribute :password, :string
  has_many :bookmarks
end

class Bookmark < ActiveDocument::Model
  attribute :title, :string, :indexed
  attribute :url, :string, :indexed
  belongs_to :user
end

User.find_by_login("sd")
User.find(:all, :conditions => “login:’s*’ AND created_at :[20071201 TO 20080115]”)

As you can see, the two biggest differences from plain old ActiveRecord is that the model will have to define it’s own schema, and that queries will use the Lucene Syntax

Relationships would be defined using fields with lists of IDs, and queried using Lucene’s fast indexes. This might make models too big when they have a large number of related objects, but that’s a problem to be solved later.

Since document-oriented databases have no concept of joins, some queries will be definitely slower than their SQL counterparts, having to make multiple calls to the server to retrieve individual objects. However, each one of those calls would be simpler and easier to cache, which I hope will reduce the performance impact. And as long as it’s not 100 times slower, I’m willing to trade off some performance for the promise of infinite scalability.

And since the models will be more flexible, you can probably skip a lot of traditional SQL tables and store the data directly into the model itself. For example, users can have preference arrays or hashes, which would have been separate tables in SQL but that are just additional attributes in ThruDB.

Speaking of attributes. ThruDB uses thrift for its own API, and the tutorials suggest using it to encode the documents themselves, but the API doesn’t require that. I’ve been trying to figure out how to encode a thrift object along with it’s own class name, to make it easier to decode afterwards, specially when performing polymorfic queries. Perhaps I’ll have to use double encoding, with an envelope thrift object containing the class name and the encoded string. Or perhaps I’ll use YAML to encode an attribute hash. YAML is tempting because it will allow for more complex objects and for dynamic schemas (i.e. an attribute that’s a hash of hashes containing values of different types).

Anyway, I’m starting to write the code, and it looks like it might be possible to have some working prototype a lot sooner than I though possible at first.

If you’re interested, just drop me a note, leave a comment, send me an email or look for me as ’sd’ on Freenode’s #nyc.rb.

Ruby, Leopard and gems

Thursday, October 25th, 2007

In case you have been sleeping in the same cave as Osama Bin Laden, Apple’s new OS X Leopard includes Ruby as a first-class language.

But Apple’s effort to make the language and all it’s extensions universal binaries can cause you some trouble when installing gems that require compilation.

If you’re installing on an Intel machine and see an error like “ld: symbol(s) not found for architecture ppc”, you probably are installing a gem that requires an external library, for which you only have the i386 version. This is a typical situation when installing mysql (as noted in the troubleshooting page of the MacOSForge wiki for Ruby).

After trying several variations, I came out with this solution:

If your installation command was

sudo gem install mysql

you need to run it as

sudo bash -c "ARCHFLAGS='-arch i386' gem install mysql"

sudo env ARCHFLAGS="-arch i386" gem install mysql

There you go… that should be all you need to install the mysql gem on Leopard against MySQL’s prepackaged binaries.

UPDATE: The troubleshooting page has been updated to include an alternative: using “sudo -s” to start a root shell. I still like my one-liner better :-)

UPDATE 2: Using env instead of bash is slightly cleaner.

UPDATE 3: MySQL still has some problems, because the library is pointing to the wrong direction. The quick solution is to create a link to the right place:

sudo ln -s /usr/local/bin/mysql/lib /usr/local/bin/mysql/lib/mysql

Ruby and Lisp, sitting in a tree…

Saturday, January 13th, 2007

(I just submitted this story to Slashdot, but I didn’t want to see this masterpiece get lost in the bowels of their submission queue, so I’m also posting it here. Update: it got accepted)

The developers of Rubinius, an experimental Ruby interpreter inspired by SmallTalk, have been discussing the possibility of adding a Lisp dialect to their VM. Pat Eyler collected some ideas and opinions from the people involved and it makes for some interesting reading.

For many, Ruby already is an acceptable Lisp, and the language itself started as a perlification of Lisp (even Matz says so) so it is perhaps fitting and might help explain why the whole idea feels right.

Now, if someone added support for VB and gave it the respect it deserves, the world would be a better place.

pm: Print Methods

Thursday, November 9th, 2006

Ruby has a very convenient method to inspect objects: “p”. It just prints the result of “inspect”. And it’s exactly what irb uses to show the result of each expression.

Anyway, the cool guys at projectionist just posted a little method of theirs called “m”, which provides easy access to an object’s methods.

That made me remember my old (well, not that old) “pm” method for irb, which even if I haven’t talked about here, I’ve made public at dotfiles as part of my irbirc (it’s the last method).

Anyway, looking at their implementation, I decided to polish mine and release it here:

ANSI_RESET        = "33[0m"
ANSI_BOLD         = "33[1m"
ANSI_GRAY         = "33[1;30m"
ANSI_LGRAY        = "33[0;37m"

def pm(obj, *options) # Print methods
  methods = obj.methods - (options.include?(:more) ? [] : Object.methods)
  filter = options.select {|opt| opt.kind_of? Regexp}.first
  methods = methods.select {|name| name =~ filter} if filter

  data = methods.sort.collect do |name|
    method = obj.method(name)
    args = "(" + case method.arity <=> 0
    when 1
      (”a”..(?a + method.arity - 1).chr).to_a.join(”, “)
    when -1
      (”a”..(?a - method.arity - 1).chr).to_a.join(”, “)
    else
      “”
    end + “)”
    klass = $1 if method.inspect =~ /Method: (.*?)#/
    klass = $1 if klass =~ /((.*?))/
    [name, args, klass]
  end
  max_name_length = data.collect {|item| item[0].size}.max
  max_args_length = data.collect {|item| item[1].size}.max
  data.each do |item|
    print ” #{ANSI_BOLD}#{item[0].rjust(max_name_length)}#{ANSI_RESET}”
    print “#{ANSI_GRAY}#{item[1].ljust(max_args_length)}#{ANSI_RESET}”
    print ”   #{ANSI_LGRAY}#{item[2]}#{ANSI_RESET} n”
  end
  data.size
end

Don’t try to understand it unless you can understand it :-)… just copy it to your .irbrc (you do have an irbrc file, don’t you?). And use it like this:

pm "a"

pm "a", :more

pm "a", /regexp/

Rails script/server and terminal windows

Wednesday, November 1st, 2006

Here’s a little hack for those of you that run Rails’ script/server on its own window or tab.

By inserting a couple of lines into the server script, you can have it change the title of the window or tab it’s running on, making it a lot easier to look for it when you have lots of windows open.

Server Tab

All you need is to change “script/server” so it looks like this:

#!/usr/bin/ruby
print 33]2;Rails Server07 # xterm window title
print 33]1;Rails Server07 # screen/iterm tab title

require File.dirname(__FILE__) + /../config/boot
require commands/server

print 33]1; 07 # screen/iterm tab title
print 33]2; 07 # xterm window title

The second set of prints will clear the title after the server terminates. You might want to adjust it to suit your needs.

Joel is wrong

Saturday, September 2nd, 2006

Joel is wrong when he says that you should pick a safe language (Java, C#, PHP or maybe python) if “someone is going to get fired”.

He almost got it right: You should go with a safe language if you’re afraid of being fired for picking the wrong language.

And in fact, in that case, I can’t understand why would you pick PHP or python. VisualBasic, sure, but using any other scripting language is probably a career-killing.

The price of a rescue

Thursday, August 31st, 2006

I’ve been abusing Ruby exception handling lately. Instead of doing something like:

if user and user.group
  user.group.do_something
end

or a cleaner

user and user.group and user.group.do_something

I just do

user.group.do_something rescue nil

And let Ruby deal with the consequences of a missing link.

It was all nice and pretty, until I saw this little guy standing in my shoulder — dressed in white and with a halo over his head — telling me that laziness is wrong and that there would be a price to pay. After all, exception handling requires setting up flags and pointers and blocks and contexts and who knows what else. They must be expensive. And of course, there was another guy — red, with horns and a tail — calling the white guy an asshole and a coward.

In the end I decided to stop taking drugs, and to do my own research into the subject, instead of paying attention to my own hallucinations.

I wrote the simplest test cases, and used the ‘benchmark’ library to measure their performance:

require 'benchmark'

n = 500000
Benchmark.bm(7) do |x|
  x.report("plain") do
    for i in 1..n
      1.0/5.0
    end
  end

  x.report("safe") do
    for i in 1..n
      begin
        1.0/5.0
      rescue
        0.2
      end
    end
  end

  x.report("rescue") do
    for i in 1..n
      begin
        1.0/0.0
      rescue
        0.2
      end
    end
  end
end

It does half-a-million floating point divisions. The first scenario (”plain”) has no exception handling. The second one (”safe”) has exception handling, but never raises an exception. And the final scenario (”rescue”) uses exception handling all the time, triggered by a division by zero.

The results are something like this:

             user     system      total        real
plain    0.780000   0.010000   0.790000 (  1.769231)
safe     0.910000   0.020000   0.930000 (  1.781405)
rescue   0.900000   0.010000   0.910000 (  1.945978)

I know Zed will probably kill me for making any conclusions based on such a small sample (the half-a-million divisions are not the sample size… the sample size is one single run of the test). But the numbers are pretty much the same when you run the tests several times.

Using rescue is not much more expensive than running naked. In my tests in particular, it never was more than 5% slower. It might even be cheaper than multiple tests.

It’s fast enough for me not to worry about it, specially considering the fact that my own operations are probably going to be a lot slower than a simple division, thus diluting any slowdown even further.

If you really care about this, then by all means make your own tests and take your own conclusions. Otherwise, just trust the guy dressed in red, with the horns and the pointy tail: use and abuse rescue until you’re tired of it; it’s not worth worrying about its performance impact.

URLs on Rails

Friday, July 7th, 2006

One of the parts of rails that some people consider “ugly” and go to greath lenghts to “clean up” is the use of numeric ids in URLs: /accounts/edit/12.

URLs are considered extremely valuable real estate. Not only because users have to see them all the time, but also because search engines give them a lot of weight: since it’s a “limited resource” where you can only include a few keywords, you better use the keywords that matter most.

Rails does an excellent effort to help you use nice and clean URLs, but it stops at the :id. And it does this for one good reason. If it were to use, say, a user’s login instead of it’s numeric ID, then when a user changed his login, old URLs would no longer be valid. Yeah, I can hear you say that users don’t change logins, but what about a blog entry title? or a person’s full name? or a project name? As soon as you use some user-editable piece of information in your URLs you create the problem of state URLs, and that’s even worse than ugly URLs.

But there is a very simple solution. Use both a permanent id and a nicer textual description. Instead of an ugly /accounts/edit/12 or a perishable /accounts/edit/john-doe why not use /accounts/edit/12-john-doe. Your code has the “12″ it needs to look for the user, even if it later changed his name to “john-d-doe”, and your users and search spiders have the “john-doe” to feast on.

Implementing this is extremely simple, because Rails treats :id as a special parameter in routes. It’s specialness comes from the fact that it would try to call the to_param method on any object passed when creating URLs. That’s why url_for :id => @account is equivalent to url_for :id => @account.id, because ActiveRecord model’s have a default to_param that returns the id of the object.

All you need to do is define your own to_param for your models, and make sure you don’t explicitly include the .id in your url_fors and link_tos, because then you would be skipping your own to_param call.

class Account < ActiveRecord::Base
  def to_param
    "#{id}-#{full_name.gsub(/[^a-z1-9]+/i, '-')}"
  end
end

The second part of this solution is, of course, making sure your actions can handle these extended :ids. The smart ones amongst you would immediately think about monkey patching ActiveRecord’s find to clean up the parameters in calls like Account.find(params[:id]). But the not so stupid way would be to forget about dealing with this, giving it a try and look surprised when it works miraculosly as expected.

See, this is just a coincidence; it was not designed as part of Rails (or DHH wouldn’t have looked surprised when I brought this up at RailsConf). It’s just that Rails will pass your long :id string to the database server, which, on seeing that the id column is actually an integer, will try to convert the parameter to a number before using it, and it happens that such conversion will just use any numerical characters it finds and drop the rest, thus converting “12-john-doe” into plain 12. See, accidental behaviour over configuration; what can be better than that?

Of course, you might want to add a couple of unit tests just to make sure whatever database server you’re using behaves in this particular way. I’m not sure if that’s part of the SQL-92 standard, but I would be surprised if any major database server works differently.

So there you have it, now go add useful information to your URLs, like “asbestos-mesothelioma-canada-drugs-viagra-ambien”.

Oh, and in case you were wondering, hyphens/dashes (-) work better as word separators than any other characters. Google will match “canada drugs” against an URL like canada_drugs, but it won’t match “canada” alone. If you use hyphens, as in “canada-drugs”, then it considers them as separate, independent words.

Update: Some good points have been raised in the comments.

First, is that Aristotle Pagaltzis brought up pretty much this same argument more than six months ago.

Second, is that you might want to use redirects from any partially valid (i.e. the ID is correct, but not the rest of the slug) to the “official” URL. This is simple to implement, but requires some extra code on each controller.

Third, is that some database servers do not perform type coercion, and might get very angry if you don’t pass an integer for your id queries. Postgres was cited as an example. The solution for this is extremelly simple: just make sure this code is executed when your application starts (i.e. put it in lib and require it from environment.rb, make it a plugin, etc)

class ActiveRecord::Base
  def self.find_from_ids_with_coercion(id, options)
    find_from_ids_without_coercion(id.to_i, options)
  end
  alias_method :find_from_ids_without_coercion, :find_from_ids
  alias_method :find_from_ids, :find_from_ids_with_coercion
end

No need to try

Thursday, July 6th, 2006

I previously talked about our try method as an easy way of dealing with exceptions inside expressions:

puts try {patient.name.first_name} || "-- no name --"

Otherwise, you would have to set up your own begin / rescue / end blocks:

puts begin
  patient.name.first_name
rescue
  "-- no name --"
end

which would have been too ugly for our modern sentitivities.

I was wrong. I should have known it.

It turns out that ruby lets you use rescue without a begin. This is most commonly seen in exception handling for methods:

def my_method
  # do something
rescue
  # handle exception
end

It also turns out the ruby lets you use rescue as a postfix modifier for an expression, just like all those if and unless. So you can write something like:

puts patient.name.first_name rescue puts "-- no name --"

And it also happens that “inline rescues” work inside parenthesis. So you can do something like:

puts (patient.name.first_name rescue "-- no name --")

Which is exactly the use-case for our original try method.

So now that I had a piece of code blessed by _why, I’ll have to get rid of it or risk not beeing rubyish enough. You live and you learn.

One of, a pocket case

Friday, June 30th, 2006

Another one of our snippets of code (all collected in a ‘private’ plugin called, surprisingly, ’snippets’) is one_of.

Suppose you have an html form, with a select that let your users specify the sort order for your index page. Said order can take a value of “name”, “name desc”, “date” or “date desc”. The more observant among you are probably yelling “SQL injection!!!, SQL injection!!!”, because even if your select only has those four options, nothing can prevent a malicious script kiddie to modify the request and ask for sort_order=your momma or perhaps sort_order=name; delete from users.

So smart web developers (or dumb web developers smart enough to remember when their sites got hacked because of a SQL injection) validate anything that comes from the outside world.

Our snippet, one_of, provides a simple way to do a very common validation: making sure a value is one of a given set of possible values.

class Object
  # Makes sure the value is "one of" the list given,
  # otherwise returns the first value from the list
  def one_of(*args)
    args = args[0] if args.size == 1 and args[0].kind_of? Array

    (args.include?(self) && self) || args[0]
  end
end

It’s use is exemplified, for example, in the following examples:

query[:sort] = params[:sort_order].one_of("name", "name desc",
                                                           "date", "date desc")
answer = answer.one_of("yes", "no")
answer = nil.one_of("yes", "no")  # => "yes"

This method serves it’s purpose well, so we haven’t had a need to enhance it. But I can easily think of a couple of improvements, making it more case-like.

class Object
  # Makes sure the value is "one of" the list given,
  # otherwise returns the first value from the list
  def one_of(*args)
    args = args[0] if args.size == 1 and args[0].kind_of? Array

    args.each do |arg|
      return (arg.kind_of?(Regexp) ? $1 : self) if arg === self
    end
    args.first
  end
end

See? Now you can do thinks like:

"abc".one_of(String, Numeric) # => "abc"
"abc".one_of("none", /([bc]+)/)  # => "bc"

It works almost like a pocket-sized version of case, with the default being first instead of last.

But be careful. Since the default value is the first one in the array, you might end up with a Class object, or a Regexp.