Archive for the ‘software development’ Category

ThruDB for Rails? ActiveDocument

Thursday, January 10th, 2008

Since Matt Knox talked about ThruDB on last tuesday’s meeting of NYC.rb, my brain has been thinking about document-oriented databases, about how tired I am of SQL, about how tired I am of trying to scale database servers, about how tempting is to have more flexible models and data structures, and about how tempting it is to have a clear and simple scalability path.

The samples included in the ThruDB tutorial are, to be honest, ugly. But they are designed to show how thrift provides language-agnostic data types and how ThruDB can be accessed from different languages.

However, I have several ideas in my head about how to implement something I’m calling, for the time being, ActiveDocument. It won’t be a direct replacement for ActiveRecord, but it will have similar features (i.e. validations and callback hooks) and it will allow for very simple usage of ThruDB. I might later add support for CouchDB, SimpleDB and other similar technologies, but just like Rails doesn’t try to be a full database server abstraction, your ActiveDocument code will not work on different servers unless it’s limited to very simple operations. The world of document-oriented databases is even less standardized than relational database servers.

Here’s a little look at how it might look:

class User < ActiveDocument::Model
  attribute :login, :string, :indexed, :sortable
  attribute :email, :string, :indexed
  attribute :created_on, :datetime
  attribute :password, :string
  has_many :bookmarks
end

class Bookmark < ActiveDocument::Model
  attribute :title, :string, :indexed
  attribute :url, :string, :indexed
  belongs_to :user
end

User.find_by_login("sd")
User.find(:all, :conditions => “login:’s*’ AND created_at :[20071201 TO 20080115]”)

As you can see, the two biggest differences from plain old ActiveRecord is that the model will have to define it’s own schema, and that queries will use the Lucene Syntax

Relationships would be defined using fields with lists of IDs, and queried using Lucene’s fast indexes. This might make models too big when they have a large number of related objects, but that’s a problem to be solved later.

Since document-oriented databases have no concept of joins, some queries will be definitely slower than their SQL counterparts, having to make multiple calls to the server to retrieve individual objects. However, each one of those calls would be simpler and easier to cache, which I hope will reduce the performance impact. And as long as it’s not 100 times slower, I’m willing to trade off some performance for the promise of infinite scalability.

And since the models will be more flexible, you can probably skip a lot of traditional SQL tables and store the data directly into the model itself. For example, users can have preference arrays or hashes, which would have been separate tables in SQL but that are just additional attributes in ThruDB.

Speaking of attributes. ThruDB uses thrift for its own API, and the tutorials suggest using it to encode the documents themselves, but the API doesn’t require that. I’ve been trying to figure out how to encode a thrift object along with it’s own class name, to make it easier to decode afterwards, specially when performing polymorfic queries. Perhaps I’ll have to use double encoding, with an envelope thrift object containing the class name and the encoded string. Or perhaps I’ll use YAML to encode an attribute hash. YAML is tempting because it will allow for more complex objects and for dynamic schemas (i.e. an attribute that’s a hash of hashes containing values of different types).

Anyway, I’m starting to write the code, and it looks like it might be possible to have some working prototype a lot sooner than I though possible at first.

If you’re interested, just drop me a note, leave a comment, send me an email or look for me as ’sd’ on Freenode’s #nyc.rb.

Erlang, The Ringtone

Tuesday, October 24th, 2006

If you did not attend RubyConf 2006, then please see this movie first.

If you did attend the conference, or you have seen “Erlang, The Movie” before, then this needs no other explanation:

Erlang, The Ringtone.mp3

Joel is wrong

Saturday, September 2nd, 2006

Joel is wrong when he says that you should pick a safe language (Java, C#, PHP or maybe python) if “someone is going to get fired”.

He almost got it right: You should go with a safe language if you’re afraid of being fired for picking the wrong language.

And in fact, in that case, I can’t understand why would you pick PHP or python. VisualBasic, sure, but using any other scripting language is probably a career-killing.

Two thoughts on enterprise software

Friday, September 1st, 2006

Thought #1: “enterprise software development” is that particular kind of software development where the process matters more than the results.

Thought #2: “enterprise software” it’s the same category that includes those billions of lines of COBOL code that used just two digits to store year values.