A collection of computer systems and programming tips that you may find useful.
 
Brought to you by Craic Computing LLC, a bioinformatics consulting company.

Friday, April 24, 2009

Ruby 1.9 and String Encoding

Ruby 1.9 implements a load of Internationalization features, which is great, but I just ran into one unfortunate side effect of that.

I work with large text files representing DNA sequences, patents, etc. These are typically plain ASCII text and that is how I treat them. Under Ruby 1.8 everything seemed fine. But running the same code on a 2GB text file I got this error:

$ ./test.rb myfile
./test.rb:9:in `block in <main>': invalid byte sequence in US-ASCII (ArgumentError)
from ./test.rb:3:in `each_line'
from ./test.rb:3:in `<main>'

Here is code that gave rise to that:
#!/usr/bin/env ruby
open(ARGV[0], 'r').each_line do |line|
if line =~ />(\S+)/
puts line
end
end

Somewhere in the middle of the input file is a non-ASCII character and Ruby 1.9 won't take it. It turns out that 1.9 takes a much stricter line on interpreting text. Unless you tell it otherwise, it expects plain ASCII and anything else is an error. 1.8 just took what you gave it.

If you know you will be reading UTF-8 or ISO-8859-1 text then you can explicitly tell your script to handle it. There are several ways to do this but in this simple example you can change the 'r' in the open statement like this:
#!/usr/bin/env ruby
open(ARGV[0], 'r:utf-8').each_line do |line|

That's OK if you know the encoding, but in my work I see occasional non-ASCII characters, such as German umlauts, that have crept into public data files that I work with. I don't know what to expect and I don't want to clutter my code with rescue clauses to handle all possibilities.

The solution for my problem is to treat the text as binary by using the 'rb' modifier in the File.open statement. I can still process text data line by line but Ruby will swallow non-ASCII characters. So this version of the code takes the input data with no problems:
#!/usr/bin/env ruby
open(ARGV[0], 'rb').each_line do |line|
if line =~ />(\S+)/
puts line
end
end

My problem stemmed from two umlaut characters buried deep in the file. To figure out which lines were causing the problem I used this variant of the code to output bad lines.
#!/usr/bin/env ruby
open(ARGV[0], 'r').each_line do |line|
begin
if line =~ />(\S+)/
end
rescue
puts line
end
end

Look up the issue and you'll find plenty of debate on the merits or otherwise of this new feature in 1.9. It took me by surprise.


 

Syntax Highlighting in Blogger

I've figured out how to display blocks of code with nice syntax highlighting in these posts. There are all sorts of Javascript/CSS combos out there for doing this. I've chosen to go with google-code-prettify by Mike Samuel. Here are the steps needed to get it working in Blogger.

1: Add the script and css to your page layout
In your Blogger account go to 'Create Post' and then to the 'Layout' tab. You will see the page elements laid out. Click 'Edit Html' in the second row of tabs.

You're going to add two lines at the end of the HEAD section just before the </head> tag. The first one pulls in the CSS file from the repository and the second pulls in the Javascript code.
<link href='http://google-code-prettify.googlecode.com/svn/trunk/src/prettify.css'
rel='stylesheet' type='text/css'/>
<script src='http://google-code-prettify.googlecode.com/svn/trunk/src/prettify.js'
type='text/javascript'/>
</head>

If you wanted to avoid the calls out to that site you could just inline the content of those files.

Immediately below the </head> tag, change <body> to:
<body onload='prettyPrint()'>

This loads the JavaScript when the page loads. Save the template and go to 'Posting' -> 'Create' to create a test post.

2: Create a test post
Enter in some text into the new post and then enter a block of code. The prettifier can figure out most languages. For example:
['apple', 'orange', 'banana'].each do |fruit|
puts fruit
end

Go into the 'Edit Html' tab and put <pre> tags around your code. Add the special class as shown:
<pre class='prettyprint'>
['apple', 'orange', 'banana'].each do |fruit|
puts fruit
end
</pre>

Unfortunately syntax highlighting does not work in Preview mode or the Compose window. You just have to bite the bullet and publish your post to see what it looks like. And it should look something like the code blocks in this post.

To represent blocks of HTML code or anything with angle brackets you need to change the brackets to &lt; and &gt;, otherwise Blogger will try and treat them as html tags.

You can help the highlighter out by specifying the specific language to use by adding a second class to the pre tag such as:
<pre class="prettyprint lang-html">

You can check out the prettify JavaScript file to see which languages have specific classes like this. But it does a pretty good job even without that. You can use <pre> or <code> tags this way. The only difference is that the <code> version omits the surrounding box.

This is pretty slick. The approach ought to work with other syntax highlighters. If you try them let me know what you think in the comments. Now I need to go back in time and add this feature to all my old posts.



 

Installing Ruby 1.9 and gems on a machine with Ruby 1.8

I've run into problems in the past with multiple versions of Ruby on one machine, such as not finding gems, etc. Here are the steps I took to install Ruby 1.9.1 from source on a machine that already had Ruby 1.8.6 installed. Specifically this was on a Fedora 8 machine that had Ruby 1.8 installed from the ruby-devel rpm, which places it in /usr/lib.

To stir things up a bit more, some gems are not yet compatible with Ruby 1.9 (This post is written in April 2009). I'll address those later on. Hopefully those problems will go away over the next few months.

1: Capture a list of the Ruby gems that you have installed on your machine.
We'll use this later to reinstall the gems under 1.9.
# gem list --local --no-versions > gem_list

2: Fetch the Ruby 1.9 source
Download from ruby-lang.org into a staging directory. Pick their recommended version.
# wget ftp://ftp.ruby-lang.org/pub/ruby/1.9/ruby-1.9.1-p0.tar.gz

3: Compile and install
The defaults will install it in /usr/local/lib/ruby. If you already have an existing ruby installed there then you might want move it out of the way.
# ./configure
# make
# make install

4: Check the new version
Make sure that you have /usr/local/bin at the start of your PATH. This line in your .bashrc file will do that.
export PATH=/usr/local/bin:$PATH

Check the version of Ruby:
# which ruby
/usr/local/bin/ruby
# ruby -v
ruby 1.9.1p0 (2009-01-30 revision 21907) [i686-linux]

5: Download rubygems
Use wget rather than curl if you are doing this on the command line as wget handles the redirects that the rubyforge download links use. Choose the latest stable version.
wget http://rubyforge.org/frs/download.php/55066/rubygems-1.3.2.tgz

Unpack it, cd to the directory and install it.
# ruby setup.rb
# gem -v
1.3.2
# which gem
/usr/local/bin/gem

This creates /usr/local/lib/ruby/gems/.

6: Download and install a bunch of gems
Use the list of previously installed gems as your guide. I've written a script that will take care of most of the work for you: install_gems.rb. Comment out any gems in your list that you no longer need. More complex gems, such as Rails, will install other gems that they need, so don't worry if you comment out any gems that you don't recognize.

We will exclude the mysql gem from this automated process as it needs a special option. So comment that one out in your list for now.

Run the script (as root or sudo) and watch it do its thing:
# ./install_gems.rb gem_list
Installing actionmailer
Skipping actionpack
Installing actionwebservice
[...]

Don't worry when it skips certain gems. That just means that another gem has already installed them. You may see errors with some gems - don't freak out yet. These may well be related to Ruby 1.9. Capture the output in a file for easier troubleshooting. You can safely run install_gems.rb multiple times. Eventually it should skip over all the gems.

7: Manually install the mysql gem
If you want to use MySQL with Rails then install with an explicit path to mysql-config.
# which mysql_config
/usr/bin/mysql_config
# gem install mysql -- --with-mysql-config=/usr/bin/mysql_config

Note that you may get installation errors.

8: Ruby 1.9 and Gem Install Errors
By the time you try your installation all your gems may install just fine. But as of now (April 2009) several gems are not compatible with 1.9. The ones that gave me errors were:
mysql
mongrel
rubyprof
sqlite3-ruby
taps

Patching gem source code is a pain, so before you start down that path, decide if you really need these gems anymore. I've moved from mongrel to thin, so I can live without that one. I don't need to profile my apps right now so I can skip rubyprof. In fact the only one that I really need is mysql.

Time to poke around on the web and see how other people are dealing with the same problem. For Mysql you can edit the gem code or pull down a patched version or just give it a couple of weeks and see if the gem has been updated. It can be frustrating but there you go.

9: Clean up after yourself
Once you have verified that your new version of Ruby is working the way you expect then you might want to rename the old installation directories. I wouldn't delete anything but renaming them should ensure that you don't accidentally use the old version due an incorrect path, etc.

I hope this helps guide you through the Ruby install process and avoids the problems that can arise with two or more versions of Ruby on your machine.

 

Wednesday, April 22, 2009

Getting started with Sinatra and RestClient

Sinatra is a Ruby framework for developing web applications with RESTful interfaces. It looks like a great way to build specific applications where a full blown Rails app would be overkill.

I'm interested in it as a way to provide very focused, lightweight web services to support larger Rails applications. One example is fetching DNA sequences from a remote database, hosted on AWS EC2. Here is the entire Sinatra app (which calls a separate data lookup class).

#!/usr/bin/env ruby
require 'rubygems'
require 'sinatra'
require 'lib/seqlookup.rb'
get '/:db/:id' do
content_type('text/plain')
SeqLookup.fetch(params[:db], params[:id])
end

I ran into one wrinkle testing this out on my Mac. Following the 'hello world' example on their site did not work for me - never a good sign. It wants to fire up the Thin web server, which I don't have. It ought to then move on to try Mongrel and Webrick, but in my case it found some components of Thin and then barfed when it failed to find the rest. Including this line after the 'require' statements got it going.
set :server, %w[mongrel webrick]

But the better fix is to install Thin.
# sudo gem install thin

So Sinatra is a great way to create lightweight web services, but how do you consume them? Well you can use the URLs directly in curl or wget, or you can use ActiveResource from within Rails. Or you can use Sinatra's friend RestClient

Install the gem and then you can call your new web service with a script like this:
#!/usr/bin/env ruby
require 'rubygems'
require 'rest_client'
data = RestClient.get 'http://localhost:4567/genbank/NM_007294'
puts data

So far I'm impressed with these tools. I can reduce the complexity of my Rails apps by splitting off common services to small Sinatra apps. Sure, I'm adding complexity by having multiple web apps running and I risk failure in a supporting app causing failure in the larger app. But using a reliable hosting service like EC2 this is a risk I'm willing to take.

Wednesday, April 8, 2009

Radiant CMS

I just redesigned my web site and coded it up using Radiant CMS, a Rails-based Content Management System. I'm pleased with the result but the process was not totally pain-free. So here are some thoughts on the system that might be helpful to others.

Although Radiant is built with Rails, you are not coding up pages in Views like a regular application. Instead you use the Admin interface to your site and create pages using a web form. Those pages are stored in a database. You can code in regular HTML, Markdown, Textile, etc. Using a web browser is convenient but I found it tedious to edit compared to a real editor like Emacs or TextMate. In particular I missed the ability to quickly jump between pages and to search for text across all pages.

I made the mistake of starting with their example web site and morphing it to the one I wanted. Next time I would start with a blank site and build out my pages from scratch.

Radiant's documentation is bad - sorry, but it is. They really need getting started guides that explain how you really go about building a modest site - something more than the equivalent to 'hello world'. The system includes a range of Radiant tags which allow you to loop through, for example, news items, blog comments, etc. I used a few of these but not many. There are also a series of Radiant extensions for blog comments, slide shows, etc. The documentation on how to build these appears to be better than the core docs.

Pros:
- Easy to install the code, whether or not you know Rails
- Web interface is simple once you get the hang of it
- You can code in Markdown, etc., not just HTML
- Extensions and Tags can save a lot of work
- Using a web interface makes it easier to collaborate with others

Cons:
- Inability to edit pages directly is a pain if you are used to doing that
- The system expects you to know HTML and CSS, so it's not for complete novices
- Documentation is not good and needs more examples

Deploying the system to a hosted server (Slicehost) was fairly straightforward using Capistrano and Rake. But your server has to have MySQL and Rails installed. It could be useful to generate a version of the live site that consists of purely static pages.

Because it is Rails-based you can deploy Radiant sites to Heroku, which could be very useful for some users. I tried this and was almost successful. The deployment part was working after a few issues but it was screwing up pages due to a stupid CRLF (linefeed) translation problem. Heroku has the potential to make deployment very easy *but* it acts as a black box such that when something goes wrong you are out of luck. In my case Slicehost just seemed to be a better bet.

Archive of Tips