A Whistle-Stop Tour of Syntax Highlighting and Markdown Solutions for Rails

by Mike Zazaian at 2009-08-14 05:02:03 UTC in rails plugins syntax

a comprehensive guide of the best syntax highlighting solutions for Ruby and Rails worth knowing about

3 comments no links

the fruitless search for syntactical goodness

There comes a time in every web developer's life that they want to show off their code and demonstrate to everybody in the open source community that they are a brilliant programmer and are clearly one of the two or three most sought-after technical minds of their generation. And while that time immediately precedes another, slightly more harrowing time marked by a torrent of angry comments, jaunts, jeers, hatemail and genuinely better solutions than the one originally presented, it's still nice and reassuring to know that the code looked pretty while it was being annihilated by your peers.

I'm not referencing any particular point in my existence here, but I remember realizing that I wanted pretty code to generate the visual appeal on this site rather than photos or graphics (initially, at least). So I set out on my journey of finding a delightfully-written, whimsical, yet comprehensive article about all of the solutions that exist for implementing syntax highlighting in a Rails app.

"This has to exist," I thought to myself, "there are millions of Rails users and I bet a quarter of them at least have wanted to implement syntax highlighting at some time or another." My internal monologue is very eloquent, you see. It continued, "It's impossible that nobody in the entire rails community would have taken even five minutes to slap together a rudimentary article on the available solutions for this issue."

But, as always, my internal monologue was unjustifiably optimistic, and, though I found a handful of articles on one or two technologies, I found no content that fully elucidated the virtues of these technologies, or, more importantly, their implementation.

So that's what this is (hopefully), everything that I was unable to find I'm putting here, in this artorial (article/tutorial, eh?), so nobody has to endure the two-hour search for answers that I did for so simple an issue. That is, at least, until every technology in this article becomes antiquated and the whole thing has to be re-written from the ground up.

Such is the nature of technology, I suppose.

the bitter yet informative response to the fruitless search for syntactical goodness

So there are a couple of parts to any good syntax highlighter that are worth explaining before I delve into the specifics of each implementation.

First, you'll need a helper method to determine where the code tags are in your text and deliver their contents to the syntax highlighter itself to be parsed and spit back out as enormously marked-up HTML.

Second, you'll need the actually syntax_highlighter gem or plugin or application, depending on the actual implementation. These will present you with a couple of methods that you can use throughout your application to do things like parse the actual code, change styles and options on the fly, and link to stylesheets for your particular highlighter. Humorously, these don't do the actual syntax highlighting (for the most part), but rather act as a wrapper for the seedy underbelly of the application, which are most likely a couple of regular expressions libraries that:

  1. Figure out what is and is not code, and:
  2. Take that code and wrap it in the aforementioned gobs of HTML so that it can styled and colorized to a degree that was never before thought humanly possible or necessary.

Lastly, there's the markdown engine. You really don't NEED a markdown engine, I suppose, but it seems to pair really well with a syntax highlighter, as you're going to be searching through your text for tags anyway you might as well go ahead and parse your plain text into standards-compliant XHTML while you're at it. Seems pretty smart. Plus, a couple of the syntax highlighters out there come paired with a markdown engine already, so it's going to be mentioned one way or another.

With those ideas out in the ether, we're ready to begin our whistle-stop tour of the beautiful Provincial countryside, or, for those who have been paying attention, the most prominent syntax highlighting solutions available for Rails applications at the moment.

redcloth-with-coderay

I remember the first time I went out into the brave, undiscovered wilderness that is the search for Rails syntax highlighting tutorials on the internet, and really didn't even understand what I was looking for much less what could be considered an acceptable solution. I knew that I wanted to type bits of code into my articles, put a couple of tags around them, and then have everything in those tags spit out in a colorful array of wonder and syntactical goodness. It was with this in mind that I stumbled across a gem called redcloth-with-coderay.

redcloth-with-coderay, as the name suggests, pairs the redcloth markdown library with the coderay syntax highlighting library. It provisions easy, out-of-the-box functionality by installing itself and its dependencies as Rubygems, and results in what I can only describe as the best syntax highlighting solution for those fixated on instant gratification.

It's literally this easy to install and use in your Rails application:

# that says augustl with and "L", as in llama, not august1, as in I made this mistake "1" too many times before

$ sudo gem install augustl-redclothcoderay --source=http://gems.github.com/

Then you can just require the necessary gems in your articles_helper.rb file or wherever it is that you're implementing them, injecting the text and code that you want parsed into the RedCloth.new method, like this:

require 'rubygems'
require 'redcloth'
require 'coderay'
require 'redclothcoderay'

RedCloth.new('I am *bold* and <source>@hi_tech</source>').to_html

And that's it. You're well on your way to a quick, easy, slow, painful, difficult to style and vastly underdocumented syntax highlighting solution for your otherwise functional and technically progressive Rails application.

Yeah, I turned a bit of a corner there. Allow me to explain:

Redcloth, as many of you may know, was one of the first markdown libraries available for Ruby, and has been around really since the very first tutorials on anything Ruby were written back in 2003ish. It follows the formatting guidelines for textile, which _why the lucky stiff once touted, and even wrote a comprehensive reference guide about. It used to support markdown as well, but no longer does as of version 4.

Also it's slow as balls. And not just any balls -- really, REALLY, painfully slow balls that seem to, even when rolling down a steep hill, defy the pull of gravity to the point at which they appear not to be moving at all. Granted, with version 4 came a promise from RedCloth's maintainers that it was now 40x faster thanks to being compiled with ragel. So, not to get entirely ahead of myself, I whipped up a quick test to check whether I was totally off base.

I created a quick benchmark using the contents of this reasonably long article, to test the difference in speed between RedCloth and its Markdown counterparts BlueCloth and RDiscount. This is what the test looked like:

require 'rubygems'
require 'bluecloth'
require 'redcloth'
require 'rdiscount'

article = File.readlines("test_article.txt").join("\n")

require 'benchmark'
Benchmark.bm do|b|
  b.report("bluecloth") do
    1000.times { BlueCloth.new(article).to_html }
  end

  b.report("redcloth") do
    1000.times { RedCloth.new(article).to_html }
  end

  b.report("rdiscount") do
    1000.times { RDiscount.new(article).to_html }
  end
end

And here's what the results looked like:

user     system      total        real
bluecloth  2.150000   0.020000   2.170000 (  2.176069)
redcloth 97.560000  11.460000 109.020000 (109.598371)
rdiscount  2.020000   0.000000   2.020000 (  2.037113)

What was surprising about this was that BlueCloth kept pace with RDiscount for the most part. What was also surprising was that REDCLOTH IS ALMOST FIFTY-FIVE TIMES SLOWER THAN BOTH BLUECLOTH AND RDISCOUNT.

Yeah. It was so slow, in fact, that when I initially attempted to run the benchmark over 10,000 iterations I was forced to interrupt the process because it appeared that the RedCloth test was hanging. But no, in fact it would have just taken about 1096 seconds, or about 16 minutes to complete what BlueCloth would likely have done in 21.7 seconds. I'm sorry, RedCloth, but that's a bitchslap on all counts.

It occurred to me then, that maybe this vast discrepancy occurred because the text I was attempting to parse with RedCloth followed markdown guidelines rather than those of textile. So I wrote a ridiculously short text, unrepresentative of any likely real-world applications, with an unordered list and some paragraphs, which was syntactically consistent between both textile and markdown conventions:

* this
* is
* a
* list
* in
* an
* article
* and
* does
* many
* great
* things

here is some text

and some more

and some more

wow what a great article

And, sure enough, RedCloth got EVEN WORSE:

user     system      total        real
bluecloth  8.210000   0.160000   8.370000 (  8.374246)
redcloth 352.960000  48.130000 401.090000 (402.421774)
rdiscount  5.610000   0.030000   5.640000 (  5.689682)

RDiscount's dominance was a bit more apparent across 100_000 iterations in the shorter test, but Redcloth still came out at 48x slower than BlueCloth and almost, wait for it...71x slower than RDiscount. I know that this isn't an entirely scientific solution, but depending on your application your server will likely encounter situations similar to these at some point, and you really don't want to get caught with your money on RedCloth. Especially on a high-traffic day.

Okay, so that's out of the way. RedCloth is slow and whatever benefits in terms of flexibility that one is given with textile are surely negated in terms of server resources.

So what about the second half of the equation, CodeRay? Truth be told CodeRay is actually, as promised, an enormously fast and useful syntax highlighting library. It's ridiculously user-friendly, requiring just one easy-to-install gem as opposed to ultraviolet's three painfully difficult and buggy ones.

CodeRay is also written entirely in Ruby, which we like, as it supports the community while shows off the language's capabilities, and has a reasonably wide range of functions for formatting code in 12 different languages including HTML, CSS, Ruby, Javascript and YAML. It even allows you to determine your language selection on the fly -- cool.

But there are, of course, caveats. CodeRay, out of the box, doesn't provide any reasonable manner to style your code. You'll either have to cut and paste the one really hideous stylesheet that I was able to find into your own stylesheet, or you'll have style the entire thing manually. You could, I suppose, port a TextMate highlighting file to be used with with CodeRay, but this, too, seems like an enormous pain in the ass, and is a reinvention of the wheel in many respects.

Alright, so redcloth-with-coderay is out. Too much work for too little payoff, including a kick-in-the-throat bottleneck courtesy of RedCloth. What's next?

everything's coming up ultraviolet

The rest of the solutions that I've come across all use Ultraviolet for syntax highlighting rather than CodeRay, and seemingly for good reason.

What's ultraviolet, you ask? Ultraviolet is a syntax highlighting engine, which is based on the Textpow textmate bundle parser, which is built on the Oniguruma regular expressions library. As it advertises on its homepage, Ultraviolet supports over 50 languages, and even sexier, allows you to use any pre-existing TextMate theme bundles to highlight the syntax of your rails app. Also, installing Ultraviolet as a gem installs both Textpow and Oniguruma as dependencies, which makes that whole process pretty durn easy.

Problem is, the oniguruma 1.1.0 gem (most current at the time of this article), seems to have some segfaulting issues on many systems, and won't necessarily install without errors. But then again I've seen reports that it does work in some environments, and seems to be a situationally-specific error.

That said, since you're hopefully going to be using Ultraviolet in some incarnation or another (at least, I'd strongly recommend that you do), you might as well install it now:

$ sudo gem install -r ultraviolet

So that'll install, and if Oniguruma works correctly it should look like any other gem install, what with creating the necessary ri and rdoc files. Otherwise you'll get this big nasty mess of build errors, and will have to build version 5.8.0 of Oniguruma from scratch.

Luckily it's not too difficult a process. If you're on a unix-based system it'll look something like this:

$ wget http://www.geocities.jp/kosako3/oniguruma/archive/onig-5.8.0.tar.gz
$ tar -xzvf onig-5.8.0.tar.gz
$ cd onig-5.8.0
$ ./configure
$ make
$ sudo make install

That should work for you. If you're still having issues, however, there's a small troubleshooting guide here. Otherwise, you're now ready to install (and use) any of the following syntax highlighting solutions, all of which are pretty nifty.

radiograph

Radiograph is a really, really, really, aggressively simple gem that allows you to add syntax highlighting to a string by passing that string into the method code(). It also has a helper method that allows you to link the css files into your rails app by calling require_syntax_css and optionally passing the name of the desired TextMate theme in as an argument.

Thing is, the gem doesn't seem to install correctly as a plugin (you can try it by executing this command at the root of your Rails application:)

./script/plugin install http://code.jeremymcanally.com/radiograph/

But the install script can't find the CSS files that it's supposed to install, and without ANY documentation of any kind, neither can I. Knowing that there are better, vastly more robust and well-documented alternatives available, I move on from Radiograph without even a second thought. I just thought I'd mention it because it's something that you might come across in your search for syntax.

harsh

harsh, or Harsh: Another Rails Syntax Highlighter, is anything but. It allows you to, like Radiograph, style a block of code by simply calling:

harsh(some_code_variable)

But is really a whole hell of a lot more flexible. It allows you, for instance, to set all of its options on the fly. So if, for example, you wanted to use the twilight theme for this particular code block, and also use line numbers, and highlight the code specifically for CSS syntax, then you could call the harsh method like this:

harsh(some_code_variable, :theme => :twilight, :lines => true, :syntax => true)

Or in block form, like this:

harsh :theme => :twilight, :lines => true, :syntax => :css do
  class MySuperAwesomeClass
    attr_writer :this_is_awesome
    def initialize(awesomeness)
      @this_is_awesome = awesomeness
    end
  end

  m = MySuperAwesomeClass.new(true)
end

For really super-good human beings, or, in layman's terms, HAML users, you can implement harsh as a filter by first configuring it in your environment.rb file, like this:

#!/config/environment.rb

Harsh.enable_haml

With that in place, you can now interpret the desired string (or string literal, or variable containing the string) under the :harsh filter as a block of code, like this:

#!/app/views/article/some_view.haml

:harsh
  ~ some_code_block

If for whatever ungodly reason you want to not only use the HAML filter but also style the code under that filter on the fly, there's a bootleg hack available for that, too:

#!/app/views/article/some_view.haml

:harsh
  #!harsh theme = twilight lines=true syntax=css
  ~ some_code_block

You'll have to observe the exact syntax there, though -- spaces and all. theme=twilight, in this particular instance, won't cut it.

Harsh also makes it enormously easy to install and uninstall TextMate themes thanks to some really nifty rake tasks.

Of course you'll first have to install Harsh as a rails plugin (I know, I kind of got ahead of myself there), which, assuming that you've already got Ultraviolet and Oniguruma installed, goes like this:

$ cd my_app
$ ./script/plugin install git://github.com/michaeledgar/harsh.git

Or, if you're using Git, you can install Harsh as a submodule like this:

$ cd my_app
$ git submodule add git://github.com/michaeledgar/harsh.git /vendor/plugins/harsh
$ git submodule init
$ git submodule update

Submodules are useful of course, for keeping your app current by updating a submoduled plugin on your server whenever you deploy using Capistrano. But that's another article...

For now, let's just bask in the themes that Harsh now allows us to install and manage using the following rake tasks:

# lists available themes
rake harsh:theme:list 

# installs the twilight theme into /public/stylesheets/harsh/
rake harsh:theme:install[twilight]

# also installs the twilight theme (for *csh shells)
rake harsh:theme:install THEME=twilight 

# removes the twilight theme
rake harsh:theme:uninstall[twilight] 

# also uninstalls the twilight theme (for *csh shells)
rake harsh:theme:uninstall THEME=twilight

Wow, pretty cool, huh? You can even (and this is the BIG, ENORMOUSLY USEFUL ONE) call a rake task to show you all of the syntaxes that are available to call as options, like this:

rake harsh:syntax:list

Gee. That's just a thing of beauty. Of course, once you've got you respective themes installed you'll want to actually include them into your Rails app, which you can do like this:

# include the stylesheet for the twilight theme
stylesheet_include_tag :twilight

# include stylesheets for all installed harsh themes
stylesheet_include_tag :harsh

So that's Harsh, just a wonderful, beautiful syntax highlighting solution for rails, as flexible and well-provisioned a plugin as one could ask for such a task. Still, fool that I am, I entirely missed the documentation of the Rake tasks first time I checked it out thanks to the unassuming gray box of code in which they were featured, so I actually ended up implementing another, equally useful solution...

tm_syntax_highlighting

The maintainer(s) of the tm_syntax_highlighting gem didn't pull any punches when they named they thing. They knew exactly what it was, what its strengths were, and didn't try to mask that in any capacity.

tm_syntax_highlighting like harsh, allows styling both themes, language and line numbers on the fly, like this:

code(some_code_variable, :theme => "twilight", :lang => "ruby", :line_numbers => true)

And provides some simple methods for including the necessary stylesheets, like this:

syntax_css("twilight")

So they seem outwardly to be very similar, with the exception, maybe, that tm_syntax_highlighting seems to use strings for its options where Harsh uses symbols.

A big departure, though, occurs, when we get to generating the stylesheets for the individual themes. Rather than listing, installing and uninstalling themes via rake tasks, as Harsh does, tm_syntax_highlighting uses generators, like this:

# to see a list of themes
$ script/generate syntax_css list
  
# to copy all the themes to public/stylesheets/syntax
$ script/generate syntax_css all
  
# to copy a single theme to public/stylesheets/syntax
$ script/generate syntax_css theme_name

Okay, so the generators that tm_syntax_highlighting users are not quite as flexible as Harsh's rake tasks. One point for Harsh on this one. *tm also doesn't have a HAML filter, which while pretty marginal in its usefulness, still exists, and is worth at least half a point for Harsh.

But tm DOES offer some flexibilities that Harsh doesn't. You can, for example, include stylesheets on the fly by calling syntax_css without any arguments, like this:

code(some_ruby_code, :theme => "twilight")
code(some_more_ruby_code, :theme => "sunburst")

# yields stylesheet tags for both twilight and sunburst
syntax_css

So that's pretty useful, especially in situations where you want to differentiate multiple pieces of code from one another. Point for tm on that one.

Another REALLY useful feature in tm is the ability to create a config file to set defaults across your entire application, like this:

#!/config/initializers/tm_syntax_config
TmSyntaxHighlighting.defaults = {:theme => "sunburst", :line_numbers => true, :lang => "ruby"}

Which, in retrospect, is a dead simple and dead useful feature to implement. To achieve this same effect in Harsh, you'd have to actually go into the file /vendor/plugins/harsh/lib/harsh.rb and edit the values in the default_options hash, but this is a really bad way to have to execute this. Say, for example, you did install Harsh as a submodule -- a new release of the plugin would overwrite your changes, and you'd have to make them all over again. No, tm definitely has the upper hand on this one. Point and half for tm.

A last point, which I noticed after going into that harsh.rb file, is that tm_syntax_highlighting is actually more elegantly coded than Harsh. Harsh is comprised of really only two functions, and doesn't really organize itself into classes, or document itself well in its comments. tm, on the other hand, is VERY well coded, and its maintainers clearly have an artistic eye for code layout. And while this doesn't necessarily have an impact on the functionality of the plugins themselves, it tells you something about the technical and creative savvy of each plugin's maintainers, and the potential for each project.

and the winner is...

YOU, the Rubyist. Truth is, as in many situations when implementing Rails plugins, there's really no RIGHT answer when searching for your syntax highlighting solution. If you think you might be defining a lot of syntax on the fly or with HAML templating, then Harsh might be a better solution for you. If you think that you'll want to retain the same options across your entire application, tm_syntax_highlighting is probably a better choice. If you're a masochist and have a textile fixation then redcloth-with-coderay might be your bag.

As always, it comes down to a situationally-specific evaluation of your needs, and, obviously, you're the only one capable of that. It's okay to be decisive here, though -- you can't really make a wrong choice. Harsh AND tm are both very strong options. The others...well...they help us appreciate how strong harsh and tm are.

what happened to those code tags you mentioned, and the helper methods, and the markdown libraries?

Oh, yeah, those. I guess this part is pretty important for the actual implementation of these syntax-highlighting plugins inside a Rails app. You'll probably be implementing these on text fields that are already filtered through markdown, and if they aren't they probably should be. So here's a helper function that'll run your code through some regular expressions, add syntax to the bits surrounded by code tags, and filter the rest of your text through the markdown library that I mentioned earlier, RDiscount:

#!/app/helpers/application_helper.rb

module ApplicationHelper

  require 'rubygems'
  require 'rdiscount'

  def prettify(text, options = {})
    text_pieces = text.split(/(<c0de>|<c0de lang="[A-Za-z0-9_-]+">|
      <c0de lang='[A-Za-z0-9_-]+'>|<\/c0de>)/)
    in_pre = false
    language = nil
    text_pieces.collect do |piece|
      if piece =~ /^<c0de( lang=(["'])?(.*)\2)?>$/
        language = $3
        in_pre = true
        nil
      elsif piece == "</c0de>"
        in_pre = false
        language = nil
        nil
      elsif in_pre
        lang = language ? language : "ruby"
        code(piece.strip, :lang => lang)
      else
        markdown(piece, options)
      end
    end
  end
  
  def markdown(text, options = {})
    if options[:strip]
      RDiscount.new(strip_tags(text.strip)).to_html
    else
      RDiscount.new(text.strip).to_html
    end
  end

  ...
end

Of course, for the sake of demonstration here, and for the sake of not breaking my own code tags, I've replaced all of the lowercase letter o's with the number 0 in this example. You'll have to go through and replace those for your own code.

Also, you'll notice the code() method under the second elsif statement -- you'll want to change that to harsh() if you're using harsh, or you could always just create an alias method like this:

alias_method :code, :harsh

Either way, these two little functions should suit anybody's needs (they could be a bit more robust -- I might post a better version as an update later) for on-the-fly syntax highlighting. Of course, you'll have to have RDiscount installed, which you can do quite easily:

$ sudo gem install -r rdiscount

And then you're good to go. Of course, if your own curiosities demand it, you could certainly use BlueCloth in lieu of RDiscount, or any other markdown solution that tickles your proverbial fancy, but hopefully you'll find these powerful, dynamic syntax highlighting solutions to provide sufficient tickle.

fin

That's all she wrote. Or, more accurately, that's all that I wrote. This should really be everything that you need to know to implement a powerful syntax highlighting solution for your Rails, Merb, Sinatra, whatever app. Pretty easy, too, in the scheme of things.

If you're confused about anything, or have questions about any case situations that I haven't covered, by all means ask your question as a comment to this article, or send me an e-mail. Otherwise, god speed through syntax.

3 comments

Gavin at 2009-10-17 23:34:56 UTC

Lovely work, Mike. Articles like this are fantastic. It's very Rails-centered, though. I'm looking to highlight code in a static HTML file generated by a Ruby program. Hopefully it's easy enough to work out.

Thanks for your great effort.

Gabe da Silveira at 2010-01-21 10:23:00 UTC

I just updated tm_syntax_highlighting for use in Rails 3 http://github.com/dasil003/tm_syntax_highlighting

Sai Perchard at 2010-04-14 10:07:18 UTC

Thanks for the great write up.

For some reason, when using your helper to facilitate syntax highlighting within tags, the

 element in to which the highlighted code was being inserted was being displayed in the wrong place on the page. For example, I would apply prettify() to my blog post; the post would be displayed in the correct place, sans all  blocks. These would be displayed at the top of the blog post, completely out of context.

I fixed this through trial and error, eventually modifying line 42 of harsh.rb. I simply changed:

concat(Uv.parse( text, "xhtml", opts[:format].to_s, opts[:lines], opts[:theme].to_s)) ""

to:

Uv.parse( text, "xhtml", opts[:format].to_s, opts[:lines], opts[:theme].to_s)

Not entirely sure what the problem was here, but I thought I would post in case anybody else was experiencing the same problem.

Thanks again for the article.

Comments closed

latest links

Help.GitHub - Multiple SSH keys The article from github help mirroring this process
ones zeros majors and minors ones zeros majors and minors: esoteric adventures in solipsism, by chris wanstrath
ActiveScaffold A Ruby on Rails plugin for dynamic, AJAX CRUD interfaces

login

register activate reset

feeds

articles/rss

topics

staff

editor

about

doblock focuses on ruby, rails, and all things that can help ruby and/or rails programmers hone their skills.

Techniques, tutorials, news, and even free open-source applications, doblock seeks to fill in the cracks of the ruby/rails blogosphere.

doblock v. 0.10.1 powered by Rails