The Fastest Way to Concatenate Strings and Arrays in Ruby
remove hidden bottlenecks by assimilating these tips into your best practices toolbox
no comments no linksconvention: the silent killer
There are a couple of conventions in programming that you never really think to question because they're just so damn commonplace. Method arguments, for the most part, act in the same way across different languages, as do equals signs, and concatenators, and all of those little bits that every language HAS to implement to be useful and high-level.
And while we except these conventions, and you never really think about this in any capacity, the fact that something is commonplace and conventional does not necessarily infer that it is the best available solution. And, as I'm coming to realize, it usually isn't. I know this is kind of a big fuck you to Occam and his Razor, but hear me out.
here be dragons
There comes a time in every programmer's life when she or he will want to concatenate one string to an existing one.
For PHP programmers, it looks something like:
$var . " add to var" . "some other stuff to add"
And here in Ruby, as in Javascript and many other languages that use the plus symbol as their concatenator, something like this:
var + "a seemingly more expressive syntax" + "i agree entirely"
That seems to make a bit more sense, no? Looking back on my PHP existence, especially now being so firmly entrenched in Ruby, it seems that the language tended to invent its own conventions for seemingly arbitrary reasons or, at the very least, not for the sake of expressibility. Oh, well. We can't all be Ruby.
Anyway, concatenators. We use them constantly for Strings and Arrays (but not Hashes, sadly). I bet you'll find a + sign in ninety percent of your methods, more maybe. Ninety five. Whatever the case, I've just always assumed that it's the superior way to concatenate and never thought twice about it.
Until today.
enter "string".concat()
The concat() method in the String class, or its more prevalent alias, "<<", is really the de facto way to append Strings into Arrays. You see it everywhere, it's VERY commonly used, and nobody ever really questions it.
But it's available to String objects, too. And it's one of those things -- sometimes you'll use it because you're feeling whimsical, and just want to prove to yourself that you know how do something simple like that in eight different ways. So you do, and you move on, and you smirk with a sense of self-importance, and you don't think about it again until the whim returns.
And it was amidst one such instance that it occurred to me, "one of these has to be faster".
testing string theory
So there are approximately three sane ways in which to concatenate strings. You can use the plus(+) or plus equals(+=) concatenators, you can use <<, or you can sort of group everything together inside of a string and then supplant your variables inside of it using pound/brackets(#{}). We're going to test all of those right now. Let's write a benchmark:
require 'benchmark' Benchmark.bm do |b| b.report('+') do hyper = "hyper" crypto = "crypto" monkey = "monkey" 10_000_000.times { hyper + crypto + monkey } end b.report('<<') do hyper = "hyper" crypto = "crypto" monkey = "monkey" 10_000_000.times { hyper << crypto << monkey } end b.report('#{}') do hyper = "hyper" crypto = "crypto" monkey = "monkey" 10_000_000.times { "#{hyper}#{crypto}#{monkey}" } end end
That's right, we're going produce thirty million hypercryptomonkey objects as quickly as we can just for the sake of doing so. And while no, we haven't yet distilled an actual application for the resulting hypercryptomonkeys, we here at do{block} labs (a contract employee of Insani-T Chemical Enterprises) are confident that we can establish a market for one with an aggressive and whimsical advertising campaign. How very American of us.
the string results
But that's another article. For right now all we care about is how quickly we can crank out said hypercryptomonkeys, and by which method. And here are the results:
= user system total real + 11.550000 1.430000 12.980000 ( 13.130550) << 8.760000 1.430000 10.190000 ( 10.299680) #{} 13.770000 1.380000 15.150000 ( 15.316712)
Yeah, that's right. The + sign is about 27.5% slower than the << or concat() methods. And the #{} method, which never really seemed to be enormously efficient in the first place, still clocks in at about 48.7% slower than << and concat.
And, luckily for Insani-T Chemical Enterprises, Ruby is open source, and implementing the superior << method costs exactly zero percent more than its portlier relatives.
but, why?
Fair question. The plus symbol, it seems, creates an intermediary copy of the variables before combining them, whereas << and concat directly concatenate the variables to each other without first producing an intermediary copy. #{} isn't even in the same league because it's not even concatenating so much as just creating a new string object altogether with the properties of the other strings.
what about Arrays?
I'm glad you asked.
In their unrelenting quest to mass market marginally useful and unquestionably hazardous products, Insani-T Chemical Enterprises, a subsidiary of Mom's Old Fashioned Bavarian Pretzel's, Inc. (did I forget to mention that?), is aggressively developing a new product called "animal basket, which, exactly as it sounds, is simply a basket full of animals.
After their success with the enormously popular hypercryptomonkey line (especially in the Czech Republic), and with limited overhead thanks to <<, they're once again researching the fastest way to assemble the baskets. So as any studious contractor, I got wind of the project, put in a bid, and was hired to write a benchmark. Here it is:
require 'benchmark' Benchmark.bm do |b| b.report('+') do animals = ["insane walrus", "maniacal otter", "eloquent raccoon"] 1_000_000.times do basket = [] animals.each do |a| basket += [a] end end end b.report('<<') do animals = ["insane walrus", "maniacal otter", "eloquent raccoon"] 1_000_000.times do basket = [] animals.each do |a| basket << a end end end b.report('push') do animals = ["insane walrus", "maniacal otter", "eloquent raccoon"] 1_000_000.times do basket = [] animals.each do |a| basket.push a end end end b.report('pipe') do animals = ["insane walrus", "maniacal otter", "eloquent raccoon"] 1_000_000.times do basket = [] animals.each do |a| basket | [a] end end end end
As you'll see we're testing a couple of new methods, push, which adds the string in question as the last item of the Array and | or the pipe method, which combines two Arrays and flattens their values.
I tried to convince them to spend the extra money to insert breaks in the test in case the insane walrus bit somebody, or if the eloquent raccoon began to recite Kantian philosophies, but they just ignored me and said "that's the cost of doing business". Fucking Mom's Old Fashioned Bavarian Pretzels. This is why I don't take corporate contracts.
Sorry, I get bitter sometimes.
Where were we? Oh yeah, the results:
= user system total real + 5.160000 1.020000 6.180000 ( 6.234889) << 3.660000 1.020000 4.680000 ( 4.727642) push 3.720000 0.990000 4.710000 ( 4.739347) pipe 7.140000 1.060000 8.200000 ( 8.274396)
So really no surprises here. The only way to concatenate a String to an Array is to first create a whole new array object in which to encapsulate the String. Unsurprisingly, that eats up some time. The same goes for the pipe method. So + and << are about 32% and 75% slower respectively.
Push, though, as you can see from the results, was about eighty-three thousandths of a second slower than <<. I'm thinking that this is because push is actually an alias for << (or Append as it's called) in the Array method, but I'm not sure about this. Either way, the difference was so marginal that I tested just those two at ten million iterations rather than one million to see if anything changed:
= user system total real << 36.010000 10.880000 46.890000 ( 47.008566) push 36.370000 10.920000 47.290000 ( 47.706194)
And nothing did. If anything push proved to be a tad slower than it initially seemed.
bon voyage
Twenty-four million animal basket prototypes later, I parted ways with Insani-T Chemical with some cash in pocket, and several %15 off coupons for the entire line of Animal Basket products. I was a bit worldlier, knowing now that unless I had any special reason to do otherwise, << would always prove the quickest method by which to concatenate Strings or to append strings to Arrays.
Farewell, Insani-T Chemical. Until we meet again.
no links
no comments
post comment
markdown basics
| **bold** __bold__ | [link](http://link.com "link") | * unordered list item |
| *italic* _italic_ | ##h2 heading | 1. ordered list item |
| > blockquote | ####h4 heading | <code>@ruby</code> |