[ACCEPTED]-Performance of Arrays and Hashes in Ruby-hash

Accepted answer
Score: 108

Hashes are much faster for lookups:

require 'benchmark'
Document = Struct.new(:id,:a,:b,:c)
documents_a = []
documents_h = {}
1.upto(10_000) do |n|
  d = Document.new(n)
  documents_a << d
  documents_h[d.id] = d
end
searchlist = Array.new(1000){ rand(10_000)+1 }

Benchmark.bm(10) do |x|
  x.report('array'){searchlist.each{|el| documents_a.any?{|d| d.id == el}} }
  x.report('hash'){searchlist.each{|el| documents_h.has_key?(el)} }
end

#                user     system      total        real
#array       2.240000   0.020000   2.260000 (  2.370452)
#hash        0.000000   0.000000   0.000000 (  0.000695)

0

Score: 7

When using unique values, you can use the 5 Ruby Set which has been previously mentioned. Here 4 are benchmark results. It's slightly slower 3 than the hash though.

                 user     system      total        real
array        0.460000   0.000000   0.460000 (  0.460666)
hash         0.000000   0.000000   0.000000 (  0.000219)
set          0.000000   0.000000   0.000000 (  0.000273)

I simply added to @steenslag's 2 code which can be found here https://gist.github.com/rsiddle/a87df54191b6b9dfe7c9.

I used ruby 1 2.1.1p76 for this test.

Score: 5

Ruby has a set class in its standard library, have 4 you considering keeping an (additional) set 3 of IDs only?

http://stdlib.rubyonrails.org/libdoc/set/rdoc/index.html

To quote the docs: "This 2 is a hybrid of Array’s intuitive inter-operation 1 facilities and Hash’s fast lookup".

Score: 3
  1. Use a Set of Documents. It has most of the 14 properties you want (constant-time lookup 13 and does not allow duplicates),. Smalltalkers 12 would tell you that using a collection that 11 already has the properties you want is most 10 of the battle.

  2. Use a Hash of Documents by 9 document id, with ||= for conditional insertion 8 (rather than has_key?).

Hashes are designed 7 for constant-time insertion and lookup. Ruby's 6 Set uses a Hash internally.

Be aware that 5 your Document objects will need to implement 4 #hash and #eql? properly in order for them 3 to behave as you would expect as Hash keys 2 or members of a set, as these are used to 1 define hash equality.

More Related questions