Johannes Thönes

Software-Developer, ThoughtWorker, Permanent Journeyman, Ruby Enthusiast, Java and Devils Advocate.

Speeding Up Iterative Calculation in Ruby

The domain background of my diploma thesis is sample size calculation for clinical trials - in fact simulation of those. This means I need to calculate how many probands or trial subjects are needed, to get the result with a certain error propability (I don’t mind going into detail right here).

The procedures for sample size calculation are well published, but some of those work in a manner like this:

1
2
3
4
5
6
7
8
9
def calculate_sample_size
  n = 4 # 4 is a good starting point
          # a we have 2 arms minimum 
          # and a trial with propability involved need 2 probands at least
  loop do
    return n if fullfills_a_some_mathematical_condition(n)
    n += 1
  end
end

So we need to return the smallest integer fulfilling a certain condition and we need to iterate up to this one, testing against a condition each time.

The above works, but has a big problem: I’m doing it for at least 10 million times, usually around 100 million times. Caching is not an option here, as we are dealing with stocastic we don’t have deterministic parameters (those parameters are hidden in the code above).

But, going back to my computer science books, I found ‘binary search’ which I applied for this - of course with a Ruby closure:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
module IterationHelper
  # This is the entry point for the iteration
  def iterate_up start = 0, step = 1000, &block
    max = start
    # First we need to find the a maximum, because in theory we have infinite integers
    # I define the step parameters, as for some sample size calculation, there is a way to guess those to
    #   some extend.
    # I could of course start with Fixnum::MAX but I in my case this is faster
    until(yield(max))
      max += step
    end

    # Now that we know the range where to look for the smallest match
    find_smallest_satisfier(max - step, max, &block)
  end

  private
  # This works nearly exactly as textbooks binary sort
  def find_smallest_satisfier min, max, &block

    # If min and max differ less than 2, we have found the smallest n
    if max - min < 2
      return min if yield(min)
      return max
    end

    # Get the value in the middle
    n = min + (max-min)/2
    # If yield(n) => value might be smaller
    if yield(n)
      return find_smallest_satisfier(min, n, &amp;block)
      # Unless yield(n) => value might be bigger
    else
      return find_smallest_satisfier(n, max, &amp;block)
    end
  end
end

This leeds to a much cleaner call like this:

1
2
3
4
include IterationHelper
def sample_size
  iterate_up {|n| fullfills_a_some_mathematical_condition(n) }
 end

But - i did say - I didn’t do this for cleaner calls. So I did a litte benchmark:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
require 'lib/helpers/iteration_helper'
include IterationHelper

require 'benchmark'
include Benchmark

N = 10**3

def complex_calculation n
  # some dummy calculations
  10.times {Math.sqrt Kernel.rand}
  n >= @n_0
end
@targets = [25, 105, 234, 500, 765]

bm(7) do |x|
  x.report("normal") do
  @targets.each do |t|
    @n_0 = t
      N.times do
        m = 4
        loop do
           break if complex_calculation(m)
           m += 1
        end
      end
    end
  end

  x.report("binary") do
    @targets.each do |t|
        @n_0 = t
        N.times do
        iterate_up(4) {|n| complex_calculation(n)}
      end
    end
  end
end[code]
The result was very impressing:

[code] user     system      total        real
normal  12.210000   0.030000  12.240000 ( 15.631090)
binary   0.520000   0.010000   0.530000 (  0.778788)

You should - of course - expect the iteration do do better, when the actual satisfaction test needs less mathematics, but perform worse if the mathematics are more time consuming (which mine are).