The domain background of my diploma thesis is sample size calculation for clinical trials – in fact simulation of those. This means I need to calculate how many probands or trial subjects are needed, to get the result with a certain error propability (I don’t mind going into detail right here).
The procedures for sample size calculation are well published, but some of those work in a manner like this:
def calculate_sample_size
n = 4 # 4 is a good starting point
# a we have 2 arms minimum and a trial with propability involved need 2 probands at least
loop do
return n if fullfills_a_some_mathematical_condition(n)
n += 1
end
end
So we need to return the smallest integer fulfilling a certain condition and we need to iterate up to this one, testing against a condition each time.
The above works, but has a big problem: I’m doing it for at least 10 million times, usually around 100 million times. Caching is not an option here, as we are dealing with stocastic we don’t have deterministic parameters (those parameters are hidden in the code above).
But, going back to my computer science books, I found ‘binary search’ which I applied for this – of course with a Ruby closure:
module IterationHelper
# This is the entry point for the iteration
def iterate_up start = 0, step = 1000, &block
max = start
# First we need to find the a maximum, because in theory we have infinite integers
# I define the step parameters, as for some sample size calculation, there is a way to guess those to
# some extend.
# I could of course start with Fixnum::MAX but I in my case this is faster
until(yield(max))
max += step
end
# Now that we know the range where to look for the smallest match
find_smallest_satisfier(max - step, max, &block)
end
private
# This works nearly exactly as textbooks binary sort
def find_smallest_satisfier min, max, &block
# If min and max differ less than 2, we have found the smallest n
if max - min < 2
return min if yield(min)
return max
end
# Get the value in the middle
n = min + (max-min)/2
# If yield(n) ==> value might be smaller
if yield(n)
return find_smallest_satisfier(min, n, &block)
# Unless yield(n) ==> value might be bigger
else
return find_smallest_satisfier(n, max, &block)
end
end
end
This leeds to a much cleaner call like this:
include IterationHelper
def sample_size
iterate_up {|n| fullfills_a_some_mathematical_condition(n) }
end
But – i did say – I didn’t do this for cleaner calls. So I did a litte benchmark:
require 'lib/helpers/iteration_helper'
include IterationHelper
require 'benchmark'
include Benchmark
N = 10**3
def complex_calculation n
# some dummy calculations
10.times {Math.sqrt Kernel.rand}
n >= @n_0
end
@targets = [25, 105, 234, 500, 765]
bm(7) do |x|
x.report("normal") do
@targets.each do |t|
@n_0 = t
N.times do
m = 4
loop do
break if complex_calculation(m)
m += 1
end
end
end
end
x.report("binary") do
@targets.each do |t|
@n_0 = t
N.times do
iterate_up(4) {|n| complex_calculation(n)}
end
end
end
end1
The result was very impressing:
1 user system total real
normal 12.210000 0.030000 12.240000 ( 15.631090)
binary 0.520000 0.010000 0.530000 ( 0.778788)
You should - of course - expect the iteration do do better, when the actual satisfaction test needs less mathematics, but perform worse if the mathematics are more time consuming (which mine are).