Software-Developer, ThoughtWorker, Permanent Journeyman, Ruby Enthusiast, Java and Devils Advocate.
Speeding Up Iterative Calculation in Ruby
The domain background of my diploma thesis is sample size calculation for clinical trials - in fact simulation of those. This means I need to calculate how many probands or trial subjects are needed, to get the result with a certain error propability (I don’t mind going into detail right here).
The procedures for sample size calculation are well published, but some of those work in a manner like this:
123456789
defcalculate_sample_sizen=4# 4 is a good starting point# a we have 2 arms minimum # and a trial with propability involved need 2 probands at leastloopdoreturnniffullfills_a_some_mathematical_condition(n)n+=1endend
So we need to return the smallest integer fulfilling a certain condition and we need to iterate up to this one, testing against a condition each time.
The above works, but has a big problem: I’m doing it for at least 10 million times, usually around 100 million times. Caching is not an option here, as we are dealing with stocastic we don’t have deterministic parameters (those parameters are hidden in the code above).
But, going back to my computer science books, I found ‘binary search’ which I applied for this - of course with a Ruby closure:
moduleIterationHelper# This is the entry point for the iterationdefiterate_upstart=0,step=1000,&blockmax=start# First we need to find the a maximum, because in theory we have infinite integers# I define the step parameters, as for some sample size calculation, there is a way to guess those to# some extend.# I could of course start with Fixnum::MAX but I in my case this is fasteruntil(yield(max))max+=stepend# Now that we know the range where to look for the smallest matchfind_smallest_satisfier(max-step,max,&block)endprivate# This works nearly exactly as textbooks binary sortdeffind_smallest_satisfiermin,max,&block# If min and max differ less than 2, we have found the smallest nifmax-min<2returnminifyield(min)returnmaxend# Get the value in the middlen=min+(max-min)/2# If yield(n) => value might be smallerifyield(n)returnfind_smallest_satisfier(min,n,&block)# Unless yield(n) => value might be biggerelsereturnfind_smallest_satisfier(n,max,&block)endendend
require'lib/helpers/iteration_helper'includeIterationHelperrequire'benchmark'includeBenchmarkN=10**3defcomplex_calculationn# some dummy calculations10.times{Math.sqrtKernel.rand}n>=@n_0end@targets=[25,105,234,500,765]bm(7)do|x|x.report("normal")do@targets.eachdo|t|@n_0=tN.timesdom=4loopdobreakifcomplex_calculation(m)m+=1endendendendx.report("binary")do@targets.eachdo|t|@n_0=tN.timesdoiterate_up(4){|n|complex_calculation(n)}endendendend[code]Theresultwasveryimpressing:[code]usersystemtotalrealnormal12.2100000.03000012.240000(15.631090)binary0.5200000.0100000.530000(0.778788)
You should - of course - expect the iteration do do better, when the actual satisfaction test needs less mathematics, but perform worse if the mathematics are more time consuming (which mine are).