Johannes Thönes

Software-Developer, ThoughtWorker, Permanent Journeyman, Ruby Enthusiast, Java and Devils Advocate.

Devoxx 2010

I was very lucky to attend the Devoxx 2010 (the conference part) this year. So this is reason enought, to blog about the thing I found most valuable to report:

The discussion panel

This was a discussion panel lead by Joe Nuxoll (with Dick Wall as keeper of the minutes) between Josh Bloch, Mark Reinhold, Antonio Goncalves, Stephen Colebourne, Jürgen Hoeller and Bill Venners. They did a fanominal job in covering up for James Goslings cancellation (get well soon James!).

Insertion Sort in Scala

Finally beeing a bit more serious about learning the Scala language, I wanted to do some exercises to get a little fluent in the language. Because I’m only four Chapters into “Programming in Scala”, I choose the simple one: Implementing search algorithms - today the insertion sort. Here is the the “easy” solution coming from a mostly object-orientated background transcoded from a text book:

Get All SVN Commiters From a Repository or Working Copy

Short - but unfortunatly to long for Twitter. With this little snippet you can get all commits from an svn working copy, repository or from a pure log:

1
svn log | ruby -e "puts STDIN.read.split(/\n/).select{|l| l =~ /^r\d+/}.map{|l| l.split('|')[1].strip}.uniq.sort"

Who has any guess why I did need this?

A Ruby Script for Upgrading Multiple DokuWiki Installations

After DokuWiki has been released multiple times in the last few days because of security problems, I though it was a good time, to write a little script for automatically updating multiple instances. You can find the ruby script at http://gist.github.com/285219.

The ruby script basically automates the upgrade instructions from the DokuWiki main page. So the following actions are performed when executing the script:

  1. Making a backup into /tmp/dokuwiki_backup_#{timestamp} of every installation.

  2. Downloading the dokuwiki release (passed in as a parameter).

  3. Extracting the files and copying everything to the installations (execept for the content of the /data directory).

  4. Creating missing folders in the /data directory, making the owner www-data:www-data and chmodding them to 664.

  5. Deleting files from a list of file from older revisions.

Within the script you need to specifiy this snipped for setting your DokuWiki installations:

1
2
3
4
5
# Definition of existing installation
INSTALLATIONS = [
  '/path/to/docu/wiki/installation1',
  '/path/to/docu/wiki/installation2'
].freeze

Then you can call for a new release as follows:

1
/path/to/script/upgrade_dokuwiki.rb http://www.splitbrain.org/_media/projects/dokuwiki/dokuwiki-2009-12-25c.tgz

If you want to improve the script, feel free to fork me on Gist.

Speeding Up Iterative Calculation in Ruby

The domain background of my diploma thesis is sample size calculation for clinical trials - in fact simulation of those. This means I need to calculate how many probands or trial subjects are needed, to get the result with a certain error propability (I don’t mind going into detail right here).

The procedures for sample size calculation are well published, but some of those work in a manner like this:

1
2
3
4
5
6
7
8
9
def calculate_sample_size
  n = 4 # 4 is a good starting point
          # a we have 2 arms minimum 
          # and a trial with propability involved need 2 probands at least
  loop do
    return n if fullfills_a_some_mathematical_condition(n)
    n += 1
  end
end

So we need to return the smallest integer fulfilling a certain condition and we need to iterate up to this one, testing against a condition each time.

The above works, but has a big problem: I’m doing it for at least 10 million times, usually around 100 million times. Caching is not an option here, as we are dealing with stocastic we don’t have deterministic parameters (those parameters are hidden in the code above).

But, going back to my computer science books, I found ‘binary search’ which I applied for this - of course with a Ruby closure:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
module IterationHelper
  # This is the entry point for the iteration
  def iterate_up start = 0, step = 1000, &block
    max = start
    # First we need to find the a maximum, because in theory we have infinite integers
    # I define the step parameters, as for some sample size calculation, there is a way to guess those to
    #   some extend.
    # I could of course start with Fixnum::MAX but I in my case this is faster
    until(yield(max))
      max += step
    end

    # Now that we know the range where to look for the smallest match
    find_smallest_satisfier(max - step, max, &block)
  end

  private
  # This works nearly exactly as textbooks binary sort
  def find_smallest_satisfier min, max, &block

    # If min and max differ less than 2, we have found the smallest n
    if max - min < 2
      return min if yield(min)
      return max
    end

    # Get the value in the middle
    n = min + (max-min)/2
    # If yield(n) => value might be smaller
    if yield(n)
      return find_smallest_satisfier(min, n, &amp;block)
      # Unless yield(n) => value might be bigger
    else
      return find_smallest_satisfier(n, max, &amp;block)
    end
  end
end

This leeds to a much cleaner call like this:

1
2
3
4
include IterationHelper
def sample_size
  iterate_up {|n| fullfills_a_some_mathematical_condition(n) }
 end

But - i did say - I didn’t do this for cleaner calls. So I did a litte benchmark:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
require 'lib/helpers/iteration_helper'
include IterationHelper

require 'benchmark'
include Benchmark

N = 10**3

def complex_calculation n
  # some dummy calculations
  10.times {Math.sqrt Kernel.rand}
  n >= @n_0
end
@targets = [25, 105, 234, 500, 765]

bm(7) do |x|
  x.report("normal") do
  @targets.each do |t|
    @n_0 = t
      N.times do
        m = 4
        loop do
           break if complex_calculation(m)
           m += 1
        end
      end
    end
  end

  x.report("binary") do
    @targets.each do |t|
        @n_0 = t
        N.times do
        iterate_up(4) {|n| complex_calculation(n)}
      end
    end
  end
end[code]
The result was very impressing:

[code] user     system      total        real
normal  12.210000   0.030000  12.240000 ( 15.631090)
binary   0.520000   0.010000   0.530000 (  0.778788)

You should - of course - expect the iteration do do better, when the actual satisfaction test needs less mathematics, but perform worse if the mathematics are more time consuming (which mine are).

Argument Fun With Ruby

As some of you might know, my diploma thesis is on simulation with the help of an internal ruby DSL. This DSL will be - at least I hope so - full of small little interesting things, making the life of those who use the DSL (i.e. biometricians) easier. This requires some interesting stuff on ruby meta programming, that will be posted - as done -  in this blog. One of them is a thing I call “multiple parameters”. So what is it all about?

First of all we have a method in the DSL:

1
2
3
4
5
6
7
8
simulate do
# Some other definition ...
arms do
treatment N([0, 0.1, 0.3], 1)
placebo N(0,1)
end
# Some more definitions ...
end

The intersting thing happens when you call the N function. If you just call it with simple numeric arguments, you get back a normal distributed sampler. If you call it with an array as one or more arguments, you get back the multiple normal distributed samples - constructed from the cartesian product of all params. This means, the treatment will get the Array as if you called:

1
[N(0,1), N(0.1,1), N(0.3,1)]

The idea behind this is, that if you specify those multiple parameters for the simulation, the simulation is run multiple times - each time with a sligly different set of definitions.

A direct implementation wouldn’t be to hard - at least we are doing ruby. But as this is not only related to the N method, but to a lot of other things as well, I wanted to extract this rather into a model so I could write the N method something like this:

1
2
3
4
5
6
7
module DistributionHelper
  extend MultipleParameters
  def N mean, variance
    Distribution::Gauss.new(mean,variance)
  end
  mutate_params :N
end

The MultipleParameters module is implemented following the path, Dave Thomas led in his 5th Episode of “The Ruby Object Model and Metaprogramming” for memorization. I’ll try to explain the module but first have the implementation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
module MultipleParameters
  def mutate_params(name)
    original_method = instance_method(name)

    define_method(name) do |*args|
      mutated_arguments = [[]]
      args.each do |arg|
        if arg.is_a?(Array)
          new_results = []
          mutated_arguments.each do |r|
            arg.each do |a|
              rm = r.clone
              rm < < a
              new_results << rm
            end
          end
          mutated_arguments = new_results
        else
          mutated_arguments.each { |r| r << arg  }
        end
      end

      bound_method = original_method.bind(self)
      ret = []
      mutated_arguments.each do |mutated_args|
        ret << bound_method.call(*mutated_args)
      end

      (ret.size == 1) ? ret.first : ret
    end
  end
end

So this is a class-level module (i.e. you need to get access to class methods rather than object methods) so you extend your class/module with it rather than including it. After having done this, you can call the mutate_params method for any previously defined method. When called, your originally defined method will get boxed into an object (line 3) and it will be redefinded. The redefined method will create the cartesian product of the argumenes (i.e. ([0,1], [1,2]) yields to (0,1), (0,2), (1,1) and (1,2)) and than create an array of all the results from the calls. Before beeing able to call the originally defined method you have to bind the method to the current self (line 24), because the method will be called as an object method, and than passing all different parameter sets to it.

Finally you have an array of all return values which is returned. The last line is somewhat of convinience as it returns the object directly if you did not specify any multiple parameters.

Note: This approach does not work, if you original method needs an array to be passed. I will probably extends the approach in the future to work with ranges as well.

Print-Version Anzeigen Mit Greasemonkey Und JQuery

Ich habe in letzter Zeit viel mit JQuery gearbeitet und als erste Javascript-Bibliothek habe ich bei JQuery das Gefühl, dass man richtig schönen Code schreiben kann.

Nun trug es sich aber zu (;-)), dass heise.de sein Design geändert hat. Mich hat deren Design schon vorhher immer genervt, wenn ich aus meinem RSS-Reader auf die einzelnen Artikel bin. Aber jetzt war definitiv die Schmerzensgrenze überschritten. Um nicht alle Elemente nachstylen zu müssen (was ja auch sehr fragil ist, da die ids und classes ja jederzeit geändert werden können), habe ich mir überlegt, das ich erstmal die Styles der Printversion nehme. Das ging erstaunlich einfach:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// Erstmal JQuery ins Dokument einführen
// Der Code zum einbinden von JQuer ist ist von:
// http://www.joanpiedra.com/jquery/greasemonkey/ und ist unter MIT-Lizenz gestellt
// Kommentare sind angepasst.
var GM_JQ = document.createElement('script');
GM_JQ.src = 'http://jquery.com/src/jquery-latest.js';
GM_JQ.type = 'text/javascript';
document.getElementsByTagName('head')[0].appendChild(GM_JQ);

// Warten bis JQuery nachgeladen ist
function GM_wait() {
if(typeof unsafeWindow.jQuery == 'undefined') { window.setTimeout(GM_wait,100); }
else { $ = unsafeWindow.jQuery; letsJQuery(); }
}
GM_wait();

// Das normale Greasemonkey Skript, nur jetzt mit JQuery
function letsJQuery() {
// Lösche alle Stylesheets die nicht media=print sind
$('link[rel=stylesheet]').not('[media=print]').remove();
// Ändere das media von print auf screen
$('link[rel=stylesheet]').filter('[media=print]').attr('media', 'screen');
}

Ich habe das Skript speziell für heise.de noch etwas erweitert … wen das interessiert, der möge mich ansprechen.

EURUKO 2008

Wie der eine oder andere von euch mitbekommen hat, bin ich dieses Wochenende auf der European Ruby Conference in Prag gewesen. Damit die Banausen, die zu Hause geblieben sind auch ungefähr mitbekommen, was sich dort so abgespielt hat, hier eine kurze Zusammenfassung von mir:

Ruby 1.9

Matz und Koichi Sasada haben das neue Ruby 1.9 vorgestellt. Und einen Überblick darüber gegeben, was war, was ist und was noch kommen wird. Neben den technischen Details (die man übrigens sehr gut auch aus dem Google TechTalk Vortrag von Matz entnehmen kann) hat Matz vorallem herausgehoben, dass für ihn die Innovation der Sprache Ruby sehr wichtig ist. Er möchte nicht mit dem zufrieden sein was ist, sondern möchte weiter gehen und Ruby weiter entwickeln. Weiterhin ist natürlich zu erwähnen: Ruby programming is fun …

JRuby

Für mich war JRuby die absolute Überraschung. Bisher hat man sich immer gefragt, was eigentlich JRuby soll - nun im Prinzip ist JRuby dazu da, Ruby Code zu schreiben und gleichzeitig Java-Klassen zu verwenden. Whant so see some? (Nach Installation via apt-get install jruby1.0 und interaktiv, also jirb1.0)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
require 'java'
include_class 'javax.swing.JFrame'
include_class 'javax.swing.JLabel'
include_class 'java.awt.event.ActionListener'
include_class 'javax.swing.JOptionPane'

frame = JFrame.new "Das JRuby Fenster"
frame.setSize 400,200
# Überlagerter Setter von
# frame.set_default_close_operation(JFrame::EXIT_ON_CLOSE)
frame.default_close_operation=JFrame::EXIT_ON_CLOSE
label = JLabel.new "Hello World"
frame.add label

frame.visible= true

Dinge wie obejct.method etc gehen natürlich auch, sodass man super neuer Java-Bibliotheken über jruby ausprobieren kann, selbst wenn man sie ‘nur’ für Java selbst verwenden will. Übrigens wurde die 1.1 Release während der Konferenz bekannt gegeben.

Weitere Vorträge

Wurden auch gehalten. Alle waren sehr interessant und haben vorallem eines gezeigt: Mit Ruby kann man verdammt viel mehr machen, als bloß Rails. Und dass heißt ja nun wirklich etwas … Ich fands jedenfalls sehr cool (schon alleine wegen der T-Shirts ;-))

Die Folien sollten in Kürze http://www.euruko2008.org zu finden sein.