Clusterpunch: a distributed mini-benchmark system for clusters

Punches

Punches are defined in a configuration file, as described in the documentation for clusterpunch.conf. This section documents the punches that come with the distribution. Some of the diagnostic punches are designed to work with a RedHat, or RedHat-like Linux distribution. You can always alter and create your own punches to suit your environment. The punches shown here are not the best punches - let me know if you come up with something better/interesting. If you create some punches for your specific OS/architecture, also let me know so that I can post these for others using the same platform.

Example Punches

punch = punch1

This punch makes and times 1,000,000 calls to rand() and populates the bench1 statistic with time taken to execute the code.

<punch>
name = punch1
statistic = bench1
valuetype = timer
format = %6.3f
function <<CODE
for (my $i=0;$i<1e6;$i++) { 
  rand ();
}
CODE
</punch>

The output below, and for each punch was obtained by using clustersnapshot with the arguments -c PUNCHNAME -s STATISTIC. The command was run on our cluster and I show the nodes at the top, middle and bottom of the listing sorted by the punch value.

         host bench1 live
        2of8  0.406    1
        7of7  0.406    1
        ...
        8of4  0.450    1
        5of3  0.450    1
        ...
        0of0  0.774    1
        0of1  1.162    1

punch = punch2

This is an example punch which employs a call to a system command. In this case the cat statistic is populated and the value is the time taken to show a directory listing of all entries under /etc.

<punch>
name = punch2
statistic  = cat
valuetype  = timer
system = "/bin/ls -alR /etc &> /dev/null"
</punch>
        host cat live
        3of8 0.086979    1
        8of8 0.094786    1
	...
        9of1 0.143933    1
        8of1 0.144877    1
	...
        0of0 0.657006    1
        0of1 2.62294    1

Benchmark Punches

Benchmark punches contain code which is timed on each node. The execution times are then used to provide relative ranking of the nodes.

punch = benchmem

This punch is meant to determine the speed of the memory subsystem bu repeatedly allocating/deallocating a large array. This punch permits arguments, M and N, and allocates/deallocates an array of N elements M times. By default N=1,000,000 and M=20. This punch contributes to the cumulative statistic b_all. The value filter defined by valuemap does nothing, and is here as an example of how this feature could be used.

<punch>
name = benchmem
statistic  = b_mem
cumulative = b_all
valuemap   = return $_[0]
valuetype  = timer
format = %6.3f
sort = ascending
function <<CODE
  my ($M,$N) = @_;
  $M ||= 20; 
  $N ||= 1e6;	
  my @array;
  foreach my $idx (1..$M) {
    $array[$idx] = [];
    $array[$idx]->[$N] = 0;
    $array[$idx] = undef;
  }
CODE
</punch>
       host  b_all  b_mem live
        7of8  0.791  0.791    1
        1of7  0.791  0.791    1
	...
        5of1  0.933  0.933    1
        5of0  0.933  0.933    1
	...
        2of1  1.242  1.242    1
        0of1  2.774  2.774    1

punch = benchio

This is a test of the I/O system using dd. A file of size kbytes is written M times to /tmp. It is assumed that the block size on your system is 512 bytes, so the number of blocks written is 2*kbytes and that you have /bin/dd and that there is enough space in /tmp to do the operation. The filename has a randomized name and is deleted immediately after creation. By default a single 60MB file is created during this punch.

<punch>
name = benchio
statistic = b_io
cumulative = b_all
valuetype = timer
format = %6.3f
function <<CODE
  my ($kbytes,$M) = @_;
  $kbytes ||= 60e3;
  $M ||= 1;
  foreach (1..$M) {
    my $randfilename = join("",map {chr(97+int(rand(25)))} (0..20));
    my $count = 2*$kbytes;
    system("/bin/dd if=/dev/zero of=/tmp/$randfilename count=$count &> /dev/null");
    unlink("/tmp/$randfilename");
  }
CODE
</punch>
        host  b_all   b_io live
        0of0  0.073  0.073    1
        2of2  0.637  0.637    1
	...
        9of3  1.460  1.460    1
        4of4  1.461  1.461    1
	...
        1of1  4.077  4.077    1
        9of1  4.123  4.123    1

punch = benchcpu

To benchmark the CPU, N calls to trascendental and power functions are called. There are multiple entries within the loop to try to minimize the overhead of looping. By default the loop is repeated 100,000 times. I'm aware that this is not a rigorous CPU benchmark, any more than the above memory and I/O benchmarks are rigorous. I've found this to be sufficient enough to rank P3 CPUs.

<punch>
name = benchcpu
statistic = b_cpu
cumulative = b_all
valuetype = timer
format = %6.3f
function <<CODE
  my ($N) = @_;
  $N ||= 1e5;
  foreach (0..$N) {
    sin($_/$N)**2*cos($_/($N+1))**2;
    sin($_/($N+1))**2*cos($_/($N+2))**2;
    sin($_/($N+2))**2*cos($_/($N+3))**2;
    sin($_/($N+3))**2*cos($_/($N+4))**2;
  }
CODE
</punch>

In the example below, I've used -c "benchcpu;mhz;load" -s "b_cpu" as the parameters to clustersnapshot to show how the CPU benchmark results relate to the current load and MHz rating.

        host  b_all  b_cpu live load   mhz
        5of7  0.514  0.514    1  1.0  2792
        5of8  0.514  0.514    1  0.0  2792
        7of7  0.514  0.514    1  0.0  2792
 	...
        9of3  0.570  0.570    1  1.1  2522
        5of3  0.571  0.571    1  1.0  2522
        6of3  0.571  0.571    1  1.1  2522
	...
        7of2  0.777  0.777    1  1.0  1992
        3of3  0.791  0.791    1  3.1  2522
        0of1  1.321  1.321    1  2.2  1992

Increasing the number of loops in benchcpu from 100,000 to 1,000,000 by calling -c "bencpu(1e6);mzh;load" -s "b_cpu" yields similar relative rankings, but now the benchmark takes about 5 seconds on each node.

        host  b_all  b_cpu live load   mhz
        7of7  5.132  5.132    1  0.2  2792 (up from #3)
        0of8  5.132  5.132    1  0.2  2792 (up from #4)
        5of8  5.132  5.132    1  0.1  2792 (down from #2)
        ...
        6of3  5.678  5.678    1  1.2  2522
        1of4  5.679  5.679    1  2.2  2522
        3of4  5.682  5.682    1  2.2  2522
        9of3  5.684  5.684    1  1.2  2522
        8of3  5.688  5.688    1  1.2  2522
        5of4  5.689  5.689    1  0.1  2522
        5of3  5.689  5.689    1  1.2  2522
	...
        2of1  7.819  7.819    1  2.5  1992
        0of1  8.181  8.181    1  3.1  1992
        0of0 11.606 11.606    1  3.0  1992

Diagnostic Punches

Diagnostic punches return values pertaining to the hardware profile, status and configuration of the nodes. Their return values can be used to rank the nodes on an absolute scale.

punch = mzh

This punch uses information from the /proc filesystem to provide a sum of the MHz speed across all CPUs in the node. It's expected that each CPU will have an entry in /proc/cpuinfo with a line like

cpu MHz		: 1261.416

showing the MHz speed. The punch uses valuetype = return since we don't want the time for execution but the return value of the code to be the punch value. The sort type of the punch is set to sort = descending because larger values are associated with higher ranks - we want more MHz!

<punch>
name = mhz
statistic = mhz
valuemap = return $_[0]
valuetype = return
format = %5d
sort = descending
function <<CODE
  return 0 if ! -e "/proc/cpuinfo";
  my $cpuinfo = `cat /proc/cpuinfo`;
  my $MHz;
  while($cpuinfo =~ /MHz\s*:\s*(\d+)/g) {
    $MHz += $1;
  }
  return $MHz;
CODE

punch = load

This punch returns the system load average derived from /proc/loadavg. By default the 1-minute load average is returned, but you can call the punch with load(N) for N=1,5,15 to get the 5 and 15 minute averages.

<punch>
name = load
statistic = load
valuetype = return
appendargs = true
format = %4.1f
function <<CODE
  my ($time) = @_;
  my $loadavg;
  if(-e "/proc/loadavg") {
    $loadavg = `cat /proc/loadavg`;
    chomp $loadavg;
  } else {
    $loadavg = "- - -";
  }
  my %load;
  @load{1,5,15} = split /\s+/,$loadavg;
  if(defined $time && defined $load{$time}) {
    return $load{$time};
  } else {
    return $load{1};
  }
CODE
</punch>

Using clustersnapshot -c "load" -s "load"

        host live load
        5of4    1  0.0
        8of0    1  0.0
	...
 	4of7    1  1.1
        9of2    1  1.1
	....
        1of5    1  3.1
        4of3    1  3.1

You'll notice that the punch populates the statistic = load. What if you ask for two loads? Using clustersnapshot -c "load;load(5);load(15)" -s "load"

        host live load load15 load5
        5of2    1  0.0   0.00  0.02
        6of0    1  0.0   0.00  0.00
	...
        6of8    1  1.0   0.90  1.01
        8of8    1  1.0   0.91  1.00
	...
        2of4    1  3.0   2.91  3.01
        1of5    1  3.0   2.91  3.03

You'll notice that the arguments for the 5- and 15-minute average punches were appended to the statistic so that you can distinguish between the loads. Because the 1-minute load punch was called with load and not load(1) the 1-minute statistic remains the same (these two calls are equivalent because the default load returned is the 1-minute average). The arguments are appended to the statistic when appendargs = true.

punch = uptime

To get the uptime of the node, use this punch. Your /proc/stat needs to have the line

btime 1043184875

for this punch to work. The value returned by the punch is in units of days.

<punch>
name = uptime
statistic = uptime
valuetype = return
format = %7d
sort = descending
function <<CODE
  my $stat = `cat /proc/stat`;
  if($stat =~ /btime (\d+)/s) {
    my $boottime = $1;
    my $uptime = (time-$boottime)/3600/24;
    return $uptime;
  } else {
    return -1;
  }
CODE
</punch>

punch = nusers

Counts the number of users, not necessarily unique, logged into the node as reported by /usr/bin/who.

<punch>
name = nusers
statistic = nusers
valuetype = return
format = %6d
function <<CODE
  return -1 if ! -e "/usr/bin/who";
  my $nwho = `/usr/bin/who | wc | tr -s " " | cut -d " " -f 2`;
  chomp $nwho;
  return $nwho;
CODE
</punch>

punch = jobusers

Reports the total CPU time of jobs in the process table a per-user basis. Any job not owned by system daemons, as listed in the punch code, is used towards the total count. It's expected that the command

ww
ps auxww | tr -s " " | cut -d " " -f 1,10

will return something like

root 8:06
bob 0:30
ntp 0:00

For each user, the CPU times for their jobs is added and the output format is user:time where time is in minutes. Thus below, phuang's jobs on 7of8 have been running for 1.98 minutes.

name = jobusers
statistic = jobusers
valuetype = return
format = %25s
function <<CODE
    my $ps = `ps auxww | tr -s " " | cut -d " " -f 1,10`;
    my @ps = split(/\n/,$ps);
    chomp @ps;
    my %users;
    my @stopusers = qw(USER bin daemon nobody root xfs rpcuser rpc ntp lp); 
    foreach my $psline (@ps) {
	my ($user,$time) = split(/ /,$psline);
	next if grep($user eq $_, @stopusers);
	my ($min,$sec) = split(/:/,$time);
	my $totmin = $min*60+$sec;
	$users{$user} += $totmin;
    }
    map {$users{$_} /= 60 } keys %users;
    my @report;
    map { push(@report,join(":",$_,sprintf("%.2f",$users{$_}))) } sort keys %users;
    return join(",",@report);
CODE
</punch>
        host                  jobusers live
        7of7 acherk:0.03,jliu:0.00,martink:0.00,srusaw:0.18    1
        7of8 jliu:0.02,martink:0.00,phuang:1.98,srusaw:0.20    1

punch = date

Returns the current time on the node in HH:MM:SS format.

<punch>
name = date
statistic = date
valuetype = return
format = %8s
function <<CODE
    use POSIX qw(strftime);
    my $format = $_[0] || "%H:%M:%S";
    my $timestamp = strftime $format, localtime;
    return $timestamp;
CODE
</punch>

punch = nrunning

Returns the number of currently running processes as reported by /proc/loadavg.

0.00 0.00 0.00 1/260 22710

The number in bold in the line above is used.

<punch>
name = nrunning
statistic = nrunning
valuetype = return
format = %8d
function <<CODE
  my $loadavg = `cat /proc/loadavg`;
  if($loadavg =~ /(\d+)\/(\d)/) {
    return $1;
  } else {
    return -1;
  }
CODE
</punch>

punch = kernel

Returns the kernel on the node via uname -r

<punch>
name = kernel
statistic = kernel
valuetype = return
format = %25s
function <<CODE
chomp(my $kernel = `uname -r`);
return $kernel
CODE
</punch>
        host                    kernel live
        0of0          2.2.14-VA.2.1smp    1
        0of1         2.2.18pre11-va2.1    1
        0of2          2.2.14-VA.2.1smp    1

punch = mem

Reports on the amount of free/used memory/swap on the node. You can call this punch using mem or mem(free) to see the amount of free memory as well as mem(total), mem(used), mem(swapused) and mem(totalwswap) to see the total memory, used memory, used swap and total memory with swap. All values are returned in MB.

<punch>
name = mem
statistic = mem
valuetype = return
format = %6d
appendargs = true
valuemap = $_[0]/1024;
function <<CODE
my ($arg) = @_;
my @lines = map {[split(/\s+/,$_)]} grep ($_ =~ /\d/,split(/\n/,`free -o`));
if(! $arg || $arg eq "free") {
	return $lines[0]->[3];
} elsif ($arg eq "total") {
	return $lines[0]->[1];
} elsif ($arg eq "used") {
	return $lines[0]->[2];
} elsif ($arg eq "swapused") {
	return $lines[1]->[2];
} elsif ($arg eq "totalwswap") {
	return $lines[1]->[1]+$lines[0]->[1];
}
CODE
</punch>