Clusterpunch: a distributed mini-benchmark system for clusters

Documentation—clusterpunch.conf

clusterpunch.conf - configuration file for clusterpunch


NAME

clusterpunch.conf - configuration file for clusterpunch


SYNOPSIS

  # general parameters
  param1 = value
  param2 = value

  # sort methods - multiple blocks
  <sort>
   statistic = value
   sort = ascending | descening
   format = value
  </sort>
  ...
  <sort>
   statistic = value
   sort = ascending | descening
   format = value
  </sort>

  # punches - multiple blocks
  <punch>
  name = value
  statistic = value
  cumulative = value
  valuemap = function
  valuetype = timer | return
  appendargs = true
  format = value
  sort = ascending | descending
  function << CODE
  ...perl code here...
  CODE
  </punch>

 ... more punches here


DESCRIPTION


Reading Configuration Files

By default, all clusterpunch daemon and utility programs look for a configuration file in the following locations

  ~/.clusterpunch
  ../etc/clusterpunch.conf (relative to location of binary)
  /usr/local/etc/clusterpunch.conf
  /etc/clusterpunch.conf

An attempt is made to read all files. If a parameter is defined in /etc/clusterpunch.conf, it will override the same parameter defined in other files. Parameters defined in blocks (sorts and punches) are overwritten, but appended to the global list of sorts and punches. Thus, it is possible to define a core network list of punches in /usr/local/etc/clusterpunch.conf and then define host-specific punches in /etc/clusterpunch.conf

To read in a configuration file use the -f configfile flag for all daemon and utility scripts. For example,

  clusterbench -f /path/to/file.conf

If a custom file is found, no other files are read.


General Parameters

General parameters define how clusterpunch works across the network. Each parameter is defined using

  parameter = value

The following parameters are possible

logdir = /path/to/logidr

Directory to which each clusterpunchserver will write its logfiles to. This directory must exist and have the right permissions.

logging = true | false

Toggles logging.

verbose = true | false

Controls presence of STDOUT messages produced by clusterpunchserver. If the server is running in the background, this value has no effect.

daemon = true | false

Controls whether the server is started in the background by default

debug = true | false

Controls debug output

port = PORT

Specifies the UDP port to listen to and to send punches over

broadcast = BROADCAST_ADDRESS

Specifies the UDP broadcast address (e.g. 10.1.2.255) to use to announce punches

timeout = TIMEOUT

Specifies the number of seconds to wait for responses from servers before client utilities stop listening.


Sort Types

In order to sort values by the results of punches, it's necessary to store how each punch should be treated. For example, sometimes high values are desireable (MHz) and sometimes low values are better (benchmark times). Generally, the ways a value should be treated are controlled using

  sort = ascending | descending
  alphasort = true | false

There are two punches which are sent to the server by default. These are the 'host' and the 'live' punch. The 'host' punch simply retrieves the host name and the 'live' punch simply asks the host for a 1. Since these punches are not defined in the <punch> blocks, their sorts are defined in the <sort> blocks.

In addition, any cumulative statistics that you define in the <punch> blocks should have their sorts defined in the <sort> block. A <sort> block looks like

  <sort>
   statistic = live
   sort = ascending
   format = %4d
  </sort>

The parameters understood by this block are

statistic = VALUE

The VALUE is the name of the punch or statistic for which the sort applies.

sort = ascending | descending

By default, sorts are ascending. If you want an ascending sort, define it with 'sort = ascending'.

format = FORMAT

The way the results are displayed are controlled by the printf-like FORMAT

alphasort = true | false

If you want the sort to be asciibetical, set alphasort to be true. The default is a numerical sort.


Introduction to Punches

The flexibility in clusterpunch lies in its ability to accept user-defined punches. You can define punches by specifying the Perl code, or specifying an external binary to run. The latter case is useful if you have your own benchmark tools.

A punch is defined in a <punch> block like this

  <punch>
   parameter = value
   parameter = value
   ...
  </punch>

A punch can be thought of like ... well, a punch. Think of a sweaty boxer. The idea is that you 'punch' your cluster nodes and see how 'quickly they get up'. Ok, enough with the metaphors. The punch is a mini-benchmark. It's supposed to be mini- so that you don't use up your CPU cycles only for benchmarks and so that you minimally affect other running jobs.

Punches can return values, like diagnostic punches. These are nothing more than requests for information. For example, a punch might ask the node to return its total MHz rating. The benchmark punches, on the other hand, are meant to be timed. They don't return any values and the sole purpose of running them is to see how responsive your nodes are at any time.


Sample Punches

Here is a simple punch

  <punch>
   name = punch1
   statistic = bench1
   valuetype = timer
   format = %6.2f
   function <<CODE
     for (my $i=0;$i<1e6;$i++) { rand () }
   CODE
  </punch>

This punch is called 'punch1'. The name specifies the function which is used to call the punch from utility scripts. If you wanted to have the nodes execute this punch you would use

  -c "punch1"

The 'statistic', on the other hand, is the label of the data value returned. Since 'valuetype' is set to timer, the return value will be the time taken to run this punch. The code to run is defined by way of a here-document with 'function' as the name of the parameter. The 'format' parameter defines the way the punch value will be formatted on output to the screen. In this punch, rand() is called 1,000,000 times. For example, running

  > clustersnapshot -c "punch1"

asks all the nodes to run this punch and returns

        host bench1 live
        0of8  0.406    1
        5of8  0.407    1
        1of7  0.407    1
        3of8  0.407    1
        5of7  0.407    1
        9of7  0.407    1
        ...

Notice that the return values are formatted with %6.2f and they are sorted with a descending numerical sort (default sort type). The 'host' and 'live' punch results are displayed, since these are default punches that are always carried out.

Within the Perl code, you can accept input parameters using something like this

  function <<CODE
  my ($x,$y) = @_;
  $x || = 5;
  $y || = 10;
  ....
  CODE

and then call the punch with

  -c "punchname(10,15)"

so that x=10 and y=15. In the code definition above, (x,y) get the default values (5,10) if they are not defined or zero.

You can define punches which make a system call using a punch like this

  <punch>
  name = punch2
  statistic  = cat
  valuetype  = timer
  system = "/bin/cat /home/martink/work/glimpse/mail/.glimpse_index | wc"
  </punch>

The command defined by 'system' will be called and timed.


Punches in Detail

The following parameters are supported within punch blocks.

name = NAME

This defines the name of the punch. The name must be unique and is used to call the punch from commands passed to utilities. If you have punch named bob, you can call it with

  -c "bob(arg,arg);jerry"

with arguments as shown. If you also have a jerry punch defined, separate them using ;. If bob does not take arguments, use -c ``bob''.

namegroup = NAME1,NAME2,NAME3,...

If you would like to split the punch value into multiple statistics, use namegroup and valuedelim. The namegroup parameter defines the names of the statistics which are populated with the punch_value, split along valuedelim. For example, if the punch return value is ``1:2:3'' then if

  namegroup = val1,val2,val3
  valuedelim = :
 
the following statistics will be populated with values

  val1 = 1
  val2 = 2
  val3 = 3

If you have more values than statistics, any unassigned values will be discarded. The namegroup/valuedelim feature is useful when calculating a list of values is just as fast as calculating a single value. For example, using a load monitor tool like 'atop', or 'atsar' to calculate CPU utilization, you need to spend, for example, 2 seconds to measure CPU sys/idle/nice/user utilization. However, at the end of 2 seconds you get all four values which really belong in separate statistics. If you were to calculate each one independently, you'd need 8 seconds.

statistic = STATISTIC

The statistic is the key in the response hash populated by the value of the punch. The statistic will show up in the table produced by clustersnapshot. The statistic is also used to sort the table.

cumulative = CUMULATIVE

Sometimes you want to create virtual statistics, which are sums of other punches. For example, the benchmarks for memory, I/O and CPU subsystems have their own statistics (b_mem, b_io, b_cpu) but also add their values to b_all. Since the cumulative statistic is not defined as a punch statistic, you have to define its sort methods in a <sort> block (see above).

You can have multiple cumulative statistics, but only one per punch. Cumulative statistics are not implemented for punches which use namegroup.

valuemap = FUNCTION

This is a fun parameter. You can map the punch value to another value using a function. For example, suppose your punch returns (or is timed) and the value is 10. If you want to return the log of this value, use

  valuemap = return log($_[0])

Within the FUNCTION, the punch value is available as $_[0].

valuetype = timer | return

If you want the punch timed, use 'timer'. If you want to use the return value of the punch code to be used as the punch value, use 'return'. Typically, benchmark punches are timed and diagnostic punches return values. Whatever the value, timed or returned, you can modify it with 'valuemap' described above.

valuedelim = STRING

If your punch returns a list of values concatentated with STRING, you can use namegroup to store these values in different statistics. Specify the delimiter to use to split the punch return string with 'valuedelim'.

format = FORMAT

The value of the punch will be formatted using FORMAT, which is expected to have a printf-type syntax.

sort = ascending | descending

Associate a sort order with the punch statistic. By default, all sorts are ascending. If you want an ascending sort, you don't need to specify the sort type.

alphasort = true | false

By default all sorts are numerical. If you want to sort by asciibetical order, use 'alphasort = true'.

appendargs = true | false

You can append the arguments passed to a punch to its statistic by setting appendargs to true (default is false). This is useful when you are calling the same punch with different parameters in the same call. For example, load() is a punch which can take parameters to sample 1 min, 5 min and 15 min loads. The load punch statistic is 'load'. If you do not append the arguments to the statistic name, you'll overwrite the values and get the load value from the last load punch.

   bin/clustersnapshot -c "load;load(5);load(15)"
          host live load load15 load5
          0of0    1  0.1   0.18  0.20
          0of1    1  0.0   0.00  0.00
function << CODE ... CODE

Using a here-document syntax, you can define the Perl code to be executed in a punch. The code must be free of syntax errors! Please use the benchdriver utility to test your punches. Within the code you have the option of calling a logging function.

  ...
  Log("message to log");
  ...

If you want to function to accept arguments, please use @_.

  name = argpunch
  function <<CODE
  my ($x,$y) = @_;
  ...
  CODE
  
then call your punch with

  -c "argpunch(10,20)"
system = COMMAND

Instead of executing Perl code, you can make a call to the system COMMAND. This is useful if you have your own binary which you want to time.

  valuetype = timer
  system = "/bin/specialbench -param 10 &> /dev/hull"

If the punch 'valuetype' is specified as 'timer' then the call to the binary will be automatically timed. If you specify 'return', however, you'll get back whatever the binary output to STDOUT. Don't pipe to /dev/null in that case!

  valuetype = return
  system = "who | wc -c"

Any leading spaces in the all output lines will be stripped. New lines will be preserved, except for the last new line.


SEE ALSO


Daemons

clusterpunchserver, clusterpunch.start, clusterpunch.shutdown


API

clusterpunch.pm


Utilities

benchdriver, clusterbench, clusterlogin, clusternodecount, clustersnapshot


CHANGES


AUTHOR

Martin Krzywinski (martink@bcgsc.ca) January 2003

$Id: clusterpunch.conf,v 1.6 2003/02/04 00:18:34 martink Exp $