Carpalx optimizes keyboard layouts to create ones that require less effort and significantly reduced carpal strain!

Have ideas? Tell me.

# the best layout

Partially optimized QWKRFY and fully optimized QGMLWY layouts are the last word in easier typing.

# the worst layout

A fully anti-optimized TNWMLC layout is a joke and a nightmare to type. It's also the only keyboard layout that has its own fashion line.

# layouts

25 Oct 21 — Added vertical and horizontal alphabetic layouts to the layouts analysis.

19 Mar 21 — Added BEAKL 15, Hieamtsr, Colemak Mod-DH and Mtgap 2.0 layouts to the layouts analysis.

15 Mar 21 — Added the Engram layout by Arno Klein to the layouts analysis.

6 Aug 20 — The search for the world’s best keyboard layout by Paul Guerin

4 May 20 — An interview with Bloomberg's Arianne Cohen Splurge on a Better Keyboard, It's Worth It.

25 May 18 — The BBC article Why we can't give up this off way of typing by Tim McDonald discusses the history and persistence of QWERTY and my Carpalx work.

16 Aug 16 — Ergonomic Keyboard Layout Designed for the Filipino Language at AHFE2016 derives layout for Filipino language using Carpalx

18 Apr 16 — Carpalx layouts soon to appear in freedesktop (package xkeyboard-config) and kbd. Thanks to Perry Thompson.

# Generating Keyboard Statistics

configuration file : etc/tutorial-01.conf
output : out/tutorials/01

## Configuration

In order to generate a better keyboard layout, it is important to be able to measure the desirability of a given layout. In this section, I'll show how to use carpalx to generate statistics for a keyboard layout (finger/hand/row frequencies, and effort values).

A variety of keyboard layouts are defined in etc/keyboards/*conf. For this example, I'll use the QWERTY and Dvorak layouts. Statistics will be generated using the English training text, which comprises a variety of books from Project Guttenberg (see corpus/).

The configuration file for this tutorial is etc/tutorial-01.conf. In it, the parameters of note are

$... action = loadkeyboard,loadtriads,reporteffort,quit ... corpus = ../corpus/books.txt mode = english triads_overlap = yes triads_min_freq = 10 ... keyboard_input = keyboards/qwerty.conf ...$

The action parameter controls what carpalx will do during the run. This is

• reporteffort report keyboard statistics and effort values (replace this with reporteffortverybrief to limit the output to typing effort only)
• quit
Other parameters in the tutorial-01.conf configuration file are discussed in other tutorials. Depending on the value of the action parameter, not all configuration sections are used (e.g. if you don't optimize layout, annealing parameters are not used).

## QWERTY Statistics

Run carpalx to carry out the actions specified in the action parameter. By default, carpalx will look for a configuration file carpalx.conf, so you'll need to override this with -conf to use the tutorial configuration.

Detailed stats take a while (1-2 minutes) to compute. You can use reporteffortverybrief in the action parameter to reduce the detail in the layout report.

$> bin/carpalx -conf etc/tutorial-01.conf > out/tutorials/01/out.txt$

The output will contain both human (for you) and machine-readable data (for your Perl scripts). The latter can be parsed into a data structure using eval() and appear in lines prefixed with "#". Let's take a look at the human-redeable sections first.

### keyboard effort

The keyboard effort is the total typing effort required to type the corpus. This value is normalized to the size of the corpus. It is formed by a sum of components that measure different aspects of typing. These components are base (b), penalty (p) and stroke (s). In turn, the penalty component is a combination of hand penalties (ph), row penalties (pr) and finger penalties (pf). Each effort is reported as three valuse: absolute, relative and cumulative.

The first lines (k1, k2 and k3) report the total base+penalty effort contributions from the k1, k2, k3 triad weight components (stroke path is not included because it does not depend on k1,k2,k3). The line k1 is the total effort when k2=k3=0. Similarly the line k1,k2 reports the effort for k3=0.

The total effort (3.000) is a useful value when comparing different layouts (you want this value to be as small as possible). Effort components (e.g. b, p, s) should be used to compare layouts of similar effort.

The model parameters have been adjusted so that the QWERTY total effort is 3, with a ratio of b:p:s of 1:1:1.

$Keyboard effort ------------------------------------------------------------ k1 1.236 61.8 61.8 k1,k2 1.817 29.1 90.9 k1,k2,k3 2.000 9.1 100.0 b 1.000 33.3 33.3 p 1.000 33.3 200.0 ph 0.000 0.0 0.0 pr 0.408 40.8 40.8 pf 0.408 40.8 81.7 s 1.000 33.3 100.0 all 3.000 100.0 100.0$

### utilization frequencies

Frequency tables indicate the number of times a given row, hand or finger was used in typing the corpus.

Frequency tables are useful to determine whether they keyboard is globally balanced, across the entire length of the corpus. For example, you would probably prefer a layout that limits the use of the pinky fingers (indexed 0,9).

$keyboard row frequency ------------------------------------------------------------ 1 4666364 51.0 51.0 2 3142011 34.3 85.3 3 1342790 14.7 100.0 keyboard hand frequency ------------------------------------------------------------ 0 5274593 57.6 57.6 1 3876572 42.4 100.0 keyboard finger frequency ------------------------------------------------------------ 0 764404 8.4 8.4 1 805304 8.8 17.2 2 1737866 19.0 36.1 3 1967019 21.5 57.6 6 1915521 20.9 78.6 7 717450 7.8 86.4 8 1087727 11.9 98.3 9 155874 1.7 100.0$

### utilization run lengths

The next set of tables are the run length tables, and these are useful to characterize the keyboard locally by reporting number of consecutive uses of the same hand, finger, and row.

First, let's look at the left hand, right hand and hand (either left or right) run lengths. The values here show the number of times the same hand was used consecutively. For a balanced layout, hands alternate, with the majority of strokes using a hand for at most two consecutive strokes.

In, QWERTY the left hand is used for <=2 consecutive strokes 68.5% of the time. The left and right sides of QWERTY are not balanced well, since the right-hand run length is better, with <=2 consecutive strokes 82.9% of the time. When run length is not categorized by hand identity, QWERTY has a <=2 hand run length of 75.7% (this means that 100% - 75.7% = 24.3% of the time the same hand is used for three or more strokes in a row).

$keyboard left hand run length ------------------------------------------------------------ 1 1027393 42.1 42.1 2 642888 26.4 68.5 3 346685 14.2 82.7 4 194523 8.0 90.6 5 104606 4.3 94.9 6 59547 2.4 97.4 7 30827 1.3 98.6 8 15744 0.6 99.3 9 8116 0.3 99.6 10 4217 0.2 99.8 11 2414 0.1 99.9 12 1268 0.1 99.9 13 711 0.0 100.0 14 349 0.0 100.0 15 223 0.0 100.0 16 124 0.0 100.0 17 43 0.0 100.0 18 42 0.0 100.0 19 19 0.0 100.0 20 10 0.0 100.0 21 5 0.0 100.0 22 5 0.0 100.0 23 1 0.0 100.0 24 1 0.0 100.0 25 2 0.0 100.0 keyboard right hand run length ------------------------------------------------------------ 1 1480183 60.7 60.7 2 543223 22.3 82.9 3 268231 11.0 93.9 4 93004 3.8 97.7 5 34058 1.4 99.1 6 12889 0.5 99.7 7 5053 0.2 99.9 8 1820 0.1 99.9 9 756 0.0 100.0 10 308 0.0 100.0 11 123 0.0 100.0 12 59 0.0 100.0 13 22 0.0 100.0 14 16 0.0 100.0 15 9 0.0 100.0 16 4 0.0 100.0 22 2 0.0 100.0 31 1 0.0 100.0 keyboard hand run length ------------------------------------------------------------ 1 2507576 51.4 51.4 2 1186111 24.3 75.7 3 614916 12.6 88.3 4 287527 5.9 94.2 5 138664 2.8 97.0 6 72436 1.5 98.5 7 35880 0.7 99.3 8 17564 0.4 99.6 9 8872 0.2 99.8 10 4525 0.1 99.9 11 2537 0.1 99.9 12 1327 0.0 100.0 13 733 0.0 100.0 14 365 0.0 100.0 15 232 0.0 100.0 16 128 0.0 100.0 17 43 0.0 100.0 18 42 0.0 100.0 19 19 0.0 100.0 20 10 0.0 100.0 21 5 0.0 100.0 22 7 0.0 100.0 23 1 0.0 100.0 24 1 0.0 100.0 25 2 0.0 100.0 31 1 0.0 100.0$

Similar run length tables are generated for rows and fingers.

While hand-alternation is desirable (i.e. short hand runs should be frequent), row run lengths should be as long as possible for the home row (many consecutive uses of this row), long for the top row and short for the bottom row (I consider the bottom row to be less accessible than the top row). Here QWERTY shows this desired pattern. The top and home rows are nicely balanced (<=2 run length 78% for top and 91% for home), more importantly the bottom row has few long runs (<=2 run length 100%).

$keyboard top row run length ------------------------------------------------------------ 1 1462742 55.0 55.0 2 618309 23.2 78.2 3 326941 12.3 90.5 4 145044 5.5 95.9 5 60202 2.3 98.2 6 26909 1.0 99.2 7 11241 0.4 99.6 8 5189 0.2 99.8 9 2308 0.1 99.9 10 1119 0.0 100.0 11 496 0.0 100.0 12 230 0.0 100.0 13 90 0.0 100.0 14 41 0.0 100.0 15 23 0.0 100.0 16 6 0.0 100.0 17 5 0.0 100.0 18 3 0.0 100.0 19 2 0.0 100.0 20 1 0.0 100.0 26 1 0.0 100.0 42 1 0.0 100.0 keyboard home row run length ------------------------------------------------------------ 1 1540936 67.6 67.6 2 541616 23.8 91.4 3 140152 6.2 97.5 4 38961 1.7 99.3 5 12008 0.5 99.8 6 3684 0.2 99.9 7 1030 0.0 100.0 8 222 0.0 100.0 9 77 0.0 100.0 10 22 0.0 100.0 11 10 0.0 100.0 12 3 0.0 100.0 13 2 0.0 100.0 15 1 0.0 100.0 keyboard bottom row run length ------------------------------------------------------------ 1 1244566 93.9 93.9 2 80811 6.1 100.0 3 369 0.0 100.0 4 25 0.0 100.0 5 2 0.0 100.0 10 4 0.0 100.0 keyboard row run length ------------------------------------------------------------ 1 4248244 67.8 67.8 2 1240736 19.8 87.6 3 467462 7.5 95.1 4 184030 2.9 98.0 5 72212 1.2 99.2 6 30593 0.5 99.6 7 12271 0.2 99.8 8 5411 0.1 99.9 9 2385 0.0 100.0 10 1145 0.0 100.0 11 506 0.0 100.0 12 233 0.0 100.0 13 92 0.0 100.0 14 41 0.0 100.0 15 24 0.0 100.0 16 6 0.0 100.0 17 5 0.0 100.0 18 3 0.0 100.0 19 2 0.0 100.0 20 1 0.0 100.0 26 1 0.0 100.0 42 1 0.0 100.0$

The finger run lengths are not classified by finger to keep the statistics output reasonably brief. QWERTY shows a <=2 run length of 99.0% for fingers. In fact, 88.6% of the time adjacent strokes use different fingers.

$keyboard finger run length ------------------------------------------------------------ 1 7542512 88.6 88.6 2 885810 10.4 99.0 3 76634 0.9 99.9 4 8667 0.1 100.0 5 938 0.0 100.0 6 225 0.0 100.0 7 123 0.0 100.0 8 6 0.0 100.0 9 6 0.0 100.0 10 1 0.0 100.0 11 3 0.0 100.0 12 1 0.0 100.0 13 1 0.0 100.0 20 1 0.0 100.0$

The final keyboard charcterization is the same-hand row jump length. This is a more complex statistic which sums the vertical distance traveled by all fingers during a hand run. For example, consider the string qzwsjumk. The first 4 characters (qzws) use the left hand. In this hand run, the row jump lengths are +1 (home->top for q), +2 (top->bottom for z from q), +2 (bottom->top for w from z) and +1 (top->home for s from w). Thus the jump length for this run os 1+2+2+1=6. Similarly the jump length for jumk is 0+1+2+1=4. Upward (e.g. bottom->top) and downward (e.g. top->bottom) distance is weighted identically.

$keyboard same-hand row jump length ------------------------------------------------------------ 1 4479002 68.1 68.1 2 1013617 15.4 83.5 3 678257 10.3 93.8 4 178933 2.7 96.5 5 155368 2.4 98.8 6 36709 0.6 99.4 7 25366 0.4 99.8 8 7985 0.1 99.9 9 4072 0.1 100.0 10 899 0.0 100.0 11 645 0.0 100.0 12 124 0.0 100.0 13 77 0.0 100.0 14 14 0.0 100.0 15 17 0.0 100.0 16 3 0.0 100.0 17 3 0.0 100.0 18 1 0.0 100.0 19 1 0.0 100.0 26 1 0.0 100.0$

The reason for the same-hand row jump run length is to quantify the number of highly undesirable typing strokes, such as "eve" or "imo" in which fingers of the same hand are required to traverse large distances. The row jump length could be replaced by the total finger travel distance for a hand run.

### corpus character frequency

This table reports the character frequency in the corpus. If it is an adequate English corpus, it should recapitulate the frequencies below quite closely for the first 10 characters or so. Indeed, any English corpus whose top characters are not e,t,a,o,n should be treated with caution.

$corpus character frequency e 1189270 12.4 12.4 t 857621 8.9 21.4 a 772047 8.1 29.4 o 743158 7.8 37.2 i 664716 6.9 44.1 n 659446 6.9 51.0 h 601534 6.3 57.2 s 600522 6.3 63.5 r 551354 5.8 69.3 d 428936 4.5 73.7 l 395151 4.1 77.9 u 278222 2.9 80.8 m 257890 2.7 83.5 w 230357 2.4 85.9 c 228885 2.4 88.2 f 208786 2.2 90.4 y 202083 2.1 92.5 g 192882 2.0 94.5 p 158583 1.7 96.2 b 149480 1.6 97.8 v 91308 1.0 98.7 k 78419 0.8 99.5 j 14421 0.2 99.7 x 14308 0.1 99.8 q 10285 0.1 99.9 z 6129 0.1 100.0$