latest news

Distractions and amusements, with a sandwich and coffee.

Twenty — minutes — maybe — more.
•
• choose four words
• more quotes

On March 14th celebrate `\pi` Day. Hug `\pi`—find a way to do it.

For those who favour `\tau=2\pi` will have to postpone celebrations until July 26th. That's what you get for thinking that `\pi` is wrong. I sympathize with this position and have `\tau` day art too!

If you're not into details, you may opt to party on July 22nd, which is `\pi` approximation day (`\pi` ≈ 22/7). It's 20% more accurate that the official `\pi` day!

Finally, if you believe that `\pi = 3`, you should read why `\pi` is not equal to 3.

For the 2014 `\pi` day, two styles of posters are available: folded paths and frequency circles.

The folded paths show `\pi` on a path that maximizes adjacent prime digits and were created using a protein-folding algorithm.

The frequency circles colourfully depict the ratio of digits in groupings of 3 or 6. Oh, look, there's the Feynman Point!

Download the HP lattice simulation binary. You'll need one of the three 2D methods — I used `rem2dm`

, which does local and pull moves. If you'd like to learn more about the algorithm, read the publication.

A replica exchange Monte Carlo algorithm for protein folding in the HP model. Chris Thachuk, Alena Shmygelska and Holger H Hoos, BMC Bioinformatics 2007, 8:342 (17 Sep 2007).

Download the batch file for 64- or 768-digit folding.

When you run the 64-digit simulation, you're likely to find a path with `E=-23`

, which is the lowest energy I've been able to sample. On my Intel Xeon E5540 (2.53 GHz) it takes anywhere from 1-30 seconds to find a `E=-23`

path (there are many possible paths at this energy), depending on the random seed. Here's the output of a typical run of the 64-digit folding simulation

> rem2dm -seq=hppphphphhhpphphhhppphpphhphhhphphppppphppphpphhhpphphpphpppphph -maxT=220 -numLocalSteps=500 -eng=100 -maxRunTime=60 -traceFile=pi.64 -minT=160 -expID=pi.64 -numReps=10 REMC-HP2D-M Begin Simulation 0.01: Current Best Solution: -8 0.01: Current Best Solution: -10 0.01: Current Best Solution: -13 0.02: Current Best Solution: -15 0.03: Current Best Solution: -16 0.03: Current Best Solution: -17 0.04: Current Best Solution: -18 0.04: Current Best Solution: -19 0.16: Current Best Solution: -20 0.27: Current Best Solution: -21 0.69: Current Best Solution: -22 36.23: Current Best Solution: -23 Real time: 120 ggslrrsrllssrrlrrllsrrlrrlslslrrsrlssrrsllrslrrlrsllsrsrrlsrssrs p--h--p | | h--h h--p--p--p | | p--p h H h--p--p | | | | | p--h h--h--p p p--p | | | p--p--h h--p p--p p | | | | | h--h h h--p--h h--p | | | p--h h h--p--H h--p | | | | p--p p p--h--h | | p p--h--p | | p--p--h h | | p--p End Simulation

If you want to apply this to different number (e.g.
φ
or
e
), you'll need to replace the digits with either `p`

or `h`

. Remember, the simulation will try to group the `h`

's together. You can download 1,000,000 of
π
,
φ
and
e
.

The best path I could find for 768 digits is one with `E=-223`

. In 1000s of simulations this solution came up only once. I also saw one path at `E=-222`

. After that, there were many solutions at each of the less optimal energy levels.

If you manage to find a better one, let me know right away!

If you obtain a segmentation fault,

> ./rem2dlm REMC-HP2D-LM Begin Simulation Real time: 0 Segmentation fault

don't panic just yet. The folding binaries don't do a lot of error checking, so you have to get the input parameters correct.

For example, if you do not include the `-eng`

parameter, the code will segfault.

Try one of the batch files above (64 digit batch file, 768 digit batch file) or the following simple job

> bin/rem2dm -seq=hhpppphhhhpppphh -maxRunTime=5 -eng 10 REMC-HP2D-M Begin Simulation 3.13877e-17: Current Best Solution: -2 5.49284e-17: Current Best Solution: -3 1.0201e-16: Current Best Solution: -4 1.33398e-16: Current Best Solution: -5 Real time: 5 ggrllslsssrllsls p--p--p | | h h--p | | H h | H h | | p--h h | | p--p--p

If this segfaults, then you'll need to recompile the code (see below).

Precompiled binaries are available for download directly: rem2dm, rem2dlm, rem2dpm, rem3dm, rem3dlm, rem3dpm.

If these don't work on your system, you need to recompile them. Download the the protein folding code and see INSTALL.txt for compilation instructions.

The Sanctuary Project is a Lunar vault of science and art. It includes two fully sequenced human genomes, sequenced and assembled by us at Canada's Michael Smith Genome Sciences Centre.

The first disc includes a song composed by Flunk for the (eventual) trip to the Moon.

But how do you send sound to space? I describe the inspiration, process and art behind the work.

A forest of digits

Celebrate `\pi` Day (March 14th) and finally see the digits through the forest.

This year is full of botanical whimsy. A Lindenmayer system forest – deterministic but always changing. Feel free to stop and pick the flowers from the ground.

And things can get crazy in the forest.

Check out art from previous years: 2013 `\pi` Day and 2014 `\pi` Day, 2015 `\pi` Day, 2016 `\pi` Day, 2017 `\pi` Day, 2018 `\pi` Day and 2019 `\pi` Day.

*All that glitters is not gold. —W. Shakespeare*

The sensitivity and specificity of a test do not necessarily correspond to its error rate. This becomes critically important when testing for a rare condition — a test with 99% sensitivity and specificity has an even chance of being wrong when the condition prevalence is 1%.

We discuss the positive predictive value (PPV) and how practices such as screen can increase it.

Altman, N. & Krzywinski, M. (2021) Points of significance: Testing for rare conditions. *Nature Methods* **18**:224–225.

*We demand rigidly defined areas of doubt and uncertainty! —D. Adams*

A popular notion about experiments is that it's good to keep variability in subjects low to limit the influence of confounding factors. This is called standardization.

Unfortunately, although standardization increases power, it can induce unrealistically low variability and lead to results that do not generalize to the population of interest. And, in fact, may be irreproducible.

Not paying attention to these details and thinking (or hoping) that standardization is always good is the "standardization fallacy". In this column, we look at how standardization can be balanced with heterogenization to avoid this thorny issue.

Voelkl, B., Würbel, H., Krzywinski, M. & Altman, N. (2021) Points of significance: Standardization fallacy. *Nature Methods* **18**:5–6.

*Clear, concise, legible and compelling.*

Making a scientific graphical abstract? Refer to my practical design guidelines and redesign examples to improve organization, design and clarity of your graphical abstracts.

An in-depth look at my process of reacting to a bad figure — how I design a poster and tell data stories.