Where am I supposed to go? Where was I supposed to know?get lost in questionsmore quotes
very clickable
music + art
Equisitely detailed gigapixel 1-bit maps of the Moon (6,733 locations), Solar System (772,063 things) and the Northern and Southern skies (113,743,599 stars, 162,252 deepsky objects, 4,009 exoplanets).

# There is no sound in space, but there is music

The first 12 seconds of a 1-bit encoding of a 128 mel 3-bit spectrogram of Flunk's Down Here / Moon Above

## 1 · Music as an image

The Sanctuary discs are 10 cm sapphire wafers. Each disc has about 3 billion 1.4 micron pixels that each store 1-bit of information — the pixel is either on or off. Reading the information off the disc is easy — just look at the pixels. Very closely. In other words, each disc is a very high resolution image.

To send music (or any kind of data), we need to convert it to an image. Enter the spectrogram.

## 2 · The spectrogram

A spectrogram is a 2-dimensional representation of sound. The x-axis is time and the y-axis is frequency. At each $(x,y)$ position the strength of frequency $f=y$ at time $t=x$ is encoded by the brigthness of a pixel. This way to "draw the sound", which can be decoded back.

The National Music Centre has an excellent short tutorial on how to interpret spectrograms. And if you're a birder, then spectograms of bird calls won't be new to you.

And while I realize that aliens (almost certainly) and future humans (quite possibly) might not perceive sound in the same way, I see this as a minor point. I'm sure they'll work it out. You know... science.

I am indebted to Tim Sainburg for providing assistance and code. The analysis uses the librosa music and audio analysis Python library.

## 3 · Down Here / Moon Above

This song was written for the Moon. It sounds best there.

Flunk's History of Everything Ever album contains two versions of the song. A final vocal mix as well as an instrumental version, which you get with the purchase of the album.

But there is another. In the intermediate version, when the lyrics weren't quite finalized, Anja sang incomplete phrases and loose vocalizations.

We called this the "gibberish" version and even though the final song was ready before the discs were created, we thought gibberish made for the perfect space language.

Pretend you're an alien or human from the future and give it a listen:

## 4 · Encoding the song as an image

If we had all the pixels on the Moon, we would encode the spectrogram with a large number of frequencies (e.g. $n = 1,024$) with very fine sampling of time (e.g. 5 ms). The song is about 4 minutes, so this would require an image of 48,000 × 1,024 × 8. The last factor of 8 is for the 8-bit encoding of each pixel.

Although this represents only about 13% of the capacity of a disc (about 200 Mb of genome sequence using our encoding) it's more than we had to spare. There weren't enough pixels on 4 discs to write 4 genomes and the proteome and instructions and the song.

It was clear that I needed a reasonably small spectrogram. There are two ways to achieve this: larger time bins and fewer frequencies. It turns out that a 50 ms time window that stepped along every 20 ms was sufficient — the music didn't have a lot of fast notes. To make the most out of the frequencies I used a psychoacoustic scale.

The mel scale is based on psychoacoustics. It is a logarithmic frequency scale and reflects the fact that we can discriminate low frequencies better than high ones. In other words, to faithfully reproduce sound you need to include more of the low frequencies than high frequencies.

The conversion between $f$ in Hertz to $m$ in mels is $m = k_0 \textrm{log}(1+f/k_1)$. Because mels are very efficient at spacing frequencies based on perception, I can get away with using very few mels! A-mel-zing!

### 4.1 · 512 mel spectrogram

I started with 512 mels and 1, 2, 3, 4 and 8 bits per mel. In this encoding, each pixel, which encodes how much of each mel is present in the sound, can have one of $2^b$ values (e.g. in the 3-bit encoding we can have up to 8 values).

512 mel 3-bit encoding of Down Here / Moon Above by Flunk. The original is 11,688 × 512 pixel, shown here sliced into 10 rows and resized to 600 × 1000.

Optimizing the number of bits is really important because I didn't have that much spare space on the discs. Every pixel of music took away from pixel of genome information. Each bit of each mel requires 11,688 pixels. Thus, going from 3 bits to 4 bits in a 512 mel encoding required an additional 5,984,256 pixels. Two pixels encoded a base, so this corresponded to about 3 Mb of sequence.

Here is what the decoding of the each spectrogram sounds like — this verifies whether the music is reasonably preserved during the encoding-decoding process.

You can hear that the 8-bit and 4-bit encodings are very good. Remember, we're talking about music on the Moon here, so manage your expectations.

The 3-bit encoding is great. This is the bit sweet spot.

The 2-bit encoding isn't awesome but it's not horrible. You can definitely make out the music and lyrics but there's a warble to the sound.

The 1-bit encoding amazingly still sounds like something. It's very ghostly. The 1-bit encoding is binary — it stores whether a frequency exists at a given point in time or not. All frequencies have the same strength. I imagine this is what music in space sounds like.

### 4.2 · 128 mel spectrogram

The 512 mel 3-bit encoding took the 17,852,768 pixels, which was about 9 Mb of sequence. Could we do better?

128 mel 3-bit encoding of Down Here / Moon Above by Flunk. The original is 11,688 × 128 pixel, shown here sliced into 10 rows and resized to 600 × 250.

It turns out that 128 mels is all we need. Well, maybe not all we need but all we can get! And while the 128 mel 1-bit and 2-bit encodings are sketchy, the 3-bit is amazingly good.

Just think about how little information is being stored here. For each 20 ms of music, we have 128 frequencies, each of which is specified by one of 8 discrete volume levels (because we have only 3 bits).

## 5 · Decoding instructions

Instructions of how to decode the spectrogram.

## 6 · Spectrogram on the disc

Because the discs are a 1-bit medium, to store each pixel of the 128 mel 3-bit spectrogram, I needed 3 pixels. This was done by taking the 3-bit pixel and representing it as a column of 3 1-bit pixels. Don't worry, everything is explained in the very clear instructions on the discs.

Below is the final spectrogram as it appears on the disc, shown here wrapped into 19 rows of 600 pixels, each of which correspond to 12 seconds of music.

The final 128 mel 3-bit spectrogram encoded in a 1-bit image.
news + thoughts

# Annals of Oncology cover

Wed 14-09-2022

My cover design on the 1 September 2022 Annals of Oncology issue shows 570 individual cases of difficult-to-treat cancers. Each case shows the number and type of actionable genomic alterations that were detected and the length of therapies that resulted from the analysis.

An organic arrangement of 570 individual cases of difficult-to-treat cancers showing genomic changes and therapies. Apperas on Annals of Oncology cover (volume 33, issue 9, 1 September 2022).

Pleasance E et al. Whole-genome and transcriptome analysis enhances precision cancer treatment options (2022) Annals of Oncology 33:939–949.

My Annals of Oncology 570 cancer cohort cover (volume 33, issue 9, 1 September 2022). (more)

Browse my gallery of cover designs.

A catalogue of my journal and magazine cover designs. (more)

# Survival analysis—time-to-event data and censoring

Fri 05-08-2022

Love's the only engine of survival. —L. Cohen

We begin a series on survival analysis in the context of its two key complications: skew (which calls for the use of probability distributions, such as the Weibull, that can accomodate skew) and censoring (required because we almost always fail to observe the event in question for all subjects).

We discuss right, left and interval censoring and how mishandling censoring can lead to bias and loss of sensitivity in tests that probe for differences in survival times.

Nature Methods Points of Significance column: Survival analysis—time-to-event data and censoring. (read)

Dey, T., Lipsitz, S.R., Cooper, Z., Trinh, Q., Krzywinski, M & Altman, N. (2022) Points of significance: Survival analysis—time-to-event data and censoring. Nature Methods 19:906–908.

# 3,117,275,501 Bases, 0 Gaps

Sun 21-08-2022

See How Scientists Put Together the Complete Human Genome.

My graphic in Scientific American's Graphic Science section in the August 2022 issue shows the full history of the human genome assembly — from its humble shotgun beginnings to the gapless telomere-to-telomere assembly.

Read about the process and methods behind the creation of the graphic.

3,117,275,501 Bases, 0 Gaps. Text by Clara Moskowitz (Senior Editor), art direction by Jen Christiansen (Senior Graphics Editor), source: UCSC Genome Browser.

# Anatomy of SARS-Cov-2

Tue 31-05-2022

My poster showing the genome structure and position of mutations on all SARS-CoV-2 variants appears in the March/April 2022 issue of American Scientist.

Deadly Genomes: Genome Structure and Size of Harmful Bacteria and Viruses (zoom)

An accompanying piece breaks down the anatomy of each genome — by gene and ORF, oriented to emphasize relative differences that are caused by mutations.

Deadly Genomes: Genome Structure and Size of Harmful Bacteria and Viruses (zoom)

# Cancer Cell cover

Sat 23-04-2022

My cover design on the 11 April 2022 Cancer Cell issue depicts depicts cellular heterogeneity as a kaleidoscope generated from immunofluorescence staining of the glial and neuronal markers MBP and NeuN (respectively) in a GBM patient-derived explant.

LeBlanc VG et al. Single-cell landscapes of primary glioblastomas and matched explants and cell lines show variable retention of inter- and intratumor heterogeneity (2022) Cancer Cell 40:379–392.E9.

My Cancer Cell kaleidoscope cover (volume 40, issue 4, 11 April 2022). (more)

Browse my gallery of cover designs.

A catalogue of my journal and magazine cover designs. (more)

# Nature Biotechnology cover

Sat 23-04-2022

My cover design on the 4 April 2022 Nature Biotechnology issue is an impression of a phylogenetic tree of over 200 million sequences.

Konno N et al. Deep distributed computing to reconstruct extremely large lineage trees (2022) Nature Biotechnology 40:566–575.

My Nature Biotechnology phylogenetic tree cover (volume 40, issue 4, 4 April 2022). (more)

Browse my gallery of cover designs.

A catalogue of my journal and magazine cover designs. (more)