The Earth BioGenome Project (EBP) is a moonshot for biology, aims to sequence, catalog and characterize the genomes of all of Earth’s eukaryotic biodiversity over a period of ten years. The articles in this special issue of PNAS explore aspects of the project including a review of progress, descriptions of major scientific goals, exemplar projects, examinations of ethical, legal, and social issues, and applications of biodiversity genomics.

PNAS cover — Earth BioGenome Project

An enlightened perspective on our planet

Lewin HA et al., The Earth BioGenome Project 2020: Starting the clock. (2022) PNAS 119(4) e2115635118

1 · Data files

download | 1,162 loci records for 806 unique species submitted by 9 consortia: Bat1K Consortium, Genome 10K Community of Scientists, Human Genome Sequencing Center, Molecular Ecology Group, Tribolium Genome Sequencing Consortium, Vertebrate Genomes Project Consortium, Wellcome Sanger Institute, Wellcome Sanger Institute Data Sharing, Wellcome Sanger Tree of Life Programme.

1.1 · Genbank records

download | 806 unique species

1.2 · Species list

download | 1,121 sequence records representing short sequences from all species.

1.3 · Sequence records

download | Geolocation mappings for each genbank record. Localization was done using GBIF. Species list of all occurrences (human observation or preserved specimen) for each continent were downloaded. If a species was observed on multiple continents it is denoted by location_type = MULTI in the file otherwise it is UNIQUE. Where a species did not have an occurrence, its habitat was determined manually through Wikipedia or other sources and denoted by SUPP. Finally, some records were corrected manually (override_location). For species with multiple locations, one was arbitrarily chosen, but excluded Europe since this continent is small on the map and has small sequence capacity (the TSP path is short).

1.4 · Species geolocalization

download | The sequence and its location for each species in the design.

Neural network primer

Mon 06-02-2023

Nature is often hidden, sometimes overcome, seldom extinguished. —Francis Bacon

In the first of a series of columns about neural networks, we introduce them with an intuitive approach that draws from our discussion about logistic regression.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Neural network primer. (read)

Simple neural networks are just a chain of linear regressions. And, although neural network models can get very complicated, their essence can be understood in terms of relatively basic principles.

We show how neural network components (neurons) can be arranged in the network and discuss the ideas of hidden layers. Using a simple data set we show how even a 3-neuron neural network can already model relatively complicated data patterns.

Derry, A., Krzywinski, M & Altman, N. (2023) Points of significance: Neural network primer. Nature Methods 20.

Background reading

Lever, J., Krzywinski, M. & Altman, N. (2016) Points of significance: Logistic regression. Nature Methods 13:541–542.

Cell Genomics cover

Mon 16-01-2023

Our cover on the 11 January 2023 Cell Genomics issue depicts the process of determining the parent-of-origin using differential methylation of alleles at imprinted regions (iDMRs) is imagined as a circuit.

Designed in collaboration with with Carlos Urzua.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Our Cell Genomics cover depicts parent-of-origin assignment as a circuit (volume 3, issue 1, 11 January 2023). (more)

Akbari, V. et al. Parent-of-origin detection and chromosome-scale haplotyping using long-read DNA methylation sequencing and Strand-seq (2023) Cell Genomics 3(1).

Science Advances cover

Thu 05-01-2023

My cover design on the 6 January 2023 Science Advances issue depicts DNA sequencing read translation in high-dimensional space. The image showss 672 bases of sequencing barcodes generated by three different single-cell RNA sequencing platforms were encoded as oriented triangles on the faces of three 7-dimensional cubes.

More details about the design.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
My Science Advances cover that encodes sequence onto hypercubes (volume 9, issue 1, 6 January 2023). (more)

Kijima, Y. et al. A universal sequencing read interpreter (2023) Science Advances 9.

Regression modeling of time-to-event data with censoring

Mon 21-11-2022

If you sit on the sofa for your entire life, you’re running a higher risk of getting heart disease and cancer. —Alex Honnold, American rock climber

In a follow-up to our Survival analysis — time-to-event data and censoring article, we look at how regression can be used to account for additional risk factors in survival analysis.

We explore accelerated failure time regression (AFTR) and the Cox Proportional Hazards model (Cox PH).

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Regression modeling of time-to-event data with censoring. (read)

Dey, T., Lipsitz, S.R., Cooper, Z., Trinh, Q., Krzywinski, M & Altman, N. (2022) Points of significance: Regression modeling of time-to-event data with censoring. Nature Methods 19.

Music video for Max Cooper's Ascent

Tue 25-10-2022

My 5-dimensional animation sets the visual stage for Max Cooper's Ascent from the album Unspoken Words. I have previously collaborated with Max on telling a story about infinity for his Yearning for the Infinite album.

I provide a walkthrough the video, describe the animation system I created to generate the frames, and show you all the keyframes

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Frame 4897 from the music video of Max Cooper's Asent.

The video recently premiered on YouTube.

Renders of the full scene are available as NFTs.

