Trance opera—Spente le Stellebe dramaticmore quotes

# research: what we do

PNAS Cover: Earth BioGenome Project

# data visualization + art

The BC Cancer Agency’s Personalized Oncogenomics Program (POG) is a clinical research initiative applying genomic sequencing to the diagnosis and treatment of patients with incurable cancers.

# Art of the Personalized Oncogenomics Program

Nature uses only the longest threads to weave her patterns, so that each small piece of her fabric reveals the organization of the entire tapestry.
— Richard Feynman

Art is Science in Love
— E.F. Weisslitz

## what do the circles mean?

The legend can be printed at 4" × 6". The bitmap resolution is 600 dpi.

Quick legend. 5 Years of Personalized Oncogenomics Project at Canada's Michael Smith Genome Sciences Centre. The poster shows 545 cancer cases. (zoom)

## a case for a visual case summary

For every case, we sequence the DNA to study the genome structure and the RNA to discover which genes are expressed and to what extent. The analysis is quite complex and brings together many steps: sequence alignment, structural variation detection, expression profiling, pathway analysis and so on. Every case is "summarized" by a lengthy report, such as the one below, which can run to over 40 pages.

A report for a typical POG case is about 40–50 pages.

One of the goals of the 5-year anniversary art was to represent the cases in a way to clearly show their number, classification as well as diversity. There are many metrics that can be used and I decided to choose the case's correlation to other cancer types.

## correlation to TCGA cancer database

For every POG case, the gene expression of 1,744 key genes is compared to that of 1,000's of cases in the TCGA database of cancer samples. For a given cancer type in the TCGA database (e.g. BRCA), we visualize the correlations using box plots. The box plot is ideal for showing the distribution of values in a sample.

Every case is compared to a database of 1,000's of cases. Shown here are box plots for the Spearman correlation coefficient between the gene expression of the POG case and cancers of a specific type (e.g. BRCA, LUAD, etc). (zoom)

The 10 largest Spearman correlation coefficients for the case shown above are

$case corr type tissue ----------------------------------------------- POG661 0.436 BRCA Breast POG661 0.371 PRAD Urologic POG661 0.295 OV Gynecologic POG661 0.257 UCEC Gynecologic POG661 0.244 LUAD Thoracic POG661 0.235 CESC_CAD Gynecologic POG661 0.225 MB_Adult Central Nervous System POG661 0.222 KICH Urologic POG661 0.219 THCA Endocrine POG661 0.208 UCS Gynecologic$

In the figure below I show how the final encoding of the correlations is done. First, the top three correlations are taken—using more generates a busy look and diminishes visual impact. The correlations are encoded as concentric rings.

Because in most cases the differences in the top 3 correlations are relatively small, differences are emphasized by non-linearly scaling the encoding (the correlations are first scaled $r^3$).

Case POG661. Median gene expression correlations with different cancer types from TCGA database. (A) Top 10 correlations shown as a bar plot. Color coding is by source tissue associated with the cancer type. (B) Top 10 correlations encoded as concentric rings. The width of the ring is proportional to the correlation. (C) Top 3 correlations. (D) Top 3 correlations scaled with a power to emphasize differences. (zoom)

The type face is Proxima Nova. The colors for each tissue source are

$Gastrointestinal ● 234,62,144 Breast ● 237,75,51 Thoracic ● 242,130,56 Gynecologic ● 253,188,61 Soft tissue ● 244,217,59 Skin ● 193,216,51 Urologic ● 114,197,49 Hematologic ● 29,166,68 Head and neck ● 43,168,224 Endocrine ● 71,82,178 Central nervous system ● 127,65,146 Other ● 150,150,150$

# Survival analysis—time-to-event data and censoring

Fri 05-08-2022

Love's the only engine of survival. —L. Cohen

We begin a series on survival analysis in the context of its two key complications: skew (which calls for the use of probability distributions, such as the Weibull, that can accomodate skew) and censoring (required because we almost always fail to observe the event in question for all subjects).

We discuss right, left and interval censoring and how mishandling censoring can lead to bias and loss of sensitivity in tests that probe for differences in survival times.

Nature Methods Points of Significance column: Survival analysis—time-to-event data and censoring. (read)

Dey, T., Lipsitz, S.R., Cooper, Z., Trinh, Q., Krzywinski, M & Altman, N. (2022) Points of significance: Survival analysis—time-to-event data and censoring. Nature Methods 19:906–908.

# 3,117,275,501 Bases, 0 Gaps

Fri 05-08-2022

See How Scientists Put Together the Complete Human Genome.

My graphic in Scientific American's Graphic Science section in the August 2022 issue shows the full history of the human genome assembly — from its humble shotgun beginnings to the gapless telomere-to-telomere assembly.

Read about the process and methods behind the creation of the graphic.

3,117,275,501 Bases, 0 Gaps. Text by Clara Moskowitz (Senior Editor), art direction by Jen Christiansen (Senior Graphics Editor), source: UCSC Genome Browser.

# Anatomy of SARS-Cov-2

Tue 31-05-2022

My poster showing the genome structure and position of mutations on all SARS-CoV-2 variants appears in the March/April 2022 issue of American Scientist.

Deadly Genomes: Genome Structure and Size of Harmful Bacteria and Viruses (zoom)

An accompanying piece breaks down the anatomy of each genome — by gene and ORF, oriented to emphasize relative differences that are caused by mutations.

Deadly Genomes: Genome Structure and Size of Harmful Bacteria and Viruses (zoom)

# Cancer Cell cover

Sat 23-04-2022

My cover design on the 11 April 2022 Cancer Cell issue depicts depicts cellular heterogeneity as a kaleidoscope generated from immunofluorescence staining of the glial and neuronal markers MBP and NeuN (respectively) in a GBM patient-derived explant.

LeBlanc VG et al. Single-cell landscapes of primary glioblastomas and matched explants and cell lines show variable retention of inter- and intratumor heterogeneity (2022) Cancer Cell 40:379–392.E9.

My Cancer Cell kaleidoscope cover (volume 40, issue 4, 11 April 2022). (more)

Browse my gallery of cover designs.

A catalogue of my journal and magazine cover designs. (more)

# Nature Biotechnology cover

Sat 23-04-2022

My cover design on the 4 April 2022 Nature Biotechnology issue is an impression of a phylogenetic tree of over 200 million sequences.

Konno N et al. Deep distributed computing to reconstruct extremely large lineage trees (2022) Nature Biotechnology 40:566–575.

My Nature Biotechnology phylogenetic tree cover (volume 40, issue 4, 4 April 2022). (more)

Browse my gallery of cover designs.

A catalogue of my journal and magazine cover designs. (more)

# Nature cover — Gene Genie

Sat 23-04-2022

My cover design on the 17 March 2022 Nature issue depicts the evolutionary properties of sequences at the extremes of the evolvability spectrum.

Vaishnav ED et al. The evolution, evolvability and engineering of gene regulatory DNA (2022) Nature 603:455–463.

My Nature squiggles cover (volume 603, issue 7901, 17 March 2022). (more)

Browse my gallery of cover designs.

A catalogue of my journal and magazine cover designs. (more)