Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - contact me Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca on Twitter Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - Lumondo Photography Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - Pi Art Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - Hilbertonians - Creatures on the Hilbert Curve
Trance opera—Spente le Stellebe dramaticmore quotes

words: meaningful


EMBO Practical Course: Bioinformatics and Genome Analysis, 5–17 June 2017.


language + fiction

Dark Matter of the English Language—the unwords

Words are easy, like the wind;
Faithful friends are hard to find.
—William Shakespeare

uncountries

The uncountries are places that don't exist, but perhaps should. If you're starting your own country or are hoping to secede from your current employer (here's looking at you States of the US), you might find this list useful.

The list of uncountries is generated by training on list of 257 countries and territories.

Here's my bucket list of where I'm going next:

  • Conchar and Pobacia
  • Hzuuland
  • New Kain
  • Rabibus and Megee Islands
  • Sentip and Sitina
  • Sinistan
  • Tuskia
  • Urzenia
  • Vontila

Below are the alphabetically first 4–10 letter single-word uncountries for each letter. In some cases, no names of a given length were generated for a given letter.

—4—
Aani
Aemo
Ball
Bang
Cada
Caga
Dafa
Dalr
Eira
Eran
Fani
Fato
Gaar
Gace
Hana
Hhen
Iaou
Inor
Kala
Kani
Lain
Lale
Mabe
Mage
Naom
Nare
Pein
Peis
Ragi
Raiy
Saen
Saic
Taga
Tans
Ucan
Uica
Venr
Viam
—5—
Aalce
Aanie
Babat
Baica
Caane
Cacae
Daaca
Demga
Eamat
Eanda
Fabta
Fania
Gaane
Gaare
Haemy
Harre
Imina
Imoba
Jacin
Jania
Kande
Kangi
Labua
Laiir
Mabia
Maeky
Nague
Nalgo
Paine
Paiti
Qetia
Qiria
Relat
Renir
Sabga
Sabta
Tabia
Tagan
Ucite
Uenha
Vamar
Vanga
Wiady
Zepek
—6—
Aamade
Afanen
Baaira
Baggia
Cabomo
Caemia
Darlye
Darzan
Eagero
Eattas
Faitia
Farado
Gabbak
Gabiai
Hauima
Henlia
Icalin
Iganda
Jaigio
Jartar
KSonko
Kabolo
Labopa
Lacuua
Mabana
Mabiak
Nalgin
Nandar
Ougagu
Paiada
Paldol
Rabgen
Ramgui
Saadae
Sabite
Taikua
Takkia
Udapor
Uermin
Vandes
Vangan
Wantia
Wengan
—7—
Aanigah
Agentan
Bagtera
Baliadi
Caathea
Cabibia
Dahmomo
Dalhais
Ebaniar
Eboniat
Falinha
Falttin
Gaereta
Gainana
Haratan
Hasland
Iafercy
Icrotii
Jelican
Jeliwia
Kanfono
Kargice
Lacitia
Laitgoe
Mabaden
Maellha
Naendia
Nalhind
Onbagin
Pabapua
Pagonia
Rarralo
Rarrkak
SEerral
Sabaida
Taateaa
Tahiria
Uagayas
Ujoland
Vaairta
Valonds
—8—
Aegtomis
Aeirania
Balcosda
Baltages
Cacbilii
Calnaria
Dargonda
Darirnio
Eatisasa
Eeniicia
Femanlan
Ferlanda
Gacterat
Gadeqtam
Hinaniti
Hndiwiun
Ihibiano
Ilriload
JenogRan
Jenorala
Kcinisan
Keberlan
Laenhuan
Lalostia
Mabdadde
Mabhalin
Naitieta
Naltenis
Oinvalia
Orringia
Pajtbava
Palalian
Qerbacon
Raliutan
Raphenla
SDatelan
Sabmutia
Taigonia
Talbiela
Uginbiam
Uityeate
Vasguace
Velentin
—9—
Aiwontmia
Alememter
Bawelilia
Beciradue
Cankeslas
Cansiaila
Dacucania
Elorbhiad
Epubhulon
Fredelapo
Gakgasdan
Gantiulan
Hotutuias
Ilallasda
Imroldian
Jendiulia
Jitgodien
Kemadicis
Kerndhand
Lazekatis
Lectarada
MInledian
Mabertima
Nacasnand
Nadordinh
Palcotiis
Panciland
Raecsatas
Rentelisy
SDirniata
Saentolia
Tarhhaldi
Tarnoigan
Untensian
Vantanira
Ventalica
Wensatial
—10—
Amodedhani
Andhsituia
Badetcinia
Bandesland
Camegessow
Canoniitia
Damalhania
Denwarinia
Ensriitrui
Eremgosdon
Garilsista
Gebticiita
Hatendacan
Hecrapband
Inteniania
Irrhalipan
Kendestand
Lantunutan
Lenkalland
Macgalland
Malbaninis
Namgelasta
Naniheomie
Pamestitia
Pilimintan
Reboteisia
Ricanlands
Sacgsainas
Samanhalaa
Tenheposda
Tezadtinia
Vagioliale
Veuthalian
—11—
Adrelebcima
Amurenoilan
Berniwhpana
Buenaslatda
Catdhtilard
Cerrertoria
Daniacsalon
Eirniatiars
Gaundaniani
Getnicistan
Kiaghbaliaa
Lurolelicam
Madcesladts
Melhellunds
Naporrestan
Niuritantsa
Pebenigisia
Rancitolian
Sarcestalta
Sengrerolia
Varenchales

And below are uncountries that are composed of compound words. The neural network doesn't always do a good job in capitalization.

—6—
Bel Eo
—7—
Ar Neli
Bei Ros
Co Naf,
Es onda
Gob and
Ka -uca
Lex Sen
Mec Len
Se amar
Ton San
—8—
Anru Ran
Baqta Aa
Can Kanc
Dr Belle
Eone Vue
Giinu an
In Gecan
Leen Kon
Mons ald
New Kain
Pobia io
Se Mawan
Tanv Wag
Un Sayth
Vanbo ia
—9—
Arte olia
Ban Tenka
Cui iepes
Dant Sion
Enu Balra
Fem Feriu
Gia honia
Kamen San
Lan Giane
Ma orepan
Nak Manti
Pat Gamia
Saa hetiu
Tar Itlin
Uem Ladde
—10—
Aem Latlia
Banh Cerra
Cairt Aani
Dal Vclcin
Ee Riritan
Gaine Sora
Ken Sonras
Laen Lalor
Mal Rilteh
Nib Carean
Paoth enia
Ran borado
Saed Canua
Tamr eatos
Vamin ores
—11—
Agim Niidea
Baman oshon
Catil Menia
Eil Mitakeo
Freni Niray
Ge Manlando
Jhs Lelland
Ktuct Calia
Leg Saltima
Macan Taman
Namet Gacia
Panan Rerni
Sab Nelieda
Tanc Talind
Vuitan Sera
—12—
Aginh Erpata
Bamen Island
Cancd Samcua
Demun Bondan
Eginan Kerta
Gaicc Iutand
Inanon Gapia
Kanan Island
Lionh Irania
Nameon Raran
Rarve Iitand
Saeci Inlond
Talth Isliin
Unde Narpisa

—13—
Aipen Sabtars
Bab il Rinvee
Caneg Iclands
Dont Calioica
Emanh Naytani
Gaetia esland
Iatin Islands
Jamedh Island
Karch Sartier
Laine Islands
Maind Veltant
Nadtin Launua
Pau Meny ings
Rarciua Oaros
Saini Islands
Tacen Goetian
Urran Teviina
Vopenkb Toppa
—14—
Actatiat Nalto
Bahten Gojilan
Cela Ticgialia
Dototiat Lieda
Eusruuta essan
Fentch Nopyvon
Gaicd Eingasya
Inoral Islands
Jounh Miiticci
Kinera en Cime
Ladten Reperta
Maponit Peraco
Nalton Islands
Pamari eslands
Raqton Beparte
Salnen Islands
Tamokan Asteto
Vekmad Islands
—15—
Aepgenin Veliin
Bemanan Supatii
Caddad Gialerna
Eertiton Ialesd
Fuparrat Asiben
Gamgad Rerradia
Hitetal Iflasds
Jawshanad Meemi
Launiad Istands
Mariadec Geruur
Negend Peregine
Pacabara islans
Rortan Rertaria
Saldusl Islands
Talion Reputlic
Umzenon Islands
—16—
Anevares Inlands
Bangan Aulentand
Caun Nelch afdsa
Dita onu Islands
Eurania Tomontia
Fontican Aviisli
Gaitinl Indipige
Izerlind Reraumo
Jarenran Islands
Krdirtel Cenuria
Lazinan Bestilin
Maltinun Islands
Namnan Sceneilin
Pauran and Rarbe
Ramgiad Cruerava
Sagcas and Narol
Tanin Kenthurian
Urarcan Sudhiety
Vharkiin Ralands
—17—
Avaryi Barsseland
Binicon Areonieme
Cbiathan Islicssa
Dulial and Carora
Eciircan Rarandin
Gainonial Islands
Jumo iten Serlacd
Kaldtan Fiamusaia
Lattioalan CSubia
Matperacd ofdente
Neni Nicch Dorova
Sanon Perbelobgie
Tengepoth Iscands
Wapu et Lat Miban

—18—
Amanes Vontenseiar
Bamntal Asitrcanos
Canomad Rertheqiad
Detuereilin Inlatd
Eeet orwen Seworas
Gantarten Afan and
Kary escin Islands
Lamorho an Lebbati
Mem aynol Islandsi
Nibitayd ordheriio
Oenginh Kinderland
Pacaut Martirlalon
Salend Fe Selerdan
Terviniot Islandss
Vetbited Afobeslen
—19—
Agen Tand and Suoni
Bont Marrtan Toraza
Conchar and Pobacia
Eith Miap Minebanis
Galenta Gueutinanes
Kucaran and Samosea
Loettiruan Atereeti
Mon Varmands Island
Notk Maeg Lemtorban
Pestsian and Kupeta
Sarretonien endanda
Tinba Rand Banciton
Utabtin Anian hulan
—20—
Alenra Varin Mepubla
Cenialas and Malalia
Dagic Islands Island
Ekmhula andi Leprico
Garthda and Geekiree
Khin Matib Gebuyston
Liqan and Nantorepua
Maloil Iulands Ncoty
Nanuu Ieean Iymoldir
Samtes iucis Inlends
Teni emctan Sanucias
Urpoe an Rec Sucilia
—21—
Borne iarran Devarece
Car Rethil Nacinitana
Emytican Geru atgania
Gaind Marchan Islonds
Molticon Cini ondiles
Naund Ramen Arigariar
Pomithan Micrilanimia
Ramtitan Saruan Gaico
Sala am Ton Pameruidi

Here are all some lists with common suffixes

*nia Ariania Aruenia Bamenia Bolsnia Bukania Caminia Carenia Copania Eniania Eruinia Eryinia Eyuinia Fvounia Gapania Gorania Guyinia Imgania Lebania Lepania Mezania Pagonia Pamonia Piainia Pirania Saminia Sesinia Simania Somenia Sorinia Tinonia Turunia Urzenia Badetcinia Damalhania Denwarinia Inteniania Mangevinia Seregiania Tezadtinia Tudennenia Akinia Arenia Arunia Bocnia Boinia Bounia Buinia Burnia Byunia Caunia Eminia Gainia Geenia Geinia Giania Guania Guinia Guonia Gwinia Jhunia Jiinia Jirnia Kcenia Leinia Lornia Neenia Rernia Ruenia Sannia Shinia Siinia Siunia Suinia Uninia Vasnia Arefeonia Bevomania Dacucania Eziboonia Gibstania Klbininia Setrounia Shlatania Suunienia Teroninia EwDirireonia Aeirania Bemginia Bunyonia Canmania Carginia Carnania Cosrania Culiinia Cumiinia Duinania Ezupinia Geziania Guinenia Guurania Konvonia Lalzinia Lertania Marbania Nandania Narnania Nenconia Pastania Sadiania Sazcinia Sigwenia Smeminia Sonconia Surbania Taigonia Tebcania Tendania Unyrania Cania Conia Fania Henia Jania Jonia Kinia Lonia Mania Ninia Nonia Sania Tenia Tonia Vania

*lan Anualan Binelan Biselan Comelan Donolan Eduulan Iferlan Ilaslan Iudelan Papilan Potalan Srinlan Takilan Tamglan Cemuneilan Gehsyanlan Mecineslan Amurenoilan Aralan Cralan Geilan Inilan Innlan Kerlan Nanlan Sorlan Tnulan Beugeilan Condamlan Cunogslan Gantiulan Geevallan Gienyslan Memsinlan Mertorlan Minnaulan Mururolan Neminolan Sandeslan Sennerlan Titorilan Vertonlan Andenlan Betarlan Ceneslan Cunmelan Curislan Femanlan Geamilan Keberlan Larielan Meloelan Menrulan Molielan Otenelan Redallan SDatelan Selenlan Alan Glan Tlan Bolan Bulan Culan Galan Malan Selan Solan

*land Garland Hasland Ujoland Bandesland Benhelland Bhqlalland Dhinioland Lenkalland Macgalland Vuleslland Caland Feland Maland Saland Anderland Cemerland Geunoland Lutkaland Mowurland Panciland Parraland Anreland Asealand Hzuuland Maerland Masrland Memoland Namaland Navaland Ponoland Tuysland Vetaland

*ana Amynana Balpana Burgana Congana Fuubana Gainana Gaulana Guiiana Somuana Tartana Vehcana Cunheqrana Berniwhpana Antana Argana Buvana Mabana Merana Mobana Relana Rucana Semana Sikana Nteradana Gitanana Hana Lana Mana Sana Giana Guana Gvana Toana

*ica Cinuica Deyrica Goitica Maltica Mannica Merlica Peotica Raryica Sortica Stamica Sumhica Tektica Tiumica Utiuica Bemgbicica Aniica Bapica Narica Sanica Selica Sibica Gatuitica Iuperiica Ventalica Buuntica Bwentica Sorgeica Uica Baica Umica

*can Banecan Celican Jelican Pelecan Deslisacan Hatendacan Leucan Noccan Tircan Tlycan Shaylican Suniracan Cerarcan Emunecan Gepuucan Mamescan Salgican Vongican Ucan

*dan Euvadan Gtardan Monmdan Seundan Srisdan Unendan Banitisdan Ringkeldan Bildan Landan Saldan Soldan Sordan Tamdan Gakgasdan Mremaldan Stelosdan Lapardan Siwesdan Srunadan

*stan Baystan Caistan Velstan Gentiastan Getnicistan Naporrestan Gistan Mastan Tengastan Sinistan

*tar Lalatar Sanktar Simntar Somytar Swettar Temitar Burekertar Jartar Tantar Unitar Gornitar Satar

VIEW ALL

news + thoughts

Classification and regression trees

Fri 28-07-2017
Decision trees are a powerful but simple prediction method.

Decision trees classify data by splitting it along the predictor axes into partitions with homogeneous values of the dependent variable. Unlike logistic or linear regression, CART does not develop a prediction equation. Instead, data are predicted by a series of binary decisions based on the boundaries of the splits. Decision trees are very effective and the resulting rules are readily interpreted.

Trees can be built using different metrics that measure how well the splits divide up the data classes: Gini index, entropy or misclassification error.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Classification and decision trees. (read)

When the predictor variable is quantitative and not categorical, regression trees are used. Here, the data are still split but now the predictor variable is estimated by the average within the split boundaries. Tree growth can be controlled using the complexity parameter, a measure of the relative improvement of each new split.

Individual trees can be very sensitive to minor changes in the data and even better prediction can be achieved by exploiting this variability. Using ensemble methods, we can grow multiple trees from the same data.

Krzywinski, M. & Altman, N. (2017) Points of Significance: Classification and regression trees. Nature Methods 14:757–758.

Background reading

Lever, J., Krzywinski, M. & Altman, N. (2016) Points of Significance: Logistic regression. Nature Methods 13:541-542.

Altman, N. & Krzywinski, M. (2015) Points of Significance: Multiple Linear Regression Nature Methods 12:1103-1104.

Lever, J., Krzywinski, M. & Altman, N. (2016) Points of Significance: Classifier evaluation. Nature Methods 13:603-604.

Lever, J., Krzywinski, M. & Altman, N. (2016) Points of Significance: Model Selection and Overfitting. Nature Methods 13:703-704.

Lever, J., Krzywinski, M. & Altman, N. (2016) Points of Significance: Regularization. Nature Methods 13:803-804.

...more about the Points of Significance column

Personal Oncogenomics Program 5 Year Anniversary Art

Wed 26-07-2017

The artwork was created in collaboration with my colleagues at the Genome Sciences Center to celebrate the 5 year anniversary of the Personalized Oncogenomics Program (POG).

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
5 Years of Personalized Oncogenomics Program at Canada's Michael Smith Genome Sciences Centre. The poster shows 545 cancer cases. (left) Cases ordered chronologically by case number. (right) Cases grouped by diagnosis (tissue type) and then by similarity within group.

The Personal Oncogenomics Program (POG) is a collaborative research study including many BC Cancer Agency oncologists, pathologists and other clinicians along with Canada's Michael Smith Genome Sciences Centre with support from BC Cancer Foundation.

The aim of the program is to sequence, analyze and compare the genome of each patient's cancer—the entire DNA and RNA inside tumor cells— in order to understand what is enabling it to identify less toxic and more effective treatment options.

Principal component analysis

Thu 06-07-2017
PCA helps you interpret your data, but it will not always find the important patterns.

Principal component analysis (PCA) simplifies the complexity in high-dimensional data by reducing its number of dimensions.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Principal component analysis. (read)

To retain trend and patterns in the reduced representation, PCA finds linear combinations of canonical dimensions that maximize the variance of the projection of the data.

PCA is helpful in visualizing high-dimensional data and scatter plots based on 2-dimensional PCA can reveal clusters.

Altman, N. & Krzywinski, M. (2017) Points of Significance: Principal component analysis. Nature Methods 14:641–642.

Background reading

Altman, N. & Krzywinski, M. (2017) Points of Significance: Clustering. Nature Methods 14:545–546.

...more about the Points of Significance column

`k` index: a weightlighting and Crossfit performance measure

Wed 07-06-2017

Similar to the `h` index in publishing, the `k` index is a measure of fitness performance.

To achieve a `k` index for a movement you must perform `k` unbroken reps at `k`% 1RM.

The expected value for the `k` index is probably somewhere in the range of `k = 26` to `k=35`, with higher values progressively more difficult to achieve.

In my `k` index introduction article I provide detailed explanation, rep scheme table and WOD example.

Dark Matter of the English Language—the unwords

Wed 07-06-2017

I've applied the char-rnn recurrent neural network to generate new words, names of drugs and countries.

The effect is intriguing and facetious—yes, those are real words.

But these are not: necronology, abobionalism, gabdologist, and nonerify.

These places only exist in the mind: Conchar and Pobacia, Hzuuland, New Kain, Rabibus and Megee Islands, Sentip and Sitina, Sinistan and Urzenia.

And these are the imaginary afflictions of the imagination: ictophobia, myconomascophobia, and talmatomania.

And these, of the body: ophalosis, icabulosis, mediatopathy and bellotalgia.

Want to name your baby? Or someone else's baby? Try Ginavietta Xilly Anganelel or Ferandulde Hommanloco Kictortick.

When taking new therapeutics, never mix salivac and labromine. And don't forget that abadarone is best taken on an empty stomach.

And nothing increases the chance of getting that grant funded than proposing the study of a new –ome! We really need someone to looking into the femome and manome.

Dark Matter of the Genome—the nullomers

Wed 31-05-2017

An exploration of things that are missing in the human genome. The nullomers.

Julia Herold, Stefan Kurtz and Robert Giegerich. Efficient computation of absent words in genomic sequences. BMC Bioinformatics (2008) 9:167