Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - contact me Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca on Twitter Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - Lumondo Photography Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - Pi Art Martin Krzywinski / Genome Sciences Center / mkweb.bcgsc.ca - Hilbertonians - Creatures on the Hilbert Curve
Here we are now at the middle of the fourth large part of this talk.Pepe Deluxeget nowheremore quotes

visualization: exciting


EMBO Practical Course: Bioinformatics and Genome Analysis, 5–17 June 2017.


fun + amusement

Search Globe — Global Visualization of Google Searches by Language

Shown here is a globe visualization of world-wide Google searches, categorized by one of 21 languages. The visualization is created with WebGL toolkit and bundled data from Chrome Experiments.

Data Annotations — Geotagged and Ranked

I have annotated the data with geographical information from MaxMind, to include city, region, and country for each search location. The closest city was determined by finding the entry in the MaxMind data set (2.8M cities) with the smallest haversine distance to the coordinates of the search term. Note that latitude and longitude were provided to 3 decimal places in the original data file but are available to 7 decimal places in the MaxMind set.

The annotated data file includes new fields

  • rank (1-indexed rank of magnitude of search data point)
  • cumulative_value (fractional total of all search terms with equal or smaller magnitude)
  • language_name (name of the search language)
  • city (closest city to latitude/longitude of search data point)
  • region (region of closest city)
  • country (country of closest city)
  • city_latitude, city_longitude (coordinates of closest city)

Download geotagged data

Thanks to Evan Applegate from UC Davis for requesting an explanation of the additional fields. They were not obvious.

By language

View all languages or individual data for the following languages: Arabic Belgian Chinese Dutch English Finnish French German Indonesian Italian Japanese Korean Norwegian Polish Portuguese Romanian Russian Spanish Swedish Thai Turkish

By magnitude

View top 5%, 10%, 15% of data.

By location

View top 10 20 50 100 search locations.

By density

View search density.

Google Search Volume

Showing search density.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca

The color legend was created based on the color scheme used in the original webgl-globe code.

Observations on the data

Illegal Mexican aliens in US

There are 11 locations in the US with searches in Spanish: Dillard, Douglas, Flint Hill, Floyds Knobs, Great Falls, Orrs Island, Redwood Estates, Simpsonville, Spanish Fork, Spanish Fort, and Washington. Conspicuously, Los Angeles is missing.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Concentration of Spanish searches from continental US.

The northern-most town in Mexico with a Spanish search is Mexicali (Baja Californa, lat 32.65 long -115.47).

Chinese Take Out

The Chinese takeover has been largely overestimated. Only two towns in the US participate in Chinese language searches: Williamsport and Evensville.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Concentration of Chinese searches from continental US.

English Around the World

English in South America

With the exception of Albouystown (Demerara-Mahaica, Guyana) and Paramaribo (Suriname), South America shows no English searches.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Concentration of English searches in South America.

English in Asia

Asia shows interesting patterns. Namely, no English searches are seen from China. No doubt, political firewalls are the cause. By country, India leads with 82 searches, followed by Malaysia (64) and Pakistan (11). The full list is India (82), Malaysia (64), Pakistan (11), United (5), Bangladesh (4), Sri (3), Philippines (3), Nepal (3), Korea (3), Japan (2), Iran (2), Singapore (1), Papua (1), Myanmar (1), Maldives (1), Cambodia (1), Brunei (1), Bhutan (1), Afghanistan (1).

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Concentration of English searches in Asia.

English in the Far North

There are 25 locations with English language searches at latitude ≥ 60°. There are 15 cities in Alaska with searches (Anchorage, Barrow, Bethel, Cordova, Delta Junction, Eagle River, Fairbanks, Kenai, Nome, North Pole, Palmer, Seward, Soldotna, Valdez, Wasilla), of which Barrow is furthest north (lat 71.29°). The other 10 cities are mostly in Canada: Lerwick (Shetland Islands, United Kingdom, lat 60.160°), Whitehorse (Yukon Territory, Canada, lat 60.720°), Jarstad (Sogn og Fjordane, Norway, lat 61.360°), Fort Providence (Northwest Territories, Canada, lat 61.380°), Yellowknife (Northwest Territories, Canada, lat 62.450°), Frobisher Bay (Nunavut, Canada, lat 63.750°), Keflavík Gullbringusysla Iceland lat 64.010°), Inuvik (Northwest Territories, Canada, lat 68.340°), Gjoa Haven (Nunavut, Canada, lat 68.630°), Igloolik (Nunavut, Canada, lat 69.380°).

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Concentration of English searches in the Far North.

English in the Far South

New Zealand and Australia dominate search loations in the far south. The southermost English search is from Invercargill (Southland, New Zealand, lat -46.4° — compare this to the northmost search from Barrow in Alaska at lat 71.29°). In Australia, the southermost search is from Davenport (Tasmania, Australia, lat -41.17°). In South Africa, the southermost search is from Hermanus (Western Cape, South Africa, lat -34.42°).

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Concentration of English searches in the Far South.

Most Remote Locations

What is the most remote search location? Here, I define distance between locations by the haversine distance.

I tabulate three types of remote locations, by language, by finding

  • most remote, regardless of language of nearest city
  • most remote, with nearest city searching in the same language
  • most remote, with nearest city searching in a different language

Most Remote

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Three of the most remote search locations.

Cities, by language, most distant from their closest city.

The most remote search location of alll is Papeete, whose closest search data point is 2,287 km away — Fusi in American Samoa. Also interesting is the Belgian-speakinng Westerschelling in the Netherlands, which has the smallest maximum distance to its nearest city, by language. It is 25 km from Harlingen, Netherlands.

  1. French Papeete (French Polynesia, lat -17.540° long -149.570°) 2287 km from English Fusi (American Samoa, United States)
  2. English Mahé (Beau Vallon, Seychelles, lat -4.620° long 55.440°) 1347 km from English Hamar (Banaadir, Somalia)
  3. Russian Yakutsk (Sakha, Russian Federation, lat 62.040° long 129.750°) 1119 km from Chinese Kuchiku (Heilongjiang, China)
  4. Dutch Godthaab (Vestgronland, Greenland, lat 64.180° long -51.720°) 818 km from English Frobisher Bay (Nunavut, Canada)
  5. Portuguese Boa Vista (Roraima, Brazil, lat 2.820° long -60.670°) 522 km from English Albouystown (Demerara-Mahaica, Guyana)
  6. Indonesian Lette (Indonesia, lat -5.150° long 119.410°) 516 km from Indonesian Balikpapan (Kalimantan Timur, Indonesia)
  7. Spanish San Juan de Miraflores (Loreto, Peru, lat -3.760° long -73.270°) 458 km from Spanish San Martin (San Martin, Peru)
  8. Chinese Hotan (Xinjiang, China, lat 37.110° long 79.920°) 431 km from Chinese Kaschgar (Xinjiang, China)
  9. Arabic Ara`ar (Al Hudud ash Shamaliyah, Saudi Arabia, lat 30.980° long 41.030°) 390 km from Arabic Hael (Ha'il, Saudi Arabia)
  10. Japanese Nase (Kagoshima, Japan, lat 28.380° long 129.490°) 248 km from Japanese Nago (Okinawa, Japan)
  11. Thai Amphoe Muang Ranong (Ranong, Thailand, lat 9.970° long 98.640°) 225 km from Thai Amphoe Muang Nakhon Si Thammarat (Nakhon Si Thammarat, Thailand)
  12. Turkish Thospia (Van, Turkey, lat 38.490° long 43.380°) 177 km from English Sangar-e Beru Khan (Azarbayjan-e Bakhtari, Iran)
  13. Norwegian Guovdagæidno (Finnmark, Norway, lat 69.010° long 23.040°) 107 km from Norwegian Bosekop (Finnmark, Norway)
  14. Swedish Lofsdalen (Jamtlands Lan, Sweden, lat 62.120° long 13.270°) 106 km from Norwegian Nybergsund (Hedmark, Norway)
  15. Finnish Kansela (Oulu, Finland, lat 65.970° long 29.170°) 98 km from Finnish Märkäjärvi (Lapland, Finland)
  16. Romanian Sisesti (Gorj, Romania, lat 45.060° long 23.300°) 68 km from Romanian Drobeta-Turnu Severin (Mehedinti, Romania)
  17. Italian Nuoro (Sardegna, Italy, lat 40.320° long 9.330°) 60 km from Italian Santu Lussurgiu (Sardegna, Italy)
  18. Polish Vlodava (Poland, lat 51.550° long 23.550°) 45 km from Polish Bielawin (Poland)
  19. Korean Bontoku (Kyongsang-bukto, Korea, lat 36.410° long 129.370°) 43 km from Korean Eijitsu (Kyongsang-bukto, Korea)
  20. German Monplaisir (Brandenburg, Germany, lat 53.060° long 14.270°) 39 km from German Prenzlau (Brandenburg, Germany)
  21. Belgian Westerschelling (Friesland, Netherlands, lat 53.360° long 5.220°) 25 km from Belgian Harlingen (Friesland, Netherlands)

Most Remote — nearest city searching in same language

Cities, by language, most distant from their closest city, in which people speak (i.e. search) in the same language.

English searches are the most spread out on the globe. Of all search languuages, Mahe in Seychelles is furthest from its same-language nearest loccation of all other languages. It is 1,347 from Hamar in Somalia, in which English searches are found.

  1. English Mahé (Beau Vallon, Seychelles, lat -4.620° long 55.440°) 1347 km from English Hamar (Banaadir, Somalia)
  2. Indonesian Lette (Indonesia, lat -5.150° long 119.410°) 516 km from Indonesian Balikpapan (Kalimantan Timur, Indonesia)
  3. Spanish San Juan de Miraflores (Loreto, Peru, lat -3.760° long -73.270°) 458 km from Spanish San Martin (San Martin, Peru)
  4. Chinese Hotan (Xinjiang, China, lat 37.110° long 79.920°) 431 km from Chinese Kaschgar (Xinjiang, China)
  5. Arabic Ara`ar (Al Hudud ash Shamaliyah, Saudi Arabia, lat 30.980° long 41.030°) 390 km from Arabic Hael (Ha'il, Saudi Arabia)
  6. Japanese Nase (Kagoshima, Japan, lat 28.380° long 129.490°) 248 km from Japanese Nago (Okinawa, Japan)
  7. Thai Amphoe Muang Ranong (Ranong, Thailand, lat 9.970° long 98.640°) 225 km from Thai Amphoe Muang Nakhon Si Thammarat (Nakhon Si Thammarat, Thailand)
  8. Norwegian Guovdagæidno (Finnmark, Norway, lat 69.010° long 23.040°) 107 km from Norwegian Bosekop (Finnmark, Norway)
  9. Finnish Kansela (Oulu, Finland, lat 65.970° long 29.170°) 98 km from Finnish Märkäjärvi (Lapland, Finland)
  10. Romanian Sisesti (Gorj, Romania, lat 45.060° long 23.300°) 68 km from Romanian Drobeta-Turnu Severin (Mehedinti, Romania)
  11. Italian Nuoro (Sardegna, Italy, lat 40.320° long 9.330°) 60 km from Italian Santu Lussurgiu (Sardegna, Italy)
  12. Polish Vlodava (Poland, lat 51.550° long 23.550°) 45 km from Polish Bielawin (Poland)
  13. Korean Bontoku (Kyongsang-bukto, Korea, lat 36.410° long 129.370°) 43 km from Korean Eijitsu (Kyongsang-bukto, Korea)
  14. German Monplaisir (Brandenburg, Germany, lat 53.060° long 14.270°) 39 km from German Prenzlau (Brandenburg, Germany)
  15. Belgian Westerschelling (Friesland, Netherlands, lat 53.360° long 5.220°) 25 km from Belgian Harlingen (Friesland, Netherlands)

Most Remote — nearest city searching in different language

Cities, by language, most distant from their closest city, which is foreign (i.e. searching in a different language).

  1. French Papeete (French Polynesia, lat -17.540° long -149.570°) 2287 km from English Fusi (American Samoa, United States)
  2. Russian Yakutsk (Sakha, Russian Federation, lat 62.040° long 129.750°) 1119 km from Chinese Kuchiku (Heilongjiang, China)
  3. Dutch Godthaab (Vestgronland, Greenland, lat 64.180° long -51.720°) 818 km from English Frobisher Bay (Nunavut, Canada)
  4. Portuguese Boa Vista (Roraima, Brazil, lat 2.820° long -60.670°) 522 km from English Albouystown (Demerara-Mahaica, Guyana)
  5. Turkish Thospia (Van, Turkey, lat 38.490° long 43.380°) 177 km from English Sangar-e Beru Khan (Azarbayjan-e Bakhtari, Iran)
  6. Swedish Lofsdalen (Jamtlands Lan, Sweden, lat 62.120° long 13.270°) 106 km from Norwegian Nybergsund (Hedmark, Norway)

Top 10 Locations

About 10% of all searches come from the top 10 locations.

  1. English New York (United States)
  2. French Paris (France)
  3. Turkish Istanbul (Turkey)
  4. English London (United Kingdom)
  5. Portuguese Sao Paolo (Brazil)
  6. English Miami (United States)
  7. German Berlin (Germany)
  8. Spanish Madrid (Spain)
  9. Spanish Mexico City (Mexico)
  10. Thai Bangkok (Thailand)

I am surprised to see Miami here (bored retirees?) as well as Istanbul — I don't have a theory for that one.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Top 10 cities by search volume.

Top 100 Locations

38% of all searches come from the top 100 locations (out of 22,826), with English dominating (33/100) followed by Spanish (11/100).

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Top 100 cities by search volume.

The full breakdown for the top 100 locations by language is English (33), Spanish (11), German (8), Japanese (6), Dutch (6), Portuguese (5), French (5), Turkish (4), Italian (4), Chinese (4), Russian (3), Arabic (3), Polish (2), Thai (1), Swedish (1), Romanian (1), Korean (1), Indonesian (1), Finnish (1).

By country, the top 100 locations fall in United States (11), Germany (6), India (6), Japan (6), Brazil (5), United Kingdom (5), Italy (4), Turkey (4), Australia (3), France (3), Mexico (3), Russian Federation (3), Canada (2), China (2), Colombia (2), Poland (2), Saudi Arabia (2), Spain (2), Vietnam (2), Algeria (1), Argentina (1), Austria (1), Chile (1), Egypt (1), Finland (1), Greece (1), Hong Kong (1), Hungary (1), Indonesia (1), Ireland (1), Israel (1), Korea (1), Malaysia (1), Peru (1), Philippines (1), Romania (1), Serbia (1), Singapore (1), Sweden (1), Switzerland (1), Taiwan (1), Thailand (1), Tunisia (1), Ukraine (1), United Arab Emirates (1), Venezuela (1)

The top 100 locations are

  1. English New York (New York, United States)
  2. French Saint-Merri (Ile-de-France, France)
  3. Turkish Küçükpazar (Istanbul, Turkey)
  4. English City of London (Essex, United Kingdom)
  5. Portuguese Liberdade (Sao Paulo, Brazil)
  6. English Miami (Florida, United States)
  7. German Berlin (Berlin, Germany)
  8. Spanish Entrevías (Madrid, Spain)
  9. Spanish Ciudad de México (Distrito Federal, Mexico)
  10. Thai Amphoe Bang Rak (Krung Thep, Thailand)
  11. Spanish Bogotá (Cundinamarca, Colombia)
  12. English City of Sydney (New South Wales, Australia)
  13. Spanish Hacienda Huachipa (Lima, Peru)
  14. Spanish San Telmo (Distrito Federal, Argentina)
  15. Italian Roma (Lazio, Italy)
  16. Polish Powisle (Poland)
  17. Italian Mailand (Lombardia, Italy)
  18. English South Melbourne (Victoria, Australia)
  19. English Los Angeles (California, United States)
  20. Portuguese São Cristavem (Rio de Janeiro, Brazil)
  21. Russian Moscou (Moscow City, Russian Federation)
  22. Turkish Maltepe (Ankara, Turkey)
  23. Indonesian Pasarmanggis (Jakarta Raya, Indonesia)
  24. Dutch Ho Chi Minh City (Ho Chi Minh, Vietnam)
  25. Spanish Barcelona (Catalonia, Spain)
  26. English Toronto (Ontario, Canada)
  27. Spanish La Reina (Region Metropolitana, Chile)
  28. Spanish Los Caobas (Distrito Federal, Venezuela)
  29. English Chicago (Illinois, United States)
  30. Russian KievPetrovsky Port (Kyyivs'ka Oblast', Ukraine)
  31. Arabic Az Zahra' (Ar Riyad, Saudi Arabia)
  32. Dutch Xóm Trong (Vietnam)
  33. German München (Bayern, Germany)
  34. English Connaught Place (Delhi, India)
  35. Portuguese Venda Nova (Minas Gerais, Brazil)
  36. Dutch Afini (Attiki, Greece)
  37. English Bangalore (Karnataka, India)
  38. English Kampong Haji Abdullah Hukum (Kuala Lumpur, Malaysia)
  39. German Hamburg (Hamburg, Germany)
  40. Chinese Beijing (Beijing, China)
  41. Arabic Rawd al Faraj (Al Qahirah, Egypt)
  42. English Singapore City (Singapore)
  43. English Houston (Texas, United States)
  44. English Paddington (Essex, United Kingdom)
  45. Turkish Azmir (Izmir, Turkey)
  46. Japanese Nishi-okubo (Tokyo, Japan)
  47. English Spring Hill (Victoria, Australia)
  48. English Bombay Wadala (Maharashtra, India)
  49. Dutch Hakiriah (Tel Aviv, Israel)
  50. French Fourvière (Rhone-Alpes, France)
  51. Chinese Shanghaishih (Shanghai, China)
  52. Arabic Bani Malik (Makkah, Saudi Arabia)
  53. English Daira (Dubai, United Arab Emirates)
  54. Dutch Kiyabo (Manila, Philippines)
  55. German Inner City (Wien, Austria)
  56. Italian Naples (Campania, Italy)
  57. English Montreal (Quebec, Canada)
  58. English Kilmainham (Dublin, Ireland)
  59. German Alt-Wiedikon (Zurich, Switzerland)
  60. Japanese Kyobashi (Osaka, Japan)
  61. Dutch Buda (Budapest, Hungary)
  62. Romanian Bucarest (Bucuresti, Romania)
  63. Chinese Central District (Hong Kong)
  64. Japanese Sengendai (Kanagawa, Japan)
  65. Japanese Hibiyakoen (Tokyo, Japan)
  66. English Thousand Lights (Tamil Nadu, India)
  67. English San Francisco (California, United States)
  68. English Farragut Square (District of Columbia, United States)
  69. English Victoria Park (Manchester, United Kingdom)
  70. Swedish Norrmalm (Stockholms Lan, Sweden)
  71. German Frankford-on-Main (Hessen, Germany)
  72. German Augusta Ubiorum (Nordrhein-Westfalen, Germany)
  73. Chinese Fantzupo (T'ai-pei, Taiwan)
  74. Korean Kyedong (Seoul-t'ukpyolsi, Korea)
  75. English Lambeth (Lambeth, United Kingdom)
  76. German Stutengarten (Baden-Wurttemberg, Germany)
  77. Japanese Sarugakucho (Tokyo, Japan)
  78. English Seattle (Washington, United States)
  79. Finnish Gloet (Southern Finland, Finland)
  80. Italian Borgo Po (Piemonte, Italy)
  81. Spanish Guadalajara (Jalisco, Mexico)
  82. Spanish Alpujarra (Antioquia, Colombia)
  83. French Toulouse (Midi-Pyrenees, France)
  84. English San Diego (California, United States)
  85. English Dallas (Texas, United States)
  86. English Denver (Colorado, United States)
  87. English Dorcol (Serbia)
  88. English Aston (Essex, United Kingdom)
  89. English Romanovskiy (Moskva, Russian Federation)
  90. Polish Kleparz (Poland)
  91. Russian Aptekarskiy (Leningrad, Russian Federation)
  92. Spanish Monterrey (Nuevo Leon, Mexico)
  93. French El Bia (Alger, Algeria)
  94. French Al `Umran (Tunisia)
  95. Portuguese Bahia (Bahia, Brazil)
  96. Portuguese Brasília (Distrito Federal, Brazil)
  97. Turkish Adana (Adana, Turkey)
  98. Japanese Edo (Tokyo, Japan)
  99. English Bhaganagar (Andhra Pradesh, India)
  100. English Mali and Munjeri (Maharashtra, India)
VIEW ALL

news + thoughts

Ensemble methods: Bagging and random forests

Mon 16-10-2017
Many heads are better than one.

We introduce two common ensemble methods: bagging and random forests. Both of these methods repeat a statistical analysis on a bootstrap sample to improve the accuracy of the predictor. Our column shows these methods as applied to Classification and Regression Trees.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Ensemble methods: Bagging and random forests. (read)

For example, we can sample the space of values more finely when using bagging with regression trees because each sample has potentially different boundaries at which the tree splits.

Random forests generate a large number of trees by not only generating bootstrap samples but also randomly choosing which predictor variables are considered at each split in the tree.

Krzywinski, M. & Altman, N. (2017) Points of Significance: Ensemble methods: bagging and random forests. Nature Methods 14:933–934.

Background reading

Krzywinski, M. & Altman, N. (2017) Points of Significance: Classification and regression trees. Nature Methods 14:757–758.

...more about the Points of Significance column

Classification and regression trees

Mon 16-10-2017
Decision trees are a powerful but simple prediction method.

Decision trees classify data by splitting it along the predictor axes into partitions with homogeneous values of the dependent variable. Unlike logistic or linear regression, CART does not develop a prediction equation. Instead, data are predicted by a series of binary decisions based on the boundaries of the splits. Decision trees are very effective and the resulting rules are readily interpreted.

Trees can be built using different metrics that measure how well the splits divide up the data classes: Gini index, entropy or misclassification error.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Classification and decision trees. (read)

When the predictor variable is quantitative and not categorical, regression trees are used. Here, the data are still split but now the predictor variable is estimated by the average within the split boundaries. Tree growth can be controlled using the complexity parameter, a measure of the relative improvement of each new split.

Individual trees can be very sensitive to minor changes in the data and even better prediction can be achieved by exploiting this variability. Using ensemble methods, we can grow multiple trees from the same data.

Krzywinski, M. & Altman, N. (2017) Points of Significance: Classification and regression trees. Nature Methods 14:757–758.

Background reading

Lever, J., Krzywinski, M. & Altman, N. (2016) Points of Significance: Logistic regression. Nature Methods 13:541-542.

Altman, N. & Krzywinski, M. (2015) Points of Significance: Multiple Linear Regression Nature Methods 12:1103-1104.

Lever, J., Krzywinski, M. & Altman, N. (2016) Points of Significance: Classifier evaluation. Nature Methods 13:603-604.

Lever, J., Krzywinski, M. & Altman, N. (2016) Points of Significance: Model Selection and Overfitting. Nature Methods 13:703-704.

Lever, J., Krzywinski, M. & Altman, N. (2016) Points of Significance: Regularization. Nature Methods 13:803-804.

...more about the Points of Significance column

Personal Oncogenomics Program 5 Year Anniversary Art

Wed 26-07-2017

The artwork was created in collaboration with my colleagues at the Genome Sciences Center to celebrate the 5 year anniversary of the Personalized Oncogenomics Program (POG).

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
5 Years of Personalized Oncogenomics Program at Canada's Michael Smith Genome Sciences Centre. The poster shows 545 cancer cases. (left) Cases ordered chronologically by case number. (right) Cases grouped by diagnosis (tissue type) and then by similarity within group.

The Personal Oncogenomics Program (POG) is a collaborative research study including many BC Cancer Agency oncologists, pathologists and other clinicians along with Canada's Michael Smith Genome Sciences Centre with support from BC Cancer Foundation.

The aim of the program is to sequence, analyze and compare the genome of each patient's cancer—the entire DNA and RNA inside tumor cells— in order to understand what is enabling it to identify less toxic and more effective treatment options.

Principal component analysis

Thu 06-07-2017
PCA helps you interpret your data, but it will not always find the important patterns.

Principal component analysis (PCA) simplifies the complexity in high-dimensional data by reducing its number of dimensions.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Principal component analysis. (read)

To retain trend and patterns in the reduced representation, PCA finds linear combinations of canonical dimensions that maximize the variance of the projection of the data.

PCA is helpful in visualizing high-dimensional data and scatter plots based on 2-dimensional PCA can reveal clusters.

Altman, N. & Krzywinski, M. (2017) Points of Significance: Principal component analysis. Nature Methods 14:641–642.

Background reading

Altman, N. & Krzywinski, M. (2017) Points of Significance: Clustering. Nature Methods 14:545–546.

...more about the Points of Significance column


me as a keyword list

aikido | analogies | animals | astronomy | comfortable silence | cosmology | dorothy parker | drumming | espresso | fundamental forces | good kerning | graphic design | humanism | humour | jean michel jarre | kayaking | latin | little fluffy clouds | lord of the rings | mathematics | negative space | nuance | perceptual color palettes | philosophy of science | photography | physical constants | physics | poetry | pon farr | reason | rhythm | richard feynman | science | secularism | swing | symmetry and its breaking | technology | things that make me go hmmm | typography | unix | victoria arduino | wine | words