{
"cells": [
{
"cell_type": "markdown",
"id": "cee3d0af",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Introduction to R and jupyter notebook\n",
"\n",
"In order to illustrate the use of `jupyter notebook` and `R`, let us prepare for the next lecture using a `notebook`.\n",
"\n",
"Notebooks run online (here on [syzygy.ca](syzygy.ca)). It uses two (mainly) types of \"cells\". A cell like this one is a text cell. Text is formatted using `markdown`, which is a simple text description language yet has relatively powerful capabilities. See [here](https://www.markdownguide.org/getting-started/) for details, for instance."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "8245d9f0",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"vscode": {
"languageId": "r"
}
},
"outputs": [],
"source": [
"# This is an R code cell \n",
"# The sign \"#\" is used for comments in R, so these lines do nothing"
]
},
{
"cell_type": "markdown",
"id": "a8e10089",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"source": [
"This is a markdown cell."
]
},
{
"cell_type": "markdown",
"id": "1b424914",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"source": [
"Cells are evaluated by pressing Shift+Enter or Ctrl+Enter. This current one is an evaluated markdown cell, the one just above is an unevaluated markdown cell, the first an R cell (evaluated or not, since it is only a comment, it is hard to tell the difference)"
]
},
{
"cell_type": "markdown",
"id": "37b28f53",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Grabing the Canadian census data\n",
"\n",
"To illustrate the method, we will consider the evolution of the population of Canada through time. For this, we will grab the census data. We search for \"canada historical census data csv\", since `csv` (comma separated values) is a very easy format to use with `R`. [Here](https://www150.statcan.gc.ca/n1/pub/11-516-x/sectiona/4147436-eng.htm), we find a `csv` for 1851 to 1976. We follow the link to Table A2-14, where we find another link, this time to a `csv` file. This is what we use in `R`.\n",
"\n",
"The function `read.csv` reads in a file (potentially directly from the web). We assign the result to the variable `data`. We then use the function `head` to show the first few lines in the result."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "b37083a8",
"metadata": {
"scrolled": true,
"slideshow": {
"slide_type": "slide"
},
"vscode": {
"languageId": "r"
}
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"A data.frame: 6 × 23\n",
"\n",
"\t | X | Series.A2.14. | Population.of.Canada..by.province..census.dates..1851.to.1976 | X.1 | X.2 | X.3 | X.4 | X.5 | X.6 | X.7 | ⋯ | X.11 | X.12 | X.13 | X.14 | X.15 | X.16 | X.17 | X.18 | X.19 | X.20 |
\n",
"\t | <int> | <chr> | <chr> | <int> | <chr> | <chr> | <int> | <chr> | <chr> | <chr> | ⋯ | <chr> | <int> | <chr> | <int> | <chr> | <int> | <chr> | <chr> | <int> | <lgl> |
\n",
"\n",
"\n",
"\t1 | NA | | | NA | | | NA | | | | ⋯ | | NA | | NA | | NA | | | NA | NA |
\n",
"\t2 | NA | Year | Canada | NA | Newfound- | Prince | NA | Nova | New | Quebec | ⋯ | Saskat- | NA | Alberta | NA | British | NA | Yukon | Northwest | NA | NA |
\n",
"\t3 | NA | | | NA | land | Edward | NA | Scotia | Brunswick | | ⋯ | chewan | NA | | NA | Columbia | NA | Territory | Territories | NA | NA |
\n",
"\t4 | NA | | | NA | | Island | NA | | | | ⋯ | | NA | | NA | | NA | | | NA | NA |
\n",
"\t5 | NA | | 2 | NA | 3 | 4 | NA | 5 | 6 | 7 | ⋯ | 10 | NA | 11 | NA | 12 | NA | 13 | 14 | NA | NA |
\n",
"\t6 | NA | | | NA | | | NA | | | | ⋯ | | NA | | NA | | NA | | | NA | NA |
\n",
"\n",
"
\n"
],
"text/latex": [
"A data.frame: 6 × 23\n",
"\\begin{tabular}{r|lllllllllllllllllllll}\n",
" & X & Series.A2.14. & Population.of.Canada..by.province..census.dates..1851.to.1976 & X.1 & X.2 & X.3 & X.4 & X.5 & X.6 & X.7 & ⋯ & X.11 & X.12 & X.13 & X.14 & X.15 & X.16 & X.17 & X.18 & X.19 & X.20\\\\\n",
" & & & & & & & & & & & ⋯ & & & & & & & & & & \\\\\n",
"\\hline\n",
"\t1 & NA & & & NA & & & NA & & & & ⋯ & & NA & & NA & & NA & & & NA & NA\\\\\n",
"\t2 & NA & Year & Canada & NA & Newfound- & Prince & NA & Nova & New & Quebec & ⋯ & Saskat- & NA & Alberta & NA & British & NA & Yukon & Northwest & NA & NA\\\\\n",
"\t3 & NA & & & NA & land & Edward & NA & Scotia & Brunswick & & ⋯ & chewan & NA & & NA & Columbia & NA & Territory & Territories & NA & NA\\\\\n",
"\t4 & NA & & & NA & & Island & NA & & & & ⋯ & & NA & & NA & & NA & & & NA & NA\\\\\n",
"\t5 & NA & & 2 & NA & 3 & 4 & NA & 5 & 6 & 7 & ⋯ & 10 & NA & 11 & NA & 12 & NA & 13 & 14 & NA & NA\\\\\n",
"\t6 & NA & & & NA & & & NA & & & & ⋯ & & NA & & NA & & NA & & & NA & NA\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"A data.frame: 6 × 23\n",
"\n",
"| | X <int> | Series.A2.14. <chr> | Population.of.Canada..by.province..census.dates..1851.to.1976 <chr> | X.1 <int> | X.2 <chr> | X.3 <chr> | X.4 <int> | X.5 <chr> | X.6 <chr> | X.7 <chr> | ⋯ ⋯ | X.11 <chr> | X.12 <int> | X.13 <chr> | X.14 <int> | X.15 <chr> | X.16 <int> | X.17 <chr> | X.18 <chr> | X.19 <int> | X.20 <lgl> |\n",
"|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
"| 1 | NA | | | NA | | | NA | | | | ⋯ | | NA | | NA | | NA | | | NA | NA |\n",
"| 2 | NA | Year | Canada | NA | Newfound- | Prince | NA | Nova | New | Quebec | ⋯ | Saskat- | NA | Alberta | NA | British | NA | Yukon | Northwest | NA | NA |\n",
"| 3 | NA | | | NA | land | Edward | NA | Scotia | Brunswick | | ⋯ | chewan | NA | | NA | Columbia | NA | Territory | Territories | NA | NA |\n",
"| 4 | NA | | | NA | | Island | NA | | | | ⋯ | | NA | | NA | | NA | | | NA | NA |\n",
"| 5 | NA | | 2 | NA | 3 | 4 | NA | 5 | 6 | 7 | ⋯ | 10 | NA | 11 | NA | 12 | NA | 13 | 14 | NA | NA |\n",
"| 6 | NA | | | NA | | | NA | | | | ⋯ | | NA | | NA | | NA | | | NA | NA |\n",
"\n"
],
"text/plain": [
" X Series.A2.14.\n",
"1 NA \n",
"2 NA Year \n",
"3 NA \n",
"4 NA \n",
"5 NA \n",
"6 NA \n",
" Population.of.Canada..by.province..census.dates..1851.to.1976 X.1 X.2 \n",
"1 NA \n",
"2 Canada NA Newfound-\n",
"3 NA land \n",
"4 NA \n",
"5 2 NA 3 \n",
"6 NA \n",
" X.3 X.4 X.5 X.6 X.7 ⋯ X.11 X.12 X.13 X.14 X.15 X.16\n",
"1 NA ⋯ NA NA NA \n",
"2 Prince NA Nova New Quebec ⋯ Saskat- NA Alberta NA British NA \n",
"3 Edward NA Scotia Brunswick ⋯ chewan NA NA Columbia NA \n",
"4 Island NA ⋯ NA NA NA \n",
"5 4 NA 5 6 7 ⋯ 10 NA 11 NA 12 NA \n",
"6 NA ⋯ NA NA NA \n",
" X.17 X.18 X.19 X.20\n",
"1 NA NA \n",
"2 Yukon Northwest NA NA \n",
"3 Territory Territories NA NA \n",
"4 NA NA \n",
"5 13 14 NA NA \n",
"6 NA NA "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"data_old = read.csv(\"https://www150.statcan.gc.ca/n1/en/pub/11-516-x/sectiona/A2_14-eng.csv?st=L7vSnqio\")\n",
"head(data_old)"
]
},
{
"cell_type": "markdown",
"id": "183ca6b1",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Obviously, this does not make a lot of sense. This is normal: take a look at the first few lines in the file. They take the form\n",
"```\n",
",Series A2-14.,\"Population of Canada, by province, census dates, 1851 to 1976\",,,,,,,,,,,,,,,,,,,,\n",
",,,,,,,,,,,,,,,,,,,,,,\n",
",Year,Canada,,Newfound-,Prince,,Nova,New,Quebec,Ontario, Manitoba,,Saskat-,,Alberta,,British,,Yukon,Northwest,,\n",
",,,,land,Edward,,Scotia,Brunswick,,,,,chewan,,,,Columbia,,Territory,Territories,,\n",
",,,,,Island,,,,,,,,,,,,,,,,,\n",
",,2,,3,4,,5,6,7,8,9,,10,,11,,12,,13,14,,\n",
",,,,,,,,,,,,,,,,,,,,,,\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "c40d15fd",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"This happens often: the first few lines are here to set the information, they lay out a simple version of the so-called *metadata*\n",
"\n",
"- The first line here does this; it is easy to deal with this: the function `read.csv` takes the optional argument `skip=`, which indicates how many lines to skip at the beginning\n",
"- The second line is also empty, so let us skip it too"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "241a052c",
"metadata": {
"scrolled": true,
"slideshow": {
"slide_type": "slide"
},
"vscode": {
"languageId": "r"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"A data.frame: 6 × 23\n",
"\n",
"\t | X | Year | Canada | X.1 | Newfound. | Prince | X.2 | Nova | New | Quebec | ⋯ | Saskat. | X.4 | Alberta | X.5 | British | X.6 | Yukon | Northwest | X.7 | X.8 |
\n",
"\t | <int> | <chr> | <chr> | <int> | <chr> | <chr> | <int> | <chr> | <chr> | <chr> | ⋯ | <chr> | <int> | <chr> | <int> | <chr> | <int> | <chr> | <chr> | <int> | <lgl> |
\n",
"\n",
"\n",
"\t1 | NA | | | NA | land | Edward | NA | Scotia | Brunswick | | ⋯ | chewan | NA | | NA | Columbia | NA | Territory | Territories | NA | NA |
\n",
"\t2 | NA | | | NA | | Island | NA | | | | ⋯ | | NA | | NA | | NA | | | NA | NA |
\n",
"\t3 | NA | | 2 | NA | 3 | 4 | NA | 5 | 6 | 7 | ⋯ | 10 | NA | 11 | NA | 12 | NA | 13 | 14 | NA | NA |
\n",
"\t4 | NA | | | NA | | | NA | | | | ⋯ | | NA | | NA | | NA | | | NA | NA |
\n",
"\t5 | NA | 1976 | 22,992,604 | NA | 557,725 | 118,229 | NA | 828,571 | 677,250 | 6,234,445 | ⋯ | 921,323 | NA | 1,838,037 | NA | 2,466,608 | NA | 21,836 | 42,609 | NA | NA |
\n",
"\t6 | NA | | | NA | | | NA | | | | ⋯ | | NA | | NA | | NA | | | NA | NA |
\n",
"\n",
"
\n"
],
"text/latex": [
"A data.frame: 6 × 23\n",
"\\begin{tabular}{r|lllllllllllllllllllll}\n",
" & X & Year & Canada & X.1 & Newfound. & Prince & X.2 & Nova & New & Quebec & ⋯ & Saskat. & X.4 & Alberta & X.5 & British & X.6 & Yukon & Northwest & X.7 & X.8\\\\\n",
" & & & & & & & & & & & ⋯ & & & & & & & & & & \\\\\n",
"\\hline\n",
"\t1 & NA & & & NA & land & Edward & NA & Scotia & Brunswick & & ⋯ & chewan & NA & & NA & Columbia & NA & Territory & Territories & NA & NA\\\\\n",
"\t2 & NA & & & NA & & Island & NA & & & & ⋯ & & NA & & NA & & NA & & & NA & NA\\\\\n",
"\t3 & NA & & 2 & NA & 3 & 4 & NA & 5 & 6 & 7 & ⋯ & 10 & NA & 11 & NA & 12 & NA & 13 & 14 & NA & NA\\\\\n",
"\t4 & NA & & & NA & & & NA & & & & ⋯ & & NA & & NA & & NA & & & NA & NA\\\\\n",
"\t5 & NA & 1976 & 22,992,604 & NA & 557,725 & 118,229 & NA & 828,571 & 677,250 & 6,234,445 & ⋯ & 921,323 & NA & 1,838,037 & NA & 2,466,608 & NA & 21,836 & 42,609 & NA & NA\\\\\n",
"\t6 & NA & & & NA & & & NA & & & & ⋯ & & NA & & NA & & NA & & & NA & NA\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"A data.frame: 6 × 23\n",
"\n",
"| | X <int> | Year <chr> | Canada <chr> | X.1 <int> | Newfound. <chr> | Prince <chr> | X.2 <int> | Nova <chr> | New <chr> | Quebec <chr> | ⋯ ⋯ | Saskat. <chr> | X.4 <int> | Alberta <chr> | X.5 <int> | British <chr> | X.6 <int> | Yukon <chr> | Northwest <chr> | X.7 <int> | X.8 <lgl> |\n",
"|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
"| 1 | NA | | | NA | land | Edward | NA | Scotia | Brunswick | | ⋯ | chewan | NA | | NA | Columbia | NA | Territory | Territories | NA | NA |\n",
"| 2 | NA | | | NA | | Island | NA | | | | ⋯ | | NA | | NA | | NA | | | NA | NA |\n",
"| 3 | NA | | 2 | NA | 3 | 4 | NA | 5 | 6 | 7 | ⋯ | 10 | NA | 11 | NA | 12 | NA | 13 | 14 | NA | NA |\n",
"| 4 | NA | | | NA | | | NA | | | | ⋯ | | NA | | NA | | NA | | | NA | NA |\n",
"| 5 | NA | 1976 | 22,992,604 | NA | 557,725 | 118,229 | NA | 828,571 | 677,250 | 6,234,445 | ⋯ | 921,323 | NA | 1,838,037 | NA | 2,466,608 | NA | 21,836 | 42,609 | NA | NA |\n",
"| 6 | NA | | | NA | | | NA | | | | ⋯ | | NA | | NA | | NA | | | NA | NA |\n",
"\n"
],
"text/plain": [
" X Year Canada X.1 Newfound. Prince X.2 Nova New Quebec ⋯\n",
"1 NA NA land Edward NA Scotia Brunswick ⋯\n",
"2 NA NA Island NA ⋯\n",
"3 NA 2 NA 3 4 NA 5 6 7 ⋯\n",
"4 NA NA NA ⋯\n",
"5 NA 1976 22,992,604 NA 557,725 118,229 NA 828,571 677,250 6,234,445 ⋯\n",
"6 NA NA NA ⋯\n",
" Saskat. X.4 Alberta X.5 British X.6 Yukon Northwest X.7 X.8\n",
"1 chewan NA NA Columbia NA Territory Territories NA NA \n",
"2 NA NA NA NA NA \n",
"3 10 NA 11 NA 12 NA 13 14 NA NA \n",
"4 NA NA NA NA NA \n",
"5 921,323 NA 1,838,037 NA 2,466,608 NA 21,836 42,609 NA NA \n",
"6 NA NA NA NA NA "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"data_old = read.csv(\"https://www150.statcan.gc.ca/n1/en/pub/11-516-x/sectiona/A2_14-eng.csv?st=L7vSnqio\",\n",
" skip = 2)\n",
"head(data_old)"
]
},
{
"cell_type": "markdown",
"id": "02153393",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Here, there is the further issue that to make things legible, the table authors used 3 rows (from 2 to 4) to encode for long names (*e.g.*, Prince Edward Island is written over 3 rows). Note, however, that `read.csv` has rightly picked up on the first row being the column names"
]
},
{
"cell_type": "markdown",
"id": "bda07fcf",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Because we are only interested in the total population of the country and the year, let us simply get rid of the first 4 rows and of all columns except the second (**Year**) and third (**Canada**)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "34a4f9c6",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"vscode": {
"languageId": "r"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"A data.frame: 25 × 2\n",
"\n",
"\t | Year | Canada |
\n",
"\t | <chr> | <chr> |
\n",
"\n",
"\n",
"\t5 | 1976 | 22,992,604 |
\n",
"\t6 | | |
\n",
"\t7 | 1971 | 21,568,311 |
\n",
"\t8 | 1966 | 20,014,880 |
\n",
"\t9 | 1961 | 18,238,247 |
\n",
"\t10 | 1956 | 16,080,791 |
\n",
"\t11 | 1951 | 14,009,429 |
\n",
"\t12 | | |
\n",
"\t13 | 1941 | 11,506,655 |
\n",
"\t14 | 1931 | 10,376,786 |
\n",
"\t15 | 1921 | 8,787,949 |
\n",
"\t16 | 1911 | 7,206,643 |
\n",
"\t17 | 1901 | 5,371,315 |
\n",
"\t18 | | |
\n",
"\t19 | 1891 | 4,833,239 |
\n",
"\t20 | 1881 | 4,324,810 |
\n",
"\t21 | 1871 | 3,689,257 |
\n",
"\t22 | 1861 | 3,229,633 |
\n",
"\t23 | 1851 | 2,436,297 |
\n",
"\t24 | | |
\n",
"\t25 | Includes 485 members of the Royal Canadian Navy whose province of residence is not known. | |
\n",
"\t26 | Included with Northwest Territories. | |
\n",
"\t27 | For the discussion of the ambiguities and under-enumeration contained in these figures consult the notes to series A2-14 in original volume. For completeness of enumeration in censuses of 1961 and later years, | |
\n",
"\t28 | see notes to series A15-53. | |
\n",
"\t29 | 1848 figure. | |
\n",
"\n",
"
\n"
],
"text/latex": [
"A data.frame: 25 × 2\n",
"\\begin{tabular}{r|ll}\n",
" & Year & Canada\\\\\n",
" & & \\\\\n",
"\\hline\n",
"\t5 & 1976 & 22,992,604\\\\\n",
"\t6 & & \\\\\n",
"\t7 & 1971 & 21,568,311\\\\\n",
"\t8 & 1966 & 20,014,880\\\\\n",
"\t9 & 1961 & 18,238,247\\\\\n",
"\t10 & 1956 & 16,080,791\\\\\n",
"\t11 & 1951 & 14,009,429\\\\\n",
"\t12 & & \\\\\n",
"\t13 & 1941 & 11,506,655\\\\\n",
"\t14 & 1931 & 10,376,786\\\\\n",
"\t15 & 1921 & 8,787,949 \\\\\n",
"\t16 & 1911 & 7,206,643 \\\\\n",
"\t17 & 1901 & 5,371,315 \\\\\n",
"\t18 & & \\\\\n",
"\t19 & 1891 & 4,833,239 \\\\\n",
"\t20 & 1881 & 4,324,810 \\\\\n",
"\t21 & 1871 & 3,689,257 \\\\\n",
"\t22 & 1861 & 3,229,633 \\\\\n",
"\t23 & 1851 & 2,436,297 \\\\\n",
"\t24 & & \\\\\n",
"\t25 & Includes 485 members of the Royal Canadian Navy whose province of residence is not known. & \\\\\n",
"\t26 & Included with Northwest Territories. & \\\\\n",
"\t27 & For the discussion of the ambiguities and under-enumeration contained in these figures consult the notes to series A2-14 in original volume. For completeness of enumeration in censuses of 1961 and later years, & \\\\\n",
"\t28 & see notes to series A15-53. & \\\\\n",
"\t29 & 1848 figure. & \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"A data.frame: 25 × 2\n",
"\n",
"| | Year <chr> | Canada <chr> |\n",
"|---|---|---|\n",
"| 5 | 1976 | 22,992,604 |\n",
"| 6 | | |\n",
"| 7 | 1971 | 21,568,311 |\n",
"| 8 | 1966 | 20,014,880 |\n",
"| 9 | 1961 | 18,238,247 |\n",
"| 10 | 1956 | 16,080,791 |\n",
"| 11 | 1951 | 14,009,429 |\n",
"| 12 | | |\n",
"| 13 | 1941 | 11,506,655 |\n",
"| 14 | 1931 | 10,376,786 |\n",
"| 15 | 1921 | 8,787,949 |\n",
"| 16 | 1911 | 7,206,643 |\n",
"| 17 | 1901 | 5,371,315 |\n",
"| 18 | | |\n",
"| 19 | 1891 | 4,833,239 |\n",
"| 20 | 1881 | 4,324,810 |\n",
"| 21 | 1871 | 3,689,257 |\n",
"| 22 | 1861 | 3,229,633 |\n",
"| 23 | 1851 | 2,436,297 |\n",
"| 24 | | |\n",
"| 25 | Includes 485 members of the Royal Canadian Navy whose province of residence is not known. | |\n",
"| 26 | Included with Northwest Territories. | |\n",
"| 27 | For the discussion of the ambiguities and under-enumeration contained in these figures consult the notes to series A2-14 in original volume. For completeness of enumeration in censuses of 1961 and later years, | |\n",
"| 28 | see notes to series A15-53. | |\n",
"| 29 | 1848 figure. | |\n",
"\n"
],
"text/plain": [
" Year \n",
"5 1976 \n",
"6 \n",
"7 1971 \n",
"8 1966 \n",
"9 1961 \n",
"10 1956 \n",
"11 1951 \n",
"12 \n",
"13 1941 \n",
"14 1931 \n",
"15 1921 \n",
"16 1911 \n",
"17 1901 \n",
"18 \n",
"19 1891 \n",
"20 1881 \n",
"21 1871 \n",
"22 1861 \n",
"23 1851 \n",
"24 \n",
"25 Includes 485 members of the Royal Canadian Navy whose province of residence is not known. \n",
"26 Included with Northwest Territories. \n",
"27 For the discussion of the ambiguities and under-enumeration contained in these figures consult the notes to series A2-14 in original volume. For completeness of enumeration in censuses of 1961 and later years,\n",
"28 see notes to series A15-53. \n",
"29 1848 figure. \n",
" Canada \n",
"5 22,992,604\n",
"6 \n",
"7 21,568,311\n",
"8 20,014,880\n",
"9 18,238,247\n",
"10 16,080,791\n",
"11 14,009,429\n",
"12 \n",
"13 11,506,655\n",
"14 10,376,786\n",
"15 8,787,949 \n",
"16 7,206,643 \n",
"17 5,371,315 \n",
"18 \n",
"19 4,833,239 \n",
"20 4,324,810 \n",
"21 3,689,257 \n",
"22 3,229,633 \n",
"23 2,436,297 \n",
"24 \n",
"25 \n",
"26 \n",
"27 \n",
"28 \n",
"29 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"data_old = data_old[5:dim(data_old)[1], 2:3]\n",
"data_old"
]
},
{
"cell_type": "markdown",
"id": "3c2c7b61",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Still not perfect: \n",
"1. there are some empty rows;\n",
"2. the last few rows need to be removed too, they contain remarks about the data;\n",
"3. the population counts contain commas;\n",
"4. it would be better if years were increasing.\n",
"\n",
"Let us fix these issues."
]
},
{
"cell_type": "markdown",
"id": "03115ef9",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"For 1 and 2, this is easy: remark that the **Canada** column is empty for both issues. Now remark as well that below **Canada** (and **Year**, for that matter), it is written **< chr >**. This means that entries in the column are `characters`. Looking for empty content therefore means looking for empty character chains. "
]
},
{
"cell_type": "markdown",
"id": "b5baf3ec",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"- So to fix 1 and 2, we keep the rows where **Canada** does not equal the empty chain.\n",
"\n",
"- To get rid of commas, we just need to substitute an empty chain for \",\".\n",
" \n",
"- To sort, we find the order for the years and apply it to the entire table.\n",
" \n",
"- Finally, as remarked above, for now, both the year and the population are considered as character chains. This means that in order to plot anything, we will have to indicate that these are numbers, not characters."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "96a251be",
"metadata": {
"scrolled": false,
"slideshow": {
"slide_type": "slide"
},
"vscode": {
"languageId": "r"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"A data.frame: 16 × 2\n",
"\n",
"\t | Year | Canada |
\n",
"\t | <dbl> | <dbl> |
\n",
"\n",
"\n",
"\t23 | 1851 | 2436297 |
\n",
"\t22 | 1861 | 3229633 |
\n",
"\t21 | 1871 | 3689257 |
\n",
"\t20 | 1881 | 4324810 |
\n",
"\t19 | 1891 | 4833239 |
\n",
"\t17 | 1901 | 5371315 |
\n",
"\t16 | 1911 | 7206643 |
\n",
"\t15 | 1921 | 8787949 |
\n",
"\t14 | 1931 | 10376786 |
\n",
"\t13 | 1941 | 11506655 |
\n",
"\t11 | 1951 | 14009429 |
\n",
"\t10 | 1956 | 16080791 |
\n",
"\t9 | 1961 | 18238247 |
\n",
"\t8 | 1966 | 20014880 |
\n",
"\t7 | 1971 | 21568311 |
\n",
"\t5 | 1976 | 22992604 |
\n",
"\n",
"
\n"
],
"text/latex": [
"A data.frame: 16 × 2\n",
"\\begin{tabular}{r|ll}\n",
" & Year & Canada\\\\\n",
" & & \\\\\n",
"\\hline\n",
"\t23 & 1851 & 2436297\\\\\n",
"\t22 & 1861 & 3229633\\\\\n",
"\t21 & 1871 & 3689257\\\\\n",
"\t20 & 1881 & 4324810\\\\\n",
"\t19 & 1891 & 4833239\\\\\n",
"\t17 & 1901 & 5371315\\\\\n",
"\t16 & 1911 & 7206643\\\\\n",
"\t15 & 1921 & 8787949\\\\\n",
"\t14 & 1931 & 10376786\\\\\n",
"\t13 & 1941 & 11506655\\\\\n",
"\t11 & 1951 & 14009429\\\\\n",
"\t10 & 1956 & 16080791\\\\\n",
"\t9 & 1961 & 18238247\\\\\n",
"\t8 & 1966 & 20014880\\\\\n",
"\t7 & 1971 & 21568311\\\\\n",
"\t5 & 1976 & 22992604\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"A data.frame: 16 × 2\n",
"\n",
"| | Year <dbl> | Canada <dbl> |\n",
"|---|---|---|\n",
"| 23 | 1851 | 2436297 |\n",
"| 22 | 1861 | 3229633 |\n",
"| 21 | 1871 | 3689257 |\n",
"| 20 | 1881 | 4324810 |\n",
"| 19 | 1891 | 4833239 |\n",
"| 17 | 1901 | 5371315 |\n",
"| 16 | 1911 | 7206643 |\n",
"| 15 | 1921 | 8787949 |\n",
"| 14 | 1931 | 10376786 |\n",
"| 13 | 1941 | 11506655 |\n",
"| 11 | 1951 | 14009429 |\n",
"| 10 | 1956 | 16080791 |\n",
"| 9 | 1961 | 18238247 |\n",
"| 8 | 1966 | 20014880 |\n",
"| 7 | 1971 | 21568311 |\n",
"| 5 | 1976 | 22992604 |\n",
"\n"
],
"text/plain": [
" Year Canada \n",
"23 1851 2436297\n",
"22 1861 3229633\n",
"21 1871 3689257\n",
"20 1881 4324810\n",
"19 1891 4833239\n",
"17 1901 5371315\n",
"16 1911 7206643\n",
"15 1921 8787949\n",
"14 1931 10376786\n",
"13 1941 11506655\n",
"11 1951 14009429\n",
"10 1956 16080791\n",
"9 1961 18238247\n",
"8 1966 20014880\n",
"7 1971 21568311\n",
"5 1976 22992604"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"data_old = data_old[which(data_old$Canada != \"\"),]\n",
"data_old$Canada = gsub(\",\", \"\", data_old$Canada)\n",
"order_data = order(data_old$Year)\n",
"data_old = data_old[order_data,]\n",
"data_old$Year = as.numeric(data_old$Year)\n",
"data_old$Canada = as.numeric(data_old$Canada)\n",
"data_old"
]
},
{
"cell_type": "markdown",
"id": "ac38ad95",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Row numbers are a little weird, so let us fix this."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "f758e698",
"metadata": {
"vscode": {
"languageId": "r"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"A data.frame: 16 × 2\n",
"\n",
"\t | Year | Canada |
\n",
"\t | <dbl> | <dbl> |
\n",
"\n",
"\n",
"\t1 | 1851 | 2436297 |
\n",
"\t2 | 1861 | 3229633 |
\n",
"\t3 | 1871 | 3689257 |
\n",
"\t4 | 1881 | 4324810 |
\n",
"\t5 | 1891 | 4833239 |
\n",
"\t6 | 1901 | 5371315 |
\n",
"\t7 | 1911 | 7206643 |
\n",
"\t8 | 1921 | 8787949 |
\n",
"\t9 | 1931 | 10376786 |
\n",
"\t10 | 1941 | 11506655 |
\n",
"\t11 | 1951 | 14009429 |
\n",
"\t12 | 1956 | 16080791 |
\n",
"\t13 | 1961 | 18238247 |
\n",
"\t14 | 1966 | 20014880 |
\n",
"\t15 | 1971 | 21568311 |
\n",
"\t16 | 1976 | 22992604 |
\n",
"\n",
"
\n"
],
"text/latex": [
"A data.frame: 16 × 2\n",
"\\begin{tabular}{r|ll}\n",
" & Year & Canada\\\\\n",
" & & \\\\\n",
"\\hline\n",
"\t1 & 1851 & 2436297\\\\\n",
"\t2 & 1861 & 3229633\\\\\n",
"\t3 & 1871 & 3689257\\\\\n",
"\t4 & 1881 & 4324810\\\\\n",
"\t5 & 1891 & 4833239\\\\\n",
"\t6 & 1901 & 5371315\\\\\n",
"\t7 & 1911 & 7206643\\\\\n",
"\t8 & 1921 & 8787949\\\\\n",
"\t9 & 1931 & 10376786\\\\\n",
"\t10 & 1941 & 11506655\\\\\n",
"\t11 & 1951 & 14009429\\\\\n",
"\t12 & 1956 & 16080791\\\\\n",
"\t13 & 1961 & 18238247\\\\\n",
"\t14 & 1966 & 20014880\\\\\n",
"\t15 & 1971 & 21568311\\\\\n",
"\t16 & 1976 & 22992604\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"A data.frame: 16 × 2\n",
"\n",
"| | Year <dbl> | Canada <dbl> |\n",
"|---|---|---|\n",
"| 1 | 1851 | 2436297 |\n",
"| 2 | 1861 | 3229633 |\n",
"| 3 | 1871 | 3689257 |\n",
"| 4 | 1881 | 4324810 |\n",
"| 5 | 1891 | 4833239 |\n",
"| 6 | 1901 | 5371315 |\n",
"| 7 | 1911 | 7206643 |\n",
"| 8 | 1921 | 8787949 |\n",
"| 9 | 1931 | 10376786 |\n",
"| 10 | 1941 | 11506655 |\n",
"| 11 | 1951 | 14009429 |\n",
"| 12 | 1956 | 16080791 |\n",
"| 13 | 1961 | 18238247 |\n",
"| 14 | 1966 | 20014880 |\n",
"| 15 | 1971 | 21568311 |\n",
"| 16 | 1976 | 22992604 |\n",
"\n"
],
"text/plain": [
" Year Canada \n",
"1 1851 2436297\n",
"2 1861 3229633\n",
"3 1871 3689257\n",
"4 1881 4324810\n",
"5 1891 4833239\n",
"6 1901 5371315\n",
"7 1911 7206643\n",
"8 1921 8787949\n",
"9 1931 10376786\n",
"10 1941 11506655\n",
"11 1951 14009429\n",
"12 1956 16080791\n",
"13 1961 18238247\n",
"14 1966 20014880\n",
"15 1971 21568311\n",
"16 1976 22992604"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"row.names(data_old) = 1:dim(data_old)[1]\n",
"data_old"
]
},
{
"cell_type": "markdown",
"id": "b27f3803",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Well, that looks about right! Let's see what this looks like in a graph."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "28fc0266",
"metadata": {
"scrolled": false,
"slideshow": {
"slide_type": "subslide"
},
"vscode": {
"languageId": "r"
}
},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
"plot without title"
]
},
"metadata": {
"image/png": {
"height": 420,
"width": 420
}
},
"output_type": "display_data"
}
],
"source": [
"plot(data_old$Year, data_old$Canada,\n",
" type = \"b\", lwd = 2,\n",
" xlab = \"Year\", ylab = \"Population\")"
]
},
{
"cell_type": "markdown",
"id": "3a1dc6ac",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"But wait, this is only to 1976..! Looking around, we find another table [here](https://www12.statcan.gc.ca/census-recensement/2011/dp-pd/vc-rv/index.cfm?LANG=ENG&VIEW=D&TOPIC_ID=1&GEOCODE=01&CFORMAT=html). There's a download csv link in there, let us see where this leads us. The table is 720KB, so surely there must be more to this than just the population. To get a sense of that, we dump the whole data.frame, not just its head."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "462e8627",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"vscode": {
"languageId": "r"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"A data.frame: 9960 × 5\n",
"\n",
"\tGEOGRAPHY.NAME | CHARACTERISTIC | YEAR.S. | TOTAL | FLAG_TOTAL |
\n",
"\t<chr> | <chr> | <chr> | <dbl> | <chr> |
\n",
"\n",
"\n",
"\tCanada | Population (in thousands) | 1956 | 16081.0 | |
\n",
"\tCanada | Population (in thousands) | 1961 | 18238.0 | |
\n",
"\tCanada | Population (in thousands) | 1966 | 20015.0 | |
\n",
"\tCanada | Population (in thousands) | 1971 | 21568.0 | |
\n",
"\tCanada | Population (in thousands) | 1976 | 22993.0 | |
\n",
"\tCanada | Population (in thousands) | 1981 | 24343.0 | |
\n",
"\tCanada | Population (in thousands) | 1986 | 25309.0 | |
\n",
"\tCanada | Population (in thousands) | 1991 | 27297.0 | |
\n",
"\tCanada | Population (in thousands) | 1996 | 28847.0 | |
\n",
"\tCanada | Population (in thousands) | 2001 | 30007.0 | |
\n",
"\tCanada | Population (in thousands) | 2006 | 31613.0 | |
\n",
"\tCanada | Population (in thousands) | 2011 | 33477.0 | |
\n",
"\tCanada | Population - % change | 1956 to 1961 | 13.4 | |
\n",
"\tCanada | Population - % change | 1961 to 1966 | 9.7 | |
\n",
"\tCanada | Population - % change | 1966 to 1971 | 7.8 | |
\n",
"\tCanada | Population - % change | 1971 to 1976 | 6.6 | |
\n",
"\tCanada | Population - % change | 1976 to 1981 | 5.9 | |
\n",
"\tCanada | Population - % change | 1981 to 1986 | 4.0 | |
\n",
"\tCanada | Population - % change | 1986 to 1991 | 7.9 | |
\n",
"\tCanada | Population - % change | 1991 to 1996 | 5.7 | |
\n",
"\tCanada | Population - % change | 1996 to 2001 | 4.0 | |
\n",
"\tCanada | Population - % change | 2001 to 2006 | 5.4 | |
\n",
"\tCanada | Population - % change | 2006 to 2011 | 5.9 | |
\n",
"\tCanada | Total private dwellings occupied by usual residents | 1996 | 10820050.0 | |
\n",
"\tCanada | Total private dwellings occupied by usual residents | 2001 | 11562975.0 | |
\n",
"\tCanada | Total private dwellings occupied by usual residents | 2006 | 12435520.0 | |
\n",
"\tCanada | Total private dwellings occupied by usual residents | 2011 | 13320614.0 | |
\n",
"\tCanada | % of the population aged 0 to 14 years | 1921 | 34.4 | |
\n",
"\tCanada | % of the population aged 0 to 14 years | 1931 | 31.6 | |
\n",
"\tCanada | % of the population aged 0 to 14 years | 1941 | 27.8 | |
\n",
"\t⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
\n",
"\tQuebec | Population share | 2001 | 24.1 | |
\n",
"\tQuebec | Population share | 2011 | 23.6 | |
\n",
"\tOntario | Population share | 1951 | 32.8 | |
\n",
"\tOntario | Population share | 1961 | 34.2 | |
\n",
"\tOntario | Population share | 1971 | 35.7 | |
\n",
"\tOntario | Population share | 1981 | 35.4 | |
\n",
"\tOntario | Population share | 1991 | 36.9 | |
\n",
"\tOntario | Population share | 2001 | 38.0 | |
\n",
"\tOntario | Population share | 2011 | 38.4 | |
\n",
"\tPrairie Provinces | Population share | 1951 | 18.2 | |
\n",
"\tPrairie Provinces | Population share | 1961 | 17.4 | |
\n",
"\tPrairie Provinces | Population share | 1971 | 16.4 | |
\n",
"\tPrairie Provinces | Population share | 1981 | 17.4 | |
\n",
"\tPrairie Provinces | Population share | 1991 | 17.0 | |
\n",
"\tPrairie Provinces | Population share | 2001 | 16.9 | |
\n",
"\tPrairie Provinces | Population share | 2011 | 17.6 | |
\n",
"\tBritish Columbia | Population share | 1951 | 8.3 | |
\n",
"\tBritish Columbia | Population share | 1961 | 8.9 | |
\n",
"\tBritish Columbia | Population share | 1971 | 10.1 | |
\n",
"\tBritish Columbia | Population share | 1981 | 11.3 | |
\n",
"\tBritish Columbia | Population share | 1991 | 12.0 | |
\n",
"\tBritish Columbia | Population share | 2001 | 13.0 | |
\n",
"\tBritish Columbia | Population share | 2011 | 13.1 | |
\n",
"\tTerritories | Population share | 1951 | 0.2 | |
\n",
"\tTerritories | Population share | 1961 | 0.2 | |
\n",
"\tTerritories | Population share | 1971 | 0.2 | |
\n",
"\tTerritories | Population share | 1981 | 0.3 | |
\n",
"\tTerritories | Population share | 1991 | 0.3 | |
\n",
"\tTerritories | Population share | 2001 | 0.3 | |
\n",
"\tTerritories | Population share | 2011 | 0.3 | |
\n",
"\n",
"
\n"
],
"text/latex": [
"A data.frame: 9960 × 5\n",
"\\begin{tabular}{lllll}\n",
" GEOGRAPHY.NAME & CHARACTERISTIC & YEAR.S. & TOTAL & FLAG\\_TOTAL\\\\\n",
" & & & & \\\\\n",
"\\hline\n",
"\t Canada & Population (in thousands) & 1956 & 16081.0 & \\\\\n",
"\t Canada & Population (in thousands) & 1961 & 18238.0 & \\\\\n",
"\t Canada & Population (in thousands) & 1966 & 20015.0 & \\\\\n",
"\t Canada & Population (in thousands) & 1971 & 21568.0 & \\\\\n",
"\t Canada & Population (in thousands) & 1976 & 22993.0 & \\\\\n",
"\t Canada & Population (in thousands) & 1981 & 24343.0 & \\\\\n",
"\t Canada & Population (in thousands) & 1986 & 25309.0 & \\\\\n",
"\t Canada & Population (in thousands) & 1991 & 27297.0 & \\\\\n",
"\t Canada & Population (in thousands) & 1996 & 28847.0 & \\\\\n",
"\t Canada & Population (in thousands) & 2001 & 30007.0 & \\\\\n",
"\t Canada & Population (in thousands) & 2006 & 31613.0 & \\\\\n",
"\t Canada & Population (in thousands) & 2011 & 33477.0 & \\\\\n",
"\t Canada & Population - \\% change & 1956 to 1961 & 13.4 & \\\\\n",
"\t Canada & Population - \\% change & 1961 to 1966 & 9.7 & \\\\\n",
"\t Canada & Population - \\% change & 1966 to 1971 & 7.8 & \\\\\n",
"\t Canada & Population - \\% change & 1971 to 1976 & 6.6 & \\\\\n",
"\t Canada & Population - \\% change & 1976 to 1981 & 5.9 & \\\\\n",
"\t Canada & Population - \\% change & 1981 to 1986 & 4.0 & \\\\\n",
"\t Canada & Population - \\% change & 1986 to 1991 & 7.9 & \\\\\n",
"\t Canada & Population - \\% change & 1991 to 1996 & 5.7 & \\\\\n",
"\t Canada & Population - \\% change & 1996 to 2001 & 4.0 & \\\\\n",
"\t Canada & Population - \\% change & 2001 to 2006 & 5.4 & \\\\\n",
"\t Canada & Population - \\% change & 2006 to 2011 & 5.9 & \\\\\n",
"\t Canada & Total private dwellings occupied by usual residents & 1996 & 10820050.0 & \\\\\n",
"\t Canada & Total private dwellings occupied by usual residents & 2001 & 11562975.0 & \\\\\n",
"\t Canada & Total private dwellings occupied by usual residents & 2006 & 12435520.0 & \\\\\n",
"\t Canada & Total private dwellings occupied by usual residents & 2011 & 13320614.0 & \\\\\n",
"\t Canada & \\% of the population aged 0 to 14 years & 1921 & 34.4 & \\\\\n",
"\t Canada & \\% of the population aged 0 to 14 years & 1931 & 31.6 & \\\\\n",
"\t Canada & \\% of the population aged 0 to 14 years & 1941 & 27.8 & \\\\\n",
"\t ⋮ & ⋮ & ⋮ & ⋮ & ⋮\\\\\n",
"\t Quebec & Population share & 2001 & 24.1 & \\\\\n",
"\t Quebec & Population share & 2011 & 23.6 & \\\\\n",
"\t Ontario & Population share & 1951 & 32.8 & \\\\\n",
"\t Ontario & Population share & 1961 & 34.2 & \\\\\n",
"\t Ontario & Population share & 1971 & 35.7 & \\\\\n",
"\t Ontario & Population share & 1981 & 35.4 & \\\\\n",
"\t Ontario & Population share & 1991 & 36.9 & \\\\\n",
"\t Ontario & Population share & 2001 & 38.0 & \\\\\n",
"\t Ontario & Population share & 2011 & 38.4 & \\\\\n",
"\t Prairie Provinces & Population share & 1951 & 18.2 & \\\\\n",
"\t Prairie Provinces & Population share & 1961 & 17.4 & \\\\\n",
"\t Prairie Provinces & Population share & 1971 & 16.4 & \\\\\n",
"\t Prairie Provinces & Population share & 1981 & 17.4 & \\\\\n",
"\t Prairie Provinces & Population share & 1991 & 17.0 & \\\\\n",
"\t Prairie Provinces & Population share & 2001 & 16.9 & \\\\\n",
"\t Prairie Provinces & Population share & 2011 & 17.6 & \\\\\n",
"\t British Columbia & Population share & 1951 & 8.3 & \\\\\n",
"\t British Columbia & Population share & 1961 & 8.9 & \\\\\n",
"\t British Columbia & Population share & 1971 & 10.1 & \\\\\n",
"\t British Columbia & Population share & 1981 & 11.3 & \\\\\n",
"\t British Columbia & Population share & 1991 & 12.0 & \\\\\n",
"\t British Columbia & Population share & 2001 & 13.0 & \\\\\n",
"\t British Columbia & Population share & 2011 & 13.1 & \\\\\n",
"\t Territories & Population share & 1951 & 0.2 & \\\\\n",
"\t Territories & Population share & 1961 & 0.2 & \\\\\n",
"\t Territories & Population share & 1971 & 0.2 & \\\\\n",
"\t Territories & Population share & 1981 & 0.3 & \\\\\n",
"\t Territories & Population share & 1991 & 0.3 & \\\\\n",
"\t Territories & Population share & 2001 & 0.3 & \\\\\n",
"\t Territories & Population share & 2011 & 0.3 & \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"A data.frame: 9960 × 5\n",
"\n",
"| GEOGRAPHY.NAME <chr> | CHARACTERISTIC <chr> | YEAR.S. <chr> | TOTAL <dbl> | FLAG_TOTAL <chr> |\n",
"|---|---|---|---|---|\n",
"| Canada | Population (in thousands) | 1956 | 16081.0 | |\n",
"| Canada | Population (in thousands) | 1961 | 18238.0 | |\n",
"| Canada | Population (in thousands) | 1966 | 20015.0 | |\n",
"| Canada | Population (in thousands) | 1971 | 21568.0 | |\n",
"| Canada | Population (in thousands) | 1976 | 22993.0 | |\n",
"| Canada | Population (in thousands) | 1981 | 24343.0 | |\n",
"| Canada | Population (in thousands) | 1986 | 25309.0 | |\n",
"| Canada | Population (in thousands) | 1991 | 27297.0 | |\n",
"| Canada | Population (in thousands) | 1996 | 28847.0 | |\n",
"| Canada | Population (in thousands) | 2001 | 30007.0 | |\n",
"| Canada | Population (in thousands) | 2006 | 31613.0 | |\n",
"| Canada | Population (in thousands) | 2011 | 33477.0 | |\n",
"| Canada | Population - % change | 1956 to 1961 | 13.4 | |\n",
"| Canada | Population - % change | 1961 to 1966 | 9.7 | |\n",
"| Canada | Population - % change | 1966 to 1971 | 7.8 | |\n",
"| Canada | Population - % change | 1971 to 1976 | 6.6 | |\n",
"| Canada | Population - % change | 1976 to 1981 | 5.9 | |\n",
"| Canada | Population - % change | 1981 to 1986 | 4.0 | |\n",
"| Canada | Population - % change | 1986 to 1991 | 7.9 | |\n",
"| Canada | Population - % change | 1991 to 1996 | 5.7 | |\n",
"| Canada | Population - % change | 1996 to 2001 | 4.0 | |\n",
"| Canada | Population - % change | 2001 to 2006 | 5.4 | |\n",
"| Canada | Population - % change | 2006 to 2011 | 5.9 | |\n",
"| Canada | Total private dwellings occupied by usual residents | 1996 | 10820050.0 | |\n",
"| Canada | Total private dwellings occupied by usual residents | 2001 | 11562975.0 | |\n",
"| Canada | Total private dwellings occupied by usual residents | 2006 | 12435520.0 | |\n",
"| Canada | Total private dwellings occupied by usual residents | 2011 | 13320614.0 | |\n",
"| Canada | % of the population aged 0 to 14 years | 1921 | 34.4 | |\n",
"| Canada | % of the population aged 0 to 14 years | 1931 | 31.6 | |\n",
"| Canada | % of the population aged 0 to 14 years | 1941 | 27.8 | |\n",
"| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |\n",
"| Quebec | Population share | 2001 | 24.1 | |\n",
"| Quebec | Population share | 2011 | 23.6 | |\n",
"| Ontario | Population share | 1951 | 32.8 | |\n",
"| Ontario | Population share | 1961 | 34.2 | |\n",
"| Ontario | Population share | 1971 | 35.7 | |\n",
"| Ontario | Population share | 1981 | 35.4 | |\n",
"| Ontario | Population share | 1991 | 36.9 | |\n",
"| Ontario | Population share | 2001 | 38.0 | |\n",
"| Ontario | Population share | 2011 | 38.4 | |\n",
"| Prairie Provinces | Population share | 1951 | 18.2 | |\n",
"| Prairie Provinces | Population share | 1961 | 17.4 | |\n",
"| Prairie Provinces | Population share | 1971 | 16.4 | |\n",
"| Prairie Provinces | Population share | 1981 | 17.4 | |\n",
"| Prairie Provinces | Population share | 1991 | 17.0 | |\n",
"| Prairie Provinces | Population share | 2001 | 16.9 | |\n",
"| Prairie Provinces | Population share | 2011 | 17.6 | |\n",
"| British Columbia | Population share | 1951 | 8.3 | |\n",
"| British Columbia | Population share | 1961 | 8.9 | |\n",
"| British Columbia | Population share | 1971 | 10.1 | |\n",
"| British Columbia | Population share | 1981 | 11.3 | |\n",
"| British Columbia | Population share | 1991 | 12.0 | |\n",
"| British Columbia | Population share | 2001 | 13.0 | |\n",
"| British Columbia | Population share | 2011 | 13.1 | |\n",
"| Territories | Population share | 1951 | 0.2 | |\n",
"| Territories | Population share | 1961 | 0.2 | |\n",
"| Territories | Population share | 1971 | 0.2 | |\n",
"| Territories | Population share | 1981 | 0.3 | |\n",
"| Territories | Population share | 1991 | 0.3 | |\n",
"| Territories | Population share | 2001 | 0.3 | |\n",
"| Territories | Population share | 2011 | 0.3 | |\n",
"\n"
],
"text/plain": [
" GEOGRAPHY.NAME CHARACTERISTIC \n",
"1 Canada Population (in thousands) \n",
"2 Canada Population (in thousands) \n",
"3 Canada Population (in thousands) \n",
"4 Canada Population (in thousands) \n",
"5 Canada Population (in thousands) \n",
"6 Canada Population (in thousands) \n",
"7 Canada Population (in thousands) \n",
"8 Canada Population (in thousands) \n",
"9 Canada Population (in thousands) \n",
"10 Canada Population (in thousands) \n",
"11 Canada Population (in thousands) \n",
"12 Canada Population (in thousands) \n",
"13 Canada Population - % change \n",
"14 Canada Population - % change \n",
"15 Canada Population - % change \n",
"16 Canada Population - % change \n",
"17 Canada Population - % change \n",
"18 Canada Population - % change \n",
"19 Canada Population - % change \n",
"20 Canada Population - % change \n",
"21 Canada Population - % change \n",
"22 Canada Population - % change \n",
"23 Canada Population - % change \n",
"24 Canada Total private dwellings occupied by usual residents\n",
"25 Canada Total private dwellings occupied by usual residents\n",
"26 Canada Total private dwellings occupied by usual residents\n",
"27 Canada Total private dwellings occupied by usual residents\n",
"28 Canada % of the population aged 0 to 14 years \n",
"29 Canada % of the population aged 0 to 14 years \n",
"30 Canada % of the population aged 0 to 14 years \n",
"⋮ ⋮ ⋮ \n",
"9931 Quebec Population share \n",
"9932 Quebec Population share \n",
"9933 Ontario Population share \n",
"9934 Ontario Population share \n",
"9935 Ontario Population share \n",
"9936 Ontario Population share \n",
"9937 Ontario Population share \n",
"9938 Ontario Population share \n",
"9939 Ontario Population share \n",
"9940 Prairie Provinces Population share \n",
"9941 Prairie Provinces Population share \n",
"9942 Prairie Provinces Population share \n",
"9943 Prairie Provinces Population share \n",
"9944 Prairie Provinces Population share \n",
"9945 Prairie Provinces Population share \n",
"9946 Prairie Provinces Population share \n",
"9947 British Columbia Population share \n",
"9948 British Columbia Population share \n",
"9949 British Columbia Population share \n",
"9950 British Columbia Population share \n",
"9951 British Columbia Population share \n",
"9952 British Columbia Population share \n",
"9953 British Columbia Population share \n",
"9954 Territories Population share \n",
"9955 Territories Population share \n",
"9956 Territories Population share \n",
"9957 Territories Population share \n",
"9958 Territories Population share \n",
"9959 Territories Population share \n",
"9960 Territories Population share \n",
" YEAR.S. TOTAL FLAG_TOTAL\n",
"1 1956 16081.0 \n",
"2 1961 18238.0 \n",
"3 1966 20015.0 \n",
"4 1971 21568.0 \n",
"5 1976 22993.0 \n",
"6 1981 24343.0 \n",
"7 1986 25309.0 \n",
"8 1991 27297.0 \n",
"9 1996 28847.0 \n",
"10 2001 30007.0 \n",
"11 2006 31613.0 \n",
"12 2011 33477.0 \n",
"13 1956 to 1961 13.4 \n",
"14 1961 to 1966 9.7 \n",
"15 1966 to 1971 7.8 \n",
"16 1971 to 1976 6.6 \n",
"17 1976 to 1981 5.9 \n",
"18 1981 to 1986 4.0 \n",
"19 1986 to 1991 7.9 \n",
"20 1991 to 1996 5.7 \n",
"21 1996 to 2001 4.0 \n",
"22 2001 to 2006 5.4 \n",
"23 2006 to 2011 5.9 \n",
"24 1996 10820050.0 \n",
"25 2001 11562975.0 \n",
"26 2006 12435520.0 \n",
"27 2011 13320614.0 \n",
"28 1921 34.4 \n",
"29 1931 31.6 \n",
"30 1941 27.8 \n",
"⋮ ⋮ ⋮ ⋮ \n",
"9931 2001 24.1 \n",
"9932 2011 23.6 \n",
"9933 1951 32.8 \n",
"9934 1961 34.2 \n",
"9935 1971 35.7 \n",
"9936 1981 35.4 \n",
"9937 1991 36.9 \n",
"9938 2001 38.0 \n",
"9939 2011 38.4 \n",
"9940 1951 18.2 \n",
"9941 1961 17.4 \n",
"9942 1971 16.4 \n",
"9943 1981 17.4 \n",
"9944 1991 17.0 \n",
"9945 2001 16.9 \n",
"9946 2011 17.6 \n",
"9947 1951 8.3 \n",
"9948 1961 8.9 \n",
"9949 1971 10.1 \n",
"9950 1981 11.3 \n",
"9951 1991 12.0 \n",
"9952 2001 13.0 \n",
"9953 2011 13.1 \n",
"9954 1951 0.2 \n",
"9955 1961 0.2 \n",
"9956 1971 0.2 \n",
"9957 1981 0.3 \n",
"9958 1991 0.3 \n",
"9959 2001 0.3 \n",
"9960 2011 0.3 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"data_new = read.csv(\"https://www12.statcan.gc.ca/census-recensement/2011/dp-pd/vc-rv/download-telecharger/download-telecharger.cfm?Lang=eng&CTLG=98-315-XWE2011001&FMT=csv\")\n",
"data_new"
]
},
{
"cell_type": "markdown",
"id": "89002f5c",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Haha, this looks quite nice but has way more information than we need (we just want the population of Canada). Also, the population of Canada is expressed in thousands, so once we selected what we want, we will need to multiply by 1,000.\n",
"\n",
"There are many ways to select rows. Let us proceed as follows: we want the rows where the geography is \"Canada\" and the characteristic is \"Population (in thousands)\". Let us find those indices of rows that satisfy the first criterion, those that satisfy the second; if we then intersect these two sets of indices, we will have selected the rows we want."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "f116fdc1",
"metadata": {
"scrolled": true,
"slideshow": {
"slide_type": "subslide"
},
"vscode": {
"languageId": "r"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
\n"
],
"text/latex": [
"\\begin{enumerate*}\n",
"\\item 1\n",
"\\item 2\n",
"\\item 3\n",
"\\item 4\n",
"\\item 5\n",
"\\item 6\n",
"\\item 7\n",
"\\item 8\n",
"\\item 9\n",
"\\item 10\n",
"\\item 11\n",
"\\item 12\n",
"\\end{enumerate*}\n"
],
"text/markdown": [
"1. 1\n",
"2. 2\n",
"3. 3\n",
"4. 4\n",
"5. 5\n",
"6. 6\n",
"7. 7\n",
"8. 8\n",
"9. 9\n",
"10. 10\n",
"11. 11\n",
"12. 12\n",
"\n",
"\n"
],
"text/plain": [
" [1] 1 2 3 4 5 6 7 8 9 10 11 12"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"idx_CAN = which(data_new$GEOGRAPHY.NAME == \"Canada\")\n",
"idx_char = which(data_new$CHARACTERISTIC == \"Population (in thousands)\")\n",
"idx_keep = intersect(idx_CAN, idx_char)\n",
"idx_keep"
]
},
{
"cell_type": "markdown",
"id": "09172d01",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Yes, this looks okay, so let us keep only these."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "12b87a3d",
"metadata": {
"vscode": {
"languageId": "r"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"A data.frame: 12 × 5\n",
"\n",
"\t | GEOGRAPHY.NAME | CHARACTERISTIC | YEAR.S. | TOTAL | FLAG_TOTAL |
\n",
"\t | <chr> | <chr> | <chr> | <dbl> | <chr> |
\n",
"\n",
"\n",
"\t1 | Canada | Population (in thousands) | 1956 | 16081 | |
\n",
"\t2 | Canada | Population (in thousands) | 1961 | 18238 | |
\n",
"\t3 | Canada | Population (in thousands) | 1966 | 20015 | |
\n",
"\t4 | Canada | Population (in thousands) | 1971 | 21568 | |
\n",
"\t5 | Canada | Population (in thousands) | 1976 | 22993 | |
\n",
"\t6 | Canada | Population (in thousands) | 1981 | 24343 | |
\n",
"\t7 | Canada | Population (in thousands) | 1986 | 25309 | |
\n",
"\t8 | Canada | Population (in thousands) | 1991 | 27297 | |
\n",
"\t9 | Canada | Population (in thousands) | 1996 | 28847 | |
\n",
"\t10 | Canada | Population (in thousands) | 2001 | 30007 | |
\n",
"\t11 | Canada | Population (in thousands) | 2006 | 31613 | |
\n",
"\t12 | Canada | Population (in thousands) | 2011 | 33477 | |
\n",
"\n",
"
\n"
],
"text/latex": [
"A data.frame: 12 × 5\n",
"\\begin{tabular}{r|lllll}\n",
" & GEOGRAPHY.NAME & CHARACTERISTIC & YEAR.S. & TOTAL & FLAG\\_TOTAL\\\\\n",
" & & & & & \\\\\n",
"\\hline\n",
"\t1 & Canada & Population (in thousands) & 1956 & 16081 & \\\\\n",
"\t2 & Canada & Population (in thousands) & 1961 & 18238 & \\\\\n",
"\t3 & Canada & Population (in thousands) & 1966 & 20015 & \\\\\n",
"\t4 & Canada & Population (in thousands) & 1971 & 21568 & \\\\\n",
"\t5 & Canada & Population (in thousands) & 1976 & 22993 & \\\\\n",
"\t6 & Canada & Population (in thousands) & 1981 & 24343 & \\\\\n",
"\t7 & Canada & Population (in thousands) & 1986 & 25309 & \\\\\n",
"\t8 & Canada & Population (in thousands) & 1991 & 27297 & \\\\\n",
"\t9 & Canada & Population (in thousands) & 1996 & 28847 & \\\\\n",
"\t10 & Canada & Population (in thousands) & 2001 & 30007 & \\\\\n",
"\t11 & Canada & Population (in thousands) & 2006 & 31613 & \\\\\n",
"\t12 & Canada & Population (in thousands) & 2011 & 33477 & \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"A data.frame: 12 × 5\n",
"\n",
"| | GEOGRAPHY.NAME <chr> | CHARACTERISTIC <chr> | YEAR.S. <chr> | TOTAL <dbl> | FLAG_TOTAL <chr> |\n",
"|---|---|---|---|---|---|\n",
"| 1 | Canada | Population (in thousands) | 1956 | 16081 | |\n",
"| 2 | Canada | Population (in thousands) | 1961 | 18238 | |\n",
"| 3 | Canada | Population (in thousands) | 1966 | 20015 | |\n",
"| 4 | Canada | Population (in thousands) | 1971 | 21568 | |\n",
"| 5 | Canada | Population (in thousands) | 1976 | 22993 | |\n",
"| 6 | Canada | Population (in thousands) | 1981 | 24343 | |\n",
"| 7 | Canada | Population (in thousands) | 1986 | 25309 | |\n",
"| 8 | Canada | Population (in thousands) | 1991 | 27297 | |\n",
"| 9 | Canada | Population (in thousands) | 1996 | 28847 | |\n",
"| 10 | Canada | Population (in thousands) | 2001 | 30007 | |\n",
"| 11 | Canada | Population (in thousands) | 2006 | 31613 | |\n",
"| 12 | Canada | Population (in thousands) | 2011 | 33477 | |\n",
"\n"
],
"text/plain": [
" GEOGRAPHY.NAME CHARACTERISTIC YEAR.S. TOTAL FLAG_TOTAL\n",
"1 Canada Population (in thousands) 1956 16081 \n",
"2 Canada Population (in thousands) 1961 18238 \n",
"3 Canada Population (in thousands) 1966 20015 \n",
"4 Canada Population (in thousands) 1971 21568 \n",
"5 Canada Population (in thousands) 1976 22993 \n",
"6 Canada Population (in thousands) 1981 24343 \n",
"7 Canada Population (in thousands) 1986 25309 \n",
"8 Canada Population (in thousands) 1991 27297 \n",
"9 Canada Population (in thousands) 1996 28847 \n",
"10 Canada Population (in thousands) 2001 30007 \n",
"11 Canada Population (in thousands) 2006 31613 \n",
"12 Canada Population (in thousands) 2011 33477 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"data_new = data_new[idx_keep,]\n",
"data_new"
]
},
{
"cell_type": "markdown",
"id": "a30736ce",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"We want to concatenate this data.frame with the one from earlier. To do this, we need the two data frames to have the same number of columns and, actually, the same column names and entry types (notice that **YEAR.S.** in data_new is a column of characters). \n",
"\n",
"So what remains to do:"
]
},
{
"cell_type": "markdown",
"id": "649bf6e0",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"1. Rename the columns in the pruned old data (data_pruned) to **year** and **population**. Personally, I prefer lowercase column names.. and **population** is more informative than **Canada**.\n",
"2. Keep only the relevant columns in data_new, rename them accordingly and multiply population by 1,000 there.\n",
"3. Transform year in data_new to numbers.\n",
"4. We already have data up to and including 1976 in data_old, so get rid of that in data_new.\n",
"5. Append the rows of data_new to those of data_pruned."
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "cc286223",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"vscode": {
"languageId": "r"
}
},
"outputs": [],
"source": [
"colnames(data_old) = c(\"year\", \"population\")\n",
"data_new = data_new[,c(\"YEAR.S.\",\"TOTAL\")]\n",
"colnames(data_new) = c(\"year\", \"population\")\n",
"data_new$year = as.numeric(data_new$year)\n",
"data_new = data_new[which(data_new$year>1976),]\n",
"data_new$population = data_new$population*1000\n",
"\n",
"data = rbind(data_old,data_new)"
]
},
{
"cell_type": "markdown",
"id": "e3234223",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"OK, we are ready now!!"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "5b69fa3d",
"metadata": {
"scrolled": true,
"slideshow": {
"slide_type": "subslide"
},
"vscode": {
"languageId": "r"
}
},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
"plot without title"
]
},
"metadata": {
"image/png": {
"height": 420,
"width": 420
}
},
"output_type": "display_data"
}
],
"source": [
"plot(data$year, data$population,\n",
" type = \"b\", lwd = 2,\n",
" xlab = \"Year\", ylab = \"Population\")"
]
},
{
"cell_type": "markdown",
"id": "2faad8cb",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"In case we need the data elsewhere, we can save it."
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "db34ba0e",
"metadata": {
"vscode": {
"languageId": "r"
}
},
"outputs": [],
"source": [
"write.csv(data, file = \"Canada_census.csv\")"
]
}
],
"metadata": {
"celltoolbar": "Slideshow",
"kernelspec": {
"display_name": "R",
"language": "R",
"name": "ir"
},
"language_info": {
"codemirror_mode": "r",
"file_extension": ".r",
"mimetype": "text/x-r-source",
"name": "R",
"pygments_lexer": "r",
"version": "4.3.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}