- Learners can apply ten functions from the dplyr R Package to generate a subset of data for use in a table or plot.
ds4owd - data science for openwashdata
2023-11-14
… based on the concepts of functions as verbs that manipulate data frames
select
: pick columns by namearrange
: reorder rowsfilter
: pick rows matching criteriarelocate
: changes the order of the columnsmutate
: add new variablessummarise
: reduce variables to valuesgroup_by
: for grouped operationsRules of dplyr
functions:
filter()
.data =
year == 2007
What to do with the datafilter()
.data =
year == 2007
What to do with the datagapminder_2007
filter()
.data =
year == 2007
What to do with the datagapminder_2007
<-
|>
name | iso3 | year | region_sdg | varname_short | varname_long | residence | percent |
---|---|---|---|---|---|---|---|
Afghanistan | AFG | 2000 | Central and Southern Asia | san_bas | basic sanitation services | national | 21.9 |
Afghanistan | AFG | 2000 | Central and Southern Asia | san_bas | basic sanitation services | rural | 19.3 |
Afghanistan | AFG | 2000 | Central and Southern Asia | san_bas | basic sanitation services | urban | 30.9 |
Afghanistan | AFG | 2000 | Central and Southern Asia | san_lim | limited sanitation services | national | 5.6 |
Afghanistan | AFG | 2000 | Central and Southern Asia | san_lim | limited sanitation services | rural | 3.1 |
Afghanistan | AFG | 2000 | Central and Southern Asia | san_lim | limited sanitation services | urban | 14.5 |
varname_short | varname_long | n |
---|---|---|
san_bas | basic sanitation services | 14742 |
san_lim | limited sanitation services | 14742 |
san_od | no sanitation facilities | 14742 |
san_sm | safely managed sanitation services | 14742 |
san_unimp | unimproved sanitation facilities | 14742 |
md-03a-data-transformation.qmd
file and click on it to open it in the top left window.15:00
md-03b-your-turn-filter.qmd
file and click on it to open it in the top left window.20:00
Please get up and move! Let your emails rest in peace.
10:00
Image generated with DALL-E 3 by OpenAI
filter()
residence == "national", etc.
What to do with the datasanitation_national_2020_sm
<-
|>
filter()
function to create a subset from the sanitation
data containing urban and rural estimates for Nigeria.sanitation_nigeria_urban_rural
Great for timeseries data
Use the ggplot()
function to create a connected scatterplot with geom_point()
and geom_line()
for the data you created in Task 1.2.
Use the aes()
function to map the year variable to the x-axis, the percent
variable to the y-axis, and the varname_short
variable to color and group aesthetic.
Use facet_wrap()
to create a separate plot urban and rural populations.
Change the colors using scale_color_colorblind()
.
md-03a-data-transformation.qmd
file and click on it to open it in the top left window.30:00
Please get up and move! Let your emails rest in peace.
10:00
Image generated with DALL-E 3 by OpenAI
md-03c-your-turn-summarise.qmd
file and click on it to open it in the top left window.40:00
Slides created via revealjs and Quarto: https://quarto.org/docs/presentations/revealjs/ Access slides as PDF on GitHub
All material is licensed under Creative Commons Attribution Share Alike 4.0 International.