Course Overview

Thank you for your interest in this course. Your course instructors: Lars Schöbitz & Mian Zhong & Sophia Skorik are looking forward to meet you.

We will meet on Zoom for 10 modules over 17 weeks (see Course Calendar below) at the following times:

We will use Posit Cloud infrastructure, so you do not need to install any software. You will hear from us about a week before the course start.

The registration window for this course has closed, but you can fill out the following form to sign up for the next time we host this course: https://forms.gle/MP5rNYZagBdfG2ZRA

Who can participate?

To participate in this course, you need:

  • to have a stable internet connection
  • to somewhat be connected to the greater Water, Sanitation and Hygiene (WASH) sector (yes, public health, solid waste management, global health engineering, and related topics also count)
  • to commit 10 * 2.5 hours to participate in Zoom calls
  • to commit another 2 hours/week for readings and additional exercises for practice
  • to identify a dataset of your own or your organisation that you are interested to share with the public
  • an openess to new ideas and workflows that disrupt current practice

What do we offer?

This course is:

  • free
  • provides you with a certificate for successful completion
  • using exclusively tools that are free and open source
  • offers 1:1 coding support between lectures and beyond the course

Course Information

This course provides learners with skills in using the collection of R tidyverse packages as a tool for data analysis, reproducible research and communication. Lectures will be delivered through participatory live coding for students to learn how to write code in code-along exercises. We will use publicly available data related to waste management, air quality, and sanitation. Students will learn how to help themselves using large language models (LLMs) and AI tools Perplexity and build upon the obtained skills to apply them to their data analysis projects.

Topics include:

  • The data science life-cycle
  • Data organization in spreadsheets
  • Exploratory data analysis using visualization
  • Using AI for software development in R
  • Concept of tidy data and data tidying
  • Data transformation and descriptive statistics
  • Data communication using the Quarto open-source scientific and technical publishing system

Learning Goals

  1. Be able to use a common set of data science tools (R, RStudio IDE, Git, GitHub, tidyverse, Quarto) to illustrate and communicate the results of data analysis projects.

  2. Learn to use the Quarto file format and the RStudio IDE visual editing mode to produce scholarly documents with citations, footnotes, cross-references, figures, and tables.

Textbooks and Materials

We will rely entirely on open source and open access material for this course. We will use “R for Data Science” by Hadley Wickham, and “Tidyverse Skills for Data Science” by Carrie Wright, Shannon E. Ellis, Stephanie C. Hicks and Roger D. Peng, as complementary reading and learning material for this course. Additional readings will consist of blog posts, journal articles, and reports. All required readings and class material will be provided through this website.

Course Calendar

date week topic module
31 October 2023 1 Welcome & get ready for the course module 1
07 November 2023 2 Data science lifecycle & Exploratory data analysis using visualization module 2
14 November 2023 3 Data transformation with dplyr module 3
21 November 2023 4 Data import & Data organization in spreadsheets module 4
28 November 2023 5 Conditions & Dates & Tables module 5
05 December 2023 6 Data types & Vectors & For Loops module 6
12 December 2023 7 Pivoting & joining data module 7
19 December 2023 8 Break NA
26 December 2023 9 Break NA
02 January 2024 10 Break NA
09 January 2024 11 Work on Capstone project NA
16 January 2024 12 Creating and publishing scholarly articles with Quarto and GitHub pages module 8
23 January 2024 13 openwashdata webinar: a data sharing workflow that may please the publishers NA
30 January 2024 14 Using AI for software development in R module 9
06 February 2024 15 Work on Capstone project NA
13 February 2024 16 Final submission date of Capstone project NA
20 February 2024 17 Graduation party of openwashdata academy module 10

Weekly Structure

Monday
Tuesday Module from 2 pm to 4:30 pm CET
Wednesday
Thursday Office hours on Zoom (2 pm to 3:30 pm CET)
Friday

Assignments

Homework assignments: Each week will have at least one homework assignment. All assignments, but those for Week 1 are delivered as Quarto documents with instructions and some sample code. Students are required to submit their work through GitHub.

Readings: Every week, additional readings will be provided that support students in learning the underlying concept that are taught during the class.

Capstone Project: A final capstone project provides students with an opportunity to apply their skills and techniques to real-world data sets. Detailed instructions for the capstone project will be provided. The project will be delivered as Quarto documents and students are asked to submit their work through GitHub.

Attendance

We hope you can participate in all classes. Class participation is an important component for successful completion of this course.

Code of Conduct

This course and the openwashdata community follows a code of conduct. Please ensure that you have read it.