Documentation for your capstone project

Have you created a repository for your capstone project?

A pre-requisite for this homework is that you worked created a repository for your capstone project. This was an assignment of module 5. If you haven’t yet, please work through the steps outlined in Assignment 2 of Module 4 before you start with the steps outlined here.

Step 1: Create a new folder

Open your capstone project on posit.cloud.
Navigate to the Files tab in the bottom right window of RStudio.
Click on the data folder in the bottom wight window.
Click on the “Folder” button to create a new folder.
Enter the name “processed” in field and click OK.
Click on the new processed folder in the bottom right window.

Step 2: Create a `README.md` file

Navigate to the Files tab in the bottom right window of RStudio.
Click on the data folder in the bottom wight window.
Click on the processed folder in the bottom wight window.
Click on the “Blank File” button to create a new file.
Select the option “Text file”.
Enter the name “README.md” in field and click OK.
Go to: https://raw.githubusercontent.com/ds4owd-001/metadata-readme-template/main/README.md
Copy the content that’s displayed in your browser and paste it into the README.md file you have just created.

Step 3: Create a `dictionary.xlsx`

Use a spreadsheet tool your choice and create a file called dictionary.xlsx.
Add two column names to the spreadsheet: variable_name and description. You do not need to describe all variables, yet.
Also save file as a dictionary.csv file.

Step 4: Upload the dictionary

Open your capstone project on posit.cloud.
Use the Files tab in the bottom right window to upload the data dictionary in CSV format to the data/processed folder.

Step 5: Prepare your analysis-ready (processed) data

We will re-iterate on this step several times.

This step will involve several iterations, depending on the complexity of your raw data. Important is that you do this for a first time for this homework assignment, so that we can start evaluating the complexity of your propject.

Open the index.qmd file in your capstone project.
Add a code chunk and write library(tidyverse) to load the R packages you have learned to work with.
Import your data by writing the following inside another code chunk (in this example we are using a CSV file):

read_csv(here::here("data/raw/your-file-name.csv"))

Write code to bring your data into a state where it’s ready for analysis (e.g. rename columns, select columns that are relevant for your analysis, remove NAs, join several dataframes, etc.)
Once you have your data in a state where it’s ready for analysis, save it as a CSV file in the data/processed folder.

Step 6: Commit and push your changes

Navigate to the Git pane in the top-right window of RStudio
Check the box next to all files to stage them for a commit
Click on the “Commit” button
Enter a commit message in the “Commit message” field
Click on the “Commit” button
Click on the “Push” button
Enter your GitHub username and GitHub Personal Access Token (PAT) in the “Username” and “Password” fields

Do not use your GitHub password

You need to enter the GitHub Personal Access Token (PAT) you created in Step 1 to push your changes back to GitHub.

Step 7: Open an issue on GitHub

Open github.com in your browser.
Navigate to the GitHub organisation for the course.
Find the repository samples- that ends with your GitHub username.
Click on the “Issues” tab.
Click on the green “New issue” button.
In the “Title” field write: “Prepared first iteration of analysis-ready (processed) data”.
In the “Leave a comment” field, tag the course instructors @larnsce @mianzg @sskorik01