read_csv(here::here("data/raw/your-file-name.csv"))
Documentation for your capstone project
A pre-requisite for this homework is that you worked created a repository for your capstone project. This was an assignment of module 5. If you haven’t yet, please work through the steps outlined in Assignment 2 of Module 4 before you start with the steps outlined here.
Step 1: Create a new folder
Open your capstone project on posit.cloud.
Navigate to the Files tab in the bottom right window of RStudio.
Click on the
data
folder in the bottom wight window.Click on the “Folder” button to create a new folder.
Enter the name “processed” in field and click OK.
Click on the new
processed
folder in the bottom right window.
Step 2: Create a README.md
file
Navigate to the Files tab in the bottom right window of RStudio.
Click on the
data
folder in the bottom wight window.Click on the
processed
folder in the bottom wight window.Click on the “Blank File” button to create a new file.
Select the option “Text file”.
Enter the name “README.md” in field and click OK.
Go to: https://raw.githubusercontent.com/ds4owd-001/metadata-readme-template/main/README.md
Copy the content that’s displayed in your browser and paste it into the
README.md
file you have just created.
Step 3: Create a dictionary.xlsx
Use a spreadsheet tool your choice and create a file called
dictionary.xlsx
.Add two column names to the spreadsheet:
variable_name
anddescription
. You do not need to describe all variables, yet.Also save file as a
dictionary.csv
file.
Step 4: Upload the dictionary
Open your capstone project on posit.cloud.
Use the Files tab in the bottom right window to upload the data dictionary in CSV format to the
data/processed
folder.
Step 5: Prepare your analysis-ready (processed) data
This step will involve several iterations, depending on the complexity of your raw data. Important is that you do this for a first time for this homework assignment, so that we can start evaluating the complexity of your propject.
Open the
index.qmd
file in your capstone project.Add a code chunk and write
library(tidyverse)
to load the R packages you have learned to work with.Import your data by writing the following inside another code chunk (in this example we are using a CSV file):
Write code to bring your data into a state where it’s ready for analysis (e.g. rename columns, select columns that are relevant for your analysis, remove NAs, join several dataframes, etc.)
Once you have your data in a state where it’s ready for analysis, save it as a CSV file in the
data/processed
folder.
Step 6: Commit and push your changes
- Navigate to the Git pane in the top-right window of RStudio
- Check the box next to all files to stage them for a commit
- Click on the “Commit” button
- Enter a commit message in the “Commit message” field
- Click on the “Commit” button
- Click on the “Push” button
- Enter your GitHub username and GitHub Personal Access Token (PAT) in the “Username” and “Password” fields
You need to enter the GitHub Personal Access Token (PAT) you created in Step 1 to push your changes back to GitHub.
Step 7: Open an issue on GitHub
- Open github.com in your browser.
- Navigate to the GitHub organisation for the course.
- Find the repository samples- that ends with your GitHub username.
- Click on the “Issues” tab.
- Click on the green “New issue” button.
- In the “Title” field write: “Prepared first iteration of analysis-ready (processed) data”.
- In the “Leave a comment” field, tag the course instructors @larnsce @mianzg @sskorik01