Code
library(DataCombine)
library(tidyverse)
library(tidyr)
library(ggplot2)
library(ggthemes)
library(lubridate)
library(readr)
library(readxl)
library(gt)
library(dplyr)
library(knitr)
This project will objectively be analyzing and interpreting raw borehole repair data to aid planning and decision making. Boreholes are the main technology used to access ground water in Uganda according to (Owor et al. 2022), and also a source for drinking water for households in rural communities in Africa, Uganda inclusive (Lapworth et al. 2020,), therefore it is important to have good quality data to inform decision making and planning. This project looks at data collected from two districts in central Uganda where a borehole operation and maintenance program is run. As professional operation and maintenance is looked at as the future for borehole functionality in Uganda (Smith, Ongom, and Davis 2023), this project report offers more insights on research for this topic.
This data is collected from a sample of borehole repair records used by the borehole operation and maintenance company operating in central Uganda. Population data is picked as an interview from a representative of the Local Water User Committees (LWUCs). The data on the technical specifications about the borehole is picked from the borehole records file from the company.
library(DataCombine)
library(tidyverse)
library(tidyr)
library(ggplot2)
library(ggthemes)
library(lubridate)
library(readr)
library(readxl)
library(gt)
library(dplyr)
library(knitr)
We start by reading the raw data from the .csv file
<- read_csv(here::here("data/raw/borehole_repair_data.csv")) borehole
Transforming the data into a readable variable name
<- borehole |>
new_well_yield rename("well_yield" = "well_yield_(m^3/hr)")
<- drop_na(new_well_yield) processed_borehole_data
Writing the processed data ready for analysis into the processed folder
write_csv(processed_borehole_data, here::here("data/processed/processed_borehole_data.csv"))
Createing a new variable from the existing data
<- processed_borehole_data |>
district_column mutate(district = case_when(
== "Gombe" ~ "Wakiso",
sub_county == "Kakiri" ~ "Wakiso",
sub_county == "Kakiri Town Council" ~ "Wakiso",
sub_county == "Namayumba Town Council" ~ "Wakiso",
sub_county == "Kira" ~ "Wakiso",
sub_county TRUE ~ "Luwero"
))
Figure 1 is a histogram showing the distribution of well depth across two districts.
ggplot(data = district_column,
mapping = aes(x = well_depth,
fill = district)) +
geom_histogram()+
xlab("Borehole Depth(m)")+
ylab("No. of Boreholes")+
labs(title = "Borehole population served summary, data from two districts")
From the histogram above we can conclude that the average depth of boreholes in both Wakiso and Luwero District is similar. For both districts the depth of the biggest percentage of boreholes is below 75 meters deep. We can also see that there are extreme instances in Luwero district where three boreholes are deeper than 100 meters.
Figure 2 is a scatterplot showing well yield distribution and population served across the two districts.
ggplot(data = district_column,
mapping = aes(x = population_served,
y = well_yield,
fill = district,
color = district))+
geom_point()+
lims(y = c(0,100))+
xlab("Populatin Served")+
ylab("Borehole Yield(m3)")+
labs(title = "Borehole well yield yield vs population served in two districts")
The scatter plot chart above shows us that the average population served by a borehole in the two districts where the sample data was collected from is 1000 people. We also learn that the average yield of boreholes in these two districts is 12.5 m3. We see cases where the population served and yield of boreholes goes above average, those are areas where we can investigate further.
Figure 3 is a column chart showing borehole numbers repaired by quarter and year.
<- processed_borehole_data |>
summary_data group_by(repair_date) |>
summarise(count = n())
ggplot(data = summary_data,
mapping = aes(x = year(repair_date),
y = count,
fill = quarter(repair_date))) +
geom_col()+
xlab("Repair date/ Year")+
ylab("No. of Boreholes")+
labs(title = "Borehole repaired by quarter of the year")
The column chart above informs us the year and quarter when the majority of boreholes were repaired. In this case with the data set that we have most boreholes 73 boreholes were repaired in the year 2022. In terms of the quarter where majority of boreholes were repaired we see that for the 2021 all of the boreholes (31) were repaired in the last quarter, for 2022 majority of the boreholes (23) were repaired in the first quarter and then finally in the year 2023, (19) boreholes were repaired in the second quarter.
# table creation
<- district_column |>
tbl_bhr_summary group_by(district) |>
summarise(
count = n(),
mean_popn = mean(population_served),
sd_popn = sd(population_served),
median_popn = median(population_served)
) # export table to processed folder
write_csv(tbl_bhr_summary, here::here("data/processed/tbl-01-bhr-summary.csv"))
(tbl_bhr_summary?) shows that Wakiso District has more people served by just 11 boreholes compared to Luwero District which has 136 boreholes.
Table 1 shows borehole characteristics in the two districts of operation.
# Using kable() to display the bhr-summary table
kable(tbl_bhr_summary)
district | count | mean_popn | sd_popn | median_popn |
---|---|---|---|---|
Luwero | 146 | 487.1507 | 757.8711 | 263 |
Wakiso | 11 | 1341.1818 | 987.4383 | 988 |
The table above shows us the total number of boreholes repaired in each of the target districts, the mean, standard deviation and median of the population served in each of the target districts.
Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see https://quarto.org.
[1] 4
From this data and the investigation carried out we can conclude that;