Determinants of borehole’ functionality in Northen Malawi
Author
Affiliation
Mabvuto Yesaya
Malawi University of Business and Applied Sciences,Malawi
Published
February 10, 2024
Project description
The data is from TA Wasambo in Karonga district (Northen Malawi). This data is for boreholes/handpumps that people mostly rely on for drinking water collection, secondary to wells and rivers. The data were collected in June 2013, and my interests are:
Objectives
To know the users of boreholes per village.
To check if there is a relationship between the costs that were considered before setting the tariff and the amount that is agreed to be paid as a tariff per month now
To check if the actual tariff collected per month, tariff collection frequency, distance to where borehole spare parts are, presence of a service provider, actual tariff collected and conducting preventive maintenance lead to the functionality of boreholes
The goal of this document is to explore, clean, and select the final variables from the borehole Malawi dataset. The selected variables will be used in:
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.3 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.4.4 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.0
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
#clean column nameslibrary (janitor)
Attaching package: 'janitor'
The following objects are masked from 'package:stats':
chisq.test, fisher.test
#for EDAlibrary(explore)#here here packagelibrary(here)
here() starts at /cloud/project
#making simple tableslibrary(knitr)library(dplyr)#for making dummy variableslibrary(fastDummies)
Thank you for using fastDummies!
To acknowledge our work, please cite the package:
Kaplan, J. & Schlegel, B. (2023). fastDummies: Fast Creation of Dummy (Binary) Columns and Rows from Categorical Variables. Version 1.7.1. URL: https://github.com/jacobkap/fastDummies, https://jacobkap.github.io/fastDummies/.
## data#read_csv or read_csv2 to read in a ; separated file does not work for me after consulting perplexity ai for multiple times. In the meantime, I will use read_delim. borehole_malawi_raw <-read_delim("/cloud/project/data/raw/Borehole_Malawi_-_all_versions_-_labels_-_2023-12-31-03-34-25.csv",delim =";")
New names:
Rows: 197 Columns: 194
── Column specification
──────────────────────────────────────────────────────── Delimiter: ";" chr
(81): Bohole name and ID, Village name, Name of VDC, Main role of the r... dbl
(99): If Functional status is one of not functional, No longer exists o... lgl
(8): HAND PUMP BOREHOLE FUNCTIONALITY SURVEY, Specify...86, Record the... dttm
(3): start, end, _submission_time date (3): today, Which year was the hand pump
borehole rehabilitated, What ...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `What is the current problem?` -> `What is the current problem?...15`
• `What is the current problem?` -> `What is the current problem?...17`
• `Specify` -> `Specify...29`
• `Specify` -> `Specify...86`
• `Specify` -> `Specify...109`
• `Specify` -> `Specify...126`
• `Specify` -> `Specify...128`
• `Is there a service provider or someone responsible for operating and/or
maintaining this hand pump borehole or water system?` -> `Is there a service
provider or someone responsible for operating and/or maintaining this hand
pump borehole or water system?...138`
• `Is there a service provider or someone responsible for operating and/or
maintaining this hand pump borehole or water system?` -> `Is there a service
provider or someone responsible for operating and/or maintaining this hand
pump borehole or water system?...142`
• `Specify` -> `Specify...155`
Data cleaning
#add id numbersborehole_malawi<-borehole_malawi_raw |>mutate(id =seq(1:n())) |>relocate(id)
#objective 1: To know the users of boreholes per Village.borehole_malawi<-borehole_malawi |>rename(village_name=`Village name`,bh_use_num=`How many households use (d) this hand pump borehole?`)
#objective 2: To check if there is a relationship between the costs that were considered before setting the tariff and the amount that is agreed to be paid as a tariff per month nowborehole_malawi<-borehole_malawi |>rename( tariff_costs_consider=`What costs were considered when setting the tariff or user fee?`,tarrif_frequency=`How often is the tariff/user fee collected?`,tariff_amount=`How much is the tariff/user fee (in Kwacha)?`)
#objective 3: To check if the actual tariff collected per month, distance to where borehole spare parts are, presence of a service provider and conducting preventive maintenance lead to the functionality of boreholes borehole_malawi<-borehole_malawi |>rename(tariff_hh_number=`How many households in the community paid a water fee the last time it was collected?`,distance_to_spareparts=`How far away are (were) the materials you use for hand pump borehole repairs (ex. spare parts, tools, etc.)?`,available_service_provider=`Is there a service provider or someone responsible for operating and/or maintaining this hand pump borehole or water system?...142`,preventive_mantainance=`Do you conduct preventive maintenance?`,bh_functional=`Functional status of the borehole`)
#data cleaningborehole_malawi_small<-borehole_malawi_small |>clean_names() |>#remove the entries where the borehole no longer exists or abandonedfilter(bh_functional !='No longer exists or abandoned')#recode functionality columnborehole_malawi_small<-borehole_malawi_small |>mutate(bh_functional =case_when( bh_functional =="Functional"~"Yes", bh_functional =="Not functional"~"No", bh_functional=='Partially functional but in need of repair'~"No" ) )#village name to be factor#bhusenum to be int not double#tarrif costs consider to be faactor#tarrif frequency to be factor#tarrif amount to be int not dbl#tariff hh number to be int not dbl#distance to spare parts to be factor#availability of service provider to be factor#preventive mantainance to be factor#bh_functional tobe factor#convert categories that will be regressed into dummies (1/0)#change all Y/N into 1/0
#shorten accessible within the community but leave others unchanedborehole_malawi_small <- borehole_malawi_small %>%mutate(distance_to_spareparts =case_when( distance_to_spareparts =="Accessible within the community"~"Within community", TRUE~as.character(distance_to_spareparts) ) )#remove the records where village name is missing or Nb borehole_malawi_small <- borehole_malawi_small %>%filter(!is.na(village_name) & village_name !="Nb")
#I think I have done the known ones, know let me do EDA to check every variable#explore(borehole_malawi_small) #main EDA indicator#the number of people that uses a boreholeborehole_malawi_small |>group_by(village_name) |>#selectedsummarise(bh_use_num)
Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
always returns an ungrouped data frame and adjust accordingly.
`summarise()` has grouped output by 'village_name'. You can override using the
`.groups` argument.
#village names were manually typed. I need to check the names for consistency and recode them# Summarize the borehole_malawi_small dataset by village_namesummary_table <- borehole_malawi_small %>%filter(!is.na(village_name) & village_name !="Nb") %>%group_by(village_name) %>%summarise(count =n()) %>%kable()# Print the summary tableprint(summary_table)
#recode the similar names with different spellings. I suspect data entry errorsborehole_malawi_small <- borehole_malawi_small %>%mutate(village_name =case_when( village_name =="Mlirasaambo"~"Mlirasambo", village_name =="Mlirasaambo"~"Mlirasaambo", village_name =="Mulirasambo"~"Mlirasambo", village_name =="Mugabi"~"Mugabe",TRUE~as.character(village_name) ) )
#tariff amount#I will write a funcion to handle the column with comments in it# Define the modify functionmodify <-function(dataset) { dataset <-as.numeric(dataset) # Convert the column to numeric to handle NAs# 1. Bring down values that are greater than 2000 to 2000 dataset[dataset >2000] <-2000# 2. When it finds values that have five digits, the last digit should be deleted dataset <-ifelse(nchar(as.character(dataset)) ==5, as.numeric(substring(as.character(dataset), 1, 4)), dataset)# 3. When it finds a single number, it should add 2 zeros to it dataset <-ifelse(dataset <10, dataset *100, dataset)return(dataset)}# Apply the modify function to the borehole_malawi_small datasetborehole_malawi_small$tariff_amount <-modify(borehole_malawi_small$tariff_amount)
#tariff_hh_number is variable that define the number of households that paid that month when the surveywas conducted in July. If households are tagged that they drink water from a borehole but forthat time they did not pay, NAs should be Zeroborehole_malawi_small <- borehole_malawi_small |>mutate(tariff_hh_number =replace_na(tariff_hh_number, 0))#same for tarrif amountborehole_malawi_small <- borehole_malawi_small |>mutate(tariff_amount =replace_na(tariff_amount, 0))#preventive mantainanceborehole_malawi_small <- borehole_malawi_small |>mutate(preventive_mantainance =replace_na(preventive_mantainance,'No'))#if tarrif frequency is missing it means those people do not pay tarrif beause that question as not applicable to them. Therefore>Noneborehole_malawi_small <- borehole_malawi_small |>mutate(tarrif_frequency =replace_na(tarrif_frequency,'None'))#those who have NA on consideration for setting the tarrif are the ones that drink from boreholes that do not have tarrif systemborehole_malawi_small <- borehole_malawi_small |>mutate(tariff_costs_consider =replace_na(tariff_costs_consider,'No tarrif system set'))
I change the values below because they will go into regression model
Binary
# Replace Yes/No with 1/0 in multiple columnsborehole_malawi_small <- borehole_malawi_small %>%mutate_at(vars(available_service_provider:bh_functional), ~ifelse(. =="Yes", 1, 0))