Kenyan Household Bands Classifier

R
classification
prediction
MTI
Predicting the Household Economic Bands Into Which University Students Fall for Award of Financial Support from the Kenyan Government.
Author
Published

September 6, 2024

0.1 University Funding in Kenya

0.1.0.1 My Shallow Thoughts on MTI

When I first heard of MTI, my immediate thought was that the government of Kenya had finally embraced Artificial Intelligence on a larger scale and decided to award university students scholarships based on economic bands decided by some novel AI algorithm. Think, a combination of classification algorithms of ‘high compute, high repute’. A pleasant thought, right? No.

No because, later on, I searched for MTI online and found out that it stands for ‘means testing instrument’, and if you are deep into data, you would think ‘means’ is hereby used to denote average. See, ‘testing of means’ is not remotely uncommon, we come across it all the time in data analytics. T-test is a test of means. However, ‘means’ in the context of MTI stands for resources, or assets’, that a student has access to that could be used to fund their higher education. ‘Means’ can be a confusing word. ‘Means of transport’, ‘by all means’, etc.

As it turned out, MTI is a widely used concept in the education and social protection sectors, and I was embarrassingly waaay off in thinking that it had something to do with statistical averages.

I was waaay off in yet, yet again, in a different aspect – the students joining university in September, 2024 as first-years/freshmen had never been banded before, and therefore, there would be no training data for my imagined AI model! This is the first time banding is happening in Kenya, as regards university funding, so maybe, there will be (enough) data to train a model in 2025, so that the freshmen of 2025 will have successfully been banded by AI.

0.1.0.2 Background to MTI

Globally, MTI has been around for a while now, with the first documented use in 1930s involving provision of relief to households by governments. If a home was deemed able to support itself by the source of income it had, the the government benefits were stopped or reduced1. MTI has since been heavily employed in the social protection to provide targeted anti-poverty benefits to households, civil legal aid to individuals2, communities, and geographies. The obvious reason for preference of MTI to universal provision of support – such as universal basic income – is that MTI offers the support to targeted beneficiaries, because with the universal approach, there may be recipients who do not genuinely require it3.

In Kenya, MTI has been used for a long time to identify households in marginalized communities that are eligible for benefit from cash transfers4 under the National Safety Net Programmes (NSNP). One such safety programme is the Hunger Safety Net Programme (HSNP) that supports old persons, orphans and vulnerable children, and persons with severe disability.

Literature indicates that MTI has worked successfully so far in Kenya as implemented under NSNP, yet it is not without shortcomings. For example, the popular controversy around it is, it discourages the target population from engaging in financial savings5, consequently promoting poverty, a concept known as poverty trap6. MTI sustained an unmitigated uproar over it’s banding inaccuracy7 that led to placement of students from poor backgrounds into higher bands that require them to dig deep into their pockets to fill the gap, pockets which they either do not have, or are torn. The bands range from 1 (least able) to 5 (most able.)

Under the hood, MTI is mainly a regression model – such as a tobit model – that aggregates various variables together and provides a value3 which is then compared to a threshold that determines whether the candidate qualifies for the benefit, or does not. Principal components analysis models have also been deployed to this cause4.

Now, let us explore how machine learning (ML) classifier could be used as an alternative to MTI to award financial support to university students in Kenya.

0.2 ML Approach to Household Banding

It goes without saying that a student requires a couple of lessons before sitting an exam, so does a ML model require massive – yet meticulous – training before it can be deployed for use, as noted in Section 0.1.0.1.

It is not too clear how and which factors were considered to create the 5 bands, although gross family income, geographical location poverty probability index, special circumstances such as orphans and students with disability, number of dependents, program costs, and gender are some of the variables that have been mentioned8. Because I do not have readily available data covering these variables, I am going to simulate them and use R9, {tidyverse}, {tidymodels} and other R packages to develop a data processing, modelling, and prediction pipeline using a ML multi-class classification (MCC)10 model of our choice. Note that the outcome should be in discrete ordinal scale.

0.2.0.1 Data Simulation

Simulation of these data is outside the scope of this article and will be covered in a later post.

The table below shows the properties of the variables are;

Table 1: Properties of variables
Variable Data type Distribution
Bands Ordinal
Gross family income x ∈ ℝ+
Geographical location Nominal
Poverty probability index x ∈ ℝ+
Special circumstances such as orphans Binary
Students with disability Binary
Number of dependents x ∈ ℕ+
Program costs x ∈ ℝ+
Gender Nominal
0.2.0.2 Descrpitive Analysis
Disclaimer!

The data is simulated, and therefore substantially differs with the actual scenario! The data is meant for learning purposes only, and the statistical estimates reported MUST NOT be taken as true reflection of the real-word situation.

Code
## load packages
library(xlsx)
library(here)

library(tidyverse)
library(tidymodels)

The simulated data looks like this;

Code
# table display setup
#| label: tbl-simulated_data .striped .hover .primary .bordered
#| tbl-cap: "Simulated data"
#| tbl-cap-location: bottom 

# load data
simulated_data <- readRDS(here::here("./Data/simulated_data.rds"))

# view data (printed on your browser)
knitr::kable(head(x = simulated_data, n = 5))
Bands Gross_Family_Income Geographical_Location Poverty_Probability_Index Orphans Disability Number_of_Dependents Program_Costs_KES Gender
2 18621 36 0.1751991 1 0 4 510735.4 0
2 6235 39 0.2081443 1 0 5 466547.2 0
3 33812 33 0.5994036 0 0 3 458893.0 0
4 30154 22 0.3511708 0 0 3 554462.2 0
2 27145 14 0.1675043 0 0 5 464203.2 1
0.2.0.3 Model Creation

Model Fitting

Model Diagnostics

Band Prediction

0.3 Conclusion

Footnotes

  1. van Oorschot, W. J. H., & Schell, J. (1991). Means-testing in Europe: A growing concern. In M. Adler, C. Bell, J. Clasen, & A. Sinfield (Eds.), The sociology of social security (pp. 187-211). (Edinburgh education and society series). Edinburgh University Press.↩︎

  2. https://www.gov.uk/guidance/criminal-legal-aid-means-testing↩︎

  3. Brown, C., Ravallion, M., & Van de Walle, D. (2016). A poor means test. Econometric targeting in Africa. The World Bank.↩︎

  4. Villa, Juan M. [2016] A harmonised proxy means test for Kenya’s National Safety Net programme. GDI Working Paper 2016-003. Manchester: The University of Manchester.↩︎

  5. Elizabeth T. Powers,Does means-testing welfare discourage saving? evidence from a change in AFDC policy in the United States, Journal of Public Economics, Volume 68, Issue 1, 1998, Pages 33-53, ISSN 0047-2727, https://doi.org/10.1016/S0047-2727(97)00087-X. (https://www.sciencedirect.com/science/article/pii/S004727279700087X)↩︎

  6. Kraay, Aart, and David McKenzie. 2014. “Do Poverty Traps Exist? Assessing the Evidence.” Journal of Economic Perspectives, 28 (3): 127–48.↩︎

  7. https://www.citizen.digital/news/govt-explains-why-many-students-miss-out-on-scholarships-under-the-new-funding-model-n348207↩︎

  8. https://kafu.ac.ke/images/2022/Academics/nfm/NEW_FUNDING_MODEL_-_6TH_AUGUST_2024.pdf↩︎

  9. R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.↩︎

  10. Kook, L., Herzog, L., Hothorn, T., Dürr, O., & Sick, B. (2022). Deep and interpretable regression models for ordinal outcomes. Pattern Recognition, 122, 108263.↩︎