Kenyan Household Bands Classifier

classification

prediction

MTI

Predicting the Household Economic Bands Into Which University Students Fall for Award of Financial Support from the Kenyan Government.

Author

Cornelius Tanui

Published

September 6, 2024

0.1 University Funding in Kenya

0.1.0.1 My Shallow Thoughts on MTI

When I first heard of MTI, my immediate thought was that the government of Kenya had finally embraced Artificial Intelligence on a larger scale and decided to award university students scholarships based on economic bands decided by some novel AI algorithm. Think, a combination of classification algorithms of ‘high compute, high repute’. A pleasant thought, right? No.

No because, later on, I searched for MTI online and found out that it stands for ‘means testing instrument’, and if you are deep into data, you would think ‘means’ is hereby used to denote average. See, ‘testing of means’ is not remotely uncommon, we come across it all the time in data analytics. T-test is a test of means. However, ‘means’ in the context of MTI stands for resources, or assets’, that a student has access to that could be used to fund their higher education. ‘Means’ can be a confusing word. ‘Means of transport’, ‘by all means’, etc.

As it turned out, MTI is a widely used concept in the education and social protection sectors, and I was embarrassingly waaay off in thinking that it had something to do with statistical averages.

I was waaay off in yet, yet again, in a different aspect – the students joining university in September, 2024 as first-years/freshmen had never been banded before, and therefore, there would be no training data for my imagined AI model! This is the first time banding is happening in Kenya, as regards university funding, so maybe, there will be (enough) data to train a model in 2025, so that the freshmen of 2025 will have successfully been banded by AI.

0.1.0.2 Background to MTI

Globally, MTI has been around for a while now, with the first documented use in 1930s involving provision of relief to households by governments. If a home was deemed able to support itself by the source of income it had, the the government benefits were stopped or reduced¹. MTI has since been heavily employed in the social protection to provide targeted anti-poverty benefits to households, civil legal aid to individuals², communities, and geographies. The obvious reason for preference of MTI to universal provision of support – such as universal basic income – is that MTI offers the support to targeted beneficiaries, because with the universal approach, there may be recipients who do not genuinely require it³.

In Kenya, MTI has been used for a long time to identify households in marginalized communities that are eligible for benefit from cash transfers⁴ under the National Safety Net Programmes (NSNP). One such safety programme is the Hunger Safety Net Programme (HSNP) that supports old persons, orphans and vulnerable children, and persons with severe disability.

Literature indicates that MTI has worked successfully so far in Kenya as implemented under NSNP, yet it is not without shortcomings. For example, the popular controversy around it is, it discourages the target population from engaging in financial savings⁵, consequently promoting poverty, a concept known as poverty trap⁶. MTI sustained an unmitigated uproar over it’s banding inaccuracy⁷ that led to placement of students from poor backgrounds into higher bands that require them to dig deep into their pockets to fill the gap, pockets which they either do not have, or are torn. The bands range from 1 (least able) to 5 (most able.)

Under the hood, MTI is mainly a regression model – such as a tobit model – that aggregates various variables together and provides a value³ which is then compared to a threshold that determines whether the candidate qualifies for the benefit, or does not. Principal components analysis models have also been deployed to this cause⁴.

Now, let us explore how machine learning (ML) classifier could be used as an alternative to MTI to award financial support to university students in Kenya.

0.2 ML Approach to Household Banding

It goes without saying that a student requires a couple of lessons before sitting an exam, so does a ML model require massive – yet meticulous – training before it can be deployed for use, as noted in Section 0.1.0.1.

It is not too clear how and which factors were considered to create the 5 bands, although gross family income, geographical location poverty probability index, special circumstances such as orphans and students with disability, number of dependents, program costs, and gender are some of the variables that have been mentioned⁸. Because I do not have readily available data covering these variables, I am going to simulate them and use R⁹, {tidyverse}, {tidymodels} and other R packages to develop a data processing, modelling, and prediction pipeline using a ML multi-class classification (MCC)¹⁰ model of our choice. Note that the outcome should be in discrete ordinal scale.

0.2.0.1 Data Simulation

Simulation of these data is outside the scope of this article and will be covered in a later post.

The table below shows the properties of the variables are;

Table 1: Properties of variables

Variable	Data type	Distribution
Bands	Ordinal
Gross family income	x ∈ ℝ⁺
Geographical location	Nominal
Poverty probability index	x ∈ ℝ⁺
Special circumstances such as orphans	Binary
Students with disability	Binary
Number of dependents	x ∈ ℕ⁺
Program costs	x ∈ ℝ⁺
Gender	Nominal

0.2.0.2 Descrpitive Analysis

Disclaimer!

The data is simulated, and therefore substantially differs with the actual scenario! The data is meant for learning purposes only, and the statistical estimates reported MUST NOT be taken as true reflection of the real-word situation.

Code

## load packages
library(xlsx)
library(here)

library(tidyverse)
library(tidymodels)

The simulated data looks like this;

Code

# table display setup
#| label: tbl-simulated_data .striped .hover .primary .bordered
#| tbl-cap: "Simulated data"
#| tbl-cap-location: bottom 

# load data
simulated_data <- readRDS(here::here("./Data/simulated_data.rds"))

# view data (printed on your browser)
knitr::kable(head(x = simulated_data, n = 5))

Bands	Gross_Family_Income	Geographical_Location	Poverty_Probability_Index	Orphans	Number_of_Dependents	Program_Costs_KES	Gender
2	18621	36	0.1751991	1	4	510735.4	0
2	6235	39	0.2081443	1	5	466547.2	0
3	33812	33	0.5994036	0	3	458893.0	0
4	30154	22	0.3511708	0	3	554462.2	0
2	27145	14	0.1675043	0	5	464203.2	1

0.2.0.3 Model Creation

Model Fitting

Model Diagnostics

Band Prediction

0.3 Conclusion

Footnotes

van Oorschot, W. J. H., & Schell, J. (1991). Means-testing in Europe: A growing concern. In M. Adler, C. Bell, J. Clasen, & A. Sinfield (Eds.), The sociology of social security (pp. 187-211). (Edinburgh education and society series). Edinburgh University Press.↩︎
https://www.gov.uk/guidance/criminal-legal-aid-means-testing↩︎
Brown, C., Ravallion, M., & Van de Walle, D. (2016). A poor means test. Econometric targeting in Africa. The World Bank.↩︎
Villa, Juan M. [2016] A harmonised proxy means test for Kenya’s National Safety Net programme. GDI Working Paper 2016-003. Manchester: The University of Manchester.↩︎
Elizabeth T. Powers,Does means-testing welfare discourage saving? evidence from a change in AFDC policy in the United States, Journal of Public Economics, Volume 68, Issue 1, 1998, Pages 33-53, ISSN 0047-2727, https://doi.org/10.1016/S0047-2727(97)00087-X. (https://www.sciencedirect.com/science/article/pii/S004727279700087X)↩︎
Kraay, Aart, and David McKenzie. 2014. “Do Poverty Traps Exist? Assessing the Evidence.” Journal of Economic Perspectives, 28 (3): 127–48.↩︎
https://www.citizen.digital/news/govt-explains-why-many-students-miss-out-on-scholarships-under-the-new-funding-model-n348207↩︎
https://kafu.ac.ke/images/2022/Academics/nfm/NEW_FUNDING_MODEL_-_6TH_AUGUST_2024.pdf↩︎
R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.↩︎
Kook, L., Herzog, L., Hothorn, T., Dürr, O., & Sick, B. (2022). Deep and interpretable regression models for ordinal outcomes. Pattern Recognition, 122, 108263.↩︎