I have been reviewing the paper PNrule: A new Framework for Learning Classifier Models in Data by Agarwal & Joshi (2000) and the associated technical report. The paper outlines an approach to managing datasets with a severely imbalanced binary outcome which takes an iterative approach to maximise both overall accuracy and precision.
The approach seems promising, and stands apart from the fairly homogenous body of approaches that incorporates cost management and various nearest-neighbour-based methods to delete points or synthesise new ones. The authors also report excellent performance in a Kaggle competition, though other real-world evidence of performance is lacking.
I have been unable to find the algorithm encoded in a package for either Python or R, whether under the name of one of the original authors or otherwise. I was wondering if anyone else was aware of a relevant package. It would be good to know before starting on any coding from scratch.
Similarly, I was wondering whether anyone knew of a reason why this isn't more popular twenty years on. Perhaps the clue is in it having more of a feel of a framework rather than something you can just do as a preliminary step in an existing data science pipeline.
Many thanks.
PS I can see there is a stream of thought that people experiencing problems with sensitivity scores due to data imbalances are suffering from an overactive imagination. I would like to be clear that the problems I am trying to solve are real.