Saturday, November 24, 2012

Learning from imbalanced data

I recently read an awesome review article: He & Garcia, 2009. Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering 21 (9): 1263--1284. I knew that imbalanced data, data with "the presence of underrepresented data and severe class distribution skews", was problematic and knew a few ways to deal with it. However, I didn't realize quite the variety of approaches, the magnitude of active research effort or even that there were several conferences solely around this topic.

If you are building a classifier with underrepresented data, I highly recommend reading this article. It covers the problem per se, some different approaches, metrics and an extensive literature review.

