Saturday, November 24, 2012

Learning from imbalanced data

I recently read an awesome review article: He & Garcia, 2009. Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering 21 (9): 1263--1284. I knew that imbalanced data, data with "the presence of underrepresented data and severe class distribution skews", was problematic and knew a few ways to deal with it. However, I didn't realize quite the variety of approaches, the magnitude of active research effort or even that there were several conferences solely around this topic.

If you are building a classifier with underrepresented data, I highly recommend reading this article. It covers the problem per se, some different approaches, metrics and an extensive literature review.

1 comment:

  1. It is just what I was looking for and quite thorough as well. Thanks for posting this, I saw a couple other similar posts but yours was the best so far. The ideas are strongly pointed out and clearly emphasized. :-)


Note: Only a member of this blog may post a comment.