ML-Inception: understanding where and why models work (and don’t work)
A subgroup discovery-based method has recently been proposed to understand the behavior of models in the (original) feature space. The subgroups identified represent areas of feature space where the model obtains better or worse predictive performance than on average. For instance, in the marketing domain, the approach extracts subgroups such as: for customers with higher income and who are younger, the random forest achieves higher accuracy than on average. Here, we propose the use of metalearning to analyze those subgroups on the metafeature space, where they are characterized in a domain-independent way, using statistical and information theoretic properties. We then use association rules to relate characteristics of the subgroups to improvement or degradation of the performance of models. For instance, in the same domain, the approach extracts rules such as: when the class entropy decreases and the mutual information increases in the subgroup data, the random forest achieves lower accuracy. We illustrate the approach with some empirical results.