Exploring Label Structure and Spatial Attention for Fashion Images Classification
In order to make decisions, for instance when purchasing a product, people rely on rich and accurate descriptions, which entail multi-label retrieval processes. However, multi-label classification is challenged by high dimensional and complex feature spaces and its dependency on large and accurately annotated datasets. Deep learning approaches brought a definite breakthrough in performance across numerous machine learning problems, and image classification was, undoubtedly, one of the tasks where these approaches had greater repercussions. In this presentation we will focus on image classification of fashion images, using deep learning approaches to tackle the multi-class/multi-label problems in order to generate rich images descriptions. Fashion datasets are challenging because they include a vast amount of similarly looking images and they are annotated with a large diversity of attributes but with few labels per exemplar. To address the previous issues we explore domain knowledge to constrain the (otherwise completely data-driven) solutions. Specifically, we first show how to incorporate knowledge about annotations structure. Secondly, we use context and semantic localization to guide an attention mechanism that designs the feature space by focusing on visually meaningful regions. We show with thorough experimentation the performance gains achieved for both cases.