Despite relatively high performance, an image-by-image review of errors suggests that the baseline model still faces significant challenges. From our previous post, we see that most of the challenges occur when the model to classifies images taken at non-conventional angles (ex: angles with only a portion of the car in the image), and images that are very similar to one another in the label space (ex: an Audi S5 Coupe 2012 being misclassified as an Audi S6 Sedan 2011).
Some of the main difficulties our baseline model faces stem from the characteristics of the Stanford Cars dataset. The dataset consists of 16,185 high resolution photos of cars spanning 196 granular labels distinguished by Car, Make, Year with each class approximately 0.5% of the whole. Its complexity arises from the specificity of the dataset, the small amount of data available per label, and the amount of data available as a whole. Essentially, the dataset is small compared to the number of labels, which means that our model may not be learning robust, generalizable features for each label. We will focus on mitigating the effects of a small dataset to number of labels ratio.
Full article: https://mlconf.com/blog/black-box-image-augmentation-for-better-classification/