Do Deep Neural Networks Suffer from Crowding?

Published 26 Jun 2017 in cs.CV | (1706.08616v1)

Abstract: Crowding is a visual effect suffered by humans, in which an object that can be recognized in isolation can no longer be recognized when other objects, called flankers, are placed close to it. In this work, we study the effect of crowding in artificial Deep Neural Networks for object recognition. We analyze both standard deep convolutional neural networks (DCNNs) as well as a new version of DCNNs which is 1) multi-scale and 2) with size of the convolution filters change depending on the eccentricity wrt to the center of fixation. Such networks, that we call eccentricity-dependent, are a computational model of the feedforward path of the primate visual cortex. Our results reveal that the eccentricity-dependent model, trained on target objects in isolation, can recognize such targets in the presence of flankers, if the targets are near the center of the image, whereas DCNNs cannot. Also, for all tested networks, when trained on targets in isolation, we find that recognition accuracy of the networks decreases the closer the flankers are to the target and the more flankers there are. We find that visual similarity between the target and flankers also plays a role and that pooling in early layers of the network leads to more crowding. Additionally, we show that incorporating the flankers into the images of the training set does not improve performance with crowding.

Abstract PDF Upgrade to Chat

Citations (32)

View on Semantic Scholar

Summary

The paper demonstrates that adding similar flankers significantly reduces recognition accuracy due to pooling integration effects.
It compares classical DCNNs with eccentricity-dependent models, highlighting improved robustness in architectures mimicking primate vision.
The study indicates that perception-inspired architectural designs can overcome the limitations of clutter training for robust object recognition.

Do Deep Neural Networks Suffer from Crowding?

This essay discusses the research findings on the effects of crowding in Deep Neural Networks (DNNs) as seen in the paper titled "Do Deep Neural Networks Suffer from Crowding?" (1706.08616). The investigation includes both standard Deep Convolutional Neural Networks (DCNNs) and eccentricity-dependent models to analyze crowding effects in artificial systems, paralleling phenomena observed in human visual perception.

Introduction to Crowding in DNNs

Crowding is a phenomenon where an object recognizable in isolation becomes indistinguishable when adjacent to similar objects. This paper explores whether DNNs trained for object recognition suffer from crowding, analogous to the human visual effect. The study incorporates clutter into images to assess how it affects the recognition capabilities of DNNs.

Figure 1: Example image used to test the models, with even MNIST as target and two odd MNIST flankers.

The eccentricity-dependent model introduced in this paper mimics the primate visual cortex by incorporating multi-scale levels where convolution filter sizes increase with eccentricity from the center of fixation. This model aims to achieve object recognition robustness in cluttered environments.

Models Under Experimentation

Classical DCNNs

The study examines DCNN architectures with varying pooling strategies across spatial domains. Three pooling configurations are analyzed: no total pooling, progressive pooling, and at end pooling. These configurations differ in how rapidly they decrease feature maps across layers.

Figure 2: DCNN architectures with three convolutional layers and one fully connected layer, investigating the role of pooling in crowding.

Eccentricity-Dependent Models

Inspired by biological systems, these models increase in scale and are defined by an inverted pyramid-shaped sampling hierarchy where receptive field sizes grow with eccentricity. This structure aims for scale and translation invariance in object representation.

Figure 3: Eccentricity-dependent model inverted pyramid with sampling points, each representing a different receptive field size.

Experimental Setup and Results

Training with Isolated Targets

The models were first trained with isolated target objects. Testing involved introducing varying numbers and configurations of flankers to observe recognition accuracy. The results demonstrated that as flankers were added, especially when similar to target objects, recognition diminished significantly due to pooling integration effects.

Impact of Pooling and Flanker Configurations

The study highlights that configurations such as at end pooling in eccentricity-dependent models offered significant robustness to clutter. Models with pooling configurations that allowed early integration of multiple scales showed reduced performance against flankers. Incorporation of training data including cluttered environments failed to generalize, justifying the need for robust architecture designs rather than exhaustive training datasets.

Practical and Theoretical Implications

The research concludes that while training with clutter does not inherently improve robustness, architectural insights from human perception, such as eccentricity dependency, can lead to more resilient models. The findings suggest advantages in coupling an eccentricity-dependent model with a system for dynamic fixation (eye-like movements) to improve practical applications in variable clutter conditions.

Conclusion

The exploration of crowding effects in DNNs reveals significant insights related to model architecture and perception-inspired designs. This study emphasizes architectural enhancements as a preferable means over extensive clutter data-training. Future research may focus on integrating fixation mechanisms for improved recognition in complex real-world environments.

Markdown Report Issue