### abstract ###
While many models of biological object recognition share a common set of broad-stroke properties, the performance of any one model depends strongly on the choice of parameters in a particular instantiation of that model e.g., the number of units per layer, the size of pooling kernels, exponents in normalization operations, etc. Since the number of such parameters is typically large and the computational cost of evaluating one particular parameter set is high, the space of possible model instantiations goes largely unexplored.
Thus, when a model fails to approach the abilities of biological visual systems, we are left uncertain whether this failure is because we are missing a fundamental idea or because the correct parts have not been tuned correctly, assembled at sufficient scale, or provided with enough training.
Here, we present a high-throughput approach to the exploration of such parameter sets, leveraging recent advances in stream processing hardware.
In analogy to high-throughput screening approaches in molecular biology and genetics, we explored thousands of potential network architectures and parameter instantiations, screening those that show promising object recognition performance for further analysis.
We show that this approach can yield significant, reproducible gains in performance across an array of basic object recognition tasks, consistently outperforming a variety of state-of-the-art purpose-built vision systems from the literature.
As the scale of available computational power continues to expand, we argue that this approach has the potential to greatly accelerate progress in both artificial vision and our understanding of the computational underpinning of biological vision.
### introduction ###
The study of biological vision and the creation of artificial vision systems are naturally intertwined exploration of the neuronal substrates of visual processing provides clues and inspiration for artificial systems, and artificial systems, in turn, serve as important generators of new ideas and working hypotheses.
The results of this synergy have been powerful: in addition to providing important theoretical frameworks for empirical investigations, biologically-inspired models are routinely among the highest-performing artificial vision systems in practical tests of object and face recognition CITATION CITATION .
However, while neuroscience has provided inspiration for some of the broad-stroke properties of the visual system, much is still unknown.
Even for those qualitative properties that most biologically-inspired models share, experimental data currently provide little constraint on their key parameters.
As a result, even the most faithfully biomimetic vision models necessarily represent just one of many possible realizations of a collection of computational ideas.
Truly evaluating the set of biologically-inspired computational ideas is difficult, since the performance of a model depends strongly on its particular instantiation the size of the pooling kernels, the number of units per layer, exponents in normalization operations, etc. Because the number of such parameters is typically large, and the computational cost of evaluating one particular model is high, it is difficult to adequately explore the space of possible model instantiations.
At the same time, there is no guarantee that even the correct set of principles will work when instantiated on a small scale.
Thus, when a model fails to approach the abilities of biological visual systems, we cannot tell if this is because the ideas are wrong, or they are simply not put together correctly or on a large enough scale.
As a result of these factors, the availability of computational resources plays a critical role in shaping what kinds of computational investigations are possible.
Traditionally, this bound has grown according to Moore's Law CITATION, however, recently, advances in highly-parallel graphics processing hardware have disrupted this status quo for some classes of computational problems.
In particular, this new class of modern graphics processing hardware has enabled over hundred-fold speed-ups in some of the key computations that most biologically-inspired visual models share in common.
As is already occurring in other scientific fields CITATION, CITATION, the large quantitative performance improvements offered by this new class of hardware hold the potential to effect qualitative changes in how science is done.
In the present work, we take advantage of these recent advances in graphics processing hardware CITATION, CITATION to more expansively explore the range of biologically-inspired models including models of larger, more realistic scale.
In analogy to high-throughput screening approaches in molecular biology and genetics, we generated and trained thousands of potential network architectures and parameter instantiations, and we screened the visual representations produced by these models using tasks that engage the core problem of object recognition tolerance to image variation CITATION CITATION, CITATION, CITATION.
From these candidate models, the most promising were selected for further analysis.
We show that this large-scale screening approach can yield significant, reproducible gains in performance in a variety of basic object recognitions tasks and that it holds the promise of offering insight into which computational ideas are most important for achieving this performance.
Critically, such insights can then be fed back into the design of candidate models, further guiding evolutionary progress.
As the scale of available computational power continues to expand, high-throughput exploration of ideas in computational vision holds great potential both for accelerating progress in artificial vision, and for generating new, experimentally-testable hypotheses for the study of biological vision.
