Sign In

Communications of the ACM

ACM Careers

Deep Learning Networks Prefer the Human Voice

View as: Print Mobile App Share:
dog image and the word 'dog'

Neural network image classification systems might reach higher levels of performance if they are programmed with sound files of human language rather than with numerical label representations of photos, according to a study by Professor Hod Lipson and researchers at Columbia University.

The researchers discovered in a side-by-side comparison that a neural network whose "training labels" consisted of sound files reached higher levels of performance in identifying objects in images, compared to another network that had been programmed in a more traditional manner using binary inputs.

"Our findings run directly counter to how many experts have been trained to think about computers and numbers," says researcher Boyuan Chen.

The team describes its work in "Beyond Categorical Label Representations for Image Classification," to be presented in May at ICLR 2021, the Ninth International Conference on Learning Representations.

From Columbia University
View Full Article


No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account