Researchers at artificial intelligence (AI) company OpenAI have engineered two deep learning models, CLIP (Contrastive Language-Image Pre-training) and DALL·E (named for artist Salvador Dalí and Pixar’s WALL·E), combining language and images to improve AI's understanding of text and to what it refers.
CLIP, trained to recognize objects from Internet images and accompanying captions, predicts which caption from a random selection of 32,768 is correct for a given image by linking a wide variety of objects with their names and descriptive words.
DALL·E, also trained on online text-image pairs, produces matching images when presented with a short natural-language caption.
OpenAI's Aditya Ramesh said DALL·E "can take two unrelated concepts and put them together in a way that results in something kind of functional," like an "avocado armchair."
From MIT Technology Review
View Full Article
Abstracts Copyright © 2021 SmithBucklin, Washington, DC, USA
No entries found