Researchers at OpenAI, a non-profit AI research company, developed two new neural networks that can accurately identify and describe images as well as create images based on natural-language user prompts.

“We believe that these neural networks represent a meaningful step toward multimodal AI systems,” says the company’s announcement.

One of them, called DALL-E, creates images, photographs, and renders from a simple text description. Its name is a portmanteau of the name of the famous artist Salvador Dali (Dali) and the equally famous robot WALL-E, the hero of the cartoon of the same name. That is, something in between a robot and a human.

The second one, called CLIP, identifies objects and classifies them based on a simple description.

The images offered by DALL-E are generated by the neural network independently. It is a miniaturized version of the GPT-3 (Generative Pre-trained Transformer 3), a transformer language model developed by OpenAI that generates human-like text and even software code using pre-trained algorithms and deep learning. In this case, a text-image pair is used, presented as a sequence of "tokens" from a certain alphabet.

An illustration of a baby daikon radish in a tutu walking a dog
An illustration of a baby daikon radish in a tutu walking a dog | Image: OpenAI

The neural network can create images of clothes, interior, furniture, food, or animals. For example, it can generate images for queries like “an illustration of a baby daikon radish in a tutu walking a dog” or “an armchair in the shape of an avocado.”

A capybara made of voxels sitting in a field
A capybara made of voxels sitting in a field | Image: OpenAI

Another neural network developed by the company, CLIP, is able to recognize what is shown in the image based on the entire description, not a single-word tag.

“Deep learning needs a lot of data, and vision models have traditionally been trained on manually labeled datasets that are expensive to construct and only provide supervision for a limited number of predetermined visual concepts. In contrast, CLIP learns from text-image pairs that are already publicly available on the internet,” detailed CLIP researchers.

The presented OpenAI projects expand the range of tasks that artificial intelligence can be applied to. This powerful technology can also grow into something even more fabulous and exciting in the future.