Thoughts After playing with Open AI's DALL·E 2

"A sentient mushroom shaking hands with a robot in the middle of a forest, in the style of impressionism, van gogh"
DALLE's take on "a computer made of stone from the neolithic era, hyperrealism, black and white"

Today, I played with DALL·E 2 for about an hour.  For those not familiar, DALL·E 2 is the second generation of DALL·E, a text to image AI.

There's a lot of great articles on what DALLE 2 is, in fine detail.  This isn't one of those articles. :)  If you want to read more from Open AI themselves, check it out here.

The short version is that DALLE has learned a lot about the relationship between descriptions of images, and the visual appearance of an image.  It is able to generate images de novo, just from a text prompt.  

The Good:

A few things struck me right off the bat.  First, it immediately struck me as wildly impressive.  I had seen preview images, so my expectations were very high.  It fully met them.

"a sentient mushroom shaking hands with a robot in the middle of a forest, in the style of impressionism, van gogh"

DALL·E 2 is amazing at taking complex props which mix concepts in away that would challenge even a skilled human artist, and making aesthetically pleasing images out of them.  Is this exactly what I was picturing when I wrote the prompt?  No.  

Is it a wildly impressive take on a robot and sentient mushroom shaking hands?  Absolutely.  It even is holding a bouquet of flowers, as an added flourish.

DALLE was trained on a wide variety of images and text, and it certainly shows.  It is every bit as capable of generating complex paintings like the one below (featuring a battle between the world-serpent and the god of war).

"Kratos battle against Jörmungandr painting"

On its own, it added splashes of water, a third figure holding back the serpent, and a bizzare pair of weapons, set in a surreal fantasy landscape.  If I saw this painting in a museum, or in an art gallery, I would not bat an eye.  

The Bad:

Object placement

DALLE struggles with the relationships between objects.  It struggles mightily with placement of objects.  In the image above, the way the weapons are held is not quite normal.  In other images (sadly not saved), DALLE generated lightsabers held by the blade, daggers held by the blade, and spears broken in half with each side held separately.

The same happened with other relational issues.  It doesn't quite understand that puppets should have strings coming out of their arms.  

Object confusion

DALLE also can't distinguish well between multiple subjects.  When told to generate a bear riding a knight, it (amusingly) generated a bear knight, riding another bear (presumably) into battle.  

Bear Necessities

When asked to create a man made out of balloons holding balloons made of steel, it generated a steel statue of a man with balloons.  Very cool, but definitely different.  

The Weird:

This is mainly in regards to policies.  DALLE (sensibly) has a lot of banned words, or restricted concepts, in a (probably futile) attempt to prevent it from being used as a tool to generate pornography, deepfakes and violent images.

Because of this, I ran into some issues where I received either a warning about my prompt (without specifying which word triggered the warning), or an uninformative error message.  This triggered for words like "killing," but not words like "war."  It also triggered for "child," (but not boy, weirdly), and seemed to trigger for "scifi..." for some reason, although it allowed scifi in a different prompt.

It also mixes up some common words that can have double meanings...

"a navy seal wearing a mechanical exoskeleton holding a pickaxe while standing on a mountain, digital art, steampunk"

Basically, use common sense, and also a bit of intuition, to avoid getting ban-hammered. :)

You can join the waitlist for DALLE access here!  It's a lot of fun to play with. :)

