Thoughts After playing with Open AI's DALL·E 2
Today, I played with DALL·E 2 for about an hour. For those not familiar, DALL·E 2 is the second generation of DALL·E, a text to image AI.
There's a lot of great articles on what DALLE 2 is, in fine detail. This isn't one of those articles. :) If you want to read more from Open AI themselves, check it out here.
The short version is that DALLE has learned a lot about the relationship between descriptions of images, and the visual appearance of an image. It is able to generate images de novo, just from a text prompt.
A few things struck me right off the bat. First, it immediately struck me as wildly impressive. I had seen preview images, so my expectations were very high. It fully met them.
DALL·E 2 is amazing at taking complex props which mix concepts in away that would challenge even a skilled human artist, and making aesthetically pleasing images out of them. Is this exactly what I was picturing when I wrote the prompt? No.
Is it a wildly impressive take on a robot and sentient mushroom shaking hands? Absolutely. It even is holding a bouquet of flowers, as an added flourish.
DALLE was trained on a wide variety of images and text, and it certainly shows. It is every bit as capable of generating complex paintings like the one below (featuring a battle between the world-serpent and the god of war).
On its own, it added splashes of water, a third figure holding back the serpent, and a bizzare pair of weapons, set in a surreal fantasy landscape. If I saw this painting in a museum, or in an art gallery, I would not bat an eye.
DALLE struggles with the relationships between objects. It struggles mightily with placement of objects. In the image above, the way the weapons are held is not quite normal. In other images (sadly not saved), DALLE generated lightsabers held by the blade, daggers held by the blade, and spears broken in half with each side held separately.
The same happened with other relational issues. It doesn't quite understand that puppets should have strings coming out of their arms.
DALLE also can't distinguish well between multiple subjects. When told to generate a bear riding a knight, it (amusingly) generated a bear knight, riding another bear (presumably) into battle.
When asked to create a man made out of balloons holding balloons made of steel, it generated a steel statue of a man with balloons. Very cool, but definitely different.
This is mainly in regards to policies. DALLE (sensibly) has a lot of banned words, or restricted concepts, in a (probably futile) attempt to prevent it from being used as a tool to generate pornography, deepfakes and violent images.
Because of this, I ran into some issues where I received either a warning about my prompt (without specifying which word triggered the warning), or an uninformative error message. This triggered for words like "killing," but not words like "war." It also triggered for "child," (but not boy, weirdly), and seemed to trigger for "scifi..." for some reason, although it allowed scifi in a different prompt.
It also mixes up some common words that can have double meanings...
Basically, use common sense, and also a bit of intuition, to avoid getting ban-hammered. :)
You can join the waitlist for DALLE access here! It's a lot of fun to play with. :)