Krista, thank you for sharing what you tried and the (weird) results you got. Wow. It does seem odd that the tool doesn't cache any previous images that it can go back to, and that text isn't handled verbatim!
Thank you, Karen. It also is odd that they offer highlighting functionality for correction but the use case they show does not include text. Historically, text recognition in images has been challenging. We’re pretty good at that now. What weird is I’m telling ChatGPT DALL-E what words I want on top of images and by the time the image is rendered, that word and spelling is lost. Are words getting encoded and tokenized along the way? Might that explain it? 🤷🏻♀️If so, it likely needs a more direct route to the output.
If it’s treating the text like pixels after it’s embedded into an image, not as letters with defined shapes, then I can see why the text gets messed up. If you’ve ever added text to a photo in MS Paint, that’s what it does.
Compared to how hard AI already is, IMHO it really shouldn’t be that hard to build a text handling layer for the tool that doesn’t have this serious flaw.
It’s been a tough problem. I have no idea why. It can be really frustrating. I’ve found DALLE3 linked to ChatGPT a different critter than the one via CoPilot, which is more surrealistic. But neither can spell worth a tinker’s damn. I read something on it a few months ago, a spark of optimism that it will arrive by Christmas, but its durability as an issue is sort of amazing when you stop to think about it. I enjoyed your voice in this post!
How interesting that you get different behavior depending on the ChatGPT wrapper, Terry! I haven’t been following image generation as closely as music, but now I’m curious about progress on text in AI-generated images. 🙂
Text generation is notoriously difficult, not just for or with DALL-E.
There also is no text, as in a text box or some font with weight and font size. It is just an image. And due to the diffusion process, it tends to get things wrong most times, pretty reliably.
There are some specialized tools that try to mitigate this.
If you are using the ChatGPT web interface you can try to have it only the parts of the image change that did not work out. Here’s an older example I documented and shared.
I tried highlighting the mistake in DALL·E, and it didn't work. I couldn't get it to correct the misspelling -- when highlighted -- or to simply erase the mistake. It seems you successfully got DALL·E to remove the AI from the image. What was your prompt?
(I wonder if there are preferred verbs such as "delete" instead of "erase" similar to commands that are more likely to get ChatGPT to respond the right way, or preferred nouns such as "letters" instead of "text" . . .)
Krista, thank you for sharing what you tried and the (weird) results you got. Wow. It does seem odd that the tool doesn't cache any previous images that it can go back to, and that text isn't handled verbatim!
Thank you, Karen. It also is odd that they offer highlighting functionality for correction but the use case they show does not include text. Historically, text recognition in images has been challenging. We’re pretty good at that now. What weird is I’m telling ChatGPT DALL-E what words I want on top of images and by the time the image is rendered, that word and spelling is lost. Are words getting encoded and tokenized along the way? Might that explain it? 🤷🏻♀️If so, it likely needs a more direct route to the output.
If it’s treating the text like pixels after it’s embedded into an image, not as letters with defined shapes, then I can see why the text gets messed up. If you’ve ever added text to a photo in MS Paint, that’s what it does.
Compared to how hard AI already is, IMHO it really shouldn’t be that hard to build a text handling layer for the tool that doesn’t have this serious flaw.
Re: treating text like pixels and text handling layer: I learned something today. Appreciate the lesson. Thank you!
It’s been a tough problem. I have no idea why. It can be really frustrating. I’ve found DALLE3 linked to ChatGPT a different critter than the one via CoPilot, which is more surrealistic. But neither can spell worth a tinker’s damn. I read something on it a few months ago, a spark of optimism that it will arrive by Christmas, but its durability as an issue is sort of amazing when you stop to think about it. I enjoyed your voice in this post!
How interesting that you get different behavior depending on the ChatGPT wrapper, Terry! I haven’t been following image generation as closely as music, but now I’m curious about progress on text in AI-generated images. 🙂
Text generation is notoriously difficult, not just for or with DALL-E.
There also is no text, as in a text box or some font with weight and font size. It is just an image. And due to the diffusion process, it tends to get things wrong most times, pretty reliably.
There are some specialized tools that try to mitigate this.
If you are using the ChatGPT web interface you can try to have it only the parts of the image change that did not work out. Here’s an older example I documented and shared.
Ugh, I can’t add an image here. Link to where I posted this: https://www.linkedin.com/posts/nico-appel_chat-gpt-plus-now-allows-image-editing-activity-7184148295991099392-d5O_
I tried highlighting the mistake in DALL·E, and it didn't work. I couldn't get it to correct the misspelling -- when highlighted -- or to simply erase the mistake. It seems you successfully got DALL·E to remove the AI from the image. What was your prompt?
(I wonder if there are preferred verbs such as "delete" instead of "erase" similar to commands that are more likely to get ChatGPT to respond the right way, or preferred nouns such as "letters" instead of "text" . . .)
I mostly got away from even trying anything with text, to be honest.