What are the obstacles for compositional images generation from text (text-to-image)?

Asked Mar 26 '24 at 11:36

Active Mar 26 '24 at 11:48

Viewed 15 times

I've tried to find the answer and could not find it. Why still no model (AFAIK) is capable of correct compositional images generation from text? I.e. "simple" (but uncommon) compositions of objects connected with prepositions under/in/on/...

P.S. I suspect some might argue that this question is not answerable so not allowed here. But papers discussing difficulties of compositional images generation might had been published even as failures are usually not published in the current scientific community. E.g. AI alignment is not solved, but there is a lot of material on that topic.

edited Mar 26 '24 at 11:48

asked Mar 26 '24 at 11:36

Alex Martian

What are the obstacles for compositional images generation from text (text-to-image)?

0 Answers0