Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
DALL-E + GPT-3 = ♥ (medium.com/glan1k)
97 points by GLaNIK on Aug 7, 2022 | hide | past | favorite | 39 comments


I made a quick comparison with midjourney, without going as much into picking pictures.

https://hackmd.io/@einarmagnus/midjourney-hallucinations

It's quite fascinating to play around with this tool. Like user `stared` mentions below, there are words like epic, realistic etc that you can add to shape the pictures quite a lot. Auto-generated text like this is not so great at producing interesting results.


For these prompts, I like the ones generated by Midjouney much more. While they are less diverse, they are more consistent for a digital art style. Thank you for making this comparison!


Firstly, it's cool. But...

The generated images are underwhelming. Other than the snake king one, they all kind of look the same album art look. The generated text also seems like it's trying to have some cohesive theme to it, but doesn't bring it together like one might expect in a short story. And the writing style is filled with cliches.

I can see it being good for inspiration, but it's pretty rough for final consumption.


The city was especially off. It was literally just a city. I realize this would be an absurd complaint just a few years ago


Midjourney seems to have better hunch for artistic drama. Though I'm not very satisfied with the AI outcomes, probably because they don't watch movies(yet!).

Here is the output for the first prompt[0]: https://imgur.com/a/18NRrx4

[0]You see a figure in the distance, waving at you. As you get closer, you realize that the figure is your doppelganger. Your doppelganger is wearing your clothes and has your face, but their eyes are black voids. You hear a voice in your head that says, “We are one. You are me and I am you.”


I have noticed this too. Midjourney seems to have poorer quality and can't really do face but is vastly more "creative".

I wonder if there is a trade off between a larger data set so more accurate and how creative it can be.


Wow, OpenAI should be truly taught in business schools around the world as a case study in product marketing. Despite the dozens or hundreds of similar tools, models, colab notebooks, etc. created by a wide flourishing community, these two products are top of mind for everyone. Both terribly disappointing and envy inspiring for us in the field!


GPT-3 and DALL-E (by my subjective opinion) feel much better than other models that their respective tasks.

DALL-E in particular kind of blows the VQGAN+CLIP messing around I've done out of the water. GPT-3 feels markedly better than other text generation or chatbots I've tried.

Definitely these are well marketed, and not the only models aorund, but they also feel ahead of other things I've tried. Can you point out some of these other tools/models?


It's not as simple as "x is better than y". They all have their own flavours. I'd had results from JAX CLIP Guided Diffusion that I can't get from anything else and some of my early experiments with Disco Diffusion have a quality that is unique. I think people will always mix and match models due to their unique qualities.

Having said that I'm on the beta for Stable Diffusion ( https://stability.ai/ ) and it's remarkably capable across a broad range of styles. Dall-E probably still has the edge for more complex semantic prompts and photographic coherance but it's very good and it's got a very open strategy.


OpenAI was the first on GPT-level text generation, as it was (I think) on DALL-E level image generation. It's winner takes all rather than marketing.


Google has similar models and could compete with OpenAI with both a GPT-3 and DALL-E equivalent if they wanted to.

tbh I'm surprised they don't.


They already have, they’ve developed two projects for AI image generation: Imagen and Parti. Sadly both are not open to public yet.


That's what I meant.


I’ll dare make a prediction: Transformers will be declared a dead end in AI in few years and GPT-4 might be the last in line. The method is able to produce highly convincing gimmicks (which is what it comes to once initial excitement splashes) but that is not how art or intelligence works.


> The method is able to produce highly convincing gimmicks (which is what it comes to once initial excitement splashes) but that is not how art or intelligence works.

Uh, art is often about reproducing highly convincing gimmicks.

See https://en.wikipedia.org/wiki/Realism_(arts)

I think you mean that Transformers will become kitsch, which probably.


We've already seen this to some extent with GANs - which now look somewhat outdated in comparison with Dall-e and others


Language model != Transformers. The scaling power of the transformer architecture allows LLMs to exist, but LMs are just one application of transformers. The architecture itself is extremely generic and is applicable to nearly every problem.

I'm pretty certain whatever comes after GPT-4 will still be using layered self-attention.


Everything is eventually called a dead end in AI, no? Once it has moved from the bleeding edge to something the public uses, the limitations become known and everyone says, "of course a computer can do it, but that's not intelligence."


That's because we aren't building systems capable of self-analysis and awareness. As far as I'm aware, there are only AI/ML systems capable of analyzing some data and giving some output with little to no nuance


no, the OP says it in a different sense, namely that even if you are duly impressed by these advancements, they will stop in a few years and a new line of research will be needed


Prompts matter - here, it is interesting to see the cascade of prompts, i.e. from ones you pass to GPT3 to those generated by it.

I also like the topic choice - getting DALLE2 esoteric, dreamy, surreal. I find generating things that do not exist more interesting - as it stretches the limits of AI creativity and imagination. (Plug: I generated quite a few religious, symbolic, and esoteric images, both pleasant and gloomy, https://pmigdal.medium.com/dall-e-2-and-transcendence-3a3a40...)


Checkout https://text-generator.io for an api compatible switch from gpt-3 for generating text or code but at a reasonable price, there’s some upcoming alternatives got Dall-e too like disco diffusion or stability.ai one day would be a easy switch to a cheaper or better quality ( or both) service from dall-e so keep an eye out for competition. There’s monopoly level pricing in openai right now so be careful y’all


In addition their M-x psychoanalyze-zippy energy, these kind of remind me of the creepypasta-fodder PlayStation game LSD: https://www.youtube.com/watch?v=ol4OSIGGukA

Is the fire god going to tell me to write a book report on Call of the Wild? ( https://www.youtube.com/watch?v=KXVdTT1Sis0&t=6m42s )


I totally thought that this was a long wind up to the author explaining how perfect GPT-3 and DALLE were for creating Dungeons and Dragons narrative



this is the first time i have seen an emoji rendered here


That's not an emoji, it's just a standard character[0] - added to Unicode in 1993.

♠♦ There's a whole bunch of them[1]! ♦♠

[0]: https://www.compart.com/en/unicode/U+2665 [1]: https://www.compart.com/en/unicode/block/U+2600


Is there any functional difference between any of those characters and emoji? To me they're all just "standard characters"


Yes, functionally those characters are single-tone and will change color in a word processor.

Emoji are full-color and can't have their color changed in a word processor in the same way.

Also, in practice those characters' appearances are tied to the font you've selected, while emoji's appearances are usually tied to the OS you're using (although sometimes the specific app -- e.g. Gmail in the browser on a Mac uses Android-style emoji rather than Mac-style emoji).


They render in color for me. Those characters have 2 ways they can be rendered and the default varies. You can use the U+FE0F Variation Selector-16 to force them to show as emojis or U+FE0E Variation Selector-15 to show them as normal paths.

Though color fonts aren't restricted to emojis, they can be used for any characters, they just aren't very common.


They're just characters from different Unicode blocks, there's no functional difference between them that I'm aware of, other than that characters from some blocks are not allowed here on HN, including all emoji characters.

When I said "standard characters" perhaps I should've been clearer. I meant "they're non-emoji characters". Sorry, I didn't realise it would cause confusion.


Found an interesting source on the wiki for emoji, basically Unicode 1 wanted to encode all existing text, it wasn't until Unicode 6 that they widened the scope to include "characters" that don't exist yet:

> For the majority of characters that are not emoji, the UTC looks for evidence of existing usage as text. Proposers need to establish that there is some reasonable body of text, either modern or historic, that uses that character.

...

> For emoji, rather than look for evidence of existing textual use — since emoji effectively cannot exist in text until they are encoded — we look for evidence of likely high usage once they are encoded, plus a number of other factors.

https://unicode.org/emoji/principles.html


Unlike "normal" characters, emoji are often displayed in full color, independent of whatever color the surrounding text is being rendered in.


If this crapola replaces real art, then the human race doesn't need creativity.


The road to hell is paved with good intentions, where the hype around both DALL-E 2 and GPT-3 will create a new original wave of disinformation and new levels of reality distortion.

Like GANs being abused by DeepFakes, the exact same thing will happen with DALLE-2, GPT-3, etc and it always ends in tears.


I have heard ceaseless doomsaying but never have I seeb a documented case, much less one where the leading factor wasn't a desire to believe. Standards of evidence drop through the floor when the speaker seeks to rationalize what they already feel.

I admittedly have my own biases - instead of any framing as a threat of new technology I see it as a self-flaggating Frankenstein complex married with bikeshedding.


I've seen so much damaging misinformation through viral tweets and tiktoks that no AI-generated imagery is necessary.

It just takes one person with a large number of followers pretending they're an authority to post complete lies on twitter to create something fake "everyone knows".


Site is unusable on mobile. Anyone got a backup link?


It works fine for me on Firefox mobile.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: