Youtube Summary
I made GPT-3 write a Jerry Seinfeld routine about cats - and then used a DeepFake voice to perform it. The results are surprisingly good!

Jerry's voice was created using Tacotron 2 and WaveRNN models

GPT-3 is available in beta
is this the Seinfield AI you were talking about? Wow that legitimately impressed me. If that last joke was actually original and not present anywhere in it's training data, then AI is farther along than I thought

Edit: ah looks like it was posted on hackernews a few days ago

I'm looking forward to an always-on pirate livestream of Crow T. Robot, George Carlin, and Strong Bad reacting to current events in real time.
That's a really funny wild idea
I need the Bill Hicks version as well
This reminds me of the bottled heads from Futurama.
This fills me with dread.

It just dawned on me how disruptive this tech is. If you think we are drowning in a sea of shitty content now, just wait until this type of thing is deployed on demand and at scale into people's feeds.

This actually had me laughing which equals a dopamine hit. I can easily see people endlessly watching rapid fire equivalents of this in a hypothetical future TikTok ursurper.

There already seas of ai content on YouTube, mostly based on a script that's a long post stolen verbatim from Reddit. SEO pushes people to the pages. Content doesn't have to even be good if it's cheap enough to make
The future will be filled with ads for things that are real. It will be what’s scarce and exclusive. And half the things they advertise as real will probably be fake anyway.
Personalized AI generated movies/songs/books are on the horizon.
It will reinforce the gate keepers. AI will destroy mediocrity.
AI will be gate kept.
Yes, but it'd be a different AI. Google's algorithm would have to adapt or off to Bing
Also as soon as deep fakes are as prevalent as photoshop, I'd want someone to verify the content. You could imagine hearing "Jerry Seinfeld" repeatedly using the n word. So I'd want to see the NYTimes or Bloomberg trying to verify it.
> just wait until this type of thing is deployed on demand and at scale

South India averages 4 movies a week. Each movie has like 5 songs. So in a year you need lyrics for atleast 1000 songs. Sure, lyricists are creative artists, but not that creative. So you hear a lot of catchy songs which have some AI input. Here’s a song with 45 million views - I don’t think a human wrote all of it because it’s mostly garbage, but very catchy, so most likely AI augmented.

Only an AI would write something like - 2 2 2 2 2 2 2 2 2 2, i love u 2 really i love u 2.

Song lyrics aren't hard. Consider:

A B C It's as easy as 1 2 3 A B C A B C You and me!

Or this gem:

A E A E I O U you and sometimes Why!

Both songs were hits.

The lottery isn't hard either, it's just a handful of simple numbers. Consider:

10 19 40 45 58 25


20 36 53 56 69 16

Both worth several hundred million dollars.

I don't agree there are plenty of songs with nonsense lyrics pre-AI

like (Da Da Da - Trio - 1981) ->

Someone had to take the time to perform the song, there's no reason they wouldn't take a few minutes to change/improve the lyrics, even if an AI produced it.

I suspect it is not AI produced and simply a simple song.

You wouldn't even need your toes to count the number or people who sang like 80% of the songs in Bollywood film history. A random google claims that Kishore Kumar sang 2905 songs, Lata Mangeshkar somewhere around 25 thousand, and Asha Bhosle around 12 thousand.

The time taken to perform a song doesn't have to be much.

edit: that being said, the choreography probably took 50x as much work as the song.

Cool in concept, but shows how important comedic timing is. Example where the semantics requires the pauses and drawn out expressions to really make it work. Might be where the jokes could be written by AI but perfected by humans, especially as comedians use audience feedback to hone their material. Even the best don’t know how any joke will land until it’s delivered. How could any AI incorporate that real world feedback loop?
It also shows the importance of a laugh track. I think a lot of Seinfeld material would sound weird, even performed by Jerry himself, in the absence of audience laughter or other contextual background noise.
I actually think this should be possible to train an AI to do. It could alter timing and monitor for laughs, it will learn when and how to pause, and use intonation to maximize for laughter.
Timing of what audiences and where? If AI heckler as the above, is training to entertain other AIs or humans with diverse tastes?
The trouble with jokes is they're only funny to people once. You'd need an adversarial laughing AI to train against. And boy would that thing be loony.
There's lots of people in the world.
I guess if you could show them all the same CAPTCHA with the same 1000 jokes... you could develop a very annoying neural network?
Then why do dads love telling the same bad jokes over and over, and laugh every time?
Because we forget that we said it already
Make it a GAN with AI hecklers
AI Statler and Waldorf.

It would be such a force for good in the world that Disney should allow it to be used for free like Volvo did with their patent for the three-point seat belt.

I wouldn't be surprised if this routine was guided by the creator. As in, they edited it to make it work and that was probably after discarding a bunch of terrible stuff. Nevertheless it's not unlike how Jerry actually works, he would write a lot, and then edit it down later. He worked hard at it and it sounded almost joyless.
All it needs is a layer of contrastive loss at the top, to tell real Seinfeld and SAInfeld (don't bother checking, it's already registered) apart, and boom! Indistinguishable delivery.
I was pretty impressed with it, honestly. But I understand what you're saying - a comedian's delivery is different from normal speech. It may be that GPT-3 just wasn't trained with enough standup routines.

(I dread to think what a Sam Kinison AI would be like)

What’s the training if every comedian has their own style, in words, intonation, and timing? Do we need to build personalized AI comedians? Or write new jokes for specific comedians trained on previous their routines?
To be honest, given the complexity and context connection DALLE2 makes it doesn’t seem all that hard to learn some speech distinctions of a couple of hundred people. It’s really not all that complex.
From listening to too many comedian podcasts, the training is performing the jokes to assistances, and adapt from their reactions.

Again and again, year after year, until you've mastered the interaction.

I'll be the annoying guy for this one:

How do we know GPT-3 really wrote this?

See if Seinfeld sues for royalties.
I just tried the prompt in the GPT-3 playground and got a similar result. I tried a bunch of other "A Jerry Seinfeld stand-up routine about X." prompts too - it's an interesting prompt format.

Here's what I got for "A Jerry Seinfeld routine about Datasette." (my open source project). Note that the "." is important - without it, GPT-3 tried to generate me a README file for a new Datasette plugin instead.


Hey, have you ever heard of this new website called Datasette?


No, what is it?


Well, it's basically a way to store and share data sets.


Huh. Sounds kind of boring.


Yeah, I know. But it's actually really useful. For example, say you wanted to find out how many people in your city use the same first name as you.


Why would I want to know that?


I don't know. But with Datasette, you could easily find out.


Really? That sounds kind of useful.


Yeah, it's actually really handy for all sorts of things. So if you ever need to find out anything about a data set, Datasette is the place to go.

I also like how this exhibits a classic GPT-3 trait: it's always biased towards having characters agree with each other. Here George breaks the entire bit by saying "Really? That sounds kind of useful."
Saying "yes" the golden rule of improv.
and "and" =)
In improv that's a technique to collaboratively build a world/scene by adding constraints with no preparation.

It does not apply to standup comedy.

GPT-3 is actually doing improv, not performing jokes. If it were really Seinfeld, it would be paying writers and scribbling notes down in a pocket notebook while golfing.

i.e. It's not Seinfeld, it's an improv guy who you just asked to "Do Seinfeld."

Seinfeld wrote his own stand-up. The whole premise is Seinfeld doing his comedy club routine, not Seinfeld the show.
> GPT-3 is actually doing improv, not performing jokes.

This is the kind of anthropomorphism that I'm consistently surprised to read from people with a technical background.

Nah. It's an whatever the opposite of that is, directed at the people who do improv. Improv is just a series of associative, impulsive reactions to stimuli - exactly what these systems are doing, following a series of words with the first (most strongly connected) utterances that come to mind.

It could seem like anthropomorphism if you don't think about it, but it's not anthropomorphism to say robots walk just because humans also walk.

Yes, if you dehumanize human creativity to "just a series of associative, impulsive reactions to stimuli", then computers can think and GPT-3 is sentient.

> …it's not anthropomorphism to say robots walk just because humans also walk.

But it is to say that robots "do improv". Could GPT-3 mimic something that looks like improv by remixing an enormous corpus of human-created improv? Certainly not well, as we've all seen in non-cherry-picked GPT-3 output.

And how do we know that this is actually GPT-3 generated (ad infitum)
Funny that in a way we fall into the same pattern as GPT-3, reproducing the same two lines always. The first is "GPT-3 generated this", the second "but how do we know it really generated that?".
This is surely an ironic, hacker news style plug for your web business.
“George: Why would I want to know that? Jerry: I don't know. But with Datasette, you could easily find out. George: Really? That sounds kind of useful.”

Another instance of the lack of memory.

Why would I want? Kind of useful…

Not that it is not impressive but that would not get a pass anywhere.

Actually, to me it kind of distills George, the character.
That was scarily good. I didn’t laugh, but I was also telling myself “this is an AI and I’m not going to laugh.”

Excited and scared to see where this tech is going to be in 10 years.

What will the world be like with computer-generated novels and screenplays?

I actually laughed at "I'm not going to start feeding my cat different foods depending on where exactly he is located".

That's a really great punchline.

I kind of chuckled at that, but it was only because of the awkward cadence of the TTS voice.

To me it sounded "haha that's weird" not "haha that's funny."

I worry less about the future of the tech, and more about hack comedians generating entire sets based on other people's material.
> [I worry more about] hack comedians generating entire sets based on other people's material.

The difference is that GPT-3 produces all kinds of hacky content based on other people's material at scale, producing an invasive species of an impossible-to-control faux-written content. It's all fun and shitty Seinfeld routines now, but GPT-X will make it impossible to find non-computer-generated content on search indexes by 2030.

In contrast, comedians and comedy nerds know who the joke-stealing hacks are (e.g. "Carlos Men-Steal-ia"), and the splash damage is limited.

I laughed because it was an AI. If Seinfeld himself had done this routine I would have been pretty stone-faced
Then I take it you would be stone-faced at most Seinfeld routines. This really captures the essence of his routine.
I didn't laugh in exactly the same way I didn't laugh at actual Seinfeld jokes. In that sense, it was absolutely uncanny.
I think that's because a new cat food commercial a week isn't really a thing. I didn't really laugh; I do laugh at actual Seinfeld probably around 40% of the time, less with his more recent content and more with his older content.
In a word, 'derivative'. Not that pop culture isn't full of remakes and endless sequels, bands that sound like their predecessors, etc. But I am not optimistic that fully AI-generated content offers any exciting novelty; so far at least most of its entertainment value comes from where it fails to convince.
It lacks the cultural awareness to pick a topic that would actually surprise a listener, hence not really funny. But the tone and structure was nearly flawless.
the topic was chosen by the creator, not the AI
Zing! Now that's pretty funny.
I think the parent means the particular beats of the joke. THere's clearly something you could do in Seinfeld style with "tuna surprise" that would be unexpected and funny, but the AI is missing all of the cultural context and understanding of comedy that would make it work.
I didn't laugh in exactly the same that I don't laugh at real Seinfeld jokes. In that sense, it was absolutely uncanny.
It doesn't have a laugh track to prompt you. Here is [Seinfeld / Laugh Track Removed: Soup Nazi](
Just played it for my wife but didn’t tell her it was AI. Watching her facial expressions it looked like at points she was skeptical of the cadence, but she came away with it being Seinfeld, but described it as “definitely not one of his better jokes” and “something about it is really off.”
I think I would have had the exact same thought had I not known it was AI. It seems like AI has this rambling, dream-like quality where things just don't entirely make sense, probably because AI lacks understanding or context (my take as a layperson). The voice is surprisingly accurate sounding, but the cadence is stilted and really lacks any kind of comedic timing.

This seems to fall right into the uncanny valley.

One thing I really want to see in my lifetime is an "AI" I can feed episodes of a TV show and have it generate more episodes. Like a second season of Firefly or Gravity. Or re-create the finale of Battlestar Galactica and give that show the ending it deserved.
Philosophically speaking whats the difference between a never ending stream of AI generated entertainment and heroin?
Heroin addicts don't watch ads.
Snarky answer: If you overdose on heroin, there is Naloxone.

Slightly-less snarky answer: If you find that question interesting, you'll probably enjoy reading Infinite Jest.

This is such an idea I never had! Am I some randomly generated episode?!
I don't think we're going to get there following the GPT path. I'm aware this is a bit of a contrarian take -- and that I have some degree of bias, since I've made literally tens and tens of dollars selling my own fiction -- but even the best generated text that I've seen suggests a few things:

- it can be very good at mimicking narrative styles

- it is good at close range context, what might be (uncharitably) described as "autocomplete on steroids"

- it doesn't understand overall structure

That last one is the contrarian bit, but I think it's absolutely true. Context gives some structure to generated text, but the longer the generated text runs the closer it gets to falling apart. Nothing we have now is remotely good enough to pull off things like "here is something planted in chapter two that will be referenced in chapters eight and fifteen and finally pay off in the climax in chapter twenty-five".

And I don't think this is just a matter of iterating on the current models; at some point this requires an understanding and application of story construction rules, of act structure, of character, that seems fundamentally incompatible with the way neural network-based machine learning currently works. If you feed a bunch of screenplays into an ML model, it can generate text in that style, but it's not going to write what we would consider a good screenplay. If you feed a screenwriting textbooks into that same model, it won't help, because it doesn't have the ability to interpret those as structural guidelines for writing entirely unrelated texts.

I don't expect this to become reality.

I'm just saying that if we are about to use "AI" for all kinds of increasingly disturbing and dystopian purposes, at least I want to get a second season of Gravity out of it.

Fair. :)
This is amazing, especially the punchline! Some of the Netflix specials I've watched had jokes that were not much better than this. I'm sure with a proper delivery (tone, pauses) it would get quite a few laughs if it was presented on the stage.
Playing around with gpt3 in the past I've noticed it lifts a lot of material and repurposes it. Does some variation of cat food skit exist in the wild? I wouldn't be surprised.
The bit is structurally similar to this routine:
What’s really effective is the way it maintains a thread through the dialog.
Pretty good! Almost a little bit Mitch hedberg fused with Seinfeld
Pretty bad, yet simultaneously surprisedly good. Lots of promise.
I'm having trouble believing this is real and not heavily massaged output. I worked on a AI-like project in 1988 as a programming gig in college, and the output was heavily edited by the composer.

If it is real, I'm actually a somewhat frightened.

I'm pretty sure its real but slightly cherry picked. As in you'd have to do 5 similar prompts to get something like this.
It's a lot of fluff
I wonder if they picked Seinfeld because of the 30 Rock episode.

>What's SeinfeldVision?

>Well, I realized that NBC owns hundreds of hours of footage of Seinfeld from his massively successful television series, Seinfeld. So my old tech guys were able to digitally capture Seinfeld, and now we can basically make him do or say whatever we want. So for the month of October, all of our primetime shows will feature a computerized guest appearance from Mr. Jerry...

>...Seinfeld. Does Jerry Seinfeld know you're doing this?

>Uh, Jerry's in Europe with his family right now, but by the time he gets back, SeinfeldVision will be a monster hit!

It seems unlikely that this is the real prompt. Even at temperature=1.0, i repeatedly get "Why did the cat cross the road? To get to the other side!" as the response.
I gave it some monologues and expanded the prompt a bit and now it's more talkative, a bit.


Cats are weird. They're like little old men in furry suits. Have you ever seen a cat drink? They stick their tongue in the bowl, and they just go "Nyahhh." It's like they're not even trying. Dogs are different. Dogs are eager to please. They're always happy to see you, they wag their tails, they bring you a stick. They're like, "Hey, look what I found! Isn't this great?" Cats are like, "Eh, what do you want?"


Why does it seem like every time you meet somebody, they're either a cat person or a dog person? What is that all about? You ever meet somebody and they're like, "I'm not really a cat person or a dog person, I'm more of a snake person." Never. You're either a cat person or a dog person. And I've never met a cat person who's also a dog person. Have you? It's always one or the other. It's very rare. In fact, I've never even met a cat person who's also a human person. They don't seem to like us very much, do they?


What's the deal with cats? I don't get it. They're so cute when they're kittens, but then they just turn into these lazy, aloof creatures that don't do anything. Just lay around all day, sleep all day. And they're always like, "I'm not really in the mood." "I don't really feel like it." "I'm not really in the mood for this." What are they in the mood for? What do they want? Nobody knows. Not even the cats. They don't know what they want. They're just like, "I'll know it when I see it."

> You ever meet somebody and they're like, "I'm not really a cat person or a dog person, I'm more of a snake person." Never.

Lol spend some time in Florida for a while.

Wow, all three of those made me laugh...which is quite amazing to me. Comedic timing would be key to performing them.

Would be interested, how did you expand the prompt?

It sounds like a real Seinfeld routine performed on an off night.
Perfectly humorless line, to me sounds exactly like Seinfeld.
Years ago on /r/aww someone posted their black cat named Jerry Seinfeld. This sounds like the kind of comedy that this cat Jerry Seinfeld would come up with.

I remember this cat because I improvised a similar comedy bit for him in the comments. Like "Humans, huh? You never see them hunt or fish, but they have all this meat just lying around!"

Very impressive. The only thing missing from AI is a perfected understanding of context without billions of data points. and it has some trouble with the consistency and novelty that humans intuitively understand because of our shared evolution
All it needs to do now is to make some unbelievably pretentious waffle about how it's been perfecting a joke it's entire career only for the joke to turn out to be "What's the deal with poptarts!?".
The last bit about "food for indoor cats" is hilarious: I'm not going to start feeding my cat different foods depending on where he's located...
Does anybody know what model was used to generate the voice? What's current SoTA in voice cloning right now?
That was every bit as fun as watching Seinfeld :|
Its pretty good. I guess the next step is to generate 100 of these in short order. That'll have stand-ups scared.
how long before this is tiktok
The scariest part of this is the top YouTube comments saying this is a legitimately funny routine.
This is solid! I'd like to hear Jerry Seinfeld's take on it.
The script is good, the voice synthesis and delivery abysmal.
"abysmal".. ? Either you're an expert on the field and know better (please send some link !) or i'm not even sure what you're comparing that too. This is the best thing i've heard from speech synthesis since i can't remember.
You don't need to be an expert to notice that some of the verbalization lacks a pause or a more dramatic flair, to make the joke land better.

It sounds like Uncanny Valley Seinfeld. Almost like him. But not really.

Something can be 10x better than the best X (X=fake Seinfeld), an amazing achievement, but still be a bad Y (Y=Seinfeld).

This audio was like using Seinfeld's unconscious body as a puppet.

i'd say the voice was actually really good, the intonation/delivery was just off (literally just been watching Seinfeld properly for the first time).
Abysmal is a huge exaggeration. It sounds convincingly like Seinfeld. Yeah the delivery isn't incredible, but what do you expect?
checks out, just as lame as the real Seinfeld
better than the real thing imo
GPT-3 can take even complete nonsense pretty far before decoherence with some light prompt engineering:


GEORGE: Somebody hacked my apes. They’re all gone.

JERRY: All gone?

GEORGE: Every single one! Funged!

JERRY: You can’t just funge a man’s ape!

GEORGE: That’s society for you - people are sick!

[KRAMER enters smoking a cigar in an ape costume]

GEORGE: What the fuck Kramer - are you smoking my ape?

JERRY: Give him back his NFT! He needs it to gain entry to the Vladimir Club for the big Cryptoland party tonight.

[KRAMER spits out a mouthful of smoke]

KRAMER: What party?

JERRY: The Cryptoland party.

KRAMER: I can’t go to the Cryptoland party.

JERRY: Why not?

KRAMER: I’m wearing an ape.

JERRY: Well take the ape off.

KRAMER: No, I can’t take him off. I’ve been wearing him for days.

JERRY: How many days?

KRAMER: A long time.

GEORGE: A few days?

KRAMER: Yeah, a few days.

JERRY: It’s those damn NFTs.

GEORGE: No, no. I’ve been wearing my ape for days too.

KRAMER: Really?


KRAMER: Well, why didn’t you say something?

GEORGE: I was embarrassed.

JERRY: We need to get those NFTs off these guys.

GEORGE: Yeah, well you’re not going to have much luck with a knife and a can opener.

JERRY: We’re gonna have to cut them off.

GEORGE: There’s no other way?

JERRY: Well they are basically a thin elastic material. I’m sure they can be stretched over a pair of scissors.

GEORGE: All right.

JERRY: All right.

[JERRY and GEORGE hold KRAMER’S arms down on the table while JERRY sneaks the scissors up behind KRAMER’S NFT]

JERRY: Okay Kramer. Here we go. One, two, three.

[JERRY quickly cuts KRAMER’S NFT off]

JERRY: There it is.

KRAMER: Oh, thank you. My ape. You saved my ape.

GEORGE: Yeah, you’re welcome.

KRAMER: Oh my ape.

[KRAMER puts the NFT on his head]

JERRY: No Kramer!

[JERRY and GEORGE quickly pull KRAMER’S NFT off]

JERRY: Oh my god.

[JERRY and GEORGE are holding KRAMER’S NFT in their hands and it is stained with blood]

JERRY: He cut the poor thing right in half.

GEORGE: I doubt we’re going to find anyone who can sew it back together.

JERRY: He’s just gonna have to go to the party with no NFT.

GEORGE: What are you talking about?

JERRY: The party.

GEORGE: The party.

JERRY: The Cryptoland party.

KRAMER: I’m going to the Cryptoland party?

JERRY: Oh yeah.


KRAMER: Oh yes.

GEORGE: You’re really going, huh?


GEORGE: Well listen Kramer, in order to get into the club, you have to have an approved ape.


GEORGE: See, the Vladimir Club is a very exclusive private club for people who have a funged ape.

KRAMER: I don’t understand.

JERRY: What George is trying to say is, if you want to go to the Cryptoland party, you’re gonna need a new ape.

KRAMER: I see.

GEORGE: You’re gonna need a new ape.

KRAMER: I see.

GEORGE: You’re gonna need a new ape.

KRAMER: I see.

GEORGE: You’re gonna need a new ape.

KRAMER: I see.

GEORGE: You’re gonna need a new ape.

KRAMER: I see.

How much of that is the prompt?
As another data point as we chart our path to the singularity, I entered the same prompt into the API playground and got this response:

> A Jerry Seinfeld routine about cats.

> Why do cats always land on their feet? I've never seen a cat land on its feet. They don't; they land on you.

> I was at my friend's house the other day, and his cat was walking across the room. And I said to him, "How does that cat always land on its feet?" And he said, "Well, it's because they're lighter than air."

> And I said, "No, it's because they've got nine lives."

> And he said, "No, it's because they're lighter than air."

> And I said, "No, it's because they've got nine lives."

> And he said, "No, it's because they're lighter than air."

> And I said, "No, it's because they've got nine lives."

And then it just repeated those last two lines until my browser crashed. It cost me about a quarter.

GPT 3 really reminds me of the kind of fluent, but ultimately incoherent, speech someone with Wernicke's aphasia produces. It's like we've done a really good at artificially building the language part of the brain isolated from all the other parts
I'd never heard of that so I googled for a video because I wanted to know if GPT 3 is like that. After watching the video I can see what you mean. Fascinating but somewhat horrifying.

I still imagine Seinfeld doing this bit with an increasing intensity and widened eyes as it progresses to infinity.
Copilot gives me these kinds of two-line loops a lot. This is probably what AI will sit around doing until the sun flames out, after it's killed all of us off.
No rodents will likely gnaw away its wiring well before that.
Hah. Someone needs to write this as a short story. Or like a follow-up to the Rats of NIMH.
