HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
Playing a Neural Network's version of GTA V: GAN Theft Auto

sentdex · Youtube · 395 HN points · 6 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention sentdex's video "Playing a Neural Network's version of GTA V: GAN Theft Auto".
Youtube Summary
GAN Theft Auto is a Generative Adversarial Network that recreates the Grand Theft Auto 5 environment. It is created using a GameGAN fork based on NVIDIA's GameGAN research.

With GAN Theft Auto, the neural network *is* the environment and you can play within it.

Github: https://github.com/sentdex/GANTheftAuto/

Unboxing and reviewing the DGX Station A100 80GB: https://www.youtube.com/watch?v=0mAesfFt4us

Neural Networks from Scratch book: https://nnfs.io

Channel membership: https://www.youtube.com/channel/UCfzlCWGWYyIQ0aLC5w48gBQ/join
Discord: https://discord.gg/sentdex
Reddit: https://www.reddit.com/r/sentdex/
Support the content: https://pythonprogramming.net/support-donate/
Twitter: https://twitter.com/sentdex
Instagram: https://instagram.com/sentdex
Facebook: https://www.facebook.com/pythonprogramming.net/
Twitch: https://www.twitch.tv/sentdex
HN Theater Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
> I'm not going to be so stupid as to say 8 cores should be enough for anyone (while attached to a machine with 128) but you have to wonder if the stable diffusion style apps running on your desktop are going to be mainstream, or isolated to the few who choose to _need_ them as a hobby or a smaller part of the public that uses them for commercial success. AKA, I can utilize just about every core i'm given with parallel compiles, or rendering a 4K video, but I'm pretty sure i'm the only one in my immediate family that needs that. My wife in the past might have done some simulation work, but these days the heaviest thing she runs on her PC is office products.

See GAN Theft Auto at https://www.youtube.com/watch?v=udPY5rQVoW0

Someone trained a neural network to convert controller input into video output that simulates the game Grand Theft Auto.

If technology keeps improving, I expect many future games will be such 'dreams' of neural networks.

You are right that running Word or Excel won't really benefit from more cores.

This YouTube video, to me, shows the promise of things to come. AI generated game worlds. Language models to generate plots and dialog, transformers and GANs to create illustrations. Imagine a game, a truly open world sand box, Grand Theft Auto meets AI Dungeon - every NPC is a "real" person with unlimited dialog options, the buildings you drive by you could easily walk in and investigate, unlimited play space - you could type in more general instructions and ideas to the plot generator ("add in a vampire romance and murder mystery angle") on the fly.

https://www.youtube.com/watch?v=udPY5rQVoW0

twoodfin
What you describe is actually all the more interesting aspects of the “holodeck” as introduced and explored (some would say too deeply) as a story concept on Star Trek: The Next Generation.

There are more than a few scenes where the intrepid crew members struggle with what we’d now recognize as prompts.

https://youtu.be/p7pPedBtbvk

This video really amazed me in terms of automated game development. My fantasies are something like this and GPT-3 plus a couple decades of progress.

I think we'll get to genuinely open ended sand box games.

https://youtu.be/udPY5rQVoW0

h0l0cube
Predicting the next frame from previous frames and learnt sequences is a neat trick, but I think automated game development is already very possible with simple techniques like genetic algorithms, or even a PRNG. I mean Rogue and it's descendants are very much automated game development, but GPT-3 could be useful for something like dynamic quest generation, world-building, narrative, adaptive NPCs (including dialogue) etc.
eru
I'm not sure classic procedural content generation like in rogue-likes is all that comparable to using a GAN to run the whole game?

Have a look at the video, it's quite impressive.

h0l0cube
I've seen it before, and I've a basic understanding of GANs, I just don't see it being overly useful. This technique can make a really blurry simulacrum of an actual game, and that's really cool, but I'm not sure how it could be used to make a something both truly novel and coherent. There's plenty of low hanging fruit for AI within an engine, whereas using AI to be the entire engine is somewhat infeasible
eru
You are right about the technique not being very useful as of today. My fascination stems from my assumption that more resources poured into this approach would yield vastly better results.

Even just watching the video, I came up with several possible improvements to try out. Eg adversarial training, that would really hone in on the situations and aspects where the model is weak so far, like edge conditions; instead of just using normal gameplay as input.

ehnto
It's definitely an interesting area of research, but for that example you still had to make the whole game in the first place in order to have something to train the model on. Say you have a novel game idea, how could you use that approach to make it a reality? I'm not sure you could, but like you mention it's a really early example and who knows where it ends up.

The other part about that GAN Theft Auto example is that it doesn't actually know what's going on, like there's no game state. All it knows is that "When I have a frame that looks like this, and they press that button, I think the next frame would usually look like this". So it's got no internal game logic, it's just really good at painting what games look like.

eru
About the first one:

Even going about this very naively, you could at least use it to train a model against a supercomputer running the game, and then run the inference on much more modest end-user machines.

But you can be much more ambitious: have you seen eg style transfer? So you could probably do a bit of ML black magic to train your model on GTA, and then point it at the Google Earth data to get a GTA-like set in real-life London.

Or you could use something like style transfer to go for a cartoony look, or add ray-tracing like effects, even if you didn't have these effects in your original engine.

Or you can use a pre-trained model (eg on GTA), and then spend a relatively modest amount of extra training to get a different kind of game, eg one that has magic or so.

About the latter part: I do think their model is already running with some state. But even if it ain't, that's a relatively small thing to add with already known standard techniques (or you can come up with new techniques.)

GAN Theft Auto also had a similar dream-like quality, especially with anything involving collisions (cars and how "big" the road was).

https://www.youtube.com/watch?v=udPY5rQVoW0

Jun 21, 2021 · 2 points, 0 comments · submitted by lnyan
Jun 19, 2021 · 5 points, 0 comments · submitted by obiefernandez
Jun 19, 2021 · 388 points, 96 comments · submitted by ALittleLight
jwilber
I like your YouTube videos in general and think this content is a great benefit to the community.

I wouldn’t take the few negative comments personally - I’ve seen many GAN architectures that heavily overfit (including my own bobross pix2pix) get a lot of praise, while ‘less violating’ models (like yours) get more skepticism. Skepticism isn’t bad! But I’d wager in your case it may be because you’re a YouTuber, and other ml YouTubers are notorious for ripping off content (eg Siraj).

Not really related to this, but I’d personally love to see the difference in training times it would take an RL agent to adequately learn to drive a car in gta versus adequately flying a helicopter.

ALittleLight
Because you're using the word "your" I just feel the need to clarify that I didn't create this. I just saw it on YouTube and thought it was neat.
jwilber
Oh my bad! The dude who made it (sentdex) has replied a ton in this thread, I just assumed he was the op as well.
senkora
Someone did a similar project with the exact same name for an ML art project at CMU a few years ago.

https://m.youtube.com/watch?v=eP5hHKne_gE&feature=youtu.be

Full list of projects: https://sites.google.com/site/artml2018/showcase/final-proje...

sentdex
Jeez, scared me. Same name yep, totally different project. That project is pix2pix. That is not a GAN-based game engine that you play within.
senkora
Oh, yeah, definitely a different thing but kinda neat that the name has occurred twice. Sorry for the scare!
godelski
Honestly "GAN Theft Auto" is the obvious choice for the name of the project (your project).
emptyparadise
One throwaway line about GAN operating systems now made me want to see a shell GAN. Keypresses as inputs, 80x24 terminal screens as outputs. Could a neural network dream of Unix?
sentdex
I don't see why not. Might be something fun to try tbh.
toxik
I always imagined this for a honeypot, an SSH server that accepts any and all! root/password? You bet!
emptyparadise
Imagine being a pentester, breaking into what looks like a normal Alpine VM, then finding out that it is weird.
reasonabl_human
This exists via recent NLP models, I’ll see if I can dig up a link…

Edit: https://www.reddit.com/r/linux/comments/mtnld7/programmer_cr...

emptyparadise
But this is converting natural language input to commands, right? It's not actually dreaming up the entire shell and the output.
riveducha
As the original creator of that video, it’s a little sad to see people download and then re-upload the entire video to Reddit.

You can find the original video as well as written commentary on my web page: https://riveducha.onfabrica.com/openai-powered-linux-shell

jallbrit
Wow, what an incredible video and showcase. This really puts GPT-3's power into perspective. I can't wait till the public has access to something that powerful- or maybe I should enjoy not receiving GPT-3 phishing emails in my inbox.
whalesalad
GPUs: am I a joke to you? Instead of using them to render polygons, let’s use them to train neural networks that produce models that make them unnecessary. I’m oversimplifying - but pretty wild nonetheless.
slver
Well, neural networks run (fastest) on GPUs.
Sharlin
Wait, you mean that "GPU" doesn't mean "GAN Processing Unit"? ;)
nitrogen
Something I'd like to see is a visualization of subsets of the network's internal state that correlate with simple quantities like compass direction, velocity, position, etc. It'd be really fascinating to see where in the model these things are being learned, whether they are concentrated in a small area or spread out, and whether this is somewhat consistent across different iterations of the model.
ludwigschubert
Me too! In a much simpler setting a former colleague of mine, Jacob Hilton, tried such an exploration for the vision part of a OpenAI CoinRun model. It’s the first part of this paper: https://distill.pub/2020/understanding-rl-vision/
philipswood
The GitHub repo is here:

https://github.com/sentdex/GANTheftAuto/

okamiueru
Can someone explain a bit more on the long term applicability, or maybe other use cases that might be easier to appreciate?

The reason why I ask is that it seems very challenging to generate the training data for such systems. Could someone explain how this can go further than to just replicating X? So, if assuming some creative freedom, could you give an idea of what the long term application of this would be?

NB: please take my questions at face value without thinking I'm implying this isn't cool for what it is. I'm all for people having fun. I'm all for projects not needing to tackle some grander issue.

tiborsaas
In the future we might have a fourth common media format besides pictures, videos and audio: GAN records.
okamiueru
That is an interesting thought. I don't fully understand how though. The main challenge is the training data. If you need to first create the interactive experience... What would the added value be?
mdale
https://openai.com/blog/dall-e/

For example is creating novel combination based on (large) training set. If the network had enough weights on what is realistic looking could create novel game experiences based on a prompt of say a film or book.

4dahalibut
Hey sentdex this is absolutely awesome! Playing with exotic target types like generating games is IMO where the fun is in ml :)

Do you see yourself taking this train of play further?

sentdex
We'd like to try some further GTA stuff, as well as some IRL stuff. Have seen some recent IRL GAN stuff, and it looks super interesting.

There's just something about AI-based environments that is particularly intriguing!

TinkersW
Looks interesting, if very far from practical-- too bad it requires a "DGX" station to train

It seems to flicker/fade things in alot, like the random poles that keep appearing and disappearing, it seems like there is not enough focus on temporal consistency or something?

bruce343434
If you see the source output image before it was upscaled you noticed the resolution is too low and thin objects "fall between" the pixels. The upscaler then interprets it as air, it seems.
None
None
junon
This is incredible. Took me a minute to realize this isn't an image transform of some kind.

Really well done.

HerrmannM
Great work! I'm curious about what could be achieved in this space in the future.

I'm curious about why you cannot share the GTA5 mod and collection script? I'm curious about that part too -- obtaining good data is always hard.

Cheers and all the best!

dividuum
Impressive. Makes you wonder if at some point in the future there isn't a game engine any more but tons of training material and you play in a generated dream.
slver
Maybe not entirely, because just like a dream, the rules of a neural network tend to drift and be somewhat fuzzy.

Unless it's a high-concept game whose very goal is offering you a dream environment.

But I do believe neural networks will get into everything. They're the last missing piece of our compute model.

jsiepkes
Certainly impressive. And sure, maybe in a distant future. Though I think this is like one of those things where creating a working prototype that is 75% complete is the "easy" part. The other 25% (which you need for an actual working product) will take forever. Like self driving cards, nuclear fusion, etc.
viraptor
Text dungeon was very much a 75% complete thing, yet it's greatly entertaining in its own right. I would happily play a dream game which just falls apart sometimes.
darepublic
Cool stuff. Looking forward to more realistic NPCs and player decision driven stories in an open world sandbox
boyadjian
WOW, this is awesome. The video gives the impression we are dreaming
black_puppydog
Woah dude!

sudo python3 inference.py?

Really? :D

dataviz1000
Of all this videos on ML, this one takes the cake with an animation of Hidden Layer Activation. [0]

[0] https://www.youtube.com/watch?v=gmjzbpSVY1A&t=100s

Randomoneh
I fail to see novelty here. What's the size difference between the model and and all of the 64x32 image training data? If the difference is not significant, you're basically almost just scrubbing a video, right?
sentdex
The GAN model is the game environment. You're playing a neural network. The novelty is no game engine, no rules, just learned how to represent the game and you can play it.
rasz
What he meant is you overfitted the network with video footage. There is no game, just seemingly clever stitching and playback of learned footage

similar concept applied to animations and implemented in a state machine https://www.youtube.com/watch?v=KSTn3ePDt50

and optimized with nn https://www.youtube.com/watch?v=16CHDQK4W5k

ShamelessC
The first link provided seems to need a very detailed human-provided cost function for specific development needs.

The second one is indeed interesting research and seems to be a combination of the prior learned motion mapping working in tandem with a generative model.

I suppose you could say that the automation of the dataset is considered as "augmentation"; but the difference here is that the dataset is just pixels and inputs rather than all that animation info and simulation data. Yes, a simulation is running; but the GAN only gets the pixels and the input.

There's a similarity there though; you're right. In either case; the explicit goal of the video you posted is to combat runtime constraints of generative models. I'm not certain it's a fair comparison.

The latter video and sentdex's result both seem to generalize to unique scenarios not present in the training set. This may mean they are creating an efficient representation of the underlying data in order to predict future samples more easily than simply overfitting.

The top level comment here is a shallow dismissal and Randomoneh could have answered these questions themselves before throwing out a smug comment like "I fail to see novelty here" when it's at the very least the first large-scale GAN successfully trained on GTA V.

rasz
The first link exposes the trick employed by your model.

>animation info and simulation data

but did your model learn any of that?

>explicit goal of the video you posted is to combat runtime constraints

The trick to motion mapping is feeding a lot of data with accompanying inputs to build an atlas you can reference during playback.

>first large-scale GAN successfully trained on GTA V

Its really cool. The problem I had is in the presentation. I immediately felt insincerity bordering on scamming the audience, because I assume someone working in this field would know how the sausage is made. From the YT clip: "the shadow and reflection works", "modeling of physics works". Do they? or did your model build an atlas of video frames it can play back according to the fed input? Im guessing weather/time of day was locked when recording training data - perfect shadow and constant sun position for a nice reflection. Searching for 1:1 matches of generated output in the training set would be interesting and pretty revealing.

kristintynski
Accusations of scamming are serious. What evidence do you have? None as far as I can see. This is wrong and should be remedied.
rasz
I feel scammed when practitioner of the art tries to sell me on his model "learning physics of the simulation. Look, it even figured out where to put the shadow".
kristintynski
No one cares how you feel, come with proof before accusations. Otherwise you are just a troll
magic_quotes
Have you seen the video? The author even goes as far as suggesting the technique might useful for (generating?) entire operating systems at https://www.youtube.com/watch?v=udPY5rQVoW0&t=853s. That's just wild.
sentdex
No, that's just false. How about a direct quote?

I suggested there could be a "future where many game engines are entirely or even mostly AI based like this. Or even things like operating system or other programs."

The thought here was just a wondering of what the future might be and if we might have far more AI based programs.

I still think the answer is a strong yes, this is a glimpse into the future. No where did I say GameGAN would be that engine. You're just trying your hardest to hate.

magic_quotes
I'd like my OS being deterministic, thank you.

> You're just trying your hardest to hate.

Manipulative much? I don't hate you (well, so far), you aren't being attacked, I'm just noting what a few informed people here don't like about your video. No, they aren't trolls. And, yes, everyone has different level of tolerance to exaggerations, of course.

sentdex
Odd, pretty sure it was you who misrepresented what I said in attempts to manipulate.

You were also the one who "exaggerat[ed]" my claims. I made a general statement about my thoughts about future AI-based software rather than human-coded.

I still think that's indeed the inevitable future. Doesn't seem like it's remotely outrageous or an exaggerated. I never said GameGAN would be that software, but you seem to want to make that be the case so you can put it down.

What makes you believe neural networks aren't or could not be deterministic? What makes you think NNs could not eventually produce far more robust, reliable, and secure operating systems?

Seems obvious to me, but I guess you're more informed than me :)

90211
You, like many youtubers, made completely exaggerated claims in your commentary. Your model fits a sequence of inputs to a video frame. But you say "wow look it even models the movement of the sun!". It's pretty absurd.
sentdex
> I immediately felt insincerity bordering on scamming the audience

MFW I read this. Jeez man. Model size is 173MB. It didn't just memorize every possible combo.

How the hell you went from our excitement about a fun project we shared on YT to accusing us of "scamming" the audience I really don't know. What a terribly rude and hateful attitude you have =/

bobsmooth
The people on this website are terrible sometimes.
slver
I wouldn't call it scamming, but 173MB is not small at all. At the resolution of this model, you can easily fit the entire Titanic movie in 173MB. Maybe even have enough space for audio.

Furthermore no one is saying the model "memorized every possible combo". However imagine you have a set of keyframes (maybe even multiple fragments per frame) and you need to interpolate between them? Not that hard of a task, isn't it.

Models don't care about simulating our "intention" properly. They care about fitting the input in the simplest way possible. Think about a model like a lazy worker merely trying to look like it's working.

None of this makes NN less exciting, but it should inform us you can't go 0 to 60 in one step and hope the NN would have great insight about what it's doing.

We need models that make smaller conceptual jumps, i.e. models that understand 3D space, then models which understand transformations in 3D space, then models which understand citicscape, etc. etc.

mscharrer
> However imagine you have a set of keyframes (maybe even multiple fragments per frame) and you need to interpolate between them? Not that hard of a task, isn't it.

Intrestingly, the video artifacts of this model look somewhat similar to those from simple motion interpolation algorithms such as ffmpeg's minterpolate, especially during fast camera motion. https://ffmpeg.org/ffmpeg-filters.html#minterpolate

Edit: I generated an example with strong artifacts. Input: https://mscharrer.net/tmp/lowfps.webm Output: https://mscharrer.net/tmp/minterpolate.webm

sentdex
Memorizing a static succession of frames with nothing actually being dynamic and interactive isn't the same challenge as this.
mekkkkkk
It sounds like you and others are trying to clarify how this demo doesn't live up to your idealized, subjective expectations. Noone is claiming this to be a revolutionizing or even useful video game engine.

It's a neural network that recreates a limited, yet fully dynamic gameplay segment only based on player input. It's a really neat and fun project.

slver
I think it's quite telling that you point to me about having idealized, subjective expectations and then describe the demo as "limited yet fully dynamic gameplay". It rotates the car to left or right depending on whether you press left or right.

It's super-interesting but it doesn't recreate limited fully dynamic gameplay. It doesn't recreate any sort of dynamic gameplay. That's your idealized, subjective interpretation.

mekkkkkk
The driving seems pretty dynamic to me. Maybe "fully" was a bit hyperbolic, as I can't really justify or quantify what that would entail. On the other hand, saying that it's not dynamic at all seems equally misguided. Also you seem to disregard the "limited" and "segment" qualifiers which was there for a reason.
uh_uh
Don't take it personal. Commenters on HN are famous for dismissing successful ideas (remember Dropbox?).

I have one question: you mentioned that the training data was 100GB. Was it the same resolution as what is output by the model (ignoring supersampling)?

sentdex
We had ~100GB of data (and that was gzip compressed data). The final model is 173MB.

It's simply not large enough to have memorized every combo.

eru
Don't think gzipping helped much with video data?
ShamelessC
Great work! Hacker News still seems to have a deeply skeptical culture with regard to machine learning - not sure why. There's always someone saying it's "not novel" and it's "just doing x".

Overfitting is a known issues in machine learning, people. If you still think all neural networks are doing is memorizing the dataset completely in the year 2021 - you might want to revisit the topic. It is one of the first concerns anyone training a deep model will have and to assume this model is overfit _without_ providing specific examples is arguing in bad faith.

Sentdex has shown his GAN is able to generalize various game logic like collision/friction with vehicles and also learns aspects of rendering such as a proper reflection of the sun on the back of the car.

He also showed weak points where the model is incapable of handling some situations and even did the impossible task of "splitting a car in two" to try and solve a head-on collision. Even though this is a failure case; it should at least provide you with some intuition that the GAN isn't just spitting out frames memorized from the dataset because that never happens in the dataset.

You will need to apply a little more rigor before outright dismissing these weights as merely overfit.

@sentdex Have you considered a guided diffusion approach now that that's all the rage? It's all rather new still but I believe it could be applied to these concepts as well.

rasz
One of the main problems with ML/NN is it often works like magic, aka the trick works as long as audience doesnt know the secret behind it. Its fascinating to gullible audience, mundane bordering on boring to practitioners.

My Tiger repelling rock^^^^^^leopard detection model works great on all animal pictures ... until you feed it a sofa https://web.archive.org/web/20150703094328/http://rocknrolln...

>able to generalize various game logic like collision/friction with vehicles and also learns aspects of rendering such as a proper reflection of the sun on the back of the car

id did none of that, what this model did is learn all the frames of video and their chronological order according to the input.

> impossible task of "splitting a car in two" to try and solve a head-on collision.

it played back both learned versions at once, like reporting confidence of round thing being 50% ball and 50% orange.

godelski
> My Tiger repelling rock^^^^^^leopard detection model works great on all animal pictures ... until you feed it a sofa

I'm sorry, how is this different than normal software engineering? There's dozens of unit/integration testing memes poking fun at specifically this (which is a mostly solvable problem in ML btw, when you use out of distribution data. Give your model a 3rd end state that represents "neither").

> id did none of that, what this model did is learn all the frames of video and their chronological order according to the input.

A better explanation is that the network knows what frame to generate given the current frame (and n previous frames) and the current user input. If it was memorizing then it'd have to generate an extremely large number of scenarios (it would exponentially grow as any given frame has k possible actions from your current frame to the next frame). If Sendex can run the game for arbitrary length and take arbitrary actions then it is a far more reasonable explanation that the model is generating the frames rather than memorizing. Apply Occam's Razor.

Edit: Sentdex said the model was ~173MB, so that is not large enough to memorize the gameplay.

motohagiography
Maybe I'm misinterpreting, but if you've ever seen a cat freak out about a cucumber (an entire video genre, apparently), ostensibly real intelligences make similar errors.

Beyond rote memorization, it looks like it could be explained by saying the model appears to have a found a concept of consonance and dissonance that is bounded within the field of its inputs, and a networked grammar for interacting with the up/down/left/right inputs. Some people might find that technically trivial, but as a layman I am impressed.

The "magic" part is that the response of the network appears to be so complex relative to its inputs, but given the input is so limited from a controller, it's easy to attribute more meaning to it when it is working with a finitely bounded simulated model.

Generally I'd wonder, if the behaviour appears more complex than the stimuli, do we tend to attribute intent to it?

sentdex
In the end, everything is boiling down to matrix math, so you can always make the argument that no neural network is impressive if you want.

The model's size is ~173MB, depending on settings. That's not much space to have memorized every single possible combination of events, nor was our data enough to cover that either.

YeGoblynQueenne
>> The model's size is ~173MB, depending on settings. That's not much space to have memorized every single possible combination of events, nor was our data enough to cover that either.

The resolution of the images output by the model is very low (what is it exactly, btw?). It's not impossible that your model has memorised at least a large part of its data.

In fact the simplest explanation of your model's output (as of much of deep neural networks for machine vision) is that it's a combination of memorisation and interpolation. There was a recent ish paper by Pedro Domingos that proposed an explation of deep learning as memorisation of exemplars similar to support vectors (if I understood it correctly - only gave it a high-level read).

It's also difficult to see from your demonstration exactly what the relation between the output and the input images are. You're showing some very simple situations in the video (go left, go right) but is that all that was in the input?

For example, I'd like to see what happens when you try to drive the car over the barrier. Was that situation in the input? And if so, how is it modelled in the output?

Finally, how do you see this having real-world applications? I don't mean necessarily right now, but let's say in 30 years time. So far, you need a fully working game engine to model a tiny part of an entire game in very low resolution and very poor detail. Do you see this as somehow being extended to creating a whole novel game from scratch? If so, how?

Edit: on memorisation, it's not necessary to memorise events, only the differences between sets of pixels in different frames. For instance, most of the background and the road stays the same during most of the "game". Again, the resolution is so low that it's not unfathomable that the model has memorised the background and the small changes to it necessary to model the input. So, it interpolates, but can it extrapolate to unseen situations that are nevertheless predicted by the physics you suggest it has learned, like driving over the barrier?

haecceity
Video frame resolution is pretty small...
teruakohatu
> The model's size is ~173MB

That is impressive! Less than twice the size of ResNet-50 weights. Surely that is within an order of magnitude of an equivalent Unity or GoDot game+models.

fartcannon
Your original self driving GTA5 videos are what helped me come to understand machine learning in the first place (along with some of Seth Bling's MarI/O, and a bit of Tom7's learn/play-fun magic). I used your tech to make an AI that played Donkey Kong Country in LSNES emulator shortly before Gym-Retro was released.

So, thanks a bunch, Sentdex. You are rad.

sentdex
Hah, awesome! Any plans to apply GAN Theft Auto to something else? :o
fartcannon
Not offhand, but you've probably inspired a lot of creativity with this across the internet... and a lot of copy cats. I'm looking forward to seeing what gets made.
andrepd
> Hacker News still seems to have a deeply skeptical culture with regard to machine learning

Is... that a bad thing? Skepticism is good. When it's about something as hyped as "deep learning", even more so.

godelski
> Skepticism is good

There's skepticism and then there's being a non-expert in a field and talking with high confidence. How do you differentiate these? Conspiracy theorists use the same logic. You're right that skepticism is good, but it is easy to go overboard.

ekianjo
And then there are so called experts who are charlatans as well. Dont ever forget that possibility.
godelski
Sure, but skepticism should decrease if there are a community of experts are saying the same thing. As an example, anti-vaxxers often claim skepticism and that they have done their own research. The reason we don't trust them is because we think doctors have a greater expertise in the subject than them (it is, either way, trusting someone). Unless you're a virologist you probably don't actually have the expertise to actually verify vaccine claims.

So sure, you are right, but in the context of this discussion you're implying that the vast majority of ML researchers (myself included) are charlatans. I'm not sure what the meaningful difference here is. We're publishing results, people are actively reproducing them, and then some person on the internet that doesn't understand the subject comes along and says "you're full of shit." We can even disprove the claims being made (e.g. I've explained why the network can't be memorizing the game in another comment). That is literally happening in this thread (GAN Theft Auto is in fact a replication/extension effort). Is that meaningfully different from the anti-vaxxers?

roystonvassey
I think it’s a problem when it turns to - being skeptical for the sake of it.

Not been too long on HN but the top comments on most threads are a contrarian one (and one which I truly appreciate because it provides a different POv) but sadly because it is encouraged through the high upvotes, the crowd tendency is to regress towards this approach, even if sometimes the rigour of the critique is lacking

jcims
>Skepticism is good.

It can be, but its certainly not an unmitigated good. Especially when it leads to aspersions of fraud and conspiratorial thinking (e.g. rasz's comment thread below).

bastawhiz
Skepticism is good when it targets bold claims with vague proof. This is not a bold claim (it's a video demo showing the process) and its proof is not vague (you can inspect the source). Skepticism over something like GPT-2 without more than sample output is good. Skepticism over GPT-2 with a workable demo and source is unhelpful.
andrepd
Funny you mention GPT-2/3, which is by all accounts a glorified chatbot, but which has nevertheless been hyped as one step below AGI by many people.
bastawhiz
Has anyone at OpenAI made that claim?
dkarras
>Is... that a bad thing?

Yes, when it is there for no valid reason, or ridiculous reasons. Skepticism is not a default position you can take like a toddler refusing to eat their vegetables. You need some informed (and non-fallacious) intelligent reasoning behind that. "I'm skeptic about this thing using X because X is so hyped these days" is not such reasoning.

akiselev
"I'm skeptical about this thing using X to do Y because the burden of proof is on people claiming X does Y and historically they have failed to meet that burden"

I don't know what skepticism has to do with ridiculous toddlers - they are almost universally incapable of grasping the nuances of epistemology.

andrepd
Well, it kind of is. Blockchain has been hyped by charlatans as the cure to all world's ills. That means when you read something about blockchain you should be especially suspicious.

Similarly, I've read too many people hyping up glorified chatbots as one step below AGI (see the :o reactions to GPT3), so I'm now extra skeptical about claims about machine learning.

YeGoblynQueenne
Let's agree that "all neural networks are doing" is not "memorizing the dataset completely".

In that case, what are they doing, other than memorising the dataset completely?

If that GAN in the video "is able to generalize various game logic" etc, how does it do that, and what does it mean to "generalize various game logic" in the first place?

sentdex
Heh, yeah, tough crowd I guess. The full code, models, and videos are all released and people are still skeptical.

I feel like 95%+ of papers don't do anything besides tell you what happened and you're just supposed to believe them. Drives me nuts. Not sure why all the hate when you could just see for yourself. I'd welcome someone who can actually prove the model just "memorized" every combo possible and didn't do any generalization. I imagine the original GameGAN researchers from NVIDIA would be interested too.

Interesting @ guided diffusion, not aware of its existence til now. We've had our heads down for a while. Will look into it, thanks!

YeGoblynQueenne
>> The full code, models, and videos are all released and people are still skeptical.

If you're uncomfortable with criticism of your work you should definitely try publishing it, e.g. at a conference or journal. It will help you get comfortable with being criticised very quickly.

alimov
I think he’s pointing out that the “criticism” here is similar to that of a person criticizing a book they’ve never read or even flipped through.
YeGoblynQueenne
Perhaps, but that criticism should be the easiest to ignore. The OP expresses frustration to lay criticism and I expect that even brief contact with academic criticism will make the frustration felt by the OP to lay criticism fade into irrelevance.
ShamelessC
I've been learning about this stuff for about a year now. Your earlier experiments with learning to drive in GTA V were an inspiration for me - because they hit that perfect intersection of machine learning, accessibility in education, and just plain cool.

The pandemic hit and Open AI had released DALL-E and CLIP. I was unemployed and bored with my Python skills and decided to just dive in. I found a nice gentleman named Phil Wang on github had been replicating the DALL-E effort and decided to start contributing!

You can find that work here

https://github.com/lucidrains/DALLE-pytorch

and you'll find me here:

https://github.com/afiaka87

We have a few checkpoints available with colab notebooks ready and there is also a research team with access to some more compute who will eventually be able to perform a full replication study and match a similar scale to Open AI and then some because we are also working with another brilliant German team https://github.com/CompVis/ who has provided us with what they are calling a "VQGAN" (if you're not familiar) - which is a variational autoencoder for vision tokens with the neat trick from GAN-land of using a discriminator in order to produce fine details.

https://github.com/CompVis/taming-transformers

We use their pretrained VQGAN to convert an image into digits. We use another pretrained text tokenizer to convert words to digits. The digits both go into a Transformer architecture and a mask is applied to the image tokens in the transformer so that the text tokens can't see the image tokens. The digits come out and we encode them back into text and image respectively. Then, a perceptual loss is computed. Rinse, wash, repeat. Slowly but surely, text predicts image without ever having been able to actually _see_ the image. Insanity.

Anyway, taking a caption and making a neural network output an image from it has again hit that "perfect intersection of machine learning, accessibility in education, and just plain cool". I don't know if you could fit it into the format of your YouTube channel but perhaps it would be a good match?

codetrotter
FWIW I saw your video a couple of days ago via Reddit and I loved it a lot. Even sent a link to the video to a friend of mine because I think it was a very inspiring and interesting video.

I hope you don't let naysayers get to you :)

fossuser
This is wild - thanks for putting the video together, it’s very cool.
godelski
> I feel like 95%+ of papers don't do anything besides tell you what happened and you're just supposed to believe them.

Honestly I think there's a big problem with page limits. My team recently had a pre-print that was well over 10 pages and we still didn't get everything and then when we submitted to NeurlIPS we had to reduce it to 9! This seems to be a common problem and why you should often check different versions on ArXiv. And we had more experiments and data we needed to convey since the pre-print. This problem is growing as we have to compare more things and tables can easily take up a single page. I think this causes an exaggeration of the problem that always exists of not explaining things in detail and expecting readers to be experts. Luckily most people share source code which helps show all the tricks authors used and blogging is becoming more common which further helps.

> I'd welcome someone who can actually prove the model just "memorized" every combo possible

Honestly this would be impressive in of itself.

danuker
There's the Hutter Prize [1] - memorizing is useful (and arguably intelligent) if it's compressed.

http://prize.hutter1.net/

justinjlynn
Indeed. Novel, efficient program synthesis is still novel, efficient program synthesis even if it's a novel, efficient data compression codec you're synthesising.
HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.