HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
Building the Software 2.0 Stack by Andrej Karpathy

www.figure-eight.com · 207 HN points · 1 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention www.figure-eight.com's video "Building the Software 2.0 Stack by Andrej Karpathy".
Watch on www.figure-eight.com [↗]
www.figure-eight.com Summary
Figure Eight's Artificial Intelligence Resources Center has been created and curated for data science and machine learning teams
HN Theater Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
Talking about progress of various players in self-driving industry is like talking about Schrödinger's cat.

They might or might not have made progress but we won't know until we open the box (i.e. until they launch commercially).

No one has any real data about how Cruise compares to Uber compares to Ford compares to Audi compares to ...

The only comparative data we have are yearly reports mandated by California.

Tesla is not there because in 2017, the last year for which we have the data, Tesla did not test on public roads in California in fully autonomous mode so they didn't have to release those stats (https://www.dmv.ca.gov/portal/wcm/connect/f965670d-6c03-46a9...).

At the same time Tesla is taking this as seriously as one can and are working on making it work. From the DMV report:

"As described above, Tesla analyzes data from billions of miles of driving received from our customer fleet via over-the-air (“OTA”) transmissions. We supplement this with data collected from testing of our engineering fleet in non-autonomous mode, and from autonomous testing that is done in other settings, including on public roads in various other locations around the world.

Through all of this data, we are able to develop our self-driving system more efficiently than only by accumulating data from a limited number of autonomous vehicles tested in limited locations."

Karpathy recently gave a talk about Tesla's progress: https://www.figure-eight.com/building-the-software-2-0-stack...

Jun 10, 2018 · 207 points, 94 comments · submitted by fmihaila
dkislyuk
Treating training datasets as dynamic components of ML systems, along with the corresponding tooling and infrastructure, is one of the most under-appreciated points in the field today. Part of the cause, I think, is due to stigma coming from the academic side that dataset collection is a low-level problem not worthy of serious algorithmic investment (good luck submitting a CVPR paper which improves on labeling speed or taxonomy management in your dataset warehouse). Datasets are considered a given, which also has the interesting side effect of massive hyperparameter and architecture overfitting, evidenced by the recent analysis of CIFAR-{10,100} [1]. In more applied and engineering settings though, it's good to see this area seeing a lot more investment, especially on the tooling side.

[1] https://arxiv.org/abs/1806.00451

mousetraps
> stigma coming from the academic side that dataset collection is a low-level problem not worthy of serious algorithmic investment

Agreed it needs more attention, but - for academia - I think it's more of an incentive issue than a stigma issue. E.g. harder to benchmark the performance of two algorithms if they don't operate on the same dataset. Also to be fair, research into things like synthetic data mitigates the problem, just in a different way.

The paper you cited is interesting. Thanks for sharing. Hopefully that spawns more focus into understanding the subtleties of each dataset. IIRC Kaggle also had issues around generalizability, but for different reasons.

Anyways it's still early on... but we're currently building tools to help solve this problem. In particular simplifying the data collection / labeling process for vision systems. Would love to chat further w/ anyone interested in providing feedback. Email is [email protected]

avip
It's indeed "an incentive issue" only not the one you've mentioned, but the one OP hinted at. Research is focused on what's publishable, hence tenure-trackable, and not on what's useful to solve real-world problems (of course, the two occasionally coincide)
mousetraps
Why so black and white? There are many incentives at play, and many ways to contribute to solving real world problems.

I’ve spent time in both academic research and industry.

Research is not supposed to be immediately applicable. The goal is to produce new knowledge - more importantly shared knowledge. Publishing is not a bad measure of that. Additionally, ability to secure grants provides incentive to focus on problems others want solved.

No incentive system is perfect, but I don’t really see how this is any different from any organization. And I don’t think it’s fair to judge an entire discipline by the negative examples.

avip
I didn’t judge anything nor said something “negative”
mousetraps
Okay fair enough, maybe we’re just talking past each other :).
aub3bhat
I am working on a project to build an ML and CV enabled database for images and videos. It supports visual search/NN as core primitive and developed for scalability using Kubernetes.

https://www.deepvideoanalytics.com

ifelsehow
Check out David Mellis (of Arduino fame)'s "IDE" for collecting and training classifiers on gestures:

https://www.youtube.com/watch?v=5nDCG4vkFP0

hyperbole
The argument the speaker makes is extremely weak - 1.0 software programs are built of building blocks modules, 2.0 software is machine learning - what point is trying to be made here?

We still break problems down when solving problems that can apply machine learning. There's no single "drive the car" neural net, rather the task of driving a car has been broken down into subcomponents, sign detection, pedestrian detection/object in front of the car detection - and then there's logic that encapsulates these classifiers using them as inputs to determine how best to steer and power the cars drive wheels.

Its a bit far fetched to believe programming has fundamentally changed, at least not yet

fredguth
I think the point of the talk was to say it is very different to engineer features vs to make features learnable. This is indeed very different than "software 1.0".

He also points that the challenges are not where academia is focusing.

tejohnso
Software 1.0:

    The car is parked IF it is on the side of the road AND it hasn't moved in X time, AND ... But not if... etc.
Software 2.0:

    The car is parked if the neural net says so.
Okay, if software 2.0 is all about thinking at a higher level, and training the neural net to deal with the details, why is the focus on detail like "is the car parked", or "is it raining", or "where is the lane marker?" Why can't we train "this is good driving" / "this is bad driving".

As a software 1.0 programmer I can see how that seems completely unreasonable, but it does seem to follow the logical direction of the talk.

franciscop
One possible reason (from the video) might be: because software still has to follow the law. Good driving might imply having to push around the boundaries of the law from time to time, while self-driving machines cannot do that. So they have to hardcode those rules. On the other hand, they should also try to counterbalance human bias as well (protecting only the driver on an accident comes to mind).
dmvaldman
You must start with labeled data. It is easier to label pictures of parked cars than it is to label pictures of good/bad driving. For labeled video, the dimensionality is out of reach of ML for now, and would add lag to your system.
eptcyka
Are labelled pictures the only kind of input that neural networks can be trained on?
lebek
Perhaps a workaround for neural nets being black boxes? At least this way you can answer questions like "does the car think it's parked?".
halflings
What you're describing would be called "end-to-end learning", and the way things have been progressing is that a lot of systems cobbled together (e.g speech synthesis, translation, image recognition) have been converted to models (mostly neural nets) that are learned end-to-end. Autonomous driving is not an exception, and they might still get there, but you have to put your pragmatic hat on and do whatever works right now. Benefit of this approach include more interpretable results, potentially improved safety guarantees (e.g you can limit the failure to a subsystem, things like that)
Gravityloss
Most humans generally don't learn driving only end to end. They are taught how the various components of the car work, traffic rules and conventions etc.

Most people also do some experimentation and calibration. Check how much empty space there is after parking, drive a circle on a snowy parking lot until you spin etc.

rasmi
One downside of Software 2.0 as compared to 1.0, at least as of today: it is incredibly hard to debug in the conventional sense. The focus of this talk was mostly on data-labelling challenges. For a company with software as mission-critical as Tesla, I'm disappointed Andrej did not bring up any of the practical challenges around debugging complex models.
goatlover
So much for TDD.
codetrotter
You can still write tests though. Hand a photo or video to the neural network and compare the list of labeled objects and their bounds to a list of the objects and positions that you are expecting the network to identify?
tzahola
Hmmm. It's called validation, isn't it? And it's pretty much how they evaluate machine learning performance since day 0.
icc97
He touched on this at the end where he spoke about trying to write an IDE for Software 2.0.

He did talk about the problems of complex models. They mostly treat the models as fairly fixed (see piechart slide of PhD vs Tesla). Most of the challenges are in labelling data.

rasmi
I watched the whole talk, so I heard the bit about the IDE, but I still think there's a really fundamental ability of being able to walk through the "decision-making logic" of your "code" (in this case, model) that wasn't touched upon. For example, suppose your model misclassifies a barrier and a car crashes into it as a result [1]. How do you debug this? You can say, "Well, it's a data-labelling problem" and go get more data on barriers, but in the meantime people have died. Model testing and debugging should be an incredibly high priority for use cases like Tesla's. That means some degree of interpretability, testing edge cases, simulation, anything to find flaws like this before they occur in real life.

See here [2] for an example of production ML testing practices. I wonder how much of this is in place at Tesla? I would argue they should be at the forefront of work like this. Something tells me they aren't.

[1] https://news.ycombinator.com/item?id=17257239

[2] https://ai.google/research/pubs/pub46555

telltruth
I love Karpathy but this is Karpathy’s worst talk. People have been doing applied ML for a long time in many real world settings and everyone who has done it has gone through the experiences of labeling guidelines getting ballooned, uncleared data, long tail surprises and so on. This is nothing specific to deep learning and nobody called it “Software 2.0”. Most ML practitioners already know it’s all about data and so called IDE to manage the data and predictions have taken many different forms. The talk would be more interesting if Kapathy had something new to say, for example, how to avoid need for long tail outliers by generalizing on more higher level concepts.
syllogism
If you're interested in implementing this sort of workflow, you might want to have a look at our product Prodigy: https://prodi.gy

Prodigy is an annotation tool that makes it easy to use active learning or other model-in-the-loop features. It's a downloadable library, that can start the web server on your local network, allowing 100% data privacy. We've just rolled out experimental image support in v1.5.0.

TimTheTinker
I know this might sound like I’m still in the 1970s, but... it occurs to me that marrying an expert system to these neural systems might help significantly with the long tail of unusual events.

Fundamentally, the problem with these systems (and note, sometimes we say this about people too) seems to be a failure to think logically. Perhaps expert systems with sufficiently detailed logical data sets could enable more complex frameworks for decision making, and allow systems to dynamically create and run judgment calls with the NN classifier IDs and confidence levels as input sources.

colordrops
I'm not involved in automated driving software but that's how I always guessed it worked (in addition to state machines). Is that not the case? Can anyone with experience confirm or deny?
TimTheTinker
A quick google search shows there are scholarly papers (IEEE, etc.) from earlier this year about the use of expert systems in automated driving. Ha, should have googled that idea first.
ccorda
Related blog post from last November: https://medium.com/@karpathy/software-2-0-a64152b37c35
icc97
It's not just related, it's pretty much verbatim taken from that article, images and all.

The only bits added on are references to Tesla.

giacaglia
Thanks for sharing this. This is awesome. It reminds me of something that Minsky said in the 70s: "Computer languages of the future will be more concerned with goals and less with procedures specified by the programmer"
mosselman
Is 'Software 2.0' something that is introduced in the video or is it some bullshit-bingo term that I have missed?
MasterScrat
He introduced it himself end of last year in this article: https://medium.com/@karpathy/software-2-0-a64152b37c35
icc97
Ah basically this entire talk execept for the bits at Tesla was taken from that article.
Intermernet
Add it to the bingo card between "cloud" and "serverless"
skocznymroczny
don't forget "blockchain"
tzahola
The latter.
crististm
You're witnessing the birth of bullshit-bingo terms :)
sheeshkebab
It seems this guy’s software 2.0 looks like some specialized toolset for auto driving/navigation than a general purpose shift.

If all that we get out of it is fancy data labeling tools, incapable of learning anything new by themselves, it’s going to get old real quick.

zawerf
Why can't the process of "iterating on datasets" in the second half of the talk be automated?

For example:

- automatically learning that trolley is a not a great fit for a "car" because they are behaviorally different

- reclassify that cluster as a new entity even if it doesn't know the english term "trolley".

- If it finds the new distinction useful, ping the human that it needs more training examples for that situation

Similar to how a human learner can identify that he's bad at something, figure out what the common problem is, and use that information to focus on what to practice on next.

I am sure he alluded to doing this in his talk but what's the technical term for it?

nl
This is what Active Training is. Basically you trace the boundaries in a classification model and provide more examples along those boundaries.
zawerf
Does this work even if the output format has to change? I can see that helping with class imbalance but not when a useful class is completely missing or in the wrong "format".

For example in the bright smudges vs raindrops case, it might not have a label for the sun but it should be able to identify it as a important dimension in the cases it is getting wrong. Better yet, something more abstract like "smudge illuminated by light" or "bright background" that will be hard to annotate (e.g., how bright is bright?).

nl
Yes(ish).

There are some practical software issues around not knowing the number of classes in advance, but those are "just coding".

There is no reason why introducing a new class shouldn't be as simple as providing additional examples to an existing class.

sjg007
Great talk.

I thought their approach to rain sensors was interesting. The vision AI wiper function seems like overkill when a different system is capable of performing it almost flawlessly. I'd guess that the AI has a dedicated circuit though and becomes upgrade-able.. so those are pluses. But the rain AI system is a good test case and learning task for both the humans and the AI. Hypothetically, if you can't recognize rain drops, how can you recognize cars? It sounds like they learnt a lot trying to make that function so hopefully a lot of the knowledge in building that system generalized/translates over to the rest.

Besides that, modularity is an important design principle. It would be interesting to see how people combine different NN modules and integrate them with 1.0 code. Do you have a NN 2.0 controller? Some kind of self learning system that you train? I would imagine you'd want to take feedback into account at some point probably in Bayesian way.

meken
Interesting thoughts about Software 2.0 IDEs at the end. A desirable property of such a tool (I think) is having a short feedback loop. However, it seems to me that common changes would include re-labeling a significant portion of your data set (e.g. because you realize you need a new class that you didn't think of when you began labeling), and retraining the model. It seems like transfer learning/fine-tuning can alleviate the latter problem some.

Also an interesting take on how complexity has shifted from architecture selection to labeling data. It makes me wonder if there won't be a "Software 3.0" where most of the complexity shifts from creating a good labeling schema to, say, deciding on a good evaluation metric (I think the buck stops here as I can't imagine an AI silver-bullet automatically determining the evaluation metric). Perhaps unsupervised learning will come to the rescue and free us from the complexities of label schema design.

FLUX-YOU
>deciding on a good evaluation metric

Have two or more labeling teams labeling the same stuff so you can reach a consensus or flag the differences and review and figure out why there was a difference.

Humans will be doing this for a while, I think it's worth having large companies (as large as Goog/Amzn/MS/FB) dedicated to the task.

acoye
This reminds me of a relevant XKCD, https://xkcd.com/1838/

Given the shown technique to build a state of the art neural net, I wonder what QA will look like and if we will be able to reach a sufficiently low probability of failure.

5 sigma reliability will be necessary at least in some fields for humans to accept to rely on it (like autonomous driving)

mst
I. um. 52 "required" cookies and video only.

What the hell? 52? Really?

tzahola
They need training examples for their neural nets. ¯\_(ツ)_/¯
imranq
I see a lot of ML startups on a daily basis, and most of this rings true. Software 1.0 is for accomplishing tasks by going from need -> logic -> solution.

Software 2.0 is translating human intuition into machine code directly through advances in machine learning. How well someone’s dataset has been labeled will determine how well a “software 2.0” program will work, since that’s where where the human intuition lies.

w_t_payne
'labelling is an iterative process' -- I learned that the hard way in 2008.

Fortunately, I now have a fairly refined method for managing data, labels etc...

meken
I'm glad you found a workflow that works for you. I'd be interested in hearing about what it looks like.
w_t_payne
Rigorous configuration management of data and metadata mostly. (Primary records kept as text in a version control system, with a copy loaded into a DB for searching).
victorai
Machine Learning might not be the answer: https://mobile.nytimes.com/2018/06/01/business/dealbook/revi...
msoad
If you think about AV systems it's a bunch of input sensors and literally two output numbers (steering and acceleration). This makes a good candidate for a big black box deep learning system.

Google tried end-to-end deep learning AV systems and failed exactly because of the reasons he went through at the end of his talk.

cicero19
Great talk. Excited to watch what Andrej achieves in his career. Keep up the great work!
rich-w-big-ego
As usual the comments around Tesla software really miss some critical points:

  You can't debug a neural net
Obviously not, but you also cannot hand-write code to do what neural nets do, so what's your point? If you can make your neural net 1000x better than a hand-written algorithm, or if Tesla Autopilot is 1000x better than human drivers, that doesn't matter. It's not "playing with human lives" if the humans around the car are 1000x times safer than sleeply, distracted, or violent human drivers.
icebraining
How will we know that the self-driving NN is 1000x better than the hand-written or the human drivers before we unleash it en-mass unto the world?
chronolitus
Test it with a human ready to interfere at any time. (Keeping this human attentive is another problem - see Uber accident)
dboreham
Kool aid being drunk...
rich-w-big-ego
I'm drinking Kool Aid? OK. I'm mostly arguing about the use of Static Analysis and other typical means of verifying programs, and the ineffectiveness of those verification methods on neural networks. People use that as an argument against the safety of NNs.

Set of all computer programs has two subsets.

  S = { s | Static Analysis can be performed on s }

  N = { n | n makes use of a neural network } with N ⊄ S
Let n ∈ N and s ∈ S. There are a certain set of programming tasks

  T = { t | t can be solved with n but not s }
Thus any claim that using n to solve t is "unsafe" because you cannot perform Static Analysis on it is absolute BS, because programs s ∈ S can't even solve the damn problem!
seanhunter
Just using set notation doesn't make an argument precise. You are begging some questions in your definition here.

For example, how do we know that a task solves a particular problem if we can't perform static analysis on it? It may give the appearance of working and then degrade radically under certain conditions. That really matters if you're using it for safety-critical applications and the problem space is large enough that it can't be exhaustively tested.

bsaul
It's a very interesting video, yet am i the only one that think the whole presentation look extremely naÏve and light for something that's playing with our lives ?

I mean, there are definitely huge shortcomings in the "let's have the computer build the model from examples" approaches, and some of them are talked in the video, others are not :

- rare events are hard to train (that's talked about). The problem is that it's a long tail of unusual events.

- models generated can't be statically analyzed. You can't predict what's going to work and what's not. You can only hope. One very striking recent example is in this video : https://youtu.be/w2BWmSBog_0?t=220 . Here you can see that an AI trained on the model of AlphaZero managed to reach 3223 elo rating (so, far beyond human), yet it blundered its queen. And that's just chess, where every rules are written in advance.

- Models don't build human knowledge. That's more of a philosophical point, but imagine a perfect AI built on neural networks after having read all human knowledge. What can it teaches us ? AlphaZero chess isn't able to provide any clue or explanation on why it favors one move instead of another. You can only learn buy playing against it, but that's all. Not even the developpers can tell you what advances in chess theory has AlphaZero made.

None
None
wuschel
I share your worries, as I too think that the abstractions used in the trained models in the talk are inherently leaky, meaning that they do not represent reality to a suffiently high degree (see corn flakes vs ketchup screen wiping action). As we do not (yet?) understand the underlying nature of those "2.0-ish" decision algorithms, we will not be able to make a judgement call when they will work, and when they will fail.
rich-w-big-ego
The main point you offer is that models can't be statically analysed and will do things unexpected that result in loss of human life. Here's my response:

- Humans also do unexpected things, like stepping on the gas instead of the brake. If Autopilot does unexpected things at 0.01% the rate that humans do unexpected things, then it is a huge safety bonus to use autopilot.

- How can we solve image recognition any other way? We must move the needle forward on our technology. If we do not struggle against the adverse side-effects of our software and make it better, we will never advance it and we will be stuck requiring human drivers for all driving tasks.

chedine
>Humans also do unexpected things, like stepping on the gas instead of the brake. If Autopilot does unexpected things at 0.01

But the difference is, one human doing the wrong thing does not mean every human Will do the same thing, given the same scenario. But whereas a software running on all cars will exactly do the same wrong thing given the same input. Sure, three is also an advantage (arguably), as fixing it once fixes all cars but that has other problems in taking on the update.

bsaul
You've got a point, the only problem is that we know when humans are more likely to do errors : stress , night time, long hours, high traffic, etc.

With AI we could enter a world where you'll have 0.0001% of having an accident, but this could happen anywhere anytime in any random situation (such as a specially shaped cloud in the sky).

This is what makes it unacceptable, IMHO.

TimTheTinker
> If Autopilot does unexpected things at 0.01% the rate that humans do unexpected things, then it is a huge safety bonus to use autopilot.

You’re forgetting a variable - the frequency or likelihood of a situation to occur. Go far enough down the long tail and neural AI can get far more deadly than human drivers.

rich-w-big-ego
It's unclear what your argument is
TimTheTinker
I’ll try to clarify. In my opinion, the weakness of neural systems is their inability to deal with input for which they have comparatively little or no training. There’s no way around that, except by introducing structures or systems outside the neural nets themselves that provide logical frameworks for dealing with rare events. (And I don’t just mean a bunch of logic in code. Expert systems are one potential approach, for example.)

Your point was that mosern autonomous driving systems can drastically reduce fatalities over human drivers—-and I agree, but only for circumstances for which the car’s neural systems have been well-trained. But the systems ought to be able to handle ~99.99% of the types of circumstances gracefully before most of us will trust them to safely drive us around.

zzzcpan
I didn't watch the video, only the slides. But got the impression with this approach it would be pretty much impossible to create safe and reliable self-driving technology. At some point chasing a long tail of unusual events will simply become too costly to continue, but the technology still won't be safe enough to use in the wild.
FLUX-YOU
Self-driving will always be held back by municipalities doing weird things with their traffic lanes and lines (and cones and hazards and signal trucks and road construction) as long as the AI puts as much faith in the lines as it does. And you'll never track down everything because construction is constant in the world, with the possibility of introducing a novel road-weirdness.

It's a road system designed for humans, and with current tech, you'll always have issues chasing the long tail down. Though I'm fairly sure self-driving is still statistically better than humans behind the wheel, especially distracted humans as the information age has made us.

The more immediate solution is white-listing safe roads that have sane paint lines and are relatively straight-forward roads. That way, the chance of the AI getting it wrong is drastically reduced. This covers most highways where you get the most out of self-driving systems anyway. Bonus points if you can automate this decision-making with recent satellite imagery.

icc97
> as long as the AI puts as much faith in the lines as it does.

The need for lines is only a temporary crux. AI systems will continually improve until the white lines are guidelines only. There are lots of other factors in an image that can be used to deduce where the lines should be.

But there will have to be handling of uncertainty which humans have to do too, simply just slowing down (not sharp braking) will help most cases.

phaedrus
Maybe the responsibility (and liability) for accidents due to weirdly designed or badly marked lanes would have to fall upon the municipality. You could make an analogy to trains: if a local government commissioned a rail line that included a junction that can't be navigated without derailing 1% of the time, you would not blame the trains for not being able to deal with it.

Another analogy could me made to data entry: you could allow free-form text input and try to process all of the long tail of unexpected inputs, or else set out a format that the data has to follow and enforce validation at the input.

mrfusion
What is the best practice for handling infrequent data points like the blue stoplights he mentions?
sjg007
Grayscale? Dunno, maybe if they are identical to some other stop light in meaning but have different colors you would include them their so hopefully the NN figures out blue or green (or something). You might be able to copy paste the traffic light over other images as well.
isaac_burbank
Two that come to mind are:

- Using data augmentation to turn the smaller amount of examples into enough samples for appropriate representation within the dataset.

- Add a weighting coefficient to the model's cost function to make misclassifying these examples more expensive.

Note: you can do serious harm to your model with either of these approaches if you don't know what you're doing. The safest solution is to collect more examples of the infrequent class.

iosDrone
Such a shame to see someone so talented working for a company that is lightyears behind Waymo and GM in autonomous driving and that is going to go bankrupt.

I believe the last three directors of autopilot have quit in the last 3 years or so. And that's in addition to the mass exodus of executives, some of whom left millions of dollars of stock options on the table.

nicodjimenez
hahahahaha
justicezyx
Not sure what you mean.

Talent is more pricy in places that lack them. Waymo can tap into googles vast amount of talent pool, this talented person would be worth less in waymo for sure.

outside1234
Saw this at Spark+AI - highly recommended
Animats
Be afraid. Be very afraid.

If this guy was working on adtech, that would be fine. That's very error-tolerant. But this guy is working on automatic driving.

The basic mindset here is to run image classifiers to classify the objects in an image, then use the classifier output to decide what to do. There's no geometric analysis. That's scary. Classifiers just aren't that good. See the earlier article today about adversarial attacks on classifiers. Classifiers pick obscure details of images and use them to make decisions. Nobody seems to know yet how to prevent that. This problem shrinks with larger data sets, where hopefully the irrelevant details cancel out as noise, but, as the speaker points out, that breaks down when you have few training cases of certain situations.

The Google/Waymo approach is to get a point cloud with LIDAR and radar, profile the terrain and obstacles, and figure out where it's physically possible to go. That's geometry based. In parallel, a classifier system is trying to tag objects in the scene, which feeds into a system which tries to predict what other road users are going to do.

With that approach, a classifier result of "not identified" is fine. The system will detect and avoid it, or stop for it, and make conservative assumptions about its expected behavior. Chris Urmson, in his SXSW talk, showed video of a woman in a powered wheelchair chasing a turkey with a broom. This was not identified by the classifier, but it was clearly an obstruction, so the vehicle stopped for it. That's essential here. It has to do something safe with unidentified or mis-identified objects.

At Tesla, Musk insisted that this could be done with a camera alone because humans can drive on vision alone.[1] So Tesla has people trying to make camera-only driving work. Not very successfully so far.

"November or December of this year (2017), we should be able to go from a parking lot in California to a parking lot in New York, no controls touched at any point during the entire journey." - Musk, in April 2017. This guy is saying what Musk wants to hear.

[1] https://blog.ted.com/what-will-the-future-look-like-elon-mus...

Fricken
I'm to presume that they aren't trying to do SLAM purely with classifiers, though with the evidence of Autopilot's performance thus far, I wouldn't put it past them. That aspect definitely isn't in Karpathy's wheelhouse.

There's a great presentation from Gabe Sibley (now with Cruise) on camera based SLAM, or "Mobile Robot Perception for Long Term Perception in Novel Environments" that explains quite well the principles underlying how it works.

Gabe concludes at the end that probabilistic perception and modelling using cameras isn't reliable enough and that Lidar is likely necessary for safety critical systems such as autonomous vehicles:

https://vimeo.com/88273779

unityByFreedom
This talk sounds like an intro to machine learning for images. He seems surprised (@ 20:00) that building the dataset takes more of your time in the real world than it did in academia.

If you can't detect rain with reliability, you're not ready for full self-driving.

ckastner
> This guy is saying what Musk wants to hear.

Apparently Musk heard it (or something similar), because Tesla is rolling out the first self-driving features in August: https://news.ycombinator.com/item?id=17282006

None
None
chronic288
> Apparently Musk heard it (or something similar), because Tesla is rolling out the first self-driving features in August

Self-driving features? Sorry, there are no some self driving features. You either have full self-driving or you don't. If you don't (which Tesla doesn't), how the hell can you call it self-driving?

inteleng
No true Scotsman. Classic troll comment.
CYHollander
Your comment appears to contradict itself. Using the term "full self-driving" implies that you'd recognize something short of that as "partial self-driving". Such a system would presumably have some, but not all the features of a "full self-driving" car.

In any case, it's quite easy to imagine a plausible meaning "some self-driving features": perhaps these features enable the car to drive itself [without oversight] in some, but not all situations (<i>e.g.</i> on highways, but not in cities).

agumonkey
Happy to see Waymo physically sound approach, who else is doing that ?

Minute point of irony, Musk seems to act more and more like geohotz which he criticized.

Ryudas
In April 2017 elon was planning to have 20 or 10k of model 3's per week by now. That was not the case as we all know. Maybe we should be receptive to the change of priorities that occur when a massive problem such as that one happen. The fate of Tesla as a company is very much ingrained on model 3 production, and just saying that this demonstration has had a massive delay in a vacuum, without the context of the situation is disingenuous. Also I would like to just put forth the question of who here drives everyday. The human body does not, as far as i'm aware, have a volumetric sensor such as lidar. I'm not saying that the use or not of the information isn't important, but we certainly can do it just fine. It's shortsighted that we're dictating the tools we should use to solve a problem before the fact. I shudder to think of the day where I reject an implementation of a system based on the tools used, and not actual performance. To say an approach is wrong because it doesn't use geometric analysis is short sighted, that Classifiers aren't that good is shortsighted.
jacobush
But the human has lots of hardware built in for adjusting aperture, edge detection, spectral imaging. That's just scraping the surface. Though its standard configuration is stereo, giving it substantial redundancy as well as automatic redundancy It also has helper hardware to adjust for vantage point - it can even shift spatially in 3 dimensions up to a meter in any direction to find new vantage points, feeding all sorts of data (including mechanical linkage positioning and angular data) into the integrating system, giving a deluge of data apt for world building.

Edit: not to mention state-of-the-art sensor fusion, with atmospheric composition analyzer, thermometer, multiple high resolution vibration sensors, stereo microphones and an advanced vehicle to vehicle signalling system backed by heuristic fallbacks.

One or two fixed position monochrome, low resolution forward facing cameras such as in a Tesla just won't compare.

I'm not saying it can't be done, but the Tesla engineers have a much harder task than if they'd had human level sensors. The Tesla "brain" is more like a human looking out of a tank through a periscope in a tank or something... and the periscope can't even be turned.

tzahola
>One or two fixed position monochrome, low resolution forward facing cameras such as in a Tesla just won't compare.

I'm pretty sure humans could cope with that. https://www.youtube.com/watch?v=-CITIXlw_T4

Animats
I'm surprised that Tesla isn't doing stereo or trinocular (3 cameras) to get depth. It's just cameras, and it works reasonably well. You can use cameras the width of the windshield apart to get a wide baseline, which increases the useful range. But no. Although Tesla does have multiple cameras, they never mention stereo vision. Mobileye is depth from motion, and apparently, so is Tesla's in-house system.

(3-camera depth is more reliable than 2-camera. Many of the ambiguous situations for two cameras can be resolved with three. Especially if it's 3 cameras in a triangle, not a line.)

unityByFreedom
> Mobileye is depth from motion, and apparently, so is Tesla's in-house system.

Monocular depth perception w/out motion is a thing too, e.g. [1], however I doubt it is good enough for safety-critical systems like self-driving cars.

[1] https://github.com/mrharicot/monodepth

HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.