HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
Haunted by Data - Maciej Ceglowski

O'Reilly · Youtube · 11 HN points · 3 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention O'Reilly's video "Haunted by Data - Maciej Ceglowski".
Youtube Summary
In his Strata+Hadoop keynote, Pinboard.in founder Maciej Ceglowski draws a parallel between the data industry and the troubled nuclear energy.

Watch more from Strata + Hadoop NYC 2015: https://goo.gl/UunGPH
Visit our curated data topic page: http://oreilly.com/go/data
Visit the Strata + Hadoop World website: http://strataconf.com/
Don't miss an upload! Subscribe! http://goo.gl/szEauh
Stay Connected to O'Reilly Media by Email - http://goo.gl/YZSWbO

Subscribe to O'Reilly on YouTube: http://goo.gl/n3QSYi
Follow O'Reilly Media:
http://plus.google.com/+oreillymedia
https://www.facebook.com/OReilly
https://twitter.com/OReillyMedia
HN Theater Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
Sep 23, 2020 · 1 points, 0 comments · submitted by kevlar1818
Aug 08, 2019 · 1 points, 0 comments · submitted by notduncansmith
https://youtu.be/GAXLHM-1Psk?t=945 - I think this commentary by Maciej Ceglowski rings true here.
nv-vn
Great vid! Thanks for sharing
timonovici
Ah, it comes a bit later, at 19:00 -

"At that point people who are angry, mistrustful, and may not understand a thing about computers will regulate your industry into the ground. You'll be left like those poor saps who work in the nuclear plants, who have to fill out a form in triplicate anytime they want to sharpen a pencil."

I love how some of the tech industry is beginning to see data as a liability rather than an asset. It dramatically reduces the ability for government mass surveillance for two reasons:

1. If companies only collect what they need (to reduce their liability), governments can't demand more than that (or even hack in to get the data illegally).

2. If the industry culture is to limit data collection, governments can't just say, "Well every company does it, so why can't we."

There's a wonderful talk, Haunted By Data, that covers a lot of the societal downsides of treating data as an asset. Highly encourage watching/reading.

Text: http://idlewords.com/talks/haunted_by_data.htm

Video: https://www.youtube.com/watch?v=GAXLHM-1Psk

baxtr
Just to be contrarian: I’m not so sure if that’s a good thing... Data can be used for great things, e.g. longitudinal data in healthcare. I think seeing data as a liability might reduce the speed of progress
rhizome
Data can be used for great things, e.g. longitudinal data in healthcare. I think seeing data as a liability might reduce the speed of progress

I think the word "can" is acting as a euphemism or term of elision here, since "longitudinal data in healthcare" includes things like the Tuskeegee Syphilis Experiment.

diafygi
In the talk, there's a parallel drawn between the nuclear industry 60 years ago and big data now, where the nuclear was originally touted as a miracle cure for everything, then disasters happened, then it never really got over the stigma despite its huge potential.

Society decided, for now, the upsides aren't worth the downsides.

Oil is another parallel where society is currently just at the point where we are starting to not value the upsides over the downsides.

Scooty
Seems like oil and nuclear energy are different because both can be replaced with alternative sources of energy. What are the alternative sources of user data?
r00fus
Why is data a requirement to keep?

Currently has value with externalized downsides.

tankerslay
Paper records, human memory....

Comparing growth in data storage versus energy usage per capita is interesting.

Even if you look back to the founding of the U.S., the change in energy use per person is actually only a few fold, definitely less than an order of magnitude.

Harder to compare quantity of data storage but the change would seem much larger. How much data is there, per U.S. person?

daxorid
> governments can't just say, "Well every company does it, so why can't we."

Odd. That's precisely what's happening with GDPR. While the EU governments are demanding that private companies limit collection and use of data, the intelligence arms of these same countries (I'm looking at you, GCHQ) and their FVEY partners are doing everything they can to hoover up and store every single bit of data on the world population, including their own citizens.

And somehow, we think this is fine. An entity with the power to disappear you and render you to black sites can have all the data on you they wish. But Facebook determining that there is an 83.7% chance you are stressed serving up an ad for vacation rentals is completely verboten.

It's absolutely mad.

giancarlostoro
Wonder what will happen if the US passes laws enforcing companies to retain data for up to a year minimum for digital forensic integrity reasons of cyber criminal cases. Also maybe on the other hand GDPR is good for VPN services to be much more transparent.
amelius
I suspect there will simply be a direct pipe for logging to the government, where companies don't need to retain anything.
giancarlostoro
GDPR requires you to make that kind of information available.
dahauns
If I'm not mistaken, GDPR has the usual exemptions in place for that kind of usage (national security etc.), hasn't it?
JumpCrisscross
Do we really believe large Chinese tech companies will be complying with GDPR?
chapium
The thought anyone believes that makes me chuckle, thanks.
confounded
Are there any that do a lot of business in the EU that you’re thinking of? I can think of lots of hardware, but not many traditional consumer Internet companies. Maybe Ali Baba for SMEs?
Operyl
Tencent to name one.
13years
However governments often have undisclosed access not even known to the holder of the information.

I doubt GDPR will significantly affect information to which the government wants access

Can I ask the government to delete all information on me?

pjc50
There will undoubtedly be a lawsuit again soon over whether companies are allowed to transfer data out of the EU to US government warrantless requests.
risotto_groupon
I hope its Greece so we can all be as transparent as possible about the real issue here...

(Democracy)

bleachedsleet
I concur and would like to add that this view of data collection has become common only because of the slew of breaches. Hackers leaking massive customer databases has forced companies to review their collection policies because they don't want to deal with the fallout, technically, politically and otherwise. This is true of the hacker ethos, using radical, often criminal behavior to point out glaring flaws that others become complacent with. Of course, these days that goal is secondary if it is considered at all, to the goal of financial gain and that's a damn shame.
None
None
zhte415
When young in my career, which was finance/banking, my 'Head' (as was the organisational naming structure) sent an email re-forwarding and re-emphasining data retention policy.

I was inclined to never delete an email rather than comply with 'your inbox should not exceed 500MB'. That was 50 emails a day then, my inbox now exceeds 500MB on a daily basis.

Sure, you can save archive locally or on a shared drive.

But the key idea... we don't want customer data because of liability. If we're storing it for a specific purpose, be that regulatory reporting or clearly defined analytics, fine.

Before the term 'big data', simply 'lots of data' sounded nice.

But don't let it hang around. At some point it will be a liability. You're serving customers. The key part of this is keeping their data safe. Not having it anymore (or having it in an non-accessible lock-away auto-delete repisitory for legal purposes) is more than good enough, to properly store and manage it is what's better.

Oct 27, 2017 · 4 points, 0 comments · submitted by anon1253
There's a great talk, Haunted by Data, by Maciej Ceglowski about how tech companies are making a mistake by wanting to collect more and more data on their users, because governments are just going to want to come in and take it.

    I want you to go through
    a visualization exercise
    with me. Really imagine
    it.
    
    Nixon's in your datacenter.
    He's got his laptop open.

    He's logged in! He's got
    root! What does he find?

    If you didn't break into a
    cold sweat at the thought,
    congratulations. You are a
    good steward of data.

    But if Tricky Dick in your
    data center scares you,
    then consider what you're
    doing.
Slides: http://idlewords.com/talks/haunted_by_data.htm

Video: https://youtube.com/watch?v=GAXLHM-1Psk

sleepingeights
Horrible analogy, as what did Nixon do with data towards citizens? It'd be more like the FBI, Hoover, CIA, NSA, etc... who have the capacity to bend the data to invent facts to fit some crime and then act on it with force without fear of repurcussion/retaliation.

Also, if there were a proponent of this kind of collection, wouldn't it be fine for a company like Google, Facebook, Microsoft, etc... if someone with the position of US President wanted to "sit in the datacenter with an open laptop"? Because then they'd be using data as a currency, which they are already very comfortable and capable of doing to meet their own and the "gov" agendas.

hordeallergy
Both governments and hackers(foreign governments). Just a bigger target all round.
vidarh
But convicing people that their own government can come in with a subphoena is easier than convicing people that their security just isn't likely to be good enough to stop each and every external hacker that tries.

No matter how many examples we get of far better funded companies getting hacked.

ergot
The IPB is slightly tainted by the Snowden disclosures. It's an interesting thought experiment to apply Snowden's revelations to newly implemented surveillance measures by any government. Snowden produced documents which gave us all an intimate understanding of the mechanics and operational details of the NSA & GCHQ. It is clear that the apparatus is already in place for spying and is only a quick click away from being galvanized by broad and sweeping laws which allow such apparatus to operate out in the open.

I think the masses are not scared enough to encrypt their communications and that's why such an apparatus has crept in so brashly and abruptly, sort of a 'surveillance creep'.

The moment the masses are conscious of the fact we are going through our second 'crypto war' is also the moment they might encrypt. Not that crypto is some munition they can use, as is wrongly spouted by the cypherpunks (IMHO), but that crypto can provide viable amounts of privacy for their needs and it doesn't need to be absolute privacy as spouted by the 'go dark' movement. Just enough that I can surf the web without my eyeball hours being monetized or that the pressure cooker I am interested in buying is not a potential tool to be used in a terrorist attack several weeks later.

walrus01
s/nixon/poindexter/

https://www.google.com/search?num=100&client=firefox-b&q=tot...

SixSigma
Let's go the whole hog.

s/nixon/goebbels/

and there's no need to imagine it

http://www.ibmandtheholocaust.com/

_0ffh
Or the story about how the Dutch thought it would be a swell idea to have the religious affiliation of all citizens in their government files. Nowhere else the rounding up of the Jews went as smoothly as there, once the SS got their paws on those files.
SixSigma
According the the book, the French too, I can't quite remember the story but the guy in charge of the data managed to delay and confuse so not quite as smooth.
mtgx
If the recently announced Yahoo data breach (which affects a lot of other sites as well if users re-used their passwords, and we know many did) taught us anything is that data is a liability not an asset, and that's how both governments and corporations should treat it. The government at least should've learned that with the OPM hack.
guitarbill
Except that it's only hurting Yahoo because they're trying to sell. Counterexamples include the UK telco TalkTalk, who has managed to increase users and revenue despite the lack of basic security features.

It just doesn't matter that much, because the inconvenience is minimal for the average person, so the backlash is minimal. I mean, most people cannot even be bothered to use different passwords (!). That's how low the bar is. Say something gets hacked, unless you experience identity theft, nothing happens. Banks will reverse any fraudulent charges. Not even a minor inconvenience. So people won't learn and won't care. Brand damage is minimal. Not worth spending on infosec if the maximum fine is less than your CEO earns in a month. Meh.

None
None
CaptSpify
Wouldn't that be the ideal place for companies? If a government is dependent on a company to collect data, wouldn't that government support the company in hard times? Sure, if it was a choice between surviving and throwing the company under the bus, the government would choose the later, but if given the choice, wouldn't the government try to keep one of it's most powerful tools?
cryptarch
It works in Russia, China, and less so (perhaps) America (e.g. telcos). It's similarly practical for a government to have unlimited intelligence on its populace.

That doesn't mean its good for anyone not working for the government, explicitly (as an employee or contractor) or implicitly (as a data collecting company which can be forced to share).

IMO a) collecting data on users and b) doing it in a way that does not preserve user privacy makes you complicit to mass surveillance.

Edit: directly -> explicitly

CaptSpify
> That doesn't mean its good for anyone not working for the government directly or indirectly.

I totally agree. I'd argue that it's objectively bad for anyone not working for the government. But I'm talking about from the company's point of view.

pjc50
And it was "good" for IG Farben to supply Zyklon B to the Nazis. Until they lost and several company executives served prison sentences for crimes against humanity. Amorality certainly pays the bills.
Jan 02, 2016 · 4 points, 1 comments · submitted by timonovici
timonovici
I stumbled across this talk, and it really left an impression, although I'm just a humble web developer.

At the end of the presentation, he talks about sampling and using transient data, rather than storing and mining it. Are there any academical papers that support his statement, that a little fresh data is better than a lot of it, with a big chunk being "stale"? What's your experience?

Nov 04, 2015 · 1 points, 0 comments · submitted by cpymchn
HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.