HN Books @HNBooksMonth

The best books of Hacker News.

Hacker News Comments on
Translucent Databases

Peter Wayner · 6 HN comments
HN Books has aggregated all Hacker News stories and comments that mention "Translucent Databases" by Peter Wayner.
View on Amazon [↗]
HN Books may receive an affiliate commission when you make purchases on sites after clicking through links on this page.
HN Books Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this book.
The great irony is that actual privacy requires unique identifiers, like RealID or equiv.

GUIDs unlock the Translucent Databases achievement, actual per field encryption of PII data at rest. TLDR, clever applications of salting and hashing, just like with proper password storage. https://www.amazon.com/Translucent-Databases-Peter-Wayner/dp... http://wayner.org/node/46

I was utterly against RealID, until I figured this out. Much chagrin. Super embarrassing.

Source: Worked on both electronic medical records and protecting voter privacy. Did a translucent database POC for medical records, back in the day.

If there's another technical solution, I haven't found it.

But I think to your point, people generally don't want the sensitive data being collected in the first place. I don't have an answer for that.

Two tangential "yes and" points:

1)

I'm not smart enough to understand differential privacy.

So my noob mental model is: Fuzz the data to create hash collisions. Differential privacy's heuristics guide the effort. Like how much source data and how much fuzz you need to get X% certainty of "privacy". Meaning the likelihood someone could reverse the hash to recover the source identity.

BUT: This is entirely moot if original (now fuzzed) data set can be correlated with another data set.

2)

All PII should be encrypted at rest, at the field level.

I really wish Wayner's Translucent Databases was more well known. TLDR: Wayner shows clever ways of using salt+hash to protect identity. Just like how properly protected password files should be salt+hash protected.

Again, entirely moot if protected data is correlated with another data set.

http://wayner.org/node/46

https://www.amazon.com/Translucent-Databases-Peter-Wayner/dp...

Bonus point 3)

The privacy "fix" is to extend property rights to all personal data.

My data is me. I own it. If someone's using my data, for any reason, I want my cut.

Pay me.

All PII must be encrypted at all times. At the field level.

Translucent Databases explains how.

https://www.amazon.com/Translucent-Databases-Peter-Wayner/dp...

http://wayner.org/node/46

Source: Was once an insider. Created and ran electronic medical record exchanges.

re: IRMA

I've been thinking about negotiated disclosure since the mid 90s. Back then we called it faceted personas. In an effort to protect oneself from aggregators of demographic data.

I've gotten nowhere.

TLDR: 99% certain deanonymization will always prevail.

Not saying I'm right. I'm not particularly smart or insightful. I just try to apply ideas foraged from academia to real world problems. Alas, the times I've slogged thru the maths and algos, I'm always left befuddled. I'm just not clever enough to figure out all the attack vectors. (I'd make a terrible criminal.)

--

re: Privacy by Design

That means Translucent Databases. Where all data at rest is encrypted. Just like you salt and hash password files.

This book details clever applications of that strategy to real world problems:

https://www.amazon.com/Translucent-Databases-Peter-Wayner/dp...

Mea culpa: I'm still unclear how GDPR's tokenization of PII in transit works in practice. Anyone have some sample code? And I still don't see how it protects data at rest.

--

Source: Design, implemented, supported some of the first electronic medical records exchanges (BHIX, NYCLIX, others). Worked on election integrity for a decade, including protecting voter privacy (secret ballot).

--

Prediction: Accepting de-anon will always win in the long run, we'll eventually also accept that privacy has a half-life. To adjust, we'll adapt differential privacy algos to become temporal privacy.

This rule clarification is good in that it acknowledges the participation of third parties. Yay!

But it doesn't change the fact that HIPAA is just kabuki (for show).

I worked on some of the first RHIOs (regional health information exchanges) on the market. We all had yearly HIPAA training. All platitudes and very little actionable advice. As devs, we all had full access to millions of patients.

Accidental disclosure is inevitable. So many participants, so many systems, the weakest link and all that. We all figured it was a matter of time before something bad happened.

I care about privacy. A lot. I researched what's what, legal and technical. Because I want to do a good job. And I have skin in the game (my own medical history).

The month I started on the electronic medical records project, a local hospital had just settled for allowing 100,000s of complete patient records leak. (A stolen laptop.) So I contacted the lawyers on both sides. Verdict? Try harder next time.

Pretty much nothing has changed (improved) since. Except the disclosure requirements, I guess.

This is a long topic, so I'll just skip to the conclusion:

We will not, cannot protect patient privacy until we assign a universal unique identifier for every single person. This means something something akin to RealID.

To protect patient privacy, we need to encrypt the data. But that's not feasible without globally unique identifiers. Because patient demographic data is dirty and mismatched record can be fatal. So you have matching algorithms that have to look at the original plaintext. And the heuristics are wrong enough that the process requires human oversight.

If we (the USA) had unique identifiers, then we could transition to translucent database designs. That'd be very cool.

http://www.amazon.com/Translucent-Databases-Peter-Wayner/dp/...

About once a year, I go to a "future of healthcare IT" event. I desperately want to hear that patient privacy is being addressed. Hope springs eternal. Mostly, no one knows what I'm talking about. Until you've worked on the systems and tried to actually implement privacy safeguards, people just don't grok the problem domain, and continue to believe it's a trivially solvable problem.

jaypl
What is this "future of healthcare IT" event? If it's open to the public, then it sounds like something I might be interested in attending.
specialist
Most of the recent events have been mixer events sponsored body shops (recruiters) hoping to get some business. Put "HL7" and "ICD-10" in your resume and I'm sure you'll get called.

The last "meaty" one I attended was a local MIT Enterprise Forum featuring local healthcare IT professionals. The panel had a device startup, a personal healthcare portal, some consulting goons, and the CIO for a local HMO (she was the only one who made any sense).

I just attended a legislative action committee meeting for my state. Our elected reps touched on what was happening at the state level to implement ACA (aka Obamacare, mostly "meaningful use" stuff and patient eligibility, really basic stuff).

Every state now has a board of some sort for implementing their state and regional healthcare information exchanges per ACA. The meetings are public. That's probably a good way to find your local players.

It's been a few years since I lurked healthcare IT blogs. There might be some good ones to follow.

askingaQ123
I can't find any contact info for you; do you mind posting your email/website? I'd love to chat about your experience in health IT! Thanks. (Not a recruiter…)
specialist
Updated my profile.
newman314
There is no dependency of having unique identifiers in order to be able to encrypt data.

A patient could have multiple identifiers that's only known to him/her.

Think a model like 1Password.

specialist
Not for encryption, for data interchange.
In practical terms, just use HTTPS with rooted certs. It's not expensive for basic usage. And, if you're just doing a for-fun project you can always use self-signed certs.

If you want to go deeper down the rabbit hole check out SRP: http://srp.stanford.edu/

In terms of dealing more securely with data on your server, check out the book, Translucent Databases ( http://www.amazon.com/Translucent-Databases-Peter-Wayner/dp/... )

tptacek
Stay away from SRP. Browsers don't support it natively, and there are (so far as I can tell) no peer-reviewed libraries for it for the major web stacks. SRP is easy to get wrong.
johnm
I did say to use SSL in practice, right?

Reading about SRP would help solve much of confusion people seem to be having in many of the discussions going on in this thread.

HN Books is an independent project and is not operated by Y Combinator or Amazon.com.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.