HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
I just want to serve 5 terabytes.

Benjamin Staffin · Youtube · 1130 HN points · 2 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention Benjamin Staffin's video "I just want to serve 5 terabytes.".
Youtube Summary
This video dates back to about 2010, and is the origin of the phrase "I've forgotten how to count that low" that was recently referenced in a blog post and on hacker news. (https://acesounderglass.com/2021/10/20/i-dont-know-how-to-count-that-low/)

Even more related discussion:
https://news.ycombinator.com/item?id=29082014 (2021-11-02)
https://rachelbythebay.com/w/2012/04/06/5tb/
https://rachelbythebay.com/w/2021/10/30/5tb/

Original title: "Broccoli Man (Production Issues)"
Original author: Jon Orwant (https://www.youtube.com/channel/UC0kuyh1g0DOQiaoxb4zIEhg)
HN Theater Rankings
  • Ranked #3 this year (2021) · view
  • Ranked #13 all time · view

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
Nov 02, 2021 · 1130 points, 513 comments · submitted by raldi
raldi
Background: https://rachelbythebay.com/w/2021/10/30/5tb/

This video was hugely influential on changing the way Google does internal tools and operations.

inoffensivename
It was hugely influential in identifying the frustration of getting things done at Google. In my experience it's even more true now than it was back then, the number of things you have to deal with has just grown. I've been at Google since 2006 and I feel like I'm losing my mind with all the complexity.
jez
Out of genuine curiosity, what keeps you at Google for 15 years despite perceived increase in complexity to getting things done? I'm wondering whether the answer is like "there's a lot of complexity, but I like the work I do more" or "I like the people more" or some other reason.
inoffensivename
money

lots and lots of money

Dangeranger
I think they call them “golden handcuffs”.
marcyb5st
Googler here.

I think the technical term is Golden cage :)

oblio
Heh, that's <<if>> they want to leave.

If they don't, they call it "comfy job with awesome paycheck and not a lot of pressure" :-p

dTal
(this comment now obsolete)
oblio
There, I fixed it!

https://cheezburger.com/5821507840/his-and-hers-shower-head

dTal
You know, I've made similar remarks on HN before, but you are the first person to actually edit their comment. Amazing. Now I feel compelled to edit mine...
throwawayamz27
This is not true, Google is no longer capable of changing anything internally, it's like a government in that respect, it can make a new process/program (that might on the surface make some things look better) but underneath the old system is still there.
kevin_thibedeau
Well they can launch new semi-independent initiatives under the Alphabet umbrella. They just need to use that to create a web search engine. Something sorely lacking in their current portfolio.
sicromoft
See also the recent discussion here of "I Don’t Know How To Count That Low": https://news.ycombinator.com/item?id=28988281
quelltext
How did things change?
compiler-guy
The most obvious change that came from this video is that Google abandoned the Borgmon readability requirement. At Google, every change needs approval from someone who has passed a detailed style-guide review process in the given language.

Now over multiple changes, it used to require one fairly big one. It's still a pain in the languages that require it--which is all the main ones, but very few of the niche ones.

Many other things changed as well. Much of what the video complains about got automated and better documented. But the company has grown so much, and the product lines have diversified so dramatically, that there are still plenty of places to complain about the overhead.

dekhn
Context on borgmon: https://sre.google/sre-book/practical-alerting/

borgmon was a truly weird system.

mikelward
It still is, but it used to be, too.
essnine
an unexpected mitch hedberg line appeared from the tall grass!
StillBored
I've never understood places with rigid style guides policed by people. Its idiotic, because we have computers and in places like google presumably a fair number of them know basic parsing/lex sufficiently that if they can't make a tool like clang-format that automatically reformats on save/commit/whatever then they can use a tool like clang-tidy to toss warnings during a development/CI/whatever phase.

Putting people in charge of formatting/style is just an excuse for wasting time bikeshedding, either the code is wrong and a tool can tell you, or its not wrong.

kccqzy
That's not what readability is. There are plenty of automated tools that will give you results from running lint, ClangTidy and other tools. Readability is mostly about structuring your code well to be easily read. It's about architecting your code within a single file. It's about telling a junior SWE who reinvented the wheel use a library function he/she didn't know about instead.
StillBored
So the rules can be codified sufficiently to test people on, but they can't be codified for a computer?

The only one that sounds more difficult to codify is telling people of the existence of duplicate functions. But as someone who contributes to the linux kernel, I can tell you right now that the only way that works reliably is to have a very large pool of reviewers. Very experienced engineers frequently miss what people are doing in other parts of the source base, the name might not be what they expect, etc, etc, etc. In the case of linux there are a fair number of duplicates, or similar functions, and people write coccinelle patches to replace them on a fairly regular basis after they have been in the kernel for years.

So, I doubt giving someone a formal gatekeeper flag, really helps vs just having wider change review.

anyfoo
> So the rules can be codified sufficiently to test people on, but they can't be codified for a computer?

I mean, given the halting problem alone, and the absence of general AI, yes, I'd say that...

I'm not sure what to tell you otherwise. I've worked at a place with such strong coding style guidelines, and even though I personally did not agree with every individual point, I accepted the tradeoff because being able to count on consistency in a very large code base was extremely helpful.

gravypod
> So the rules can be codified sufficiently to test people on, but they can't be codified for a computer?

Small note: readability isn't a test or quiz you take (asterisk). It's obtained by merging code in the language you want readability for. If you merge code for a language often and the reviewers have very few style-based questions for the code then you will get readability fairly quickly.

> The only one that sounds more difficult to codify is telling people of the existence of duplicate functions. But as someone who contributes to the linux kernel, I can tell you right now that the only way that works reliably is to have a very large pool of reviewers. Very experienced engineers frequently miss what people are doing in other parts of the source base, the name might not be what they expect, etc, etc, etc.

A better example would be knowing when you should use `const std::string&`, `std::string_view` or `char*`. Example: https://abseil.io/tips/1

The best readability advice I have recieved has been:

1. Direct "I was confused by X" or "The recommended way to do A is using B", etc

2. Reasoned: "std::string_view is more efficient and clearer in intention than char ptr, it also improves type safety as it is read only and clear about ownership"

3. Linked to source material where examples are given totw or other examples in the code.

kccqzy
> Very experienced engineers frequently miss what people are doing in other parts of the source base, the name might not be what they expect, etc, etc

Very true. Readability can't help with that, nor is it designed to. It's mostly there to help novices and new hires. Experienced engineers already have readability themselves so they don't need this extra review.

compiler-guy
I know of no automated tools available today that can determine if an identifier is accurately and usefully named. They can all tell if you are using the proper case, but that doesn't really tell you anything.

No tool like that tells you if returning a bool instead of an enum is appropriate here, or that a reference vs a pointer makes more sense given the rest of the code.

I'm sure a clever machine learning algorithm could figure that out with a corpus as large as Google's. Maybe. But no tool like that works today.

And not strangely at all, Google does accept "what clang-tidy does" as the canonical way of formatting text. But readability at Google is far more than just formatting.

Readability is frustrating and annoying, but more than just lint.

dlubarov
> I know of no automated tools available today that can determine if an identifier is accurately and usefully named.

Naming isn't very language-specific though [*], and poor naming should be flagged in coding interviews. Any Google engineer should generally have good naming, and should be able to critique others' naming in any language.

[*] Granted there are some language-specific naming conventions, like ! and ? in Ruby methods. But I'd say the majority of languages don't have important naming conventions, and those that do often have lints for them.

vkou
> Naming isn't very language-specific though [*], and poor naming should be flagged in coding interviews.

I'm not going to turn a candidate away because they use shorthand names when writing code under time pressure on a whiteboard.

Jensson
This discussion wasn't about hiring but about code reviews. If you use bad names in your code then the reviewer is correct when they tell you to make better names.
compiler-guy
"poor naming should be flagged in coding interviews"

Exactly! And Readability process is exactly the training Google uses to ensure someone knows how to look for this issue, and that someone actually is looking for this issue in code reviews.

There is plenty about the Readability process I don't like (particularly that one has to get it through a fairly artificial process), but checking for things like the above (which I just use as one easily understood example) isn't something that happens automatically in an organization as large as Google, with as many engineers as Google.

A common story on HackerNews is just how bad the average programmer is, how crummy code they produce. Taking steps to ensure they do it better is a good thing.

We can argue about the process, for sure, and there are many things I would love to change about it. But "yeah, that should be done in code reviews" is exactly the idea here.

gravypod
Formatting and readability are two separate concepts (as other replies have pointed out). I'd like to specifically point to a fantastic example of what we mean when we say "readability": https://www.youtube.com/watch?v=wf-BqAjZb8M

Someone with readability in a language, who keeps up with the style recommendations, will generally produce code that is easier to read by other engineers.

btilly
The hypothetical discussion about readability is pointless.

Let's make it specific. Read https://google.github.io/styleguide/cppguide.html for readability for a language, namely C++. All the things that can be automated, automatic tools have been written for. But, for example, you can't automate "Prefer to use a struct instead of a pair or a tuple whenever the elements can have meaningful names." Because what does it mean for a name to be meaningful?

StillBored
"Because what does it mean for a name to be meaningful? "

Are you optimizing for someone who already knows all the project lingo, or someone who doesn't know any of it?

Are your engineers native English speakers?

There are a whole bunch of things which make the perfect variable name frequently less than perfect, and putting project insiders in charge likely yields the opposite result.

Take: https://elixir.bootlin.com/linux/latest/source/mm/khugepaged...

If you don't know what a vma, pte, pfn, compound_page, young pte, huge page, lru, etc your going to be unable to even begin to understand what that code is doing, despite those all being pretty reasonable variable names and actually fairly industry standard concepts. It gets worse as you move to more esoteric topics. Expanding pte to PageTableEntry might help some subset of users, but at the expense of those that work on the code daily. So who do you optimize for? Is it readable if the only people that can read it already know what it does?

iamstupidsimple
Readability is not about formatting, that's an orthogonal issue. It's possible to have terrible code that's perfectly formatted.

It's more about good usage of idiomatic language constructs, which still requires good human judgement to evaluate.

stvvvv
It's also worth noting that Borgmon readability included an awful lot of "how not to shoot yourself in the foot in strange and unexpected ways". That required someone who had shown they knew of those strange and unexpected ways to review your code.
StillBored
And I take it google has done wide ranging scientific studies about the variations in coding styles and language constructs that it is a secret advantage that they know how to write "readable" code? Implying they tried a bunch of diffrent ways until settling on the one true way that allows a diverse set of people with diverse experiences to read it?

Ever heard of COBOL?

Because readability has always been in the eye of the beholder, and codifying it makes it even worse.

compiler-guy
I'm not a fan of readability exactly the way Google does it, but I'm pretty happy that Google insists on various aspects of it, like good identifier names.

I don't know of any research off hand, but I'm pretty sure the industry consensus is that good identifier names improve the quality of the code (Go style notwithstanding.) Readability is one way to training engineers to do it.

joshuamorton
> Because readability has always been in the eye of the beholder, and codifying it makes it even worse.

This is empirically false. Consistency, even if it is unfavorable to your preferences, is superior to inconsistency. So a codified set of best practices is better than none at all.

There are part of Google's style guides that I would change if I could, but I also prefer having a style guide (and one that goes beyond things that are lintable) than none at all, because consistency across the codebase means that I can usually understand code at a glance, or if not, know at a glance that something unusual is happening. (this is in fact precisely the argument in favor of autoformatters like gofmt/black/prettier, but extended to softer concepts that can't always be formatted: consistent style, even if it isn't your favorite, is superior to inconsistent style).

StillBored
Consistency is what you get when you have a defined rule set programmatically enforced. If your looking for "readability" via human judgment, then you get a very different result.
joshuamorton
A programmatically enforced set of rules is certainly one way to get consistency, but it isn't the only way. You can achieve consistency through culture and training too, and sometimes that's the only way.

Edit: you can look at Google's C-style guide for some examples, https://google.github.io/styleguide/cppguide.html#Structs_vs...

It isn't possible to statically analyze if a class/struct is a POD or if the methods enforce invariants. But it's often very easy to do so with a human eye. And there's value in the distinction!

Similarly, forcing someone to justify using a power-feature (operator overloading, templates, metaclasses, whatever) can only be done by a human. There may be cases where the power feature is warranted and the benefits outweigh the cost, but a linter can't know that. (and ultimately all of this comes back to: things look consistent, and when things are inconsistent, that's a strong signal that something unusual is happening and you should pay close attention)

Karrot_Kream
> Consistency, even if it is unfavorable to your preferences, is superior to inconsistency. So a codified set of best practices is better than none at all.

Indeed, but at what cost? Every step of added gatekeeping reduces development velocity, and Google does not have a monopoly on high scale services (cough AWS cough). In an ideal world for every full-time dev writing code, you'd have a full-time reviewer whose job it is to simply review that dev's code. But the real world has budgets and deadlines, and humans become discouraged when changes take too long to complete. So is readability worth the added gatekeeping cost is the question.

joshuamorton
You're ignoring the cost of unrestrained development without standards. The impression I've gotten is that a AWS papers over their lack of development gatekeeping with unreasonable oncall loads. All things considered, the "cost" of readability in terms of development velocity is fairly low. If anyone on your team has readability [in the language], they can review your code. This can be problem if only one person has it, but usually once one person gets through the process, you can get more people through fairly quickly.

I'm a bit of a polyglot at Google (I have readability in C++, python, and Javascript/Typescript, and am part-way through the Go and Java processes, which covers basically every popular language at the company), and IMO, the readability process for each language has been a net positive, although it was painful for a while when I had no readability and none of my coworkers did (but there are procedures for getting around that today).

For example, I "donate" my readability, and review random changes from people who don't have readability but need a quick turnaround or aren't yet working to gain readability.

Karrot_Kream
> You're ignoring the cost of unrestrained development without standards.

I'm not. I'm not saying that code should be merged without any stylistic commentary; I'm simply proposing that stylistic commentary be provided by other stakeholders in the service you're contributing to, on top of whatever linter or sanitizer is being used. That is, after all, the goal of a human reviewer. To add _another_ layer of gatekeeping is what I'm questioning.

> The impression I've gotten is that a AWS papers over their lack of development gatekeeping with unreasonable oncall loads.

I don't agree with this perception at all. AWS spends a lot of time building frameworks and libraries which attempt to minimize human gatekeeping to only the necessary areas. So think retries, backoff, timeouts, circuit-breakers, etc. Instead of adding gatekeeping at the code level (with things like code style), instead gatekeeping is added at the architecture and operations stages, which is where most of the work of keeping a production level service up, available, and fast. Operational level gatekeeping in the bounds of a given organization is a much more neatly constrained problem than stylistic gatekeeping and this gatekeeping happens on a per-service basis rather than a per-CL/review basis. This keeps overhead a lot lower.

My impression from being outside of Google (and having a Xoogler for a partner) is that Google loves bureaucracy and process, and that engineers that continue to stay at Google are fine with navigating these processes. Its answer to problems is to add more bureaucracy, another item for the checklist that a person/committee/reviewer needs to check for. That reflex is hard to shake, IMO.

joshuamorton
> That is, after all, the goal of a human reviewer. To add _another_ layer of gatekeeping is what I'm questioning.

I guess I'm not following the distinction. For most people, that's exactly what happens. Either you, or the someone on your direct team, will be the person with readability for your code. The readability process just ensures that you don't end up with silos where style diverges too much, and that the stakeholder who proposes stylistic commentary has a baseline knowlege of linguistic best practices.

With few exceptions, the thing people complain about is attaining readability themselves, not getting changes approved.

> I don't agree with this perception at all. AWS spends a lot of time building frameworks and libraries which attempt to minimize human gatekeeping to only the necessary areas. So think retries, backoff, timeouts, circuit-breakers, etc.

I think we're talking a bit past each other. These things are all table stakes defined by frameworks and updated by centralized processes as best practices change. Human reviewers, for the most part, are only going to say "why are you diverging from the default".

But importantly, that sidesteps my statement entirely, which, rephrasing, is that amazon carries along more technical debt than Google. Whether that's a good long-term business decision remains to be seen, but the impression I get from ex-amazon friends is that maintainability is less prioritized, and maintainability is one of the goals of the readability process. Global consistency helps anyone understand and improve anything else, and to an extend avoids haunted graveyards (though...not entirely).

> instead gatekeeping is added at the architecture and operations stages, which is where most of the work of keeping a production level service up, available, and fast. Operational level gatekeeping in the bounds of a given organization is a much more neatly constrained problem

I'm not sure what you mean, but if anything, this sounds more bureaucratic.

> My impression from being outside of Google (and having a Xoogler for a partner) is that Google loves bureaucracy and process, and that engineers that continue to stay at Google are fine with navigating these processes. Its answer to problems is to add more bureaucracy, another item for the checklist that a person/committee/reviewer needs to check for. That reflex is hard to shake, IMO.

FWIW I don't get this impression. It's honestly difficult for me to tell what you even are referring to. Like, other than readability and promo, there's very few processes or bits of bureaucracy that are consistent across the company, and those two have gotten decidedly more streamlined since I joined the company.

Well okay that's half true, I can think of a bunch of places where bureaucracy has increased, but they're all spurned by legislation.

Karrot_Kream
> I guess I'm not following the distinction. For most people, that's exactly what happens. Either you, or the someone on your direct team, will be the person with readability for your code. The readability process just ensures that you don't end up with silos where style diverges too much, and that the stakeholder who proposes stylistic commentary has a baseline knowlege of linguistic best practices.

Separating these two out, even if one person may fulfill two roles, IMO adds to bureaucracy that I don't care for. Ramping up onto a new language sounds like a pretty fraught experience if nobody on your team has readability and the service you're contributing doesn't have a member who has the time to actually work with you, especially if your work is not relevant to theirs. They even touched upon this in the linked video. The whole idea that you can enter a bureaucratic catch-22 at a company seems like an anti-pattern to me, a moment to stop and think; a "modern" engineering organization should try its utmost to accelerate development, with only as many checks as needed for safety and no more.

> With few exceptions, the thing people complain about is attaining readability themselves, not getting changes approved.

That's what I mean by bureaucracy. Even having "another eye" on the code should work. Having a certification process for readability is even more bureaucracy.

> But importantly, that sidesteps my statement entirely, which, rephrasing, is that amazon carries along more technical debt than Google. Whether that's a good long-term business decision remains to be seen, but the impression I get from ex-amazon friends is that maintainability is less prioritized, and maintainability is one of the goals of the readability process. Global consistency helps anyone understand and improve anything else, and to an extend avoids haunted graveyards (though...not entirely).

Ah, that makes a lot more sense. In my career I've never worked at a place that prioritizes code style that much, and I've worked at other high scale places in the past. This seems like a Google specific need. You're right that global consistency helps to save off "there-be-dragons" codebases, but I feel like this is a case of YAGNI. Unless you have engineers constantly cross-cutting across the company, most engineers learn the style guidelines of a team/product/service by ramping up on the team. The optimization for global consistency seems premature to me.

> FWIW I don't get this impression. It's honestly difficult for me to tell what you even are referring to. Like, other than readability and promo, there's very few processes or bits of bureaucracy that are consistent across the company, and those two have gotten decidedly more streamlined since I joined the company.

But promos drive _so much_ of the company culture, at least from what my partner tells me. And the video in the OP certainly makes it sound like Google has lots of bureaucracy, though I'm not sure how many of the requirements outlined in the video were there for hyperbole or not. But especially for services that have low SLOs, it makes no sense to put thought into failover, evacuation, or anything like that.

joshuamorton
> Separating these two out, even if one person may fulfill two roles, IMO adds to bureaucracy that I don't care for. Ramping up onto a new language sounds like a pretty fraught experience if nobody on your team has readability and the service you're contributing doesn't have a member who has the time to actually work with you, especially if your work is not relevant to theirs. They even touched upon this in the linked video. The whole idea that you can enter a bureaucratic catch-22 at a company seems like an anti-pattern to me, a moment to stop and think; a "modern" engineering organization should try its utmost to accelerate development, with only as many checks as needed for safety and no more.

Keep in mind this video is ten years old. While yes, trying to write something from scratch in a language no one on your team has any experience in can be fraught (although this raises other questions: what exactly are you doing, in every case I've used a new lang, my team has been able to find people to review if needed), even in that case, the organization has help for you (see again my prior comment about "donating" readability, and my understanding is that the readability granting process actively prioritizes people who "need" the readability because they have few potential reviewers).

> That's what I mean by bureaucracy. Even having "another eye" on the code should work. Having a certification process for readability is even more bureaucracy.

I think initially the primary driver of readability is C++ (but Java + Guice also...needs it). Take a look at https://abseil.io/tips/. That's 70 tips, which is around a third of the internal ones. I have C++ readability and have internalized, some of those. The readability granting process ensures your code is reviewed by someone who knows all (or nearly all) of those tips, and will recognize when you're doing wrong things in your code and help explain how or why you can improve. When I personally went through the C++ readability process, I got mentorship on how to fix bugs in my code, some of which a peer on my team would have caught, and some of which they wouldn't have. I can now recognize those bug prone patterns (which are hard to lint for, in my case they mostly had to do with parallelism) and know about tools to debug them (msan and asan, which an experienced C++ user should know, but on my team of people who had basically no C++ experience at Google? Nah).

That follows into other languages. Being granted readability means you have a familiarity with the language that suggests you'll be able to effectively mentor others and not mislead them. You don't have that otherwise.

> Unless you have engineers constantly cross-cutting across the company, most engineers learn the style guidelines of a team/product/service by ramping up on the team.

Google uses a single repo and builds stuff from HEAD. This has up- and downsides, but one of the upsides is that its very easy to investigate unusual behavior in your dependencies. I was doing something similar just this week, and fixed obscure bugs in like 7 other teams tools that were causing downstream problems that those teams weren't aware of. No bureaucracy. I found the bug, fixed them, and sent the change to the owning team. Imagine other situations, where I'd need to maintain a fork, or file a bug to have them fix it, or coordinate an update with them.

You'll pay the cost no matter what, choosing to do so in a way that also provides value and mentorship makes a lot of sense to me.

> But promos drive _so much_ of the company culture, at least from what my partner tells me. And the video in the OP certainly makes it sound like Google has lots of bureaucracy, though I'm not sure how many of the requirements outlined in the video were there for hyperbole or not.

I mean the entire thing is tongue in cheek and again is a decade old. I joined Google ~5 years ago, and even then most of the problems with resource acquisition were solved ("flex") and borgmon and its associated readability were gone, replaced with monarch, which is centralized and used a python-based (though still admittedly arcane) query language that doesn't require readability or managing your own instance. And since then things have gotten even more turnkey for any service that's...reasonably shaped. And, well, the level to which promo drives company culture is consistently overstated (at least in some ways).

iamstupidsimple
What counts for readability is not set in stone by some language czar as the One True Way. Everyone knows the style guides can't be perfect which is why they're relatively mutable.

In any case, readability will comment on stuff that cannot easily be quantified, such as when to use a certain object hierarchy or dependency injection, etc...

DannyBee
Oh boy. So, to start, we have done scientific studies about the costs and benefits of readability (as codified circa 2018, since that was when the last study was).

It was studied for multiple programming languages Google uses.

It was done, at my request, by the engineering productivity research team, who are experts in this kind of research - you can find public papers they publish[1].

For background: I was the person responsible for production programming languages, and I did this precisely because i did not feel at the time there were recent good studies as to whether readability was really worth the cost.

The answer is "yes, it is".

There is an upfront cost, but readability has a meaningful and net positive effect on engineering velocity.

It is large enough to make the cost back quite quickly.

One could go down the rabbit hole of seeing whether you improve (or make worse) the numbers by changing various parts of the style guides, but like I said, what is there now, has in fact been studied scientifically.

Honestly, i'm not sure why you think it wouldn't be. You come off as a little immature when you sort of just assume people have no idea what they are doing and don't think these things through. You are talking about a company with 60k+ engineers. Every hour of all-engineer time you waste is equivalent to having ~30 SWE do nothing for a year.

Maybe there are companies happy to do that. I worked at IBM for a few years when i was much younger ;). All I can say is that as long as i'm involved in developer tools at Google, i'm gonna try not to waste my customers time.

[1] I say this in the hopes you don't go making assumptions about whether the studies were done properly. They were properly controlled for tenure, number of reviews, change size, etc.

Karrot_Kream
I'd love to see the research here. Because searching the software engineering Google papers from 2021-2021, the only somewhat relevant papers I found were case studies (such as [1] and [2]), which were based on Googlers' perception of their own processes. While self-perception can indeed by used to measure efficacy, there's a reason why it's used with great caution in surveys. I'll be honest, I have a hard time believing in readability myself.

[1]: https://research.google/pubs/pub47025/

[2]: https://research.google/pubs/pub43835/

DannyBee
We have never published the readability research. I could look into doing so, but it is awfully Google specific. Trying to come up with something generally applicable would be a very different project
Karrot_Kream
Then why do you often need external readability reviewers, especially if you or your team don't contribute to a codebase frequently? Having humans judge other humans' style is a given; it is part of the PR process. But why do you _also_ need someone with a readability rating as opposed to a code owner?

Note: I'm not a Googler nor a Xoogler, but my partner worked at Google for some years and so I've heard her complain a lot about Google and readability.

iamstupidsimple
> But why do you _also_ need someone with a readability rating as opposed to a code owner

FWIW, these can be the same person - if the have appropriate readabilities. The problem is that it's not easy to get readability in the first place, so occasionally teams have to look for people outside.

Karrot_Kream
Right but the fact that these are two separate roles strikes me both as bureaucratic overhead and a fundamental lack of trust in individual engineers. If I were approached to enact something like readability, I would veto it immediately. We hire engineers because we trust them. Part of that trust means that engineers occasionally wade into codebases they don't understand in languages they don't know and need to make contributions, but we trust them to not make a mess. Do some of them make messes? Undoubtedly. Is the magnitude of the mess, multiplied by the number of engineers, worth the creation of an entire readability certification process? I don't think so.
KKKKkkkk1
Google doesn't have a union. There's no official way to confer power and privilege on people purely based on their seniority (regardless of their achievement). When there's no official way, people find a way. That's readability.
iamstupidsimple
> Is the magnitude of the mess, multiplied by the number of engineers, worth the creation of an entire readability certification process? I don't think so.

And that's an acceptable trade-off for a different sized company. Google arguably has the biggest centralised codebase in the world, and simply has different requirements.

As other have said, most of the overhead is from attaining readability, not the requirement itself.

Karrot_Kream
> And that's an acceptable trade-off for a different sized company. Google arguably has the biggest centralised codebase in the world, and simply has different requirements.

AWS does not operate this way nor do many CDNs operate this way nor do ISPs operate this way. There's other high scale businesses out there. Google isn't the only one. Using Google's scale to justify business practices is a self-fulfilling prophecy.

iamstupidsimple
> AWS does not operate this way nor do many CDNs operate this way nor do ISPs operate this way.

I'm not denying that, but all the same, we're not talking about thousands of separate codebases. It's a single monorepo codebase with consistent styling, the advantage being that people can switch teams or contribute to other Google software under the same style guide.

The overhead isn't a bug, it's a core feature as consistent style makes it that much easier to switch teams.

m0zg
I've proudly managed to avoid Borgmon in favor of Monarch. Which was new at the time, but worked all right even back then. I have a lot fewer gray hairs because of that. They should have kept and rigidly enforced the Borgmon readability requirement to force people to migrate off that convoluted, idiosyncratic piece of shit.
vechagup
There's been a big investment in server platforms that strive to enable SWEs to build a new service that follows Best Practices with as little knowledge and handholding as possible. These consist of conformance tests that yell at you while you're coding if you are trying something generally thought to be bad, and semi-automated workflows that help you bring your code to production. When everything works as intended, the production workflows set up a decent set of alerts, acquire resources, configure CI/CD pipelines, and launch your jobs with just a few button presses on your part. (In practice, one of the steps will probably require debugging, but eh, it seems way better than the broccoli man video.)
mathteddybear
Broadly speaking, there are tools to automate this or that, some technologies are getting deprecated and replaced by new ones

Also probably the privacy review could be a bigger bottleneck these days ;-)

scottlamb
I think you can read about some of these changes in Google's SRE and SWE books (even if they don't mention this video in particular), at least the ones most likely to be interesting to someone outside Google.

But dropping Borgmon readability was the most immediate and obvious. It was basically true that no one had Borgmon readability. The policy was a catch-22: you couldn't get readability for the simple/formulaic Borgmon macro invocations that were encouraged and often sufficient. You could only get it for doing something "clever". I got it by writing fancy borgmon rules to paper over a problem that (in hindsight) I should have solved elsewhere.

Another was easing quota management. IMHO the most unbelievable thing in the video was that after Broccoli Man told Panda Woman to get quota in two cells, she just said "done". Besides the hassle in transcribing what you needed into the request system [1], various types of quota were chronically unavailable where you needed them, even in tiny amounts. In 2010, I kept a critical infrastructure service running by regularly IMing major clients' on-calls asking them to donate 0.1 cpu(!) of their quota in some cell or another when I didn't have quite enough to grow. There was a "gray market" mailing list where people would trade resources they couldn't get through the primary system. But eventually, they built a system that for small services would make the quota just happen for you.

Overall, it was a kick in the pants for the most basic infrastructure teams that made them see how unnecessarily hard this is for their internal customers, prompting them to make small things just happen while keeping large things possible. In any large organization, it's healthy to get this kind of feedback regularly. The actual specific changes and technologies are pretty specific to Google in 2010...

[1] Many people managed this very very tediously with spreadsheets. I eventually wrote a tool to generate the requests based on comparing your intended production config with your current quota.

srj
I think I may have used your tool at some point. There were times in geo where we'd run out of bigtable quota and I needed to hit the "emergency loan" button to keep maps from going down globally.
ineedasername
That's why you had these problems: you should have just let maps go down globally and then pointed the finger at the appropriate target to blame for it.

I'm only half joking about that too. The other half is that sometimes it is better to let the system collapse under its own weight when that's what is required to convince the right people the system is broken.

jeffbee
Production priority quota horse trading in the days before it was easy was a real skill. But non-production quota was free and virtually infinite, even in those days.
erik_seaberg
I seem to remember hearing prod quota disputes described as “monkey knife fights” by SRE. SWE would joke about using corp credit cards on EC2 instances rather than waiting on the outcome.
the-rc
Or converting between priority levels or users that were under your product. If you ran a large enough service, there was also begging your users for spare quota or warning them that their recent donation couldn't be fully deployed because a necessary change in some secondary borgcfg template had eaten into the service's resources, so it would help everybody if they knew anyone with a few cores laying around...

By the way, do you happen to be the jwb that filed a... creative BT quota request?

jeffbee
Yeah that's me.
the-rc
Hah, I was the one that played along and made you keep going ("May you be gifted with a long, healthy life and an exquisite taste in ties"?). And I think SAD was one of the teams that occasionally helped us find resources in this or that cluster, along with Analytics and, later, Social.

AAAAA+ GREAT CUSTOMER WOULD SELL TO AGAIN

tdiggity
wow, that’s incredible. Doesn’t sound like real life.
scottlamb
Thing is, though, something equally ridiculous happens at every large company and at more than a few small companies. Worse in government. The important thing is how they react when it's pointed out.
marcinzm
When people wonder why a large corporation would pay to be on AWS/GCP/Azure rather than running their own infrastructure this is one reason why.
ineedasername
Would it violate Godel's incompleteness to try and have Google implement its resource provisioning in GCP?
dilyevsky
Every single buzzword uttered in the video has a cloud equivalent except for PCR. These days people just give up on regional (even zonal) outages and just take downtime and blame their cloud provider in the rca so doesn’t matter. The big thing that i think is missing in this thread is just how enormous google infra was even at 2010 so these problems were and still are sort of unique
justicezyx
I worked at TI, Planet and later Borg, I did not feel much influence of this video other a chuckle. Or I might be too low level to perceive.
jeffbee
I think it was a very common perception among application-level SWEs and SREs that TI, platforms, and Borg did not themselves use the stack enough to perceive its flaws.
jrockway
I feel like people forgot after about five years. I remember wasting a week filling out various pieces of paperwork and submitting byzantine configuration CLs so that some contractor would have permission to view a certain webpage through the corporate proxy. (I think what made me most mad is that regular employees could view the website with no additional configuration. I can understand if I was filling out tickets to get approval, or a security review, but the actual configuration of the proxy had to change to allow this, in addition to getting all of those approvals!) My team didn't make the website, and the contractor didn't work on my team, so I'm honestly not sure why I was involved. I just remember being annoyed about it. I'm sure there are some memes about it in the archive.
hnov
This is paradoxically because historically everything was wide open to anyone so access-control and such isn't super fleshed out for most apps behind the proxy. Random internal app X could have been conceived and built with very little oversight and opening it up to a rotating cast of temporary workers is seen as an unnecessary risk. Broadly used apps (e.g. the bug ticket system) tend to have app-level security-controls and are not blocked by the proxy for contractors.
ikiris
Contractor access is its own hellscape.
davidw
I did a stint contracting for Goldman Sachs a while back. I can relate. Don't think I can say anything more without a team descending on my house from a black helicopter, though.
servytor
At night all helicopters are black.
twinkletwinkle_
I once worked at a tiny startup where we were trying to sell a dataset to GS. Before we could even send a sample, they sent over some boilerplate forms for us to sign. I remember two distinct stipulations - anything we sent them was immediately and forever their property, AND they had the right to drug test any of our employees. We ended up not signing so there was no deal. My boss said it was their way of getting rid of us.
dzhiurgis
> anything we sent them was immediately and forever their property,

send them CP

sn_master
what's GS
onceiwasthere
I think it's Goldman Sachs as mentioned in their parent comment.
q3k
... socially, too. That part sucked.
abustamam
Separation of the classes and all
ak217
That's the scary, and perhaps most immoral, part about Google. More so than other megacorps, it relies on a shadow workforce of contractors to do core parts of its work. It's quite the caste system, and most Google FTEs are oblivious or choose to ignore it.

I know a Google contractor who was denied access to a lactation room at her office (she's the type of contractor who works full-time alongside regular employees). When she tried to fix that, she got sent into an infinite loop of support tickets being bounced around between different departments claiming it was the other one's responsibility. Literally no one was able to fix that for her. Her manager was on a different continent, and wasn't able to help either.

amznbyebyebye
Wow that’s so not googley :(
rincebrain
Once, I saw the employees get frustrated in a nominally high security environment with the contractors not being able to even directly access their dev env without scheduling a visit days in advance (the contractors were all hours away) while having the employees stare over their shoulder the entire time. They were trying to debug an issue that sometimes came up in dev/prod but not in the contractors' local copy of dev.

The employees set up TeamViewer on their "dev" server and promised the contractors in writing that this all had sufficient permission, this was a dev server, and the credentials on the dev server were not going to get them into anything else that might be troublesome by mistake.

The last of those three statements was accurate. As you might imagine, TeamViewer on a nominally tightly controlled network was not even in the same hemisphere as acceptable...

...and while debugging, the contractors made an incompatible DB schema change on dev to see if it fixed something, only to get a nasty surprise when the employees ran to them within an hour or so asking what they had done, because by "dev" they meant "prod".

I don't really have a great moral for this story, other than maybe "netsec/infosec teams need to actually work with other teams and not just be opaque sources of fiats, or people are going to work around them to get things done instead of trying to work with them, and that's going to end poorly for everyone."

cromwellian
As a Googler, it's often easier for me to setup a GCP consumer account, AWS, or Heroku account to demo something, compared to using anything internal. I remember the most annoying situation was like 10 years ago when me and other engineers ported Quake 2 to run in Chrome, we were in a time crunch to demo it multiplayer, and I ended up setting up an AWS account to serve it. But then I left it running and forgot and ended up getting a few hundred dollars billed to me because the Quake2 server was chewing CPU.
bamboozled
I could imagine you're violating some pretty strict policies doing this?

You're taking proprietary code and running it on a competitors platform?

I like to think I'm pretty open minded about stuff like this, and I've actually done something similar, but I'd be surprised if you didn't get your ass handed to you for that type of thing?

cromwellian
For those interested: The original project https://code.google.com/archive/p/quake2-gwt-port/

GitHub (Stefan Haustein is my genius teammate who did all of the heavy lifting on the OpenGL -> WebGL piece) https://github.com/stefanhaustein/quake2-playn-port

You can still play it here, on AppEngine http://quake2playn.appspot.com/

potatoman22
Can't let the Quake 2 source code escape Google
mabbo
Why? Do you think there's a risk of Amazon stealing code from a customer?

No matter what code they took, the cost would never be worth it for them.

bamboozled
If I took my companies code and hosted it anywhere execept where I was authorized to do so, I'd expect flack for it. I'm less worried about Amazing stealing it, but it seems like a silly place to put it nevertheless.
cromwellian
Just to reiterate, that’s not what happened, no proprietary code was exposed externally, as the OP is based off of Java clone of the Quake2 code both of which were released as open source.

I don’t know what the policy of Google is with respect to hosting stuff that’s proprietary or sensitive on AWS but I imagine it requires approval just for lawyercat reasons, especially if any PII is collected given all of the regulations these days.

mgfist
> You're taking proprietary code and running it on a competitors platform?

Do you not realize how dumb this sounds? Companies run critical code/infrastructure on AWS to the tune of many billions of dollars a year.

bamboozled
They do, but you generally also need to do so following some best practices and guidelines.If everyone in every company just put stuff on whatever provider they liked, then there would be a lot of issues.

I mean, do you have any idea how many people have exposed confidential data by hosting it on a publicly accessible s3 bucket?

stvvvv
Google Proprietary code on AWS:

https://cloud.google.com/blog/products/data-analytics/introd...

https://cloud.google.com/anthos/clusters/docs/aws/concepts/a...

etc.

BoorishBears
This is what separates companies that get things done from companies that pay a lot of people to hopefully maybe get things done.

A server for a multiplayer Quake port...

Who is Google paying to hand their ass to them over that?

Who has both the authority to hand their ass to them, and a lack of discretion to not let it end at "well in general we don't do that, but I see why you did it and there's little to no risk"

-

At some companies yes, someone is paid to go "I caught someone putting our proprietary code up on a competitors platform!!!!" and no one will actually think critically about what exactly was proprietary, so someone putting up a quake demo might as well have put up the Coke secret formula

And now OP who actually generated some value at little to no risk gets their ass handed to them and someone who simply lacked the skills to realize the risk profile gets a notch on their "I add value and earn my paycheck" badge.

cromwellian
No, no proprietary code was used, the port was done from the Open source Java clone Jake2: https://en.wikipedia.org/wiki/Jake2

We ported it by using Google Web Toolkit Java->JS compiler, and replaced OpenGL with WebGL, and all of the other bits with Web APIs (websocket, pointer-events, fullscreen-api, filesystem api, etc)

The assets (proprietary artwork, levels, etc) were not hosted on AWS, it simply downloaded the EXE file from ID servers and extracted it in the browser.

bamboozled
You said this:

> As a Googler, it's often easier for me to setup a GCP consumer account, AWS, or Heroku account to demo something, compared to using anything internal.

I get you're trying to make a point of saying you can do something easier elsewhere, but then why even through in the "as a Googler bit" without clarifying that you're not really working on anything of consequence where you'd be actually asked to host things internally.

You're basically hosting open source projects on AWS.

breakfastduck
Looks like someone is desperate to get into an argument
bamboozled
I couldn’t care less really, I would if someone was just running my code wherever they felt like it though.
cromwellian
Why does it matter whether I'm working on something for production, or working on a 20% project? The point is, it shouldn't have been so hard for me to setup and run a simple server cost (billed using my divisions cost center), and then have to pay out of my own pocket to use a competitors product (and go through the hassle of expensing it). That's a bit of a sad pill for someone working at a company that had one of the most advanced data centers in the world. There are lots of times when developing a product internally that we need to host internal prototypes for demos, and GCP/AWS is a lot easier to use than the internal tooling. Even today, using our internal GCP instance vs the external one is more difficult.

A bigger reason AWS was chosen was because AppEngine, IIRC, back in those days could not support WebSockets (among other things), as Google's frontend Load Balancers that the predecessor to GCP used didn't know how to deal with it. The Chrome team was pushing heavily on new Web APIs like WebSockets, but by the time they were ready to ship to market, our own cloud infrastructure didn't support it yet.

I'd go further and say that, using internal infra to host employee 20% projects that are not really for production, is the kind of dogfooding that would have generated the pain points necessary to make a product like AWS, because those hobby workloads are far more similar to the workloads the average GCP customers deploy. Because we didn't really prioritize that, we kind of missed the boat on Cloud and was a late entrant, even though most of the foundational technology (eg Containers) we already had a long time ago.

jcims
Typically the problem is that employees with the ability to just stand up random servers within an environment also have the ability to start using them to run production workloads, handle sensitive information, and just build little silos entirely outside the operating standards, and if regulated, compliance requirements of the company.

These things typically ride on a pendulum over the course of 10 years or so, swinging from high speed to high friction. As the costs associated with approaching one extreme stack up, someone eventually says enough and the direction reverses.

cromwellian
Good point, although Google's internal monitoring systems are pretty good (even aggressive) at detecting abandoned, low-use, or high cost un-approved systems. I've gotten a number of my old projects flagged by automated monitoring, and asked to delete them, shut them down, or move them.

In recent years, there actually is an internal framework for standing up servers quickly along with end-to-end everything else a Google production service usually gets. The frontend uses Wiz. It's still not as easy as say, NextJS/Vercel, or Heroku, or even throwing together a Kube deployment, but it does provide a lot more than any of those, so the medium learning curve pays back quickly.

bamboozled
I said this in my original post but was downvoted to oblivion because my view was unpopular. I fully understand why you did it, I've done it myself for the same reasons.

I was just saying there is a line you can easily cross and you need to be careful when you take something and put it on other platforms. I've seen people get in trouble for enabling a Slack / GitHub integration, which actually made sense when I thought about why it was an issue.

kevan
An interesting contrast: for as long as I can remember (I've been at Amazon for 5 years) we've had burner AWS accounts available for testing/prototyping stuff that get auto-closed after a week. Many devs, myself included, also have a personal long-lived AWS account for infra stacks of the services we're working on. These accounts get billed directly to the team's fleet budget. As long as you're using them for work-related things (i.e. not mining crypto or hosting a minecraft server) and not racking up massive bills without a good reason then it's fine.
cromwellian
OMG, this is exactly what I want. GCP burner accounts that are billed to the team's cost center, perhaps with preemptable VMs that sleep dormant when unneeded, and eventually expire unless renewed.
bamboozled
This is super easy to setup on GCP? How can you work at Google and not have this?
knorker
> No, no proprietary code was used, the port was done from the Open source Java clone Jake2: https://en.wikipedia.org/wiki/Jake2 We ported it by using Google Web Toolkit Java->JS compiler, and replaced OpenGL with WebGL,

That sounds like you wrote code, code on which Google holds copyright, and you really need to read up on the Google company policy on that. I'm not sure you understand fully what distinct things copyright and software licenses are, and how Google policies apply to them.

There's a reason people who join Google have historically "disappeared" from open source: It's because they can't just contribute.

Back when you did this it definitely was explicitly and strongly violating policy. Nowadays maybe it could get away without opensource review by counting it as a "patch".

https://opensource.google/docs/patching/#no-review

cromwellian
I'm pretty sure when we did this, we didn't just drop it without talking with anyone, after all, it uses proprietary assets and we presented it at I/O. It's been a long time, but IIRC, Chris DiBona was involved at least as far as getting an opinion (I don't recall if OSPO existed back then), as well as some lawyercats because of the proprietary assets it dynamically loads. This was semi-official, it was shown at all-hands by Eric, the media team helped us create the trailer (https://www.youtube.com/watch?v=XhMN0wlITLk), it was hosted by Chrome DevRel who deal with OSS releases all the time, so this wasn't just some random engineers off doing a secret private project and uploading it unannounced.

My first few years at Google, I worked almost exclusively on official OSS projects (no google3 commits).

Edit: looking at the trailer, I forgot we released this on April 1, as an April Fools joke. LOL

knorker
Ah. Yeah ok that paints a different picture than your other comments.

I'd hate to see people get in trouble after reading what you wrote and reason their way to thinking because it's opensource and their own time they can do what they want.

cromwellian
No biggie, my initial wording was problematic. I really only wanted to express at the time I was racing to get a private demo ready for TGIF that we could link people to, and I ended up having to use AWS which was frustrating.

For the most part, I think most Googlers are thoughtful about the rules, god knows we go through enough training decks, but also as a xoogler? you know Googley behavior means people are pretty reasonable about obeying license restrictions and adhering to desires of OSS authors.

Although as the number of employees at Google skyrocket, the probability of bad actors increases and company culture was probably much different than when there were only 5-10k employees vs >100k

xiphias2
I wish I would have had this video before 2010. I got paged at night every time there was a PCR failover, and I didn’t know what to do with it. This video is better than all the extensive documentation that we had.
xiphias2
I got 20 points in 1 hour...it seems like other people have similar experiences as well
cletus
Ah, this takes me back (disclaimer: Xooger, 2010-2017). It's painful and funny because it's true. Or was true.

Rumour had it that the Borgmon readability requirement was removed when Sergey saw this video. I don't know if this is true but that's what I heard.

DaiPlusPlus
Pray tell, what is/was Borgmon?
sleepydog
It's a language and supporting infrastructure for collecting and querying time series data for monitoring.

It was replaced a long time ago by a new system called monarch, but a few holdouts will probably continue using borgmon until the heat death of the universe.

ikiris
If monarch supported large customers it would take over those holdouts
dekhn
prometheus 0.1
jensensbutton
This is the correct answer.
knorker
To elaborate on this answer: Prometheus was written by ex-Googlers who had no shame in copying Borgmon pretty much exactly, syntax-compatible and all.
dmoy
https://sre.google/sre-book/practical-alerting/

Short version: it's a monitoring configuration language.

twinge
A system for alerting based on time-series data, with its own rule language. The language (along with many others) required authors have demonstrated they can adhere to the style guide by going through a process to obtain readability.

https://sre.google/sre-book/practical-alerting/

leg
It is true that Borgmon readability went away due to this video. It wasn't Sergey, it was an eng director.
ikiris
"no one has borgmon readability". years later and i still die laughing.
taldo
Ah, the laughs (Xoogler since 2020). It was a lot easier, at least last year: you'd use "flex" quota from your PA pool (product area) for Spanner and Borg, write some code for your server, a few configs here and there, and you'd be ready and serving.
dilyevsky
That video came out a few years before flex appeared I think at a time they were having a sort of “resource crunch” on the heels of growth spur following the GFC.
willidiots
Flex was available for certain things (Colossus IIRC, gave you a ton of flex quota) but for others it wasn't. Because This is Google.
the-rc
It was easier to mint and carve out Colossus quota than e.g. Bigtable. I seem to remember that flex for Borg existed, but only in a few locations with enough capacity to back it. You couldn't just retrofit it in clusters where existing, large customers were already granted and using most of the quota.
the-rc
It wasn't that there was a crunch — that had always existed. There just wasn't all the tooling to implement anything like flex. At least this video was made after "buying Borg quota" was a normal thing. Before it, you had to "buy" regular machines and donate/assimilate them into Borg. Then after X days you'd receive your quota, minus a Borg "tax" of 10% to cover borglet and system daemons' overhead.
exikyut
So before Borgmon existed, the quota unit was basically an entire machine? No virtualization?

Oh wait, this was probably 2008, VMWare had only just figured out JITed software virtualization a couple years prior. Makes a bit more sense now. And now the containerization thing makes a tad more sense: Google basically skipped over the "use QEMU" (or more recently, Firecracker) phase everyone else is now going through.

the-rc
Borgmon and Borg are different things, but yes, before the latter, machines were owned by teams. Brian Grant talks a bit about the predecessors, Babysitter and GWQ: https://softwareengineeringdaily.com/2018/04/27/google-clust...

Borg was started in 2003 and by 2006-2007 was already the default way to run things, even though isolation wasn't perfect then. I think it started with chroot jails, then fake NUMA, then cgroups, which were written for it. It took years before all of web search moved to dedicated Borg machines (their quota belonged to you and nobody else could run on them) and, eventually, shared ones.

dilyevsky
Fun fact - up this thread you replied to one of original cgroups/borg creato’s comment
rachelbythebay
Ah, see, should've given 'em to us! Big rectangular state, you know, #3 machine owner at the time. We didn't charge overhead, probably because it never occurred to us to do it.

People brought machines, we gave 'em quota. Easy enough.

So glad to not be doing that any more.

the-rc
I know the service! The folks in NY had the state flag by their cubicle. My team in 2007-2011 was probably one of the largest users and I think I donated machines to y'all in the old Groningen cluster, before quotas were automated. GFS didn't charge for chunkserver overhead, either, and that mistake took years and lots of pain to fix...
menage
And then they got Steamrolled away if you didn't use them enough.
the-rc
Hah, I think that at least at the beginning, the tools wanted to steamroll the Bigtable service, too. As the one that had to vet the changes every week, I had to go explain that it wasn't really neither our quota nor our usage. For Colossus, sometimes large new clusters would go from very idle to super busy and under provisioned within days or few weeks, AFTER the steamrolling was already done, just because large users moved in. We then had to bring up new curators (masters) quickly, many times with quota that didn't exist. Quite often, when out of options, I ended up taking precious p360 quota from the storage monitoring user to mint production resources for Colossus. Fun times.
dekhn
you left out monitoring for reliability which is a major part of this video
packetslave
automon is (was?) a magical thing
weeks
Is. :D
bhickey
About six years ago I had a resource manager deny me a database instance the very same day it became available for flex in another product area. I tried to "Hey Mister" resources from someone in that group to no avail. Eventually I wrote a high-durability key-value store on top of our source control system and told them they could give me my database or I'd be deploying to prod.
plaguuuuuu
>Eventually I wrote a high-durability key-value store on top of our source control system and told them they could give me my database or I'd be deploying to prod.

That's diabolical. I knew Google should never have dropped the "do not be evil" thing

w0mbat
When I first started at Google I got things done a lot faster because I didn't know all those rules existed and nobody stopped me. My service was still plenty fast & reliable. Eventually it all got rewritten by other people to do things properly like the video says.
robocat
Rachel said something similar “My own ‘solution’ to it after far too much thrashing was just to say ‘we cannot get all N types of quota in the same place so we are at the mercy of whatever happens to be available, and if that dries up, we stop running’. Granted, this was for some internal stuff that was seven or eight levels removed from anything that anyone on the outside might ever see, but still, it was stupid and made me feel so dirty. I'm sure my non-solution probably bit someone later. Sorry, whoever.” — https://rachelbythebay.com/w/2021/10/30/5tb/
hiepph
Isn't that things are supposed to be? Prototype and then polish later. I think teams should work in layer like this, but I guess with big company they're just too lazy and assume you're smart enough and have time to take care of everything.
w0mbat
Right. Time was of the essence with this service, and it was still very reliable and scalable in its original simple form, because it leaned on a lot of existing Google infrastructure. Later on it got upgraded to follow the rules better.
dekhn
I managed to deploy a whole system at google that had the ability take down all of google globally by DoS'ing the network, and ran it casually (IE, starting and stopping it when I felt like it, at the capacity I felt like, with the binary versions I wanted) for 3 years.

In retrospect, this was absolutely crazy! The actual visible outcomes were: 1 cluster drained due to heat rising so fast the alerting thought there was a fire, 1 page to an engineer in the middle of the night (sorry discovery-service) and a whole bunch of complaints about CPU stealing that weren't my fault.

Those were the good old days.

jedberg
The same conversation at Netflix 10 years ago:

I want to serve 5TB of data.

Ok, spin up an instance in AWS and put it there.

I want it production ready.

Ok, replicate it to a second instance. If it breaks we'll page you to fix it.

The funny thing is, for important stuff, we ended up doing similar things to what you see in this video, but for unimportant things, we didn't. I think it was a better system, and it was amusing when we hired people from Google who were confused by the lack of process and approvals.

ignoramous
> I want to serve 5TB of data. Ok, spin up an instance in AWS and put it there... it was amusing when we hired people from Google who were confused by the lack of process and approvals.

Quoting from Velocity in Software Engineering https://queue.acm.org/detail.cfm?id=3352692:

In 2003, at a time in Amazon's history when we were particularly frustrated by our speed of software engineering, we turned to Matt Round, an engineering leader who was a most interesting squeaky wheel in that his team appeared to get more done than any other, yet he remained deeply impatient and complained loudly and with great clarity about how hard it was to get anything done. He wrote a six-pager that had a great hook in the first paragraph: "To many of us Amazon feels more like a tectonic plate than an F-16."

Matt's paper had many recommendations... including the maximization of autonomy for teams and for the services operated by those teams by the adoption of REST-style interfaces, platform standardization, removal of roadblocks or gatekeepers (high-friction bureaucracy), and continuous deployment of isolated components. He also called... for an enduring performance indicator based on the percentage of their time that software engineers spent building rather than doing other tasks. Builders want to build, and Matt's timely recommendations influenced the forging of Amazon's technology brand as "the best place where builders can build."

...leading up to the creation of AWS.

jll29
> we turned to Matt Round, an engineering leader who was a most interesting squeaky wheel in that his team appeared to get more done than any other

Matt went on to study theology, and he's started a church community in Scotland: https://www.linkedin.com/in/mattround/

"Leader Company Name: Hope City Church Edinburgh Dates Employed: Sep 2017 – Present Driving a new church start-up."

bcoughlan
What does "platform standardization" mean in this context?
ryandrake
The "approval paralysis" thing happens at a lot of companies, large and small, not just GiantTech. It creeps up on you slowly: 1. A big problem happens that gains the attention of leadership. 2. The problem is root-caused to some risky thing an employee did trying to accomplish XYZ. 3. To correct this, a process is put in place that must be followed when one wants to do XYZ, and (critically) gatekeepers are anointed who must approve the activity. 4. These gatekeepers are inevitably senior already-busy people who become bottlenecks. Now we can't do this critical thing without hounding approvers. 5. Some other big problem happens and the above cycle starts all over again.

Before you know it, every even slightly risky task you need to do through the course of your job requires the blessing of approvers who are well-intentioned, but all so overloaded they don't even answer their E-mail or chats. They sometimes need to be physically grabbed in the hallway in order to unblock your project. Progress grinds to a halt and it still has not stopped production problems--just those particular classes of problems that the approval processes caught.

EDIT: Not sure what the right solution is, but it must be one that doesn't rely on a particular overloaded human doing something. Maybe an automated approval system that produces a paper trail (to help with postmortem and corrective action later) and ensuring all changes can be rolled back effortlessly. Easier said than done, obviously.

david422
What is the solution?

I've worked at big companies that are mired in process because they would rather spend more time crossing i's and dotting t's than risk breaking something. I can see why.

And I've worked at smaller companies where the clients are small and it's easy to fix things that break. Move fast and break things at a small scale maybe.

But how do you grow to be a big company and still operate like a small company? I can't seem to see an answer.

native_samples
There are many, but the problems are more subtle than this video really gives credit for.

I worked at Google at the time this video was made, and empathized (in fact I had been an SRE for years by that point). Nonetheless, there are flip sides that the video maker obviously didn't consider.

Firstly, why did everything at Google have to be replicated up the wazoo? Why was so much time spent talking about PCRs? The reason is, Google had consciously established a culture up front in which individual clusters were considered "unreliable" and everyone had to engineer around that. This was a move specifically intended to increase the velocity of the datacenter engineering groups, by ensuring they did not have to get a billion approvals to do changes. Consider how slow it'd be to get approval from every user of a Google cluster, let alone an entire datacenter, to take things offline for maintenance. These things had tens of thousands of machines per cluster and that was over a decade ago. They'd easily be running hundreds of thousands of processes, managed by dozens of different groups. Getting them all to synchronize and approve things would be impossible. So Google said - no approvals are necessary. If the SRE/NetOps/HWOPS teams want to take a cluster or even entire datacenter offline then they simply announce they're going to do it in advance, and, everyone else has to just handle it.

This was fantastic for Google's datacenter tech velocity. They had incredibly advanced facilities years ahead of anyone else, partly due to the frenetic pace of upgrades this system allowed them to achieve. The downside: software engineers have to run their services in >1 cluster, unless they're willing to tolerate downtime.

Secondly, why couldn't cat woman just run a single replica and accept some downtime? Mostly because Google had a brand to maintain. When she "just" wanted to serve 5TB, that wasn't really true. She "just" wanted to do it under the Google brand, advertised as a Google service, with all the benefits that brought her. One of the aspects of that brand that we take for granted is Google's insane levels of reliability. Nobody, and I mean nobody, spends serious time planning for "what if Google is down", even though massive companies routinely outsource all their corporate email and other critical infrastructure to them.

Now imagine how hard it'd be to maintain that brand if random services kept going offline for long periods without Google employees even noticing? They could say, sure, this particular service just wasn't important enough for us to replicate or monitor and the DC is under maintenance, we think it'll be back in 3 days, sorry. But customers and users would freak out, and rightly so. How on earth could they guess what Google would or would not find worthy of proper production quality? That would be opaque to them, yet Google has thousands of services. It'd destroy the brand to have some parts that are reliable and others not according to basically random factors nobody outside the firm can understand. The only solution is to ensure every externally visible service is reliable to the same very high degree.

Indeed, "don't trust that service because Google might kill it" is one of the worst problems the brand has, and that's partly due to efforts to avoid corporate slowdown and launch bureaucracy. Google signed off on a lot of dud uncompetitive services that had serious problems, specifically because they hated the idea of becoming a slow moving behemoth that couldn't innovate. Yet it trashed their brand in the end.

A lot of corporate process engineering is like this. It often boils down to tradeoffs consciously made by executives that the individual employee may not care about or value or even know about, but which is good for the group as a whole. Was Google wrong to take an unreliable-DC-but-reliable-services approach? I don't know but I really doubt it. Most of the stuff that SWEs were super impatient to launch and got bitchy about bureaucracy wasn't actually world changing stuff, and a lot ended up not achieving any kind of escape velocity.

edude03
This is a great explanation, thank you.

(I've never worked at google, and maybe this isn't a problem anymore however) It seems like the "solution" here would be to do for Infra what Go did for Concurrency - build an abstraction with sane defaults, and rubber stamp anything that doesn't stray from those defaults. Anything that does - requires further scrutiny.

For example, at the companies where I've been response for infrastructure (admittedly much smaller than google) I've done exactly that (with Kubernetes specific things like PodDisruptionBudgets and defaulting to 2 replicas), and if users use the default helm chart values, they can ship their service by themselves.

native_samples
They did a lot of stuff like that, but the work to launch a new service wasn't only technical and some of the non automated works was there partly to enforce checkpoints on the other stuff. For example to get your service mapped through the load balancers required you to prove you'd been approved for launch by executives, so the process required filing a ticket. It's probably all different now though.

I should also note that "launch" in Google speak means "make visible to the world". If you only wanted your service to be available for Googlers it was dramatically easier and the infrastructure was entirely automatic with zero approvals being easily possible.

plaguuuuuu
Trust your people! Otherwise don't hire them in the first place.

You want to deploy something to prod? Okay, either call an API or fill out a webform (or message a Slack bot idk) - but the contents will be a checklist of stuff you need to have done.

1. Did you [load test, integration test, whatever]

2. Did your local architect look over and approve the high-level design? (NB: notice how we aren't requiring an architect to sign off, we trust the developer. Because if they lie they're fired lol.)

3. Other stuff. Maybe some taxonomic stuff like tags associated with deployed infrastructure? Swagger endpoints? Go nuts as long as it's stuff actually needed by central planning - the documentation and paper trail aspect is covered here

This is picked up to be ingested into databases, wikis, emails, wherever.

Compare with my last large corp where we had change approval boards where 25+ people sat in a long meeting and essentially just asked if you'd done the above and you'd then be greenlit to go to prod (at the time deployments were pretty manual and required scheduling as well which is obviously suboptimal). I'm just about to move from a small consulting company to a startup/scaleup so it's going to be interesting to see how things move there..

ignoramous
Autonomy.

Solution to such org woes, in part, is discussed by Clayton Christensen in his work, The Innovator's Solution http://web.mit.edu/6.933/www/Fall2000/teradyne/clay.html: Even after correctly identifying potentially disruptive technologies, firms still must circumvent its hierarchy and bureaucracy that can stifle the free pursuit of creative ideas. Christensen suggests that firms need to provide experimental groups within the company a freer rein. "With a few exceptions, the only instances in which mainstream firms have successfully established a timely position in a disruptive technology were those in which the firms' managers set up an autonomous organization charged with building a new and independent business around the disruptive technology." This autonomous organization will then be able to choose the customers it answers to, choose how much profit it needs to make, and how to run its business.

---

Amazon and Cloudflare are good examples of big-orgs trying their best to implement late Prof. Christensen's ideas.

Andy Jassy on Amazon's approach to innovation: https://www.hbs.edu/forum-for-growth-and-innovation/podcasts...: And then if we like the answers to those first four elements, then we ask, can we put a group of single threaded focused people on this initiative, even if it seems like they're overwhelming it with strong senior people, if you try to add really busy people do the existing business and the big new idea, they will always favor the existing business because it's surer bet. So we want to peel people away from the existing business and put them just on the new initiative.

Pace of innovation at Cloudflare https://blog.cloudflare.com/the-secret-to-cloudflare-pace-of...: ...it is not unusual for an initial product idea to start with a team small enough to split a pack of Twinkies and for the initial proof of concept to go from whiteboard to rolled out in days. We intentionally staff and structure our teams and our backlogs so that we have flexibility to pivot and innovate. Our Emerging Technology and Incubation team is a small group of product managers and engineers solely dedicated to exploring new products for new markets. Our Research team is dedicated to thinking deeply and partnering with organizations across the globe to define new standards and new ways to tackle some of the hardest challenges.

---

Also read: Clayton Christensen and Stephen Kaufman on "Resources, Process, and Priorities": https://personal.utdallas.edu/~chasteen/Christensen%20-%202n...

None
None
Ao7bei3s
Self-service approvals.

Instead of appointing a senior eng to be approver, task the same senior eng with writing down his decision criteria (as text or where it makes sense even as code).

This has advantages for everyone:

1. It lets the engineers who need approval move at their own speed, and plan time for it as a predictable work item like any other, instead of depending on an approver for whom the approvals will usually be at a lower priority and mid-sprint.

2. For the approval policy writer, it turns this into a one time effort with a defined scope that can be planned and prioritized in his/her own backlog, instead of open ended toil that can come at any time, take any time, and not clearly relate to their own current priorities.

3. For the company, writing down the policy brings consistent decision making.

Obviously this requires trust that employees can and will say "no, can't do" when they're tasked with something that is not approvable, which can be culturally difficult (business and otherwise). Checklists (literally a list of checkboxes to click on, "I confirm that...") can help with this.

(As an example of writing down the policy as code: that's any CI/CD pipeline. But it's not limited to engineering decision making - for example, we're using a well-known open source license management tool that promises auto-approval for open source library use depending on policies configured by legal. This works moderately not so well because this particular tool is not great; the idea is sound. We still made it work: now legal wrote down their policies, trained a large number of engineers on them and those are now empowered to make decisions.)

strictfp
In change management they argue that companies tend to purposely slow down change over time to become more predictable and lock in on the "successful route". That certainly mirrors my experience. The only thing I don't understand is why you hire so many people when you let a few handful people gate everything. You might just as well fire 80% of the workforce.
jeffbee
You cannot have both an organization that fastidiously protects the privacy and security of user data, and one that requires no process to build and launch software. It's just not possible.

Anyway the video is just a joke. I've never worked anywhere where it was as easy to just serve 5TB of static data as at Google. Googlers who want to just host junk under their own authority do not need to shop for quota, set up borgmon, etc.

joshuamorton
Right like looking back, they're setting up a production, user facing service. If I want to just store a 5tb blob somewhere, I think that fits in freebie CNS, so I don't even have to provision resources, I just cat the file or whatever (granted, 5tb was a bit bigger 10 years ago).

Having a rule that "your user-facing service needs to be replicated" is a good rule. Replication being difficult was the problem.

bostonsre
Automate as much as possible. Approval gates are there to prevent obvious issues from continuing down the pipeline. If you can automate checks for known issues that you want to prevent from happening, then you should be able to add it as a test step. Then in the catch, log why it failed and point the dev at documentation.

Manual processes suck for everyone involved.

Zababa
I've read on HN that "processes are organizational scar tissue", I think it applies here.
rShergold
That's an excellent phrase. It reminds me of the navy saying "regulations are written in blood"
MauranKilom
It's actually super related, given that (at least in the medical software sector) you won't get anything approved by the FDA before spelling out the entire software development operation in processes.
NaturalPhallacy
Makes sense in the medical domain where people are way more likely to be injured or die.

But most software isn't that critical.

riknos314
Yep. A wise engineer once told me "Runbooks [written SOPs] are just solving bugs with people instead of code"
dpryden
Non-Googler: What do all those words mean?

Noogler: Haha, this video is so funny!

L4 SWE: (Crying because the video is so true)

L5 SWE: Haha, this video is so funny! I should show it to my interns, this will be a good training for them.

L6+ SWE: Why do people think this is funny? This Broccoli Man guy makes some really good points...

belter
L7+ SWE

My life is a waste but the money is too good...

tandr
How good?
riknos314
https://www.levels.fyi/company/Google/salaries/Software-Engi...
tandr
/me falls from the chair.
kubb
This applies to every level, particularly the lower levels.
ikiris
it ain't much, but it's honest crying into piles of money
vanderZwan
Not sure if "SWE" stands for software engineer, or "Sweden" as in Stockholm Syndrome
keville
Why not both? :sob:
frakkingcylons
Oh definitely Sweden.
dpryden
frysquint.jpg

Edit: Sigh, it seems that my terse response was misinterpreted by some?

What I mean is: perhaps you were intending your comment to be a caption for frysquint.jpg?

praptak
Random synapse activation:

A few years ago there was a Swedish tourist at a hotel where I was on vacation. He had a blue-yellow hat with "SWE" written on it in Courier font. I felt an urge to steal his hat because it looked better than most of the Google-branded swag I got as a Google SWE :)

None
None
throw1234651234
Non-Googler: What do all those words mean?

Exactly. This wasn't too relatable, even though I have the GCP Certified Architect cert.

NikolaeVarius
Why would internal tooling mean anything to you? And why would GCP knowledge be useful in any way?

Its fairly simple to extract the gist of what these systems from the script.

jjtheblunt
As an ex Apple person, i'd say it means there's way too much hierarchy at Google? not sure i'm reading it right though
flatiron
we still had our processes though. Radar was my least favorite, but they replaced the ant eater app with one that was at least partially usable right before i left.
jjtheblunt
we'd say, about spoken mad scientist style requests, if it's not in radar, it never existed. :)
q3k
IMO/IME it's the clash between tooling, systems and and processes designed for running long-term highly scalable and reliable services maintained by teams in multiple geographical locations and used by billions of people; and greenfield projects that just want to get things done at an early stage.

Requiring multi-cluster/region, the quota/resource economy system, handling PCRs, code review, readability approval for complex configuration languages (and the existence of such complex languages in the first place) ... all of that makes sense in a vacuum and all were built to handle real problems and are likely written in the blood of a near-miss outage. But it also all comes crashing down on you when you're doing things from scratch for a relatively simple usecase that no-one really designed for.

dpryden
I can't tell if this comment is implying that my comment is unclear, or if you're agreeing with the first line of my comment.

In either case, though, it's an inside joke precisely because it's more relatable to those who are (or were) inside. In particular, I think it would be most funny to someone who was at Google about a decade ago; when I left Google in 2017 things had already changed enough that this didn't ring quite as true for new hires.

That said, GCP is not very representative of what the internal platform looked like circa 2010. (Or even of what the internal platform looks like now, as far as I know.)

throw1234651234
I agree that as a non-Googler, I don't get the video, that is all. No negative connotation toward your comment.
celtain
>I think it would be most funny to someone who was at Google about a decade ago

I was going to correct you on the timeline, but then I realized that my time at Google was almost 10 years ago now.

Fuck.

cperciva
Where does "these are really good points, but why don't we have tooling which sets everything up automatically?" fit on the scale?
raldi
Or: These are really good points for a visibly-user-facing post-alpha service, but isn't it a bit overengineered for an experimental internal service whose clients can tolerate the risk of occasional downtime?
nostrademons
L5 Xoogler who left for a startup.
SilasX
Yeah, that was my reaction. I get the need for all this reliability/failover, but it's horrible failure of abstraction/separation of concerns.

There's no reason the serving team should have to learn how to do all of those things on the checklist, since it can be done by anyone who's already learned the infra. You're expecting them to learn all kinds of stuff outside of their specialty, when they should be able to kick the app over the wall and let infra ensure that the app is deployed in two separate PCR zones with the failover plan etc, which should itself be mostly automated.

q3k
> when they should be able to kick the app over the wall and let infra ensure that the app is deployed in two separate PCR zones with the failover plan etc, which should itself be mostly automated

Not entirely - the developers should actively participate in designing the actual failover scenario and making sure the application can handle that (anything from being okay with some downtime due to the failover happening to designing an actual multi-region multi-master application). Making assumptions like 'infra will handle it' is a great way to not only get unexpected outages (because the developers assumed there would be no downtime because failover is magic, or that writes will never be lost) but to also introduce tensions between teams (because you now have an outside team having to wrangle an application into reliability when the original authors don't give a crap about it).

I get and agree with your point, the tooling and processes should definitely be simplified/automated when possible, and developers deserve a working platform that just works. The whole point of a platform team is to abstract away the mundane to let people do their job. But reliability is everyone's job, not just the infra's team, and developers must understand the tradeoffs and technology involved in order to not design broken systems.

SilasX
If that's the point:

A) It's doing a horrible job conveying it. A dev does need to be concerned on how to handle failover, but only at a certain abstraction level. They should be required to specify something in the form "given server A fails and has to pass to B, what do you do?" That does not require you to know the terminology about PCRs and how to make decisions about which cells (or whatever) to pick on deployment, or avoiding the "gotcha" about making sure the two servers are in different PCR zones.

At that point, it's just following a checklist that needs no knowledge of the specifics of the app, and, to the extent that it's accurately representing how Google was, is indicative of bad processes.

B) Many things should be infra's job, as they're cleanly orthogonal to what dev's are doing. For example, how to apply a security patch to a DB. That's unrelated to the operation of the app.

I do get your point though, and I wouldn't say something like this about e.g. testing (which was the short, "reasonable" part of the video!) -- the devs have intimate knowledge of what counts as passing and failing and should be writing tests, and not 100% passing it over to QA. But that's precisely because such concerns are deeply tied in to the thing they are concerned with. "SQL 3.4.1 vs 3.4.2" is not.

q3k
Yeah, it seems like we agree :).
dustingetz
Because you have to get it working before you can make it better. Abstraction is quite secondary
SilasX
Yes but the video is in the context of a mega-scale mega-corp that should have been able to set up clean abstraction boundaries at this point by now.
dustingetz
that's now how that works - you win by blitzscaling the first thing that works (not the second thing, that would slow you down)
Jensson
They already have done that, this video is 11 years old, at that point Google was half the age it is now and a fraction the size.
SilasX
Google was still huge in 2010. Everyone seems to think that everything was a hundred percent different just <small number> of years ago...
Jensson
> <small number> of years ago

Half a companies lifetime isn't a "<small number> of years ago" for that company. You can't compare tech ecosystems today to those in 2010, so many things has gotten standardized since then, Google was at the forefront back then.

Unlike modern companies Google had to build out everything themselves since nobody had built those systems or even had experience building such systems. That takes time, but today all of the things Google learned is common knowledge in papers and similar.

If you disagree mention one company that had a one button script that abstracted away things like where the data is stored to ensure failsafe, data replication etc, in 2010. I don't think there was any, just the fact that Google made it relatively easy to launch such services, just that you had to manually configure the replication script and the zones your data should be stored in wasn't really a big deal.

lumost
Mega-Caps suffer from the following problem:

1. There are more engineers making more divergent architectural solutions such that there is never a single place where you can make changes across the group.

2. Failures keep happening, so process is instituted with many checkboxes for engineers to work through.

3. Engineers on the small scale stuff get stack ranked against the engineers on the big scale stuff. Everyone needs to show that they can do the work and are "fungible". This leads to small internal systems having the same operational standard as large public facing systems.

SilasX
I don't see what that's replying to. Nothing in that list would justify demanding that the app's team have knowledge or preference about which PCR zones to pick and which will just have to be corrected when they inevitably pick the wrong one.
lumost
The point is that every team gets to set their own failure modes. I know of multiple tier-1 services which diverge from at least one best practice.

Think of the scenario where a cloud provider needs to evacuate an az. There is no API which would allow the compute team to force migrate tens of thousands of apps and guarantee that they both are not effected and maintain their redundancy guarantees.

Internal services at google are in the same boat. However google knows about the hard edges and forces everyone to deal with all of that complexity - there is no api which the serving team could plug into which will avoid this overhead.

joshuamorton
While what you say is true, I think GP is ultimately correct. You can have a system define a convention and allow bypassing it, instead of forcing everyone to start from scratch. In fact, this is the approach that pretty much any modern service at Google will use.
SilasX
That still at no point requires the application's team to make decisions about which two PCR zones to pick and which cells within it to pick, which [decision] can still be cleanly abstracted away, and would still be a mixing of unrelated concerns, and so your comments are still orthogonal to the point I was bringing up here.

Edit: It might help to check out my comment here, where I clarify what a dev should vs shouldn't have to worry about: https://news.ycombinator.com/item?id=29085638

cmrdporcupine
imho the Google interview process selects for people who thrive on organizational challenges.
GeneralMayhem
I think that was more or less the intended response. And ten years on, most of these things are automated. This video was a kick in the pants internally.
nostrademons
L9+
packetslave
T7-T9 Vision
rahimiali
Is there a page that documents this anecdote? I’d like to link to it next time i use this phrase. Ironically, googling for it doesn’t turn up anything relevant.
tommiegannert
Google has officially apologized. The person in question had to take their blog offline due to bad behavior from readers (unrelated to Google AFAIK). Overall, this is a dark chapter. It's also been scrubbed internally. Not obliterated, but you won't accidentally bump into it as a Noogler.

As much as I think people should take responsibility for their own actions, it's probably for the better to let this one rest now. Who caused it is irrelevant at this point, though. We (Xooglers, and Googlers) can take responsibility for our actions, and not continue perpetuating it.

rahimiali
I use the phrase to describe how leveling is not about what junior people think it’s about. I’m not clear about the responsibility you’re talking about.
nuerow
> Where does "these are really good points, but why don't we have tooling which sets everything up automatically?" fit on the scale?

My guess it fits nowhere because the L5s don't have the ability to automate it, and the L6s think it's trivial and as it's done sparingly then it doesn't justify the work to do things differently.

And this is why we can't have nice things.

ikiris
More like you aren't going to get promoted for automating someone else's toil. Also, now who's going to support it, better deprecate it since the library changed / got deprecated / it's tuesday.
Jensson
> More like you aren't going to get promoted for automating someone else's toil.

Lots of people were promoted for automating these things. They built easy to use services, got extra headcount since they became important and climbed the ranks. So not sure why you'd think that.

It may be different at other companies, but at Google building stuff that many other engineers depends on is a major way to get promoted. Of course if you automate something and nobody uses your automation tooling then you wont get promoted, but if your work gets used by basically every new engineer you'll climb the ranks quickly.

azornathogron
And yet it's been a decade since this video and practically everything it mentions is a non-problem now.

No one is spinning up new borgmon instances. Spanner is replicated by default. Only very low level services need to care about PCRs. If you use one of the approved frameworks it will set up practically all the production configuration for you. Basic alerting for your service is automated, just turn it on, picking cells to run in is automated, scaling your service is automated, etc.

Actually getting quota remains a problem... :-p

Anyway I would argue we can and do have nice things, and that has happened precisely through the efforts of a huge number of people at all levels.

Edit to add: of course, there are always new problems to complain about! It's the march of progress after all.

compiler-guy
Yes. If someone were to make this video today, it wouldn't be about production jobs and PCRs, it would be about privacy reviews and branding approvals.

But the quota issues haven't changed a bit.

omreaderhn
That would be 'Xoogler' because Google's engineering and broader corporate culture does not reward work like that and so when you realize that, you leave.

In general, Googlers have very little idea how far behind the rest of the industry they are when it comes to tooling.

I am a Xoogler.

mwcampbell
I got the impression, based on a blog post by Eric Lawrence [1], that Google's developer tooling was top-notch (except for devs working on open-source projects like Chromium). Did it get worse since 2017, or are you talking about a different kind of tooling?

[1]; https://textslashplain.com/2017/02/01/google-chrome-one-year...

throwawayfgg
Google's developer tooling is top-notch and amazing and constantly improving.
devnull3
At 2:05 the green dude asks if you think your users are scum and do you hate them.

The funny thing is Google as an org ends up hating their users "accidently" anyway because of their history of pulling the rug under the services/APIs etc.

nunez
even more ironic given that google+ came out four years later
munk-a
If the users had properly set up a PCR notification about the change and registered it to a bigdata instance then they would never be surprised about service discontinuations. The moral of the story is that you can't fix stupid users. /s
dang
Recent and related:

I don’t know how to count that low - https://news.ycombinator.com/item?id=28988281 - Oct 2021 (259 comments)

especially this comment: https://news.ycombinator.com/item?id=29032656

birken
Hey... those of us that worked on Google's internal Bigtable service worked very hard so you didn't have a file to a ticket to set up replication between your Bigtable cells.

The rest does seem about accurate though.

gcampos
What exactly are these "peer bonuses"? Is it real? Is it what I think it is? Do people actually use them as bargain chips?
advisedwang
Yes they are real, it's in the low hundreds of bucks range. They must be approved by the recipients manager. Its also limited how many a employee can send (but the limit is fairly high). There is also "kudos" which comes with no money, but has no limits or approvals required.

They are intended to be used for going above and beyond, not for stuff that falls within the scope of ones job. Using them as bargaining chip is explicitly against policy.

B-Con
You can send a small, semi-official "thanks for a job well done" to someone else and it comes with a few bucks attached. People joke about using them nefariously (as people tend to joke), but I've only seen them used appropriately.
compiler-guy
People don't use them as bargaining chips most of the time--it is explicitly against policy. I'm sure it happens some times.

What they do do is send one when someone else does something nice (like fix a bug from a project they have left or whatever else). If you ever need something similar again, the person you peer bonused has warm fuzzies about the experience and a hint that they might get it again.

People also use peer bonuses during perf time to demonstrate that the work they are doing impacts other people enough for a somewhat uncommmon thank you.

nunez
you get some money ($150/bonus, IIRC) for helping someone out, assuming manager approval

akin to the usual corporate "thank you" gift card, but more money and generally easier to distribute

q3k
> What exactly are these "peer bonuses"? Is it real? Is it what I think it is?

Each month, you can nominate another employee for a small bonus. This is designed to be given to coworkers who have gone above and beyond what was expected from them.

> Do people actually use them as bargain chips?

From my experience it's so over-the-top absurd that it would be difficult to have someone interpret such an offer as anything other than a joke or a meta-joke.

https://blog.bonus.ly/a-look-at-googles-peer-to-peer-bonus-s...

tazjin
It's not each month. You can send a lot of them. There's a theoretical limit and a bunch of restrictions but in practice they're unenforced.
guyzero
Each one has to be manually approved by the recipient's manager, so this can't happen. It's a joke.
tazjin
No, they auto-approve after 3 workdays and in my experience this is what usually happens with "questionable" peer bonuses.
jamestimmins
As an external user who has found Google's services to be incomprehensible, it's nice to know it is (was) equally as painful internally.
Spivak
I’m so confused, isn’t this just like basic highly available infrastructure mixed with a toxic SRE culture?

I want to serve 5TB!

Okay grab two instances in different patching zones, create a bucket in our replicated RADOS storage that can hold your data or create a table/db in our Postgres cluster, write your app with tests, add an entry in to the load balancer, add an entry in our big ole distributed job scheduler if you need cron, and submit a PR against the infra repo to add Prometheus metrics and alerts.

And when your done with that set up CI/CD because you shouldn’t assume that instances are reliable and if you don’t give us the code to do a deploy we can’t recreate your app when the VM goes belly up and we’ll have to page you.

Are people not used to what it really takes to “just run some code?”

gliese1337
I am used to it, but

1. It is rare for the details of how to actually accomplish each of those steps to be both documented and the documentation made accessible.

2. If you can describe it that succinctly, it really ought to be automated. If it can't be automated... then you left something out of your instructions, which goes back to point (1).

Spivak
Like the steps to do all of this are automated, but we can't read your mind. All of this is basically boils down to submit a PR against some repo that says "there shall be two instances in these regions, there shall be a database in this cluster, there shall be a bucket with this name, etc etc" that the SRE team reviews and merges, which triggers an infra deploy.
svachalek
It totally makes sense for Gmail, but at Google "serve 5TB" means something like sort your manager's inbox, something that someone somewhere has an interest in doing, or trying, but of no real consequence for failure.
q3k
People with HA production experience can easily vibe with points made by Broccoli Man. Yes, these things make a lot of sense if you actually want to get code running reliably, especially at scale (organizational and userbase).

But we must not forget how this can look from the point of view of someone who hasn't had to deal with a page due to an entire datacenter going offline, who's not aware of all the hundreds of small things that can go wrong by doing the 'obvious' thing. I think the video is more of a way to poke fun at the optics of this (and some of the overly arcane stuff involved), rather than at the idea of high availability being useless. At least that's how I've always felt about it, a way to remind SREs to respect their internal users (simplify! automate! document!) and that what makes sense to them might look ridiculous to others.

kgin
Only if you think your users are scum. Do you think your users are scum? Do you? Why do you hate your users?
GauntletWizard
Holy crap! I've been asking for this for forever[1]. Thank you to the leaker!

[1] https://news.ycombinator.com/item?id=21786729

benley
You're welcome <3

Here's hoping Google doesn't get mad about it - though after 12(ish) years there's really nothing secret in that video.

jamestimmins
As a friend of mine explained why she left Google a year or so ago, "I got tired of emailing 30 people to try to figure out who owned a single variable."
raldi
The best and worst parts of being a Google engineer: Impossible things are merely very hard, but on the other hand, easy things are also very hard.
nostrademons
A TL I worked with once had a simple but effective strategy for that:

"Remove it and see who complains."

I did that (with the impenetrably named "PrefetchExperiment", last touched by a branch that lost previous file history in 2007). Turned out it was the source data for Google's DNS to figure out how to route queries to the lowest-latency datacenter, based on their geographic location. In about a month, it would've taken down all Google services. Oops.

It was a very effective way of figuring out who owned the variable and writing a big long comment explaining what it's there for and which team to contact before changing it, though.

AlexanderTheGr8
LMAO, isn't this very similar to FaceBook's recent DNS problem?

Also I love the idea of removing it and seeing who complains.

_3u10
It also works great for a product / bug backlog. Just delete the entire thing. If it's a real bug / feature it will get recreated.
zamadatix
Facebook's recent "DNS problem" was a process for checking routing failover capacity on the backbone for maintenance ended up taking down the backbone links. As a result of the servers being disconnected from the backbone they pulled their BGP advertisements since they considered their location to be unhealthy (no connection to the backbone).

FB's problem was the lack of routing reachability on its backbone triggering the lack of routing reachability information being sent to the larger internet, this in turn caused problems for DNS not the other way around.

joshuamorton
Scream tests are always fun. ("break it and see who screams")
ikiris
The hilarious thing is I know exactly what file you're talking about here.
nojvek
The borgman readability approvers makes me chuckle.

At Stripe, there were language approvers. Only those blessed could approve PRs. Even XML had a set of approvers. I had fun time getting hold of an XML approver.

nunez
this is making me miss memegen

google had its downs, but wasting hours on memegen was not one of them

dmitrygr
Ask someone still on the inside and you'll hear that the memegen is a mere shadow of its former self

Xoogler 2012-2019

nunez
that's really sad
mikewarot
I didn't realize this was made by Google in the first place when I saw it a few days ago. I hope things are simpler now, but I doubt it.
jazzyjackson
what am I looking at here

EDIT: it has been explained to me: https://rachelbythebay.com/w/2021/10/30/5tb/

Zababa
There's almost some kind of irony on uploading that to youtube, a feeling of "why can't I deploy my service as easily as people can upload videos to youtube?".
opinion-is-bad
The multiple repetitions of “This is Google” hit home for me. I never worked as a software engineer so much of the rest is out of scope to my experience, but the constant idolization of Google, and by proxy each other for working at such a place, eventually changed from feeling coy to cultish.
frakkingcylons
For anyone else who'd rather read than watch this video, here's the transcript (from YT's auto-generated captions): https://pastebin.com/8UrFftM6
ChuckMcM
Wow, way to trigger my Google PTSD. :-)

I think the only thing I would have added would be using some service that recently was changed so that someone could get promoted but does the same thing the old one did but with different bugs.

shoeshoeshoey
Facebook had its own meme: "Pusher I need a hotfix"
treebog
What does “serve 5TB” refer to? They expect 5TB of network bandwidth over some time period (a month?)? Or their database takes up 5TB on disk?
rachelbythebay
Imagine it as "I want to have a http://foo/~me/ type path where I can park 5 TB of stuff and other people can fetch from it when they feel like it".

5 TB of data made available, not 5 TB of transfer/bandwidth/etc.

drjasonharrison
If you watch the video, it doesn't matter. It's just something they want to serve.
packetslave
IIRC, the impetus for Jon Orwant creating the video was him wanting to make 5TB of data publicly available (the US patent dataset? it was before my time) and all the hassles that were involved.
metalliqaz
i think it just means to put 5TB of data online
raldi
It's a joke that's sort of open to interpretation.

The most straightforward is, "I just want do this incredibly simple thing; why is it so hard?"

But there's also the level of, "Googlers are so engineeringly pampered that they think serving 5 terabytes is the equivalent of Hello World."

And then there's another level of, "Well, isn't it? After all, this is Google and this is $YEAR."

Reschivon
> It was a site where you could type in a script and it would do text-to-speech and actually > animated some goofy characters to lip-sync the words for you

Does anyone still have the code for generating "Fox girl and Broccoli man" videos? I'm thinking of starting a small revival of this meme.

SilasX
Wow, kind of funny that Xtranormal now lives on in the few viral videos that were made with it.

Here's where the company is now (the original domain is used for something else now):

https://en.wikipedia.org/wiki/Nawmal

metanonsense
Move fast and break things! And while you are at it, please, don't break anything.
zzt123
Was this video ever used by Google for anything outside of internal before now? I swear I’ve seen this before (it’s still hilarious!) but I never worked at Google.
codewiz
10 years ago, on my first week at Google, my manager suggested I watch this video to get a fairly accurate idea of what the team does :-)
didip
What is "Borgmon readability" and why was it important. I think that's one of the punch line of the video.
yeputons
If you change source files in language X, someone proficient in language X (aka "has X readability") should approve that it corresponds to Google Code Style in language X.

You start without it and may obtain once you've written a bunch of code in language X.

I'm not sure if there is really a Borgmon readability. But if there is, it seems like Borgmon configuration files are both common (so that there is a readability requirement) and uncommon (so that there are very few people with readability).

GeneralMayhem
There isn't Borgmon readability anymore because of this video (the video is about 10 years old).

Borgmon is a monitoring system for Borg (https://sre.google/sre-book/practical-alerting/#the-rise-of-...), and it has its own language for configuring monitoring rules. The Borgmon language is infamously obtuse, and the idea of "clean" Borgmon code has basically always been a punchline. To be fair, the problem domain that it's in (declarative computation and reduction over high-dimensional datasets) is tricky to make clear, and if you use it exactly as intended, it can do some really cool stuff. But the upshot is that it has a high learning curve, a high propensity to degenerate into wallpaper, and a very small number of people who are sufficiently familiar with it to be readability reviewers.

It also doesn't help that "just copy-paste someone else's" actually does cover 90% of use cases, which both reinforced the idea that it was a chore and actually made it harder to gain readability, since being granted "readability" status requires writing a meaningful amount of non-trivial, non-copy-paste code.

mlindner
Wow I saw this somewhere a long time ago. But I don't remember where and in what context.
silentsea90
Actively working towards being a Xoogler so I don't have to live in this dystopia.
Lammy
Personally I'm still waiting for a copy of "Pusher, I have a hotfix."
rachelbythebay
Maaaaaaybe ID 10150136890464739. Maybe.
Lammy
Didn’t work logged-out, but I’ll try again if I ever reactivate my account. Thanks!
romwell
Ah, as a xoogler, I can finally enjoy this video without tears :)
sbpayne
I'm so glad I can see this again. I forgot how much I missed this.
Wonnk13
One of the few things I miss about my time there...

Never did get Java readability :(

anshumankmr
What is this exactly?
ts4z
Xtranormal was a video service that would animate scripts with some stock characters.

Someone made a bit of internal-only snark, and "I just want to serve 5TB" became an in-joke for turning easy problems into exercises in frustration.

Some of these things have, actually, been addressed.

jboggan
Eagerly awaiting the "GoogFellas" video leak.
sunyc
I actually have Borgmon readability! Peer bonus pls.
ineedasername
I think I'm missing something. Does readability have a special meaning in this context or is it really as basic as it sounds, and therefore all the more ridiculous for being such a bottleneck?
goleary
At Google an employee must have "readability" in a language in order to sign of on a code reviews in that language.

You must pass a test in order to get "readability".

adminscoffee
it's funny how you have to have a dictionary in order to understand how to navigate google's infrastructure
slac
I have the t-shirt!
martini333
Google: The Sunk Cost Fallacy
nunez
still hilarious
b20000
meanwhile google docs is still a slow piece of shit.
loxias
And now, in 2021, Google has inflicted their "clarity" on the rest of the world. I miss jobs from the 2000s, the jobs where you were paid to write software for a living.

You know, engineering! Given a task, or set of requirements, develop software on your computer, software which eventually runs on the customer's computer, where it's used to solve the customer's problem.

My most recent full time employment a year ago was at a great company. Healthy culture, some of the most talented coworkers I've ever had the pleasure to work along side.

Over the year I lasted there, I used for the first time: Docker, Golang, Kubernetes, Terraform, Gitlab, Saltstack, Prometheus, (and probably other middleware that my brain has GCd to free space). I was barely able to get anything done. At least, it always felt that way.

Maybe I'm just an idiot, I don't know. I'd accept it if true! What I do know is that I used to be able to build things for people, be compensated well for it, and get satisfaction from a customer liking what I've built. It was simple.

In this brave new world, with containers, pods and this and that and the other thing, where it can take months before one even understands enough primitives to do a "hello world".... how can anything ever get done?? How can anything inventive, creative, or experimental emerge from our industry when the develop/test/improve cycle has gone from minutes to weeks or months?

I don't know what the future looks like, but the present strikes me as unsustainable in the long run.

(edit: Wow! I expected this to be downvoted to oblivion, not my highest rated comment on the site...)

<tiny>(Forgive this shameless self promotion: if, dear reader-who-is-a-hiring-manager, you have a paid role for a lowly but experienced systems engineer who doesn't know anything about "web" or "apps" or "social" but is quite adroit with C/C++ (and a few others), most "sciencey/mathy" type problems, signal processing, firmware, network protocols, automation/scripting, and more, ... email is welcome!)</>

rvnx
I tend to agree with the green dude :|

It's normal to have a production service replicated on 2 availability regions.

The green guy is annoying, because reality is annoying, and reliability is not about luck, but is about a properly calibrated and tested process.

Yes, you need to write monitoring, you cannot run only with "hope".

Yes it sucks that a DC can go down. Your particular service is not important if it's down, but having a copy of the production data is essential in case of a catastrophe.

Except for the tests that are probably unnecessary, everything else seems to make sense.

The peer bonuses are an issue though.

midasuni
Depends what the problem you’re trying to solve is. In my experience the vast majority of business problems do not need that kind of reliability, and if they do they don’t need it deployed in such a Byzantian way.
zaphar
You say that but then the system goes down and the CTO is walking up to your desk asking why it's down and exactly when can they expect it to be up and don't you know we are bleeding money right now?

What you call byzantine an SRE calls necessary complexity to meet the needs of your business.

midasuni
No it’s not. There are of course some systems like that, but refit the majority of systems for the majority of companies that’s not the case. If an invoice can’t be paid for 6 hours the world doesn’t collapse. If the CFO can’t get the statistics for his quarterly powerpoint at 3am, it’s not a major problem.
gopher_space
A lot of it feels like premature optimization. Like I'm laying down a heavy infrastructure to support change but it's already locking me into certain ways of looking at problems.
loxias
Great observation. Perhaps the term "premature infrastructure architecture optimization".
Jensson
It isn't premature though, a service is as robust as its weakest link, so if you let people write crappy services that easily goes down and are hard to get back when they do then you will get a huge amount of outages in major services since they depend on so many small ones.
loxias
All of this is true, but I'd wager there's 10, at MOST, entities on the planet that are large enough to warrant this level of ... "architectural overkill". The other 99.99% of us don't need it.

I CERTAINLY don't debate that Google, or Amazon, or Facebook, or Netflix, or the phone system, or anything else that touches a noticeable percentage of the human race needs architecture like this to provide "5 9s".

But, just like when "big data" became a buzzword, and many people thought their problem needed "big data" approaches to solve, the thought that all but a small minority of entities need this is Simply Wrong.

I am reminded of a client doing something with genomics about 9 years ago. They had some over-complicated "new tool" infested approach to solve their "big data" problem, but the run times were taking too long. I was brought in as a consultant to improve it. After I was done, a data processing run that took hours (causing employees to run them overnight) before took minutes, or seconds. What did I do? I got rid of all the complexity. I replaced their expensive cluster with one studly provisioned machine. I replaced their collection of networked Java microservices with 1 non-networked multithreaded C program. I replaced their XML based format for data at rest with something I whipped up, tuned to what they actually needed.

Once their "we need big data!" >10TB data set could fit in a single machine's memory, the rest was easy. What used to "require" a cluster of machines and overnight processing could be done interactively, and quickly enough for the scientists to get into a much more productive "flow", doing dozens of runs per day.

tl;dr: unless you're google (or google scale) you don't need all this crap. :)

ridaj
I think the challenge is you'd expect a company like Google to have more of the setup be automated. If replication and monitoring are such universally good ideas, then why don't they come out of the box?
novok
Green guy should be making a all of that a one click process to start up a service shell that does all of that for you although. Then as you write it up an automatic linting & rules engine will highlight what is missing before you make a final pull request to get the necessary human approvals, ONCE.
ethbr0
The current landscape (optimizing for hyperscale, at the cost of complexity) seems like a natural extension of relatively few giant corporations funding the majority of programmers. To such an organization, efficiency & time to market are more important than simplicity.
bcrosby95
Last I looked at these numbers, its more like 50/50. And as the video points out, it doesn't necessarily make things more efficient.
bcrosby95
There are around 4.4 million programmers in the USA. A few giant corporations don't fund the majority of programmers.
iamstupidsimple
But at least in the FAANG example, time to market is much slower because of said complexity.
thiagocsf
I believe is now called MANGA.
tester756
not MAGMA?
mwcampbell
> software which eventually runs on the customer's computer

I gather then that you're not a fan of SaaS. True, one can cynically explain the rise of SaaS as rent seeking. But there's undeniably value in selling whatever functionality your software provides without burdening the customer with having to run it on their own computer(s). And when we do that, it's our responsibility to make the service reliable, which is what a lot of these tools are trying to do.

loxias
> I gather then that you're not a fan of SaaS.

I'm neutral, I think? I don't quite see the point of it would be more accurate. I don't think I've used any SaaS in my personal life (other than streaming services. Which I'd prefer as a local app anyway, and I still do, for music, but not video)

I'm sure it's a matter of opinion, not something with an objective answer, but "burden of running software on their own computers" genuinely confused me as I read your comment, I thought "burden? what burden?".

As a user:

If software is designed properly (and most isn't...) you download it once, and it runs. Is the burden the time it takes to do the download? Compared to the noticeable burden of using a webapp, with problems like crappy and frustrating responsiveness, an inability to work without an internet connection, and frequent inability to handle tasks of real complexity, I'd choose a local program any day.

As an employee:

Heck yes SaaS! $/month >>> $$$/customer :D Of course it's rent seeking, and I take (and give) no shame in that.

mrtksn
How malware people manage to scale to millions of machines and run multimillion or even multibillion $ operations? They do all these things in a hostile environment, people don't spin a docker container to run their malware, their ransomware software is installed by a single click or even less. It runs on pretty much anything.

Yes, sure, for mission critical stuff you will need everything that the green guy talks. You will need replications, you will need tests, you will need a highly involved deployment protocol but for most of the stuff it seems like everything is an overkill that transforms product building into an assembly plant construction where all you need is a workshop.

thrashh
Perhaps we need more specialization but I remember the time before these kind of tools and I hated it.

I’m a lazy person and I absolute love tools. Tools like Docker helped me never have to solve other people’s complex environment problems again. I love metric reporting tools like Prometheus because it helps me front problems before they become weekend emergencies. I use a paid Git GUI so I can fix complex Git problems without ever making a mistake.

pm90
Same. I do not have any nostalgia for when you had to say into machines and run scripts. Please no.
karolist
Which paid Git GUI friend?
thrashh
SmartGit
loxias
I'm also a lazy person! Which is why tools like this are a PITA to me.

The one exception is Docker. It's not a regular part of my workflow, because of how it makes things both harder to get started (making a working Dockerfile takes a bit of time), harder to debug, and slower to build (I just changed one line! Now I have to rebuild the whole image to see if it fixed the problem... &c).

However, for deployment of the final product? I agree Docker's GREAT. But, consider, in that respect it offers nothing I didn't have at the start of my career 20 years ago. Static linking for interpreted languages. :)

thrashh
Remember that Docker does cache intermediary builds (each Dockerfile line being a build) so when you are developing, avoid editing any existing lines and then combine it back together when you get it working.

Deployment of Docker is gross, but I think that’s because that space IMO is very immature.

From my experience, when something has a poor interface (Backbone.js, AngularJS, UMD, etc.) I avoid learning it because I know something is going to replace it. Kubernetes is currently squarely in that boat as far as I am concerned.

mathteddybear
Reminder that jobs from the 2000s that you were paid to write software for a living include also J2EE and CORBA projects.
loxias
True, true. The mention of J2EE did just made me shudder a bit. ;)

But at least, then, I could write code, run it, make changes, and run it again! And see results! Quickly! My main objection to this "future world" is the vast number of layers of abstraction that you need to fight through just to get your first result.

As you can surely guess from my biases and opinions, my happiest engineering projects are those that only require me, my thinkpad, emacs, some man pages, and a C++ compiler. :D And those are the ones I do in my spare time.

VHRanger
In my team, we often deploy internal "services" as cronjobs on an EC2 service. This hasn't run into any issues in 24 months.

One of these we decided to move to a more serious infrastructure (a set of AWS lambdas). It's failed three times in 6 months since, and we're moving it back to be a good old cronjob on a server.

Simple is good.

cheeze
What does the cronjob do? Start some service that listens for inbound connections? Or are you talking more about daemons that do some set of work every interval?
VHRanger
ETL and modelling, then pushing the modeled values to tables
SamuelAdams
Just curious what your cost differences are between a dedicated EC2 instance and lambdas. For our organization an EC2 instance was at best 8-10 times more expensive than lambdas.
VHRanger
There are multiple equivalent lambdas on a single EC2 instance, so it's running jobs 10-12hours per day
callmeal
AWS Lightsail instances are pretty affordable ($10/mo and up)
SanchoPanda
$3.5 USD and up
x0x0
likely dwarfed by eng time, both in dollars and opportunity cost
azmodeus
The question is also what's the cost of troubleshooting the lambda service going down 3 times. 10x more expensive and reliable can be a good trade.
jbverschoor
omg so toxic
throwaway20371
These kind of organizational problems happen everywhere, that doesn't bug me. What bugs me is when leadership knows about it and doesn't care. After low-level engineers stick their professional neck out to complain in internal town halls and through feedback forms, and leadership gives some bullshit answer that doesn't address or even acknowledge the problem. It would be less infuriating if they just said "I don't give a shit." It's the weasel words and pretending the problem doesn't exist that infuriates me. A lot of the time it doesn't even take much work at all to begin addressing the issue, like a working group for continuous improvement of highly-painful high-value processes. You don't even have to solve it. Just attempt to address it.
m0zg
At Google back then "leadership" might as well not even show up. It was super bottom-up, and _you_, not "leadership" were supposed to identify and fix issues. No "leadership" would stop you, either, at least in most cases. I don't believe that in all my years there anyone ever told me what to do. It was very easy to start projects, shut down projects, get headcount, get resources (if your business case is sufficiently persuasive to others). Not a complete free for all, but certainly _a lot_ more freedom than you'd normally see in companies of that size. And (IMO) people used that freedom and autonomy pretty well.

That kinda deteriorated over time, culminating with Sundar "McKinsey" Pichai, and then went rapidly downhill from there, and now I flat out reject their recruiters, based on the feedback from friends still employed there.

TideAd
My team has issues deploying builds to test machines. It's like 15 steps and takes an hour. The tooling is atrocious and recently got even worse.

We eventually found the team responsible for this (the org structure is hard to penetrate because no one answers emails). They said they had no idea anyone was dissatisfied. Then they said that it was a low priority so they didn't care and nothing would be done.

In my experience, you can usually convince an engineer that their stuff has a problem and they need to fix it. But it's often impossible to convince management if they aren't on the hook for user satisfaction.

ts4z
To be fair, they did, and many things have improved. And this video was used as an uncomfortable reminder to make some of those changes.
calmlynarczyk
I work at a global corporation with 50,000 employees. Even though I've never been at Google I felt every pain point this video was getting at because our company is trying to implement all of this stuff right now.

"Oh you want to go to production? Here's a list from A-XX stating what you need to accomplish that." Thing is I thought they actually handled this gracefully when I started because lots of requirements were tiered with various criteria you had to meet to move up (mostly for brownie points).

But then one day the Tech Execs lose their minds and decide "everything needs to meet all criteria for every single process." You want to create an S3 bucket to store data? That will be a week of submitting paperwork and another month of meetings and approvals from various teams you've never heard of. Plus you have to register your schema, implement data quality checks, unit tests, regression tests, get a PR and CO approved for your central config change, remediate any CVEs in the tooling that you used, and build all of this using our in-house CI/CD platform we created because we're just soooo special. Now you're allowed to launch. Oh wait, NO because we've put the entire corporation on hold from launching new systems for the last calendar year because we're still trying to agree on the final process everyone needs to follow to go to production.

It's surreal how universally so many orgs makes the same mistake of trying to throw more and more process at problems.

unethical_ban
In my previous role, the secdevops groups (matrixed teams) were building custom terraform modules for our devs to use in order to easily deploy compliant AWS infrastructure - and devs could only deploy via terraform/CI-CD. While TF specifically states that custom modules are not meant to be used as wrappers, I thought it was a clever way to try getting security "out of the way" while still enforcing best practices.
darkwater
> While TF specifically states that custom modules are not meant to be used as wrappers

What do you mean with this?

easton
"We do not recommend writing modules that are just thin wrappers around single other resource types. If you have trouble finding a name for your module that isn't the same as the main resource type inside it, that may be a sign that your module is not creating any new abstraction and so the module is adding unnecessary complexity. Just use the resource type directly in the calling module instead."

from https://www.terraform.io/docs/language/modules/develop/index...

If you write different versions of the terraform modules that do some corporate specific magic, I think that would be okay under this rule. It's when you're writing a module that doesn't do any useful magic that they want you to stop and think.

acdha
> It's surreal how universally so many orgs makes the same mistake of trying to throw more and more process at problems.

Followed by the inevitable ranting about “shadow IT”, AKA the requirements gathering they really should have done.

tetha
Well we're small, but our development is currently starting to build new products and new extensions to their products. I'm pretty happy that everyone is pretty much onboard with our situation #1.

There are a few hard requirements, but most of the requirements we as operations put up are tied to the guaranteed service level agreement to the customer and possibly overall user count.

If there is just an entirely lax service level agreement there might be no need to invest time in clustering non-trivial applications, or implementing more monitoring than a simple HTTP check. On the other hand, if you're selling some 99.95 24/7 with penalties to a customer, the list of must-dos suddenly grows a lot.

The nice thing of approaching it like this: It allows a gradual increase in operational rigidity and robustness. A product team doesn't hit a wall of requirements for their first productive customer. They rather have to incorporate more requirements as the service becomes more successful. Or they don't if the idea doesn't work.

brightsim
Literally every point you made applies also at my workplace. The optimist in me hopes we work at the same place, but I fear that your last statement might just be the truth :-)
oblio
> It's surreal how universally so many orgs makes the same mistake of trying to throw more and more process at problems.

It's hard to find the right balance. You want a bit of process, but not too much.

But it's one of the hardest problems in the existence of humanity and whoever solves it should probably get all the Nobel prizes available (including peace and chemistry!).

smartician
It's not much different today. Nowadays you'll also need privacy review, accessibility review, security review, and diversity & inclusion review.
throw10920
> diversity & inclusion review

Is this tongue-in-cheek, or are you serious? Poe's law and all that.

smartician
Partly tongue-in-cheek. These review processes exist, but whether they're required or not depends on the product area and type of project.
rodgerd
Perhaps Google don't want to be in the news for identifying dark-skinned people as monkeys again?
jjeaff
I can't remember which company it was that launched a camera with face identification features, but that didn't recognize any face that wasn't lilly white like every single engineer that worked at that company. They could have probably benefited from a diversity and inclusion review. Heck, employing a single brown engineer or even QA engineer probably would have been enough to notice that before launch.
NaturalPhallacy
People jump to the racism conclusion here, but it's really just a contrast issue.

Detecting eyes, for example is simply easier with lighter skin.

Light skinned black people read just fine, and super tanned white people are harder to read. It's literally contrast (light) detection, not racism.

But because the media keeps everybody primed for racism to stave off the necessary class power rebalancing, everyone jumps to racism.

KeepFlying
Contrast may be one root of the technical problem, but claiming a product "ready to launch" while it fails to work for people based on their race (especially when the company clearly didn't put effort into preventing the issue ahead of time) is problematic.

By having a diverse team (or making some effort to include diverse opinions) you'd have a chance to discover new ways to detect faces, or new mitigations to the contrast problem.

But claiming a product is ready for release when it excludes people based on race (no matter the technical reason) is a problem.

alasdair_
It’s only a “contrast issue” if the people building the system failed to have roughly half of humanity represented in any meaningful way on their dev team.
erik_seaberg
They need to test on a realistic sample of users. Testing on the dev team is just lazy; they probably have unusually new and expensive hardware in well-lit offices.
throwaway45644
There is really no reason why the dev team should include any particular demographic: how are you supposed to have 90 years old people in the team to make sure they are recognized correctly? This is a requirements issue which directly impacts validation/test data collection. If their user base has 50% black people any reasonable protocol will include enough black faces int he test data to detect the problem early on. Ml based systems will always make errors, which errors matter will be defined by market/legal/mission requirements. It may very well be that faces of black people are harder to detect (especially in backlit situations). Should you hold the product because it may not work for everybody? It’s a complex decision. Maybe you can just have a good “face detection failed” flow to handle all the errors (think not only black people but also, tattooed people, etc.).

Arguing that having quotas of that or the other in the dev team will make them more sensitive to diversity issues in general is also unnecessary because everybody is part of some minority in some situation, hence a minimum of education will make anybody understand first hand the value of inclusiveness and diversity.

Btw, the team is using only their faces to test the system they won’t go far.. (think about lighting condition / different environments).

kelseyfrog
> Ml based systems will always make errors

Sure, but error should be randomly distributed. This is stats 101. Any decent ML practitioner will check for this before releasing a model.

throwaway45644
In theory I agree with you, we want unbiased models but here we have an input distribution that is not well understood so things get much more complex. We don’t even have a clear definition of what’s a face or not.

The model doesn’t work for people with masks: near 100% failure rate on this category of inputs. Should we release it or not?

In general some inputs are harder than others so it is expected to have more errors on those.

That being said in practice, in normal conditions, it is not hard to detect people with dark skin if the proper training data and training is used (btw, if you don’t pay attention how you do things even a low light image of a Caucasian will not be recognized) so there is little excuse to exclude a large part of the population just because of sloppiness. Moreover for this specific category (and of course others), there are consideration ethical and legal to make sure the system works for them.

Apart from that in general I do really think that ML systems with no “operator override” in many contexts are an hazard. We cannot expect the model creators to have predicted and tested for every possible input and we cannot have ways to manually correct the error (for instance in lending or border controls). Incidentally it is interesting to note this will be skilled work that will not be take over by “AI”.

kelseyfrog
I believe we're mostly in agreement. What's not acceptable to me is using "All models are wrong" to imply that it's ok to not understand ways in which they wrong, to be willfully ignorant of their failures, or to devalue transparency.

As a professional and practitioner, I have to a responsibility to engage in transparency and honesty when I deliver a model. Part of that is understanding and designing failure modes. That's simply good engineering.

throwaway45644
Indeed I agree, it seems even that for some use cases training data is not anymore the bottleneck but robust test suites are. Interesting times, let’s hope we will find a responsible way to use these powerful technologies.
jjeaff
And yet a short while later, they released a patch that fixed the bug. So your physics claim is irrelevant.

The fact remains that they would not have released that software knowing it wouldn't work for Black people. And yet, they didn't notice the bug because they were making no effort to be inclusive.

yawaramin
It may be unintentional, but it shows that they didn't test with anyone with a darker skin tone, which shows the biases at work.
NaturalPhallacy
It doesn't show that. It's literally numerical in that dark skin reflects less light than light skin , so the sensors report lower values for the entire face, reducing contrast for the entire face, which is what the recognition systems count on.

Brown eyebrows on brown skin = low contrast.

Brown eyebrows on pale skin = high contrast.

If our races were dark purple hair on bright green skin and bright green hair on dark purple skin, facial recognition systems would have no trouble with either. But that's not how humans render, so our contrast based systems struggle with low contrast.

It's like you're confusing a software/data problem with a photon/physics problem because you're thinking in your box.

yawaramin
Here is what I said:

> it shows that they didn't test with anyone with a darker skin tone

Are you disagreeing and saying they did test on people with darker skin tone, found the issue, and decided to ship anyway? You realize that either way, it doesn't make them look good?

Anyway, leaving all that aside, the article interviewing an actual face recognition software expert, shows that your guesses here are incorrect.

NaturalPhallacy
They aren't guesses. It's physics.
JCharante
It's a design problem. If they tested it with POC they would have noted down "well, our primative algorithm works well for light skinned individuals but not others"

And hopefully someone wouldn't have said "hmm good enough for me, let's ship it!"

NaturalPhallacy
>algorithm

This just proves that you're assuming it's a software problem when it isn't.

This is basic GIGO. The light sensors feed poor quality data in for people with low contrast faces, so there's nothing the software can do about it.

Thorrez
Couldn't it brighten the image in some way?

My webcam has an advanced option panel that lets me edit both the brightness and the exposure time. I can turn it up so bright that you can't even make out any of my facial features, and I'm in a somewhat dark room lit by a single floor lamp.

zorpner
Dark-skinned people are "garbage". Got it.

Your comments here are at best ignorant, and I'll stop there under the principle of charity. You need to spend some time introspecting about why you say the things you say.

jahewson
Here’s a link to an article from the time with an expert on the subject explaining that it’s not a contrast problem:

https://www.baltimoresun.com/bs-mtblog-2009-12-hp_racist_web...

alasdair_
Microsoft Kinect also had this issue. I know the one black engineer (that had little to do with the project) that repeatedly got pulled in to see if the test system worked with non-white people.

He left soon after.

shadowgovt
I wonder if the company failed to give him credit, responsibility, or compensation commensurate with his value to the project?
saltminer
> I can't remember which company it was that launched a camera with face identification features, but that didn't recognize any face that wasn't lilly white like every single engineer that worked at that company

It was HP https://www.youtube.com/watch?v=t4DT3tQqgRM

kevingadd
If you're publishing a dataset in the terabytes it does actually make sense to at least do a pass over it and make sure the data you're using isn't skewed in any undesirable way that would cause problems down the road. For example, if you're releasing 5tb of face photos for training facial recognition nets, it would certainly be a problem if all the faces are white women or asian men - the result would probably be over-fit and not perform as well for people in other categories. It would be correct to call that a diversity/inclusion issue.

Privacy and accessibility reviews serve similar purposes there, you're reducing risk by checking for these various problems and ideally they also spot ways to improve the quality of your outcomes.

dekhn
the 5tb was performance data collected from servers
kevingadd
Sounds like the reviewer would glance at it for 5 seconds and say 'ok'
drexlspivey
What if some servers were excluded?
murph-almighty
It's common in fintech for data/ML models to go through similar overview. If you happen to disenfranchise a set of people because your model said not to lend to them, you risk legal jeopardy.

To clarify, I think it's good that this is a practice.

londons_explore
A review doesn't necessarily mean you need to resolve all diversity/inclusion issues. It can merely require that you identify the issues and understand the risks of not resolving them.
charcircuit
The whole point of the model is to find who not to lend to. You are always going to exclude people by definition.
murph-almighty
I should clarify, the point is to not discriminate against a protected class.
criley2
Tell that to the legislators and prosecutors who create laws and enforce laws against you.
Thorrez
Yes, but we should exclude people for valid reasons, not for their race.
myownpetard
There are so many ways you can accidentally systematize racism in software like automated lending.

In the past there were explicitly racist policies like redlining. This leads to a historical data set of loan denials to people in specific racial group. If that group has other traits that correlate to their race, e.g. the neighborhood they live in then you could presumably have a model that doesn't explicitly have race as a feature but uses that historical data and some subset of racially correlated features and as a result disproportionately excludes people of that race.

belorn
I am not sure how one would remove all ageism, sexism, racism, classism, title-ism, and so on from lending. The whole concept is about making a prediction about the future with sub optimal information, guessing who will default on a loan and who won't. Same goes with insurance.

I have been pretty tempted to lie about where I live in order to reduce my insurance costs. It would reduce the insurance cost by half. It seems pretty disproportionately harsh that I should get lumped together with the people who simply happen to live around me.

Is it possible to make predictions illegal if they are based on historical data from other than the individual customer?

dang
We detached this subthread from https://news.ycombinator.com/item?id=29086292.
dekhn
having launched some product at Google in my day, I know quite well how to skate through that process (although D&I was not part of it when I filled out my forms). Sadly for my friends in privacy and security, it's not hard for product teams to exploit Google's propensity to launch and override privacy and security concerns.
ikiris
Is the super secret process to just have a vp invested in the launch?
dekhn
having executive cover is important, but equally important is knowing exactly how to write "this control is out of scope for my project" in the launch forms, or making an approval "FYI" instead of "Required". It gets harder and harder to do this as your launch requires more and more personal data to operate.
ntaylor
diversity & inclusion review
pangolinplayer
Based
sayhar
Hello, I wasn't aware we were on /r/politicalcompassmemes
thatguy0900
Are you trying to claim based is from some random subreddit
paoda
I think it's rather clear that using "based" in this context is more common in certain communities than others.
thatguy0900
This is how the whole internet uses based
X6S1x6Okd1st
Not HN
ranger_danger
have to make sure there's no trans jokes in there.
cynicalkane
Assuming this is sarcasm, you realize Google has a massive userbase all over the globe from all walks of life, right? Does it make business sense to accidentally exclude certain people? Or ethical sense?
fyd6gexygsydy
I don't understand this line of reasoning since it assumes inclusion training actually promotes inclusion. My experience has been that it usually means racial/gender intersectionalism training that everyone gets to swallow regardless of culture or belief because it's what white people in the us tech industry are passionate about right now.
deanCommie
I mean yeah if your culture or belief involves not treating people of different races or genders equitably, then the goal of the training isn't to change your mind. Swallow, follow, or get out of the way.
ygjb
Yes.

The expectation isn't that you actually adopt or accept the values. The expectation is that you know that if you fail to do so (and you lack sufficient privilege in your organization), then you will be held accountable.

Practically speaking "woke" people would prefer to work with people who share values, but most of us will settle for people who can at least emulate a decent human being while interacting with other people at work.

beaconstudios
Being "woke" goes beyond just being a decent person though, because most people's metric for decency is interpersonal decency. My understanding is that the sociological concepts that go into "wokeness" include intersectional analysis, microaggression theory, critical theory, 3rd wave feminism, gender theory etc. I think these ideas are mostly good (with the exception of microaggression theory), but they go way beyond "just be decent to other people" and into the territory of deep academic and systemic mindsets that are far from the default in the individualist West (and especially the US). I mean damn, half these ideas are French, and France is pretty culturally different from the US, French academia even moreso.

For example: not being racist on an individual level is pretty intuitive and obvious to most people, and mostly comes down to being a decent human being. Being institutionally anti-racist is a totally different thing, and way more involving, because you're not just not being a dick to people of a different race; you're trying to counteract systemic disadvantages.

wpietri
I agree that at one point maybe being oblivious to systemic problems could go along with being decent. But these days I don't see how being a decent human is compatible with either, "I don't want to learn whether you're getting the short end of the stick" or "I know you're getting the short end of the stick but I'll never do anything about it": neither seem decent to me.

I'll also note that although those particular theoretical frameworks were originally popular ways to understand certain problems, there are plenty of other ways to understanding.

As an example, let's take the microagression where white people want to touch black hair. This is a common problem [1][2][3], and one certainly can situate it within a whole host of racist microaggressions and a broader theoretical framework. But one can also just say, "Dude, black people are not pets. Keep your hands to yourself." Or in the middle, the handsy person can listen to black voices on this and get a personal understanding of why it's a demeaning thing to ask/do. That doesn't require any theory, just the sort of empathy and respect that is at the core of human decency.

[1] https://www.forbes.com/sites/janicegassam/2020/01/08/stop-as...

[2] https://www.ft.com/content/b5c3fa4e-e6c0-11e9-9743-db5a37048...

[3] http://www.cnn.com/2011/LIVING/07/25/touching.natural.black....

beaconstudios
> But these days I don't see how being a decent human is compatible with either, "I don't want to learn whether you're getting the short end of the stick" or "I know you're getting the short end of the stick but I'll never do anything about it": neither seem decent to me.

The novel part isn't the interpersonal part like "don't try to touch black people's hair" - that's just basic common sense, and it's extremely cringey that there are people who do that and think it's OK. The novel part is the systemic aspects of progressive thinking; my primary academic (hobby) interest is in systems theory and cybernetics, so through experience I can say for a fact that most people find systems thinking to be unnatural or alien. It's a different way of looking at the world to thinking in terms of intent and individual actions, which is the norm in the West.

wpietri
For sure, which is why I said there were other ways to understanding.

A lot of my education here has come from people just talking about their daily lives on social media and in person. Many years ago I ended up going from having a pony tail to shaving my head. I was telling a group of friends that it was weird how differently people treated me. E.g., seeing people cross the street rather than walk near me. A black member of the group said, "Well now you know."

The systems-thinking aspect of it came to me later, as I was looking for explanations for all of the little bits of data that I kept coming across. It was only then that I found the more academic takes useful.

But I think these days it's very, very hard for a white person to credibly and honestly have no understanding that there are big problems with race in the US. Which I think is why we're seeing the right-wing moral panic around "Critical Race Theory", which few can define but many are sure is such a problem that we need laws to prevent white children from learning about actual white history.

garfieldnate
I hate that you've called this "aggression", like it's a kind of morally reprehensible violence. People like to feel puffy hair, black person or not. Ask a white person with dreds if anyone has ever felt their hair. In Japan, people like to feel my arm hair, which is blonde and almost invisible and completely foreign to East Asians. It's harmless. It's completely natural that humans are interested in the physical variety of other humans. Now, you can obviously say or do something racist or mean while touching that hair, but the act of touching hair cannot itself be deemed aggressive without knowing the context. You would have to understand the social context of black people (apparently) being tired of being touched all the time in order to know that you should avoid doing this specific thing, which makes this a "faux pas".
ygjb
> People like to feel puffy hair, black person or not. > ... > In Japan, people like to feel my arm hair, which is blonde and almost invisible and completely foreign to East Asians. It's harmless.

Do you extend this perspective on unwanted touching to other parts of the body without consent? People like to do things to and with other people. The thing that makes doing that a form of aggression is doing those things without consent.

> Ask a white person with dreds if anyone has ever felt their hair.

Please point to the cultural and historical legacy of white people being stripped of their freedom, dignity and agency when comparing the experiences of white folks to Black folks. That sets aside the entire discussion of cultural appropriation related dreadlocks which is related to, but not at the core of the point I am trying to make.

> Now, you can obviously say or do something racist or mean while touching that hair, but the act of touching hair cannot itself be deemed aggressive without knowing the context. You would have to understand the social context of black people (apparently) being tired of being touched all the time in order to know that you should avoid doing this specific thing, which makes this a "faux pas".

The fact that you dismiss the documented experience of Black people as "apparently" being tired of being touched all the time says pretty much everything anyone in the audience needs to know about whether you are arguing in good or bad faith. The point is further driven home by the fact that you are contrasting your own anecdotal experience with an awareness of the social context of why this is an issue.

garfieldnate
I wrote more thoughts above in response to wpietri, but I just wanted to clarify that my usage of the word "apparently" was meant with the opposite intention: someone on the internet said something about black people, and I can't claim it through personal experience to be the truth. From wpietri's reply above, I believe they are also basing their thoughts on hearsay, which makes it double hearsay.
knorker
To address what parent post asked in the first sentence though: Is it "aggression"? Isn't that like calling "a dirty look" a "microrape"?

I'm not saying I don't understand how it can be an unwelcome experience to be touched and prodded, but "microaggression" is a term out there like "silence is violence" and "words are violence" in the possibly most literal real-life version of Orwell's writings, even (in its overtness) possibly outweighing the real life society he was describing at the time.

wpietri
If I come up to you, violate your personal space, and start running my hands over your body, you will absolutely see it as aggressive.

If you think that's not the case, go out and try doing that to the first 10 men you see on the street. Heck, try it with a couple of cops.

So yes, calling more modest unwanted touching a microaggression is perfectly appropriate.

knorker
"oooh, nice hair, can I touch it?" is also counted as "microaggression". Just the words.

A misunderstood social signal (e.g. a raised eyebrow) can be called a micro-aggression.

I don't think you actually know what "microaggression" means. You should educate yourself on its definition.

The main source of microaggressions are in fact words, and words that while rooted in ignorance (and people, like you, should educate themselves), are not in fact in any shape or form "aggressive" or even have any form of negativity associated with them.

Take the "you're the whitest black person I know", from the "I, Too, Am Harvard". Well, that's sure a stupid thing to say. But is it "aggression"? Obviously not.

wpietri
Ah, condescension from an anonymous goof who's sure his knowledge is superior. Sorry, but I don't have enough time or energy to talk you out of your willful ignorance. This is one you'll have to figure out for yourself.
wpietri
One other way to look at this it through the broader system. Since America's founding, black people have been treated as inferior. How have they been kept in what white people saw as their place?

Some of it has been open violence, of course, with lynching and race massacres being the most obvious. There was also plenty of more quiet violence, the unmarked graves and the vicious but survivable beatings.

But that's relatively rare because it is backed up by a host of more subtle things. Things that might lead to violence, especially if an uppity person persisted in acting like an equal. Threats, of course, but also menacing looks, harsh words, bad attitudes, etc.

This is summed up in ADL's pyramid of hate. The top layer is built on the layers beneath: https://www.adl.org/sites/default/files/documents/pyramid-of...

So we talk about microaggressions because the societal system of white supremacy uses both macroaggressions and migroaggresions as a continuum of actions that maintain the racist status quo, continuously informing both black and white people of their assigned place.

beaconstudios
Given he was writing about a mix of Stalinist Russia and Nazi Germany, I don't think the tendency for social justice people to assign overly provocative words to their ideas is worse than those places.
knorker
I don't think you're familiar with Orwell, and the meaning of Orwellian.

Did my comment read to you like I didn't know what real life societies he was writing about? I explicitly described how they fit into my point exactly in order to make people not reply with comments like yours.

beaconstudios
Maybe you are just bad at writing clearly.
knorker
I see now that my failure was in assuming an educated reader. And you're right that having an accurate model of the reader is indeed important for clear communication.
ygjb
First, touching someone without permission isn't a micro-aggression. Depending on the jurisdiction, the location of the unwanted touching, and the age or power disparity between the two, it can range from harassment to assault.

The fact that this discussion is even happening in the context of Black people and their hair is frustrating because the implicit bias is that somehow individual curiosity overrides another persons expectation of freedom from interference or right to not be fondled. If we were talking about a casual grope of a woman's breast because people are naturally curious, I would expect that most people would be moderately outraged.

Also, while a "dirty look" is subjective, "leering" is a form of sexual harassment in many jurisdictions.

While I appreciate that your perspective that microagressions, silence is violence, and words are violence are Orwellian, your perspective also reveals a pretty clear ignorance of the nuance and impact that these slogans capture. I don't know if it's an ignorance that stems from a lack of knowledge and experience, or a more insidious and willful ignorance that stems from the type of thinking that allows for or encourages "marketplaces of ideas" that tolerate and debate some of the most awful and toxic values, but it doesn't really matter. Up in the thread I stated, and I stand by it, workplace inclusiveness and diversity training is intended to reach those who can be taught, and inform those who can't of the consequences of failing to at least act in a baseline socially acceptable fashion for the duration of the work day.

It would be a better world if more people cared about the impact of what they do and say, but in the absence of that, most of us will settle for people who can at least act like they care.

knorker
> First, touching someone without permission isn't a micro-aggression.

Now you're changing the subject. You and I both know that "oooh, nice hair, can I touch it?" is also counted as "microaggression".

A misunderstood social signal (e.g. a raised eyebrow) can be called a micro-aggression.

Most migroaggressions involve nothing physical, nor any ill intent. That doesn't make them right. They can still be hurtful. E.g. "you're the whitest black person I know" sure is a stupid thing to say.

"Hey, nice hair" is also a thing banned in these trainings. Because the receiver can infer that their hair is unique, exotic, and that they are different and maybe don't belong here.

So "hey, nice haircut" is banned from workplaces under all circumstanses. Ok, fine. Nobody needs to comment on appearance in the workplace, why would they?

But it's not "aggression". It's nothing like it.

And this is what "microaggression" is. This is what's being stamped out.

> individual curiosity overrides another persons expectation of freedom from interference or right to not be fondled

It does not, I agree.

> If we were talking about a casual grope of a woman's breast because people are naturally curious,

Jesus christ you're going way overboard in changing the subject. I got it already: You want to change the subject.

> Also, while a "dirty look" is subjective, "leering" is a form of sexual harassment in many jurisdictions.

But is it a "microrape"? The difference here is a controlling use of language.

If someone walks down the street and get checked out by a passer by, they were not "almost raped". To say that they were is insulting to rape victims, a perfectly normal person who just looked at their surroundings, and language itself.

> workplace inclusiveness and diversity training is intended to reach those who can be taught, and inform those who can't of the consequences of failing to at least act in a baseline socially acceptable fashion for the duration of the work day.

Yup. But I think it's failing at it. There's plenty of bad behavior to stamp out. But it's also being replaced by other bad behavior. Like telling people that being white means that you as an individual have these attributes, and shutting down a colleague saying "you are a man, and therefore can't be a part of this conversation or decision".

I don't know if you bought into the "intent doesn't matter" crowd, but if you have, then the fact that inclusivity and diversity training has good intentions doesn't matter.

> It would be a better world if more people cared about the impact of what they do and say

Diversity & inclusiveness activists at companies don't have a monopoly on these values. And I wish they would stop pretending that they did. Because they sure don't actually live their stated gospel.

ygjb
I want to say that based on your response, I don't believe that you are arguing in good faith, but I will give you the benefit of the doubt.

I didn't change the subject. I rejected your claim that touching someone without permission is a microagression. It's not, it falls on a spectrum of harassment to assault, depending on who you touch, where you touch them, and where you are when you do it. That's not a microagression.

> "microrape"

Unless we are talking about moths, please provide a serious academic or published document that actually proposes this as a generally accepted term. I confess that the first time I saw it, I thought you being flippant, but it appears that you actually think this is a commonly used or generally accepted term.

I spent some time reading about the term, and asking about it among the D&I folks that I know, and based on that, it's not really a thing that people are concerned with, and aside from some fringe groups on the edges of D&I activism. Most references are related to some shitty humour on reddit and other sites meant to mock folks rather than engage in actual discourse.

Aside from your use of the term, I think the question you are really asking is "Does a 'dirty look' count as sexual harassment?", and the answer is, yes, depending on the jurisdiction. I already said that.

> "intent doesn't matter"

Yeah, intent doesn't matter. This isn't a new concept - look up the etymology of the phrase "The Road to hell is paved with good intentions."; it's a well understood concept and proverb that dates back at least 500 years, farther if you torture some of the translations and transliterations. This isn't to say that intent doesn't actually matter, its a slogan that illustrates that even well intentioned actions that have a negative outcome are still the responsibility of the person who took the action, and that positive intent doesn't balance out negative outcomes.

That said, it doesn't really matter what your opinion is on D&I activism, or your thoughts on the role they play in business. I fall back to my original statement that the vast majority of D&I training and related activities are risk mitigation activities.

If you don't want to change your beliefs, that's fine. Just act like a decent human being, and treat others with respect while you are operating in a professional context.

As for the rest of your claims, it is obvious to me that you are more concerned with your perceived harms to your own freedoms than you are with considering the perspectives of others - unless you have something more meaningful and evidence based to add to the conversation, there isn't much point to continuing it.

knorker
> I want to say that based on your response, I don't believe that you are arguing in good faith, but I will give you the benefit of the doubt.

That's good. Because it's very easy, especially on the internet, of going through the cycle of:

1. This person disagrees with me. They must simply not be informed. Let me explain. 2. Oh, they still disagree. They must just be trolling, then, becasue what rational person would disagree with me when the facts are out. 3. Oh, they actually do disagree? They must be evil.

And it's a fallacy that's easy to slip into, and part of the reason there's so much hate out there.

>> "microrape"

> Unless we are talking about moths, please provide a serious academic or published document that actually proposes this as a generally accepted term.

It's not. The closest thing is immature girls saying they were "almost raped" when actually what they got is an unwanted look, or declined an advance.

My point, though, was to give an example of this clearly incorrect term, to compare it with what I'm saying is the completely incorrect term of "micro aggression".

It can be misogynistic, racist, insensitive, lacking in empathy, and many other things. But "aggressive" seems to me like it's a term chosen for its political weight, not for its accuracy.

> Yeah, intent doesn't matter.

But it clearly does. Obviously it does. The whole legal topic of Mens Rea is dedicated to this.

Murder is morally and legally distinct from manslaughter.

But manslaughter is still a crime. And it's a crime because the perpetrator is morally culpable.

But they're not eqivalent crimes.

Hitting someone with your car on accident is CLEARLY very different from doing it with intent.

"Intent doesn't matter" is another phrase that has a very specific meaning in one setting ("by that I mean that you can't give a sexual comment at work just because it's a compliment"), but is used in its literal form to bully people who admit to making mistakes and improving. It's used to call people unfixably evil, instead of allowing them to improve their behaviour when they didn't realize it was hurtful.

Do you remember this woman: https://www.dailymail.co.uk/news/article-2964489/I-really-ob...

She mocked yelling at a cemetary, where people saw it as hugely disrespectful. And if that had been her intent then it would have been bad.

But turns out she had a collection of photos of herself violating signs. E.g. wearing no shirt and no shoes in front of a sign with "no shirt, no shoes, no service", cigarette and holding a bottle in front of "no smoking, no drinking", walking past a "STOP" sign. (I don't remember exactly the other examples)

Does that context not matter at all to you, for moral culpability?

"Intent doesn't matter" is in a way like "Defund/abolish the police". It's a big slogan, but most people say "oh we don't actually mean that", but there definitely are ones that do. So you should say what you mean, instead, because it's hurting more than it's helping.

"Intent doesn't excuse"

> If you don't want to change your beliefs, that's fine. Just act like a decent human being, and treat others with respect while you are operating in a professional context.

I think the biggest violators of that recommendation is D&I activists.

I'm perfectly able to act as a decent human being without a mob of people calling me an inherently evil white male, born with original sin I cannot wash away no matter how I act, thank you very much.

> As for the rest of your claims, it is obvious to me that you are more concerned with your perceived harms to your own freedoms than you are with considering the perspectives of others

I'm sorry we've had such a huge misunderstanding. That is not an accurate description of my opinion.

But take a specific example: for about a year the "lab leak theory" was censured from social media, and called "racist". The "harm to others" here was actually shutting down a reasonable discussion by calling it "racist".

I still have no idea why it's a racist theory. Like, how does it even help to be a racist, to have this view? (isn't it more racist to critizise wetmarkets?)

Of course nowadays it's actually a mainstream theory, and let's all just forget that the D&I mob mobilized against people who said that it's at least possible that the lab that experimented with the viruses could have possibly been involved.

If we're talking aggression, then shutting down anyone you disagree with, on any topic, by calling them racist with no logical connection: that's (macro)aggressive and not considering the perspective of others.

Nobody wants to be called a racist. Very few want to be racist. It's a big hammer, that leaves a wound that doesn't go away. You'd better be sure.

Another one of those is "pedo". You don't call someone a pedo publicly unless you literally mean that, and you're sure. There's no taking that back, for the accused.

wpietri
I'm sure you do hate it. Many people hate recognizing that their own behavior has been harmful to others.

> the act of touching hair cannot itself be deemed aggressive without knowing the context

I have never in my life had random people walk up and start petting me. Be honest. If I walked down the street feeling the hair of each man I passed, how long do you think it would be before I got punched?

So white people already know perfectly well you don't just go around touching strangers. It's just that some will make an exception for black people because they are seen as other/lesser.

I would add that your notion that a microaggression is ok due to white ignorance of the experience of black people at the hands of white people is itself a racist notion.

And giving an example in Japan doesn't change much for me, as as Japan is a notoriously racist place. (For those unsure, a quick Google of "racism in Japan" will help. And I think you could understand that what's harmless to you as a high-status foreigner is not always going to be harmless for other people. Especially, say, a marginalized group whose inferior status was established America's founding and persists to this day.

garfieldnate
When reading your original post, I thought there was an implicit assumption that any case of a white person feeling a black person's hair was automatically classified as a racist "micro-aggression". Re-reading your original post, I think I probably misunderstood you, but I'll explain my thoughts a bit here since we have a thread started.

>Many people hate recognizing that their own behavior has been harmful to others.

I've never felt a black person's hair. I'm generally not a touchy person.

>I have never in my life had random people walk up and start petting me.

I actually think we're talking about different things. In my mind I was seeing a friend ask another, "hey, sorry, I know it's weird, but can I feel your hair? I'm curious what it feels like." The friend says "yes" or "no" and the interaction goes on from there. There are countries/cultures where strangers will touch others, but it's a pretty foreign concept to me.

>your notion that a micro-aggression is ok due to white ignorance

The word "aggression" implies willing injury or intimidation of another person. Hurting someone's feelings on accident is also bad, but it doesn't make sense to label them the same way. You're absolutely right to say that the context was completely different in Japan, and that's exactly the point. You can't unilaterally label an action like touching hair as aggressive in all contexts. If someone thinks fuzzy hair is neat and they don't have any sense of a racial divide, then they would feel curly caucasian red hair or African dreadlocks and not think anything was different about the two actions.

If you classify all interactions between all white people and all black people in terms of their racial differences, then how do we properly get rid of racism? If in the US a white and black person have to keep slavery in the back of their minds during every interaction, how are they ever supposed to act normally or integrate? How do we ever expect to overcome our differences if we have to constantly remind ourselves of them?

I really like concrete initiatives for helping those that have been historically and presently disadvantaged: paying meaningful reparations, fixing police and the justice system, UBI, etc. But I dislike social notions that impede communication and drive wedges between people. After all of the actions that we take to help everyone in our society to thrive, the end goal has to be social harmony, and I think we need to be careful not to attribute all unpleasant interactions to voluntary aggression or racism.

wpietri
> I've never felt a black person's hair. I'm generally not a touchy person.

My point is not about hair. It's that it's the white people who have done very little reflection on this topic that have strong enough feelings that they have to argue endlessly about when it's ok to point out America's endemic racism. DiAngelo's paper on this covers the topic well: https://libjournal.uncg.edu/ijcp/article/viewFile/249/116

> The word "aggression" implies willing injury or intimidation of another person.

No. People often do things without making choices fully conscious of roots of their feelings and the broader implications. Indeed, that's the human default. See Kahneman's System 1 vs System 2 work.

> How do we ever expect to overcome our differences if we have to constantly remind ourselves of them?

You already know the answer to this. Imagine a junior developer asking, "How can we ever get anything done if we have to be worrying about all the possible ways something would break?!?" Is that a problem while learning? Yes. Does it prevent progress? No, just the opposite.

America has always been a racist place. For a long time it was carefully and consciously structured that way. We have been making spotty, two-steps-forward-one-step-back progress since Reconstruction, where we removed many of the formal, legal supports. But that's just the most visible surface of the problem. What drove the laws was white attitudes, beliefs, and behaviors handed down over generations. Those still persist. (For more on this, see Kendi, Oluo, Mills, and Loewen.)

To truly end that, white people are going to have to step up, pay attention, and root those things out. It's a multi-generation project. One that, given the US Right's self-generated panic about teaching white kids about America's realities around race, we are backsliding on.

I get that this makes you uncomfortable. I also spent years avoiding the necessity to face it. Social harmony is a good long term goal, but we cannot measure progress toward that by looking at white comfort.

skyde
how do you “counteract systemic disadvantages” without simply disadvantaging all white peoples ( that would be racist against white people)

Or by doing simply giving extra benefit (affirmative action) to one group ?

As we have seen with affirmative action it put people from china India and japan in the same bucket and give them less preferential treatment compared to African Americans. So it just seem that the minority which speak louder about injustice is the one that get the most benefit.

I agree that systemic racism is a thing but I have never seen a single proposed solution which is not simply “reverse racism” or positive racism.

We should be able to give equal opportunity to all group without explicitly helping one group or disadvantaging one group!

beaconstudios
Let me frame this up in terms of white/black disparity in the US, as it's the clearest case: black people have for a long time been explicitly discriminated against at an institutional level, and even when you remove this, they will collectively remain at a disadvantage until corrective action is taken. Traditionally the suggested solution is reparations, but organisations have decided that to do their part they should engage in affirmative action. Of course affirmative action slightly disadvantages white people on an individual level, but the argument is that black people are disadvantaged on a societal level from said discriminatory history, so it balances out.

Affirmative action isn't actually a systemic solution though, it's an individualistic solution. A systemic solution would be something like creating a government fund to invest in infrastructure and enterprises in historically redlined areas and help to bootstrap the economic uplifting of poorer black communities.

Do bear in mind though that I'm from the UK so this is just my understanding of an issue I'm not personally familiar with.

eyelidlessness
> For example: not being racist on an individual level is pretty intuitive and obvious to most people, and mostly comes down to being a decent human being. Being institutionally anti-racist is a totally different thing, and way more involving, because you're not just not being a dick to people of a different race; you're trying to counteract systemic disadvantages.

Sincerely acknowledging this may be confusing: it’s the equivalent of recognizing that you’re being graded or compensated fairly while you see someone else not being treated that fairly… and then not shrugging it off.

It’s not a deep philosophical concept. It’s living in a society with responsibility to everyone else in your society.

_3u10
Do you have a copy of this social contract I am a party to? As far as I know the duty to labor for another has been outlawed under common law since 1033. And in the US since at least 1865.
_3u10
It also presupposes such systemic disadvantages exist. Not sure why so many people from other countries immigrate to places that are so obviously systemically biased against them.

Or why when institutions such as Harvard actually do systemically discriminate against Asians it’s routinely ignored by the woke crowd.

Can anyone explain to me why Asians despite having some of the highest scores and GPAs have the lowest rate of admissions to some of the wokest institutions in America?

Why is the difference in incarceration rate between men and women or the police shooting rate not presented as systemic discrimination?

beaconstudios
Your arguments betray that you don't actually understand what systemic disadvantages are. It's not used to mean intentional discrimination; it means that the way our society is set up results in discriminatory outcomes even if nobody is actively being discriminatory.

To address your gender disparity in incarceration example: yes, that is a systemic problem. Men commit more crime than women, and if you dig into the reasons why, it's going to relate to things like lack of opportunity to compete and succeed through legitimate means. People in poverty stricken areas have much less of a chance to succeed through legitimate economic means so the ambitious turn to crime. That's a systemic problem.

evancox100
> Men commit more crime than women, and if you dig into the reasons why, it's going to relate to things like lack of opportunity to compete and succeed through legitimate means.

Pretty sure that the main driver here comes down to testosterone and men’s overall higher levels of impulsivity.

beaconstudios
yeah sorry I should've said explicitly that men are also more likely to be aggressive and competitive, so given the constraints of poverty, are more likely to go into crime to fulfil those needs.
orand
Bingo. Read the recent book T: The Story of Testosterone, the Hormone that Dominates and Divides Us by Carole Hooven. It covers this and related issues in great scientific detail. https://www.amazon.com/dp/1250236061/
eru
> Can anyone explain to me why Asians despite having some of the highest scores and GPAs have the lowest rate of admissions to some of the wokest institutions in America?

You mean they have some of the lowest rates of admission when corrected for GPA? Or lowest rates in some absolute sense?

> Why is the difference in incarceration rate between men and women or the police shooting rate not presented as systemic discrimination?

https://tvtropes.org/pmwiki/pmwiki.php/Main/AcceptableTarget...

_3u10
Basically, the GPA / exam scores required for Asians to be accepted is far higher than any other group.

To exclude them Harvard gives Asians low personality scores. https://www.nytimes.com/2018/06/15/us/harvard-asian-enrollme...

judge2020
> It also presupposes such systemic disadvantages exist.

But it would be incorrect to presuppose that absolutely none exist, which is why they have a culture of reviewing and examining such a possibility.

_3u10
Agreed. Most of the evidence is extremely weak or points to problems that wokeism doesn't want to exist. My general experience in dealing with the woke is that the systemic systems systemizing systemically must exist, and then they go looking for evidence instead of analyzing the evidence and drawing conclusions from the evidence.

eg. If the difference in incarceration rates is due to systemic racism, than a much larger degree of systemic sexism must exist in the justice system to explain the incarceration rate of men vs. women.

_3u10
Of course. I think the vast vast majority of people abhor racism and discrimination in the US and Canada yet our countries are described by the woke as bastions of this kind of thought when the vast majority of the evidence points to it being two of the most welcoming and accepting of differences.
cto_of_antifa
Even if true, that's definitionally unrelated to whether or not systemix racism exists or not. It just means that you've personally defined some ambiguous bar of entry for whether or not you think marginalized voices require honest thought.
zapita
> Can anyone explain to me why Asians despite having some of the highest scores and GPAs have the lowest rate of admissions to some of the wokest institutions in America?

Because of legacy admissions, also known as “rich white kids skipping the line in spite of low GPAs”. There’s nothing “woke” about Ivy Leagues…

https://www.theguardian.com/us-news/2019/jan/23/elite-school...

Zababa
> Why is the difference in incarceration rate between men and women or the police shooting rate not presented as systemic discrimination?

I've tried multiple time the argument "if we have quotas in top positions like board of directors, high-prestige public institutions, we should also have them in bottom positions. Where are the inclusivity programs for prisons?". The answer that I've always received was "these are totally different", as in you end up in a board of directors due to chance and privilege, but you end up in prison due to your own actions.

While this argument is a bit stupid and not really constructive, I find it surprising how easily it reveals that people apply very different standards to different social issues. It seem that for most people, the mechanism which makes men dominate society is totally different from the mechanism which makes men be at the bottom of society. My explanation for that is that the glass ceiling comes with a glass floor.

I personally haven't found other people talking about things this way, but that may be me not researching enough. I also find it unfair that some people would be in this "glass box" just because of how they were born. But I'll admit that I find it troubling when I hear people talking about "breaking the glass ceiling" all the time, which seem to benefit mostly people already well-off in society that want event more (at least for positions like board of directors), while leaving people to rot in prison because they're male.

_3u10
That's because wokeism is communism in disguise. Ask any woke person about the economic improvements in Britain under Thatcher and how she put an end to patriarchal Labour rule, no gushing about the glass ceiling, no gushing about how progressive the Tories are, etc.
_3u10
Men likely just commit more crimes. I do think there is huge bias in sentencing though. You have to know how to interact with police tho, I think many men's intuition on how to do this is lacking.

Lawrence Summers got in a lot of trouble discussing this.

https://en.wikipedia.org/wiki/Lawrence_Summers#Differences_b... https://en.wikipedia.org/wiki/Variability_hypothesis

Zababa
> Men likely just commit more crimes.

I think the thing that makes men commit more crimes is the same that makes them be at the "top". Thank you for that link about the variability hypothesis, that seem to be what I'm thinking about.

nasmorn
Committing crimes without getting caught is also how you can become a board director. There are other ways too obviously. Mostly wage theft, anti competitive crimes or some embezzlement. The kind of crimes privileged people would commit.
Zababa
Fair point, though I have honestly no idea of the percentage of board directors that have committed white collar crime.
_3u10
It's zero or near zero. Committing crimes generally bars one from being a director, especially public companies. White collar crime sentences generally preclude serving on any board until their dues are paid to society. Even personal bankruptcy can preclude being a director.
tester756
>accidentally exclude certain people?

e.g how? could you provide some examples e.g two?

there's a lot of talk about this stuff when it comes to MAGMA, yet docs still use some auto-generated translations which suck.

sangnoir
> could you provide some examples e.g two?

There was that time when Google Photos started labeling black people as gorillas[1] in uploaded pictures. I suspect the training data for their classifiers "accidentally exclude[d] certain people": diversity & inclusion review would have avoided that kerfuffle.

1. https://www.usatoday.com/story/tech/2015/07/01/google-apolog...

KKKKkkkk1
That's a cherry-picked example designed to stoke outrage. Image classifiers will never be 100% accurate. Nobody would have cared if Google Photos misclassified white people as teacups. What Google should have done is to push back against the bad-faith accusations of racism that were made against it.
derivagral
HP did this in '09. Microsoft did it in '10.

[0] http://www.cnn.com/2009/TECH/12/22/hp.webcams/index.html [1] https://www.pcworld.com/article/504514/is_microsoft_kinect_r...

davidcbc
https://sitn.hms.harvard.edu/flash/2020/racial-discriminatio...

https://futurism.com/delphi-ai-ethics-racist

https://www.nytimes.com/2019/04/25/lens/sarah-lewis-racial-b...

tester756
It seems like this kind of problems occur mostly within some specific areas, meanwhile OP seems to suggest that this kind of review should be applied for everything.
davidcbc
Not every kind of review is applicable for every single launch, but diversity and inclusion is applicable to more than just AI (in general, I don't know what the review process or requirements are for D&I)
mgfist
Ah I know. Let's have a review to see if we need a D&I review!!
shadowgovt
Sounds unnecessary. You just roll it into the D&I review. There is such a thing as a D&I review that comes down to a couple paragraphs on how this product has no features that are relevant for diversity or inclusiveness.

But Google added the review because they found that, in general, the average software engineer does not have the background or technical experience to make an educated guess on that topic.

ygjb
From a practical business perspective, performing a diversity and inclusiveness review is a risk management activity.

It doesn't really matter if the business strongly supports or opposes a particular set of diversity and inclusiveness goals from a fiduciary perspective, but it sure does matter if the business keeps losing money or missing targets because it is embroiled in scandals, paying out settlements to staff that have suffered discrimination, or being hauled in front of regulators to air their dirty laundry.

One would hope that being a decent place to work, and treating people fairly would be enough of an incentive, but for everyone else, there are risk management processes designed to have repeatable processes to identify business risks, escalate them to leadership, and presumably either accept the risk, or steer the project towards a solution that has a more acceptable risk profile.

kukx
By the same logic we can justify any [social issue] division. The sad thing is that the rules are arbitrary and do not help in solving the issue. Actually it is in the interest of the division to create or exaggerate problems to justify its existence.
fy20
Slightly OT, but a lot of products that are launched in multiple regions - Google included - exclude people who live in a country but don't speak the native language.

I work for a company from an English speaking country, and every time I need to reauthenticate with my Google account, it defaults to the native language of the country I'm in. They do have an option to change the language (in native language), but it's weird it defaults to that given I was last logged in with an account that is set to "English (US)" and my computer is set to the same.

Recently a large clothing retailer launched that is available in many other European countries, but it's only possible to use the native language here. It's even the same app, they just see your account is set to this country and only lets you view in that language.

xmprt
I agree with you but it sometimes seems like Google doesn't care at all about it when they have the kind of customer support processes that they have.
kevingadd
Customer support is after the fact, reviews are before the fact. It's very cheap to do these reviews before launch and then you can point at those to say "we're trying!" while not providing any customer support.
Igelau
You think? Who is doing do diversity and inclusion reviews? Do you think they're getting paid call center wages?
ineedasername
The customer base is larger than the # of projects to review by many orders of magnitude. So yes, I think internal review will cost less when a single reviewed project/product might have millions of users.
brailsafe
Does it make sense to serve a dataset without approval that it's inclusive enough? Yes, because that's typically how things in the world work.
geofft
I don't understand this argument. It's okay for things to be a certain way, because things are typically that way?

Apart from the circular reasoning, the practical impact is that you should also drop privacy review because corporations steal your data, security review because everyone gets hacked, readability review because there's a lot of legacy code, etc.

mlcrypto
I miss the internet where people just created what they wanted and organically found users
KKKKkkkk1
Diversity and inclusion is an American concept. What value does an American DEI consultant who usually knows very little about the world outside of the US have to contribute in a global context?

The reason why DEI review is necessary is in order to shield the company from ideological mobs on English-speaking Twitter and inside the company.

wyager
That’s not what D(E)I refers to.
bbarnett
Nothing is all inclusive. Nothing.
mikepurvis
Sure, but that's not a reason to not even ask the question. Maybe not every DI initiative turns out to be helpful or productive, but as someone who's privileged on pretty much every axis there is, I'd be grateful for the kind of internal support system that could give me an early warning sign for "hey, this design decision that made sense to you and your team has the potential to alienate user base X and there's a real possibility that if we launch in this state it's going to explode into a minor Twitter scandal."
brailsafe
Isn't this just called user testing? Also this is in the context of a fucking dataset. If data needs to go through DI in case something blows up on Twitter, I guess it's sad state we're in.
davidcbc
If, for example, the dataset only contains white faces and is intended to train facial recognition then yes, it needs to go through some kind of DI review.
brailsafe
Wouldn't this review be done on the data collection and planning side, rather than at point of publishing though? Surely you can publish datasets of just white faces or just black faces if during planning that's what you intended to do for some reason?
davidcbc
I mean, maybe, but you still might need it to be reviewed. You don't have to wait until you're about to launch to start these kinds of reviews and if you know that some kind of DI review is necessary for your project you should start talking to the reviewers as early as possible, especially if you are making a potentially controversial design choice.
Volundr
Wouldn't it be both? With a legal review I would make sure that we take into account any legal requirements in the planning stage, then at completion I'd still want legal to look at it and make sure those requirements were met. I don't see why this would be any different. Planning review: "Here's how we're going to make sure we get a suitably diverse set of faces". Pre-Publishing review: "Let's look at the data and make sure we have everything we planned on. Oops, looks like we missed New Zelanders somehow, better fix that before we publish."
brailsafe
Seems sensible enough.
Volundr
Does it? Seems to me data is a prime place for exclusion to occur. Example: a dataset of tagged photos for training a neural net to analyze facial expressions. All the photos are of white faces.
belorn
Maye they should run a study on diversity approved data set and see how well they match the demographics where it is being used. Then they could compare it to data sets without diversity reviews and see which one has better representation of the actually demographics. A kind of performance test for the diversity review.
brailsafe
Ya I imagine the real utility of a given dataset or review would be determined by what it's sampling and what it's intention is.
bawolff
no code is 100% perfect, yet people still do code review and the point of CR is not to make the code 100% perfect.
pilsetnieks
Perfect is the enemy of good.
bufferoverflow
Death and taxes.
teawrecks
Science is always wrong. Always.
oriki
Is your argument here supposed to be "Nothing is all inclusive, therefore we shouldn't even bother trying"? If so, I'd argue that's a lot more ridiculous than a review process designed to help catch major inclusivity issues before they become problems.
protomyth
Google can talk when they stop using a license by a domain squatting org who revised their history and has a pretty offensive line on their front page. COMMUNITY-LED DEVELOPMENT "THE APACHE WAY indeed. Worse, most of the links on Google search point to the org and not the actual tribes.
Igelau
Everyone knows the org is called Apache because they Jump on it! Jump on it! and not because they're appropriating Native Americans.
killerstorm
Businesses exclude people all the time. E.g. many videos are geoblocked, and there's no way to view or purchase them in some countries.

Here are some other examples: I can use free version of Google Colab from Ukraine, but I can't pay for Pro version. (I can pay for Google Cloud, though.)

OpenAI blocks API dashboard access to IP addresses from Ukraine. (But it is OK if I use VPN LOL.)

So it seems blocking ppl is the norm. I guess "diversity and inclusion" is mostly about social topics within US, not about not excluding people.

dustintrex
You're running into US sanctions issues (Crimea), not woke Google policy.
KKKKkkkk1
Ukraine is not under sanctions over Crimea, Russia is.
1cvmask
Some people call US foreign policy woke imperialism.
gsich
Doesn't matter. Also sanctions are not against Ukraine, that would be stupid.
woodruffw
> Also sanctions are not against Ukraine, that would be stupid.

The sanctions explicitly include Ukraine, due to financial entanglements between Ukrainian and Russian corporations[1].

[1]: https://www.state.gov/ukraine-and-russia-sanctions/

jfrunyon
Your link:

> authorizes sanctions on individuals and entities

> a number of Russian and Ukrainian entities

> sanctions on individuals and entities

> impose sanctions on those persons

etc...

Your claim:

> The sanctions explicitly include Ukraine

Your claim is false by your own "evidence". The sanctions are not on Ukraine, they are on a few people in Ukraine.

IncRnd
All you did was read the page's url and take that to mean Ukraine is sanctioned. That's 100% false, which you can see by reading the page's contents.
Kranar
I went over the link and it does not explicitly include Ukraine. Instead it explicitly lists out the specific individuals and entities subject to the sanctions in this incredibly long and detailed list:

https://www.treasury.gov/ofac/downloads/sdnlist.txt

All your link does is reference the rationale for why certain individuals and entities are part of the SDN list, namely violating the territorial integrity of Ukraine, but no where in anything you've linked to does it state that Ukraine is in general subject to sanctions.

dendriti
Because businesses exclude people all the time, there's no need for diversity and inclusion? I think you've got that backwards.
skydhash
Cry in Haiti
cto_of_antifa
This sentiment is sad, to be honest. Not for the hypothetical actual Haitian people that deserve normal respect that your comment flipantly uses as a gag insult, but because it betrays a perverse resistance to even considering the possibility that having any sort of interaction with people in the larger world might, in fact, make your own internal world better, more knowledgeable, and expose you to things you could never even imagine without doing so. It's possible that you'll never get to be that person unless you let yourself self-consciously deconstruct what brought you here, in all honesty, inside yourself, and actually make the decision to grow a little bit. And if you feel like you should double down and die on this hill, that's not just a big yikes - it's kind of self-destructive, ultimately. The kind of situation where if it weren't so clearly done outside external influence, it would be the kind of thing one might pity you for.
deanCommie
> social topics within US

I loved having conversations with Europeans like this when I spent a few years living in Western Europe.

"Sexism and Racism are US-invented concepts. We don't have that here"

Oh, honey. And yet you also tell me in the same conversation "It's not racist to assume a black person i meet is a drug dealer. It's just statistics."

In your case, Just because Ukraine doesn't have black people doesn't mean you don't have your own problems with racism and sexism. You just don't need to worry about itright now when you have to worry about Russia invading your country. Your problems are about fundamental survival - totally different spot on the manslow hierarchy of needs.

So you should hope that in a few years the situation improves enough that you can start worrying about the same issues as The West.

d1a2n
L0L. This might be a new level of wokeism. Virtue signalling via condescending, taunting language directed towards someone the woke has identified as dealing with more significant problems than themselves. Truly brave and stunning. Praise George Floyd, most deserving martyr.
j16sdiz
Of course racism is everywhere.

But the “diversity review” is so US-centric that it never capture other forms of racism.

Learning about the America-Latino-Black history, while being silent about more local (non-American) issues are just “inclusive drama”

Retric
I strongly dislike a Google, but they do try an mitigate non US forms of racism. Which isn’t to say they do a great job just that it’s considered.
eru
I came in contact with Facebook wide internal 'diversity' campaigns that were blatantly US centric---while working for Facebook in South East Asia..
conductr
> So you should hope that in a few years the situation improves enough that you can start worrying about the same issues as The West.

Do you believe the opposite is true? That as soon as things in the West (US) get unstable that all the progress made will be unwound? Slavery will come back. Women will lose their voting rights. LGBTxyz will be persecuted. Etc. etc. I don’t see why a society couldn’t “improve” while still having larger concerns. Having a common enemy can actually be unifying. Unfortunately, I also see how in the US things could revert significantly, they have simultaneously improved and reverted in the recent past in ways I probably never would have guessed. But I don’t think the will. At least not in the same ways. Whites simply don’t have the numbers to put chains on another race again. I could see it become more of a caste system where diversity existed within the castes.

selimthegrim
They have Roma to kick around in place of black people
randallsquared
To this random bystander, this comes across as deeply condescending, in case you didn't intend it that way.
londons_explore
In general it's about not accidentally excluding people. All the cases you propose are deliberate blocks for various (mostly legal) reasons. The deliberate blocks are considered in the review, and as long as there is a sound business case for launching with the exclusion, it goes ahead.
j16sdiz
to rephrase: as long as the discrimination is systematic or enforced by government, it ok.

/s

Uyghurs in xinjiang cries.

danans
That's only for public launches (and I'd add QA review to the list), and I'd argue that each of them are critical.

For serving analytics data internally, you only need privacy review, for obvious reasons.

Security and auditing is built into the tools used for querying and serving any such data internally.

Imnimo
What I don't get is why they wouldn't just use MongoDB. MongoDB is web-scale.
hinkley
/dev/null is also web-scale
sondr3
And available as a SaaS: https://devnull-as-a-service.com/
DeepYogurt
Is /dev/null fast? I will use /dev/null if it is fast.
flatiron
does it support sharding?
closeparen
It supports sharting: https://github.com/dcramer/mangodb
323
But is it planet-scale?
swalsh
That's out of date, we're now in the days of IPFS.
vorticalbox
Maybe because mongodb had been out less than a year in 2010?
gnabgib
I think you missed the /s from GP.
vorticalbox
quite likily.
nostrademons
That was a major impetus for this video, IIRC. The "MongoDB is web-scale" video went around Google about a month before Broccoli Man and some enterprising Googler figured they could use the same software to make a satire of Google's internal tools.
hedgehog
Link for the Mongo video: https://www.youtube.com/watch?v=b2F-DItXtZs

And bonus lean startup video: https://www.youtube.com/watch?v=3J9KhpgYVB0

fragmede
MongoDB is web-scale: https://www.youtube.com/watch?v=b2F-DItXtZs

NSFWish; it gets a bit personal around 3:11

mlindner
I miss 2010.
vinay_ys
Ah, 2010 - when web scale and its secret sauce – sharding was all the rage.
alexjplant
I had a similar conversation with a heavily-intoxicated MongoDB sales guy in a diner at 1AM after the second day of KubeCon 2019. My concerns were primarily around data consistency issues during denormalization and lack of schema. H pitch was essentially "Who cares?! I'm getting [three-letter agency] to move _everything_ to Mongo because it's so cheap and easy! It's all just JSON! Why does it need a schema?!"

He probably made more than I did that year so maybe he has a point ¯\_(ツ)_/¯

sergiotapia
Much more infamous.

"Mongo DB Is Web Scale" - https://www.youtube.com/watch?v=b2F-DItXtZs

keymone
s/in//
shirleyquirk
famous ternal?
tusharsadhwani
that would be s/in//g :)
CephalopodMD
Broccoli man is incredible. I hope someone leaks the rest...
bluefox
BitTorrent existed since 2001. Get on with the times.
mseepgood
This monotonous speech synthesis is annoying to listen to. The delivery of the jokes is awful. Who can sit through a 3 min video like that?
zucked
This was an output of a free (now defunct) service that used to accept transcripts and pump out these videos with TTS audio. It led to some hilarious results, usually within niche communities. Around ~2010 these things were everywhere.
pas
https://m.youtube.com/watch?v=b2F-DItXtZs
zaphar
I think it mostly works best when you've lived it. Which I did. And the resurfacing of that video brought back a lot of memories.
kgin
Somehow it makes it funnier to me
drannex
That's what makes this even more funny.
dekhn
Thank you, whomever did this! I asked for it in a comment recently.

This video basically is making fun of a common situation of Google at the time, where a person wants to serve up some data for analytics, but the sysadmins expect the person to follow a process intended for much more complex and high availability services run by teams of skilled engineers.

It parodies SRE as a BOFH sysadmin, even though in general SRE are quite easygoing and helpful.

It helped poked fun at a number of overly stuffy processes and also helped push people to make hosting modest datasets (like this 5TB one) easier.

Regarding the phrase "I don't know how to count that low", here's the video where it originated. Frankly I'm astonished this hasn't been shared before, and I hope I don't get in trouble for posting it (it's like 11 years old, surely nobody cares at this point, right?)

https://www.youtube.com/watch?v=3t6L-FlfeaI

HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ [email protected]
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.