HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
Felienne Hermans: How patterns in variable names can make code easier to read

Strange Loop Conference · Youtube · 138 HN points · 0 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention Strange Loop Conference's video "Felienne Hermans: How patterns in variable names can make code easier to read".
Youtube Summary
Felienne Hermans is an Associate Professor at Leiden University in the Netherlands. In this presentation she look at how thinking about the content and shape of names can make programmers more producive.

---

On April 27, 2022, It Will Never Work in Theory ran its first live event: lightning talks from leading software engineering researchers presenting immediate, actionable results from their work. Our audience learned:
- powerful new ways to test modern software
- how to do better, smarter code reviews,
- what effective remote onboarding means during the pandemic,
- whether test-driven development actually makes you more productive,
- and what "productive" really means for programmers.

Their slides, and over 250 reviews of software engineering research papers, are all available on https://neverworkintheory.org.

We are grateful to Strange Loop, Mozilla, and Taylor & Francis for their support, and we hope you'll join us at Strange Loop 2022 in September for more insights.
HN Theater Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
May 19, 2022 · 138 points, 62 comments · submitted by zdw
reidjs
Loved how short and to the point it was. If you don’t have time to watch, the idea is it’s incredibly rare for two devs to come up with the same names for vars. To increase the odds of coming up with the same names for vars, you should agree on naming conventions (name molds) as a team. Sounds obvious, but great science is often confirmations or denials of the obvious.
extrememacaroni
Patterns in the way code looks in general are invaluable for parsing code quickly especially in areas that you're somewhat familiar with. You can discard/ignore big chunks of code very quickly and go straight to where you think the relevant part is if they look as you'd expect at a glance. If they don't, it's sort of like a cache miss. "What the hell why don't people autoformat their goddamn files before saving" and then read those bits of code just to make sure they're not hiding any surprises, before formatting them properly.

It's the difference between taking, say, 2 seconds to read a method, and 10 or more.

I can only assume people who don't treat code formatting as a rule read every.single.thing.line.by.line.every.time.

tobr
Unpopular opinion: if big chunks of code look “the same”, that might be unnecessary boilerplate you should get rid of.
shikoba
> I can only assume people who don't treat code formatting as a rule read every.single.thing.line.by.line.every.time.

That's why I assume too. I don't understand how one can code like that.

nickjj
Aren't linguistic names a form of mold?

I find Rails' conventions are very good around this, for example datetime fields end with _at and dates end with _on. This way you end up with variable names like published_at or published_on depending on if you care about the time or not. It sounds so natural.

The idea of using ? to end a variable name for booleans is great too.

It's the opposite of cognitive load because you can glance at a name and know what it is without knowing more about it. If the implementer of a linguistic named function does bad things to break the expected behavior then you shouldn't blame the method -- that's a user error.

Personally I find consistent names more important for CLI tools, kubectl's CLI is good in this department for being consistent. You can predict how each command works by knowing the pattern. They went with a "verb noun" style. I don't think one is necessarily better than the other but being consistent does help for CLIs because you often need to recall what to run by memory, CTRL+r history or running the command incorrectly to get a help menu on what you can run. However a code editor gives you a lot more help with auto-complete or buffer-complete for function or variable names.

For naming things in programming, I'm not 100% convinced a hard pattern based standard makes sense because naming is very subtle, sometimes you want the emphasis on the "thing" or an emphasis on the "action" depending on the context -- basically which one is more important for that specific instance.

For her open question of "what would you name a variable for storing the maximum number of orders per month", that's an incomplete question. What's the context behind it? Is this variable defined as a constant somewhere? What other functions are in that module or class? How do you plan to use the variable? Will it be used in more than 1 spot? Is it part of a library that third party folks can use or limited to 1 code base? Will there be other similar variables, such as getting weekly or yearly orders?

hinkley
These are the sorts of 'style guides' that we need. I started boycotting 'style' meetings at new companies ages ago because it always turned into a bunch of people using up all of their time, energy, and social capital arguing about where the curly brackets go and how whitespace should be handled. These are things a machine can do for you. We shouldn't be wasting our breath on them.

As far as 'consistent' names go, there are multiple dimensions of sameness. Using the same word for all instances of the same concept, not using the same word for other concepts, using consistent pluralization. Using same adjective/adverb/gerund form for related concepts. You are telegraphing sameness in these cases, and difference in others.

We have tried things similar to what you describe before, we just have dialed it in wrong. New-ish, good ideas often fall prey to bad execution. Hungarian notation, for instance, dictates that the variable name stays the same when the sense of the data changes, but is supposed to change when the implementation details shift. Which is exactly the opposite of what we want. If I fix a Y2K bug or a 2038 bug in due_on, I'm going to end up with a slightly different structure, but the deadline it represents is still 12 midnight. And if it's not, well, maybe we need a different convention for calendar day versus business day deadlines.

sokoloff
I agree that a machine can enforce conventions, but it can’t decide them for you. The primary point of conventions (IMO) is to aid the reading of code. If one style is more readable than another, that should be chosen.

If you think opening braces should go at the end of the line and I (incorrectly ;) ) think they should go on a line by themselves, our team style guide should probably pick exactly one of those. That’s a human choice after human debate, followed by machine enforcement.

nicoburns
I also really like is_ and has_ prefixes.
DonHopkins
My name is Don, so every time I see a column called "createdon" I think it's a boolean flag that you can set true to create me. I wish the db designer would use snake case instead of mashing all the words together. But then again, I keep my ssh key in a file called donkey.pem.

The "big-endian naming mould" suggests naming it orders_per_month_max, since orders is the object (most significant), per_month is a count of orders (secondary significance), and max is a constraint of the order per month count (least significant).

Then you can use other parallel names in the same big-endian pattern, like orders_per_month orders_per_year orders_per_year_max orders_per_second_min refunds_per_year_average etc, and they will all sort next to their closely related names, instead of the "inline max" or "prefix max" scrambling the alphabetical order.

Zecc
In some tables it's possible to set whether you want to create a donut C.

(createdonutc)

DonHopkins
A donut C is a donut with one big bite taken out of it, all the way through to the hole.
bjourne
Very interesting video. I'm convinced that this is a very under-explored area of software engineering and that proper naming is at least 50% of developer productivity. Often it doesn't matter how well-structured a code base is, if the function and variable names are nonsensical the code will still be very hard to read.
throwawayboise
This is actually not a new idea at all.

I once worked in a place in the 1990s that took it to such an extreme that every table name, column name, and variable name had to be approved by a naming standards committee before it could go into production. IIRC the committee met once a month, maybe twice? Which was not ideal for the developers but changes only went to production once a month during a "change window" anyway.

Naming conventions can help with code readability, but don't let the process become more important than the goals.

bjourne
Not a new idea, but a not well-researched one.
prettyStandard
Agree. The name mold I like best is nounAdjective like Spanish rather than adjectiveNoun like English.

I wouldn't mind you poking holes in my logic here.

https://soft-wa.re/naming-conventions

To use her example. I would have chosen ordersPerMonthMax. Which would probably sort alphabetically nicely with ordersPerDayMin and ordersPerYearAverage.

Now that I know "name-mold" would be a good query, I might find something better than the Spanish name-mold.

jackblemming
This was noted in Code Complete too, so you're probably in good company.
RhysU
Definitely nounAdjective.

Alphabetically {a, b, c} × {Min, Max} is soo much nicer than the converse. Especially in lists dozens of items long.

jonahx

               Wide Scope     Narrow Scope
             +-------------+-------------
    Function | Short Name  |  Long Name
             +-------------+-------------
    Variable | Long Name   |  Short Name
             +-------------+-------------

    I can’t quite explain why this works

I'll take a shot...

The general principle uniting all 4 quadrants of the table is: "Use names just long enough to be clear, but not longer."

Here's an illuminating exception to the heuristics: The use of the very short global "DB" for database.

We are really trying to balance two competing goals:

1. Brevity -- Don't explain what I already know. You mention this in relation to a tight loop variable: "I bet you didn’t need me to explain dL stood for Drivers License. It might have even annoyed you if I had spelled it out."

2. Clarity -- Don't confuse me. Don't make me look something up to figure it out.

Maximize brevity while retaining clarity.

Clarity is related to frequency of use. This relates to your comment: "How come the jQuery constructor feels much more natural than the native version? document.querySelectorAll('#appContainer')". It is annoying because we use it all the time... we don't need or want a verbose description.

If the thing is used everywhere, and especially if it is a general convention, assume familiarity. Sure, someone might be confused by "DB" the first time they ever see it, but it will quickly become part of their lexicon and remain so through repeated exposure. However, the same cannot be said for "CGTAO" as a stand in for "cudaGetTextureAlignmentOffset". In that case, the long form is what I want.

We handle these principles effortlessly with our use of "he" vs "John" vs "John Smith" vs "the John Smith you went to highscool with" but for some reason have trouble with them when writing code.

sodapopcan
Agreed. As with all rules there are always exceptions. I say this applies to concepts that are ubiquitous across unrelated codebases. id, repo, min, max, and enum are some that come to mind. Otherwise, all business domain terms should always be spelled out in full in the same way people refer to them in speech (ie, their ubiquitous language). So the only time acronyms are ok here is if that is how people talk about the particular term in every day speech (like "sku" instead of "stock keeping unit").
kortex
It's all about managing entropy. Less surprising means you can use fewer letters. More surprising means more letters. That shouldn't be too controversial.

The part where it gets tricky is when a concept is widely used, but is a complex concept. What if you have dozens of calls to cudaGetTextureAlignmentOffset in a function, and hundreds in a codebase? Heck, even CUDA is an acronym, Compute Unified Device Architecture.

There's a similar complication when you have several of these big names with slight differences. Made up example, say you also had cudaGetTextureAccessKey, cudaGetVectorAlignmentOffset, etc. I actually find these sometimes worse than the initialisms, as my eyes skip over these long names. The acronyms (CGTAO, CGTAK, CGVAO) have a higher ratio of different letters to total length. But then obviously the abbreviations are very opaque.

jonahx
> It's all about managing entropy. Less surprising means you can use fewer letters.

I like this framing.

> The part where it gets tricky is when a concept is widely used, but is a complex concept. What if you have dozens of calls to cudaGetTextureAlignmentOffset in a function, and hundreds in a codebase?

You have to predict the knowledge of the developers (current and future) working on the system, and let that guide what you can assume. This is necessarily an art and you'll miss the mark sometimes. One approach is to always be overly verbose, but this is too simple: it destroys readability when practiced without restraint.

> There's a similar complication when you have several of these big names with slight differences. Made up example, say you also had cudaGetTextureAccessKey, cudaGetVectorAlignmentOffset, etc.

One technique here is to introduce a namespacing object/module/<whatever you call it in your language>. So something like "cuda.textureAccessKey", "cuda.vectorAlignmentOffset", etc. Sometimes repetition of a long name is the least of all evils, sometimes it's not.

ojintoad
The Programmer's Brain is my favorite read this year, highly recommend

https://www.manning.com/books/the-programmers-brain

teddyh
Making Wrong Code Look Wrong: https://www.joelonsoftware.com/2005/05/11/making-wrong-code-...
theranger
Could you please update the title with [video] so that we know what to wait from that link.
azeirah
It says youtube in the link
marcosdumay
Still, that's the convention for the name of things that point to a video here.
raffraffraff
Strange that we're now watching a disagreement between two people about whether or not the link should follow a naming convention.
beebeepka
Like most things, it's a double edged sword. I haven't worked with world-class developers so most of my experience is dealing with people who would benefit immensely from any linguistic practice.

If you think someone comes up with bad names, wait till they have to write a few sentences, or paragraphs.

funstuff007
She also has a number of talks on Excel (for the HN crowd) up on YouTube are worth the watch.
armchairhacker
Sometimes I create variable names like "runProcessAsync" (instead of "asynchronouslyRunProcess"), "setIsActive" (instead of "setActive"), and even use shorthand vs non-shorthand (e.g. "src" vs "source") in different contexts.

It abuses the English language but makes code much easier for me to read. Most of the time I don't even realize I'm doing it.

But does it make the code easier for others to read? The first 2 steps I think so, and I've seen them in other projects. The last one probably not, and I try to avoid it and use more descriptive names (like "srcPath" and "srcData") when I spot myself making it.

gfaregan
My only rule for variable names is to never use abbreviations.
DonHopkins
I strongly agree. There's only one correct way to spell a word, but many different possible abbreviations. The hard part is remembering just WHICH letters to leave out, not typing the letters.
elevaet
I have a terrible habit of mixing camelCase with snake_case. I'll start out using snake_case because I find it slightly more readable, but then use some library that has camelCase methods, and before I know it's all a bit of a hodge-podge. (Or is that hodge_Podge?)
epgui
You're a monster.
astrange
It seems to me that ideally, if a variable name is so predictable that you can name it by rules, that’d be an opportunity for the language to not require a name.

But in practice $0 and Haskell’s point-free styles can be annoying to read, so maybe what I want is the IDE to insert obvious names.

marcosdumay
You still have to say what of the many obvious things you are using here.

Point-free syntax has a different kind of namelessness, where if you have a single thing, you don't have no name it. And the $0 is really a limitation of the language, nobody ever though it was a good thing.

astrange
The main case I’m thinking of is “for thing in things” loops. “thing” is an obvious name, but tools would have to understand English to generate it.

Surely Perl programmers think a name like $0 is good, that’s their whole aesthetic.

marcosdumay
Hum... Perl programmers are supposed to shift or die.

The $0 syntax screams of somebody that don't use it since the 90's.

eckza
This is, in essence, Hungarian notation. It's great.

I wrote about it a few months ago:

https://dev.to/jmpavlick/hungary-for-the-power-a-closer-look...

Supermancho
While there is evidence that hungarian notation + camelCase is better for token usage - eg variables

Underscores are better for readability other kinds of things like unit test names or filenames. Because humans tend to shortcut, it becomes camelCase for everything, including other inappropriate attributes out of laziness, which is aggravating. It's too bad that distinction has not been properly subjected to rigor yet.

http://www.cs.loyola.edu/~lawrie/papers/lawrieICPC09.pdf

Kiro
You can use Hungarian notation with underscores as well so not sure what you're getting at here. Sounds like you're talking about camelCase vs snake_case.
DonHopkins
Or snake_that_swallowed_halfACamelCase.

What Did Terrence Eat?

https://www.youtube.com/watch?v=CAUxhXIeSc8

shikoba
> While there is evidence that hungarian notation + camelCase is better for token usage - eg variables

What evidence? Where?

DonHopkins
I submit to you, the Motif Angst Page.

http://www.art.net/~hopkins/Don/unix-haters/x-windows/motif....

TLDR:

s_CALLBACK_CUR_INSERT Lexical_Bindings_For_XmTextVerifyCallbackStruct XtPointer call_data XLTYPE_CALLBACKOBJ /* How long can this go on???? */ Set_Call_Data_For_XmTextVerifyCallbackStruct Wcb_Meta_Callbackproc XmAnyCallbackStruct doit newInsert XmCR_MODIFYING_TEXT_VALUE cdr car ep /* do nothing for most cases... */ Cvt_XmRXmString_to_LVAL GetValues_Union Resource_Instance WINTERP_MOTIF_111 XmStrings XtGetValues XtPointer_value cv_xmstring XmBulletinBoard XmNdialogTitle XmNnoMatchString XmNlabelString XmNtitleString XmRowColumn XtGetValues XmStringFree /* This is so totally ridiculous: there's NO WAY to tell Motif that any button can select a menu item. Only one button can have that honor. */ /* If this function looks like it does a lot more work than it needs to, you're right. Blame the Motif scrollbar for not being smart about updating its appearance. */ xm_update_scrollbar widget_instance Widget widget scrollbar_values pane_maximum widget_sliderSize new_sliderSize h_water l_water XtVaGetValues XmNheight XmNpaneMaximum XmNsliderSize widget_sliderSize XmNrefigureMode maximum minimum INT_MAX percent XmScrollBarSetValues ARMANDACTIVATE_KLUDGE DND_KLUDGE *dialog*button1.accelerators:#override Ctrl<KeyPress>m: ArmAndActivate() /* sets the parent window to 0 to fool Motif into not generating a grab */ USE_MOTIF xlw_unmunge_class_resize XlwMenuResize

shikoba
What a proof...
eckza
Hungarian notation is not "camelCase".

It's a system for semantic variable naming.

Supermancho
Made an edit to better serve your sensibilities SMH
Kiro
Your post before your edit made no sense so don't give us that "SMH" BS.
None
None
hinkley
I had a situation recently where I was naming states in a state machine and someone took issue with why I used a different conjugation for one of the states from the rest.

All of the states in the machine are stable, process driven, and widely publicized within the group. All that is, except one, which is machine driven, and which people will only encounter if the system is not functioning properly, or you're trying to learn the internals. And then if you search for it, you will find two implementations of the same concept, and in the middle of an unplanned system state you don't want to be reading conceptually similar but operationally distinct code.

Of course our local 'word guru' took issue with this, but I don't generally take advice from people that far along the Dunning-Kruger spectrum, so said my piece and ended that conversation through trickery when he was not satisfied.

DonHopkins
I like to use "big-endian" naming molds (love that term!) to define sets of names that when you alphabetize them place related variables next to each other. (i.e. in a completion menu or browser.)

For example, left_foo and right_foo are little-endian, since the least significant word comes first, so they'll be a long distance away from each other in an alphabetized list.

But foo_left and foo_right are big-endian, since foo is more significant than left or right. So they will appear one after the other in an alphabetized list.

Common suffix words are _x _y _z or _min _max, or _left _right _top _bottom, of even singletons like _enabled _loaded _error etc.

But when you combine multiple dimensions together in names, you need to think of which dimensions are more significant, based on how the variables are used, so use foo_x_min foo_x_max, if the positions are important, or foo_min_x foo_min_y, if the ranges are more important.

Sometimes it's hard to decide or ambiguous, so just try to be predictable and the same as all the other code. Think of which variables should appear closest to each other in an alphabetical list.

And avoid middle-endian or random-endian (or sentence-grammar-order-endian) like the plague. A variable name should probably not be a grammatically correct sentence.

Another really annoying linguistic naming smell is "smurfing," where all of class Smurf's instance variables have smurf_ prefixes. Or where all the classes, methods, or instance variables have an "xyz_" prefix where "xyz" is the name of the project or library. Arrgh!!!

mankyd
There's an interesting question that arises when you says "when you alphabetize them place related variables next to each other".

Let's say you have some non-trivial class that includes, among others, some 2d rectangular data: An x, y, width, and height. They're all related, but they don't naturallly occur near each other without a little massaging:

coordX, coordY, sizeWidth, sizeHeight?

xMin, xMax, yMin, yMax?

coordXMin, coordXMax, coordYMin, coordYMax?

I generally agree with your sentiment, but there's a reason "naming things" is one of the hardest problems in computer science :)

saghm
I'm not quite able to verbalize exactly why, but when I see the set of { "coord", "x", "min" }, it sounds to me like the most intuitive way to put it would be "x_coord_min", but this seems to violate the rule that GP gave, since "coord" here seems to be "greater" than "x" given that "x" is an answer to "what kind of coordinate?". The best explanation I can come up with is that "x coordinate" feels like a coherent logical unit and that splitting it would make it more work for me to parse as a reader, and then "min" follows that because "min_x_coord" sounds like it would be something like the "minimum x coordinate" for a given window or something. I wish I could come up with some consistent universal rule for how to order these things, but I can't really come up with any other process to describe how to get what's the most intuitive other than "look at all of them and see what sounds right". I guess it's not unreasonable to say that ordering three "words" is fairly easy to brute force looking at all of them, and beyond that it's probably worth reconsidering the naming (and perhaps scoping) of the variables you need to disambiguate, but it's not nearly as satisfying as having some sort of objective rule.
lmm
"Smurfing" and "big-endian" are the same thing though!

IMO a big alphabetical list of everything in your project is not a useful or important thing. Use a language that has good support for hierachical namespaces, and use them.

dcuthbertson
I think big-endian naming was useful for programming with editors that supported tab completion. At one point, the suggestions were only displayed alphabetically. Nowadays, editors use a more sophisticated algorithm (is there a name for it? Fuzzy search, perhaps?) that suggests words containing the sequence of characters already typed anywhere within it.
pvg
And avoid middle-endian or random-endian

Also applicable to boiled eggs, the primordial application of endianness theory.

SnowHill9902
Agreed. When dealing with real values, it’s favorable to explicit the units: weight_lb, length_cm.
bottled_poe
You don’t really need if you instead apply stronger types.
SnowHill9902
Not every system supports that. How would you do it in SQL?
lmm
Life's too short to use crappy systems.
SnowHill9902
I laughed but really what’s the mature alternative to logging and fetching sensor data?
lmm
Did you mix up your threads? How does logging and fetching sensor data require SQL?
DonHopkins
Yes, explicit unit suffixes are good smurfs!

Also: eschew Bill and Ted's Excellent Postfix "_not", which inverts the meaning of the variable name. That's a most totally bogus code smell, dude.

elliekelly
I really like this concept but I find it a bit frustrating that the name for the naming convention doesn’t follow its own convention. Shouldn’t it be called “endian-big”? ;)
DemocracyFTW2
Also from an LTR standpoint why is it big-endian when the left is not the end but the start? so it should be big-startian or, according to you, startian-big.
HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.