HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
How Does SHA-256 Work?

learnmeabitcoin · Youtube · 5 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention learnmeabitcoin's video "How Does SHA-256 Work?".
Youtube Summary
An explanation of how SHA-256 works, with animations of the operations used inside the hash function.

I'm not a cryptographer though, so I can't explain the reasons behind the design (at the moment).

00:00 - Introduction
↳ 02:20 - Bitcoin Mining
05:05 - Basic Operations
08:27 - Functions
10:58 - Constants
12:23 - Hash Computation
↳ 12:41 - Message
↳ 13:19 - Padding
↳ 14:24 - Message Blocks
↳ 14:58 - Message Schedule
↳ 17:42 - Compression
↳ 22:30 - Final Hash Value

Source code for animations (and text guide): https://github.com/in3rsha/sha256-animation

Official specification: https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf


====
FAQs:
====


=== What setup are you using for your computer? ===

* Operating System = Ubuntu (Xubuntu)
* Desktop Environment = XFCE
* Appearance Style = Arc-Dark
* Appearance Icons = Papirus-Dark
* Window Manager Style = Numix
* Wallpaper = A simple gradient from dark grey to grey.
* Shell = zsh (using zsh-autosuggestions)
* Terminal Font = Hack Regular


=== How do you work out the constants? ===

The constants are created using the first 32 bits from the fractional part of the cube roots of prime numbers.

For example, the first constant is the cube root of the first prime number (2), so:

∛2 = 1.2599210498948732

The fractional part of this is:

= 0.2599210498948732

However, we want 32 bits' worth of this fractional part. To get these 32 bits, we multiply the fraction by 2^32:

= 1116352408.8404646

The integer part of this is our 32-bit constant, which we can convert to binary:

= 1116352408
= 0b01000010100010100010111110011000

Here's some code showing how it works: https://github.com/in3rsha/sha256-animation/blob/master/constants.rb



=== How does the padding work if my message is X bits in length? ===

Lets say we have a message that is exactly 448 bits.

We always need to include the `1` separator and the 64 bit message size in the padding, which takes the message+padding up to 513 bits. This exceeds the 512 bit message block size we're after, so we pad with 511 zeros to take us up to 1024 bits (the next multiple of 512).

The zeros go between the separator and the size, like so:

[message] [separator] {zeros} [size]

So in other words, if your message and the initial padding takes you beyond the size of a message block, you pad all the way up to the next multiple of 512.
HN Theater Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
I found this video to be an excellent explanation and sufficient to implement the algorithm: https://www.youtube.com/watch?v=f9EbD6iY9zI

Code: https://gist.github.com/void4/6f5ff23a3df81d6115fceb6adefddd...

This site contains a nice visualization: https://sha256algorithm.com/

Oh this is great. When we taught SHA-256 last semester, we linked to this YouTube video: https://youtu.be/f9EbD6iY9zI. Next time we do it, we'll probably link to both. Having several different ways to visualize the same thing is very helpful, and I like that this one moves quickly.

A couple of details missing from this visualization are how you pad a message to be a multiple of the block size, and how you chain blocks together to form a longer message. In the pseudocode at https://en.wikipedia.org/wiki/SHA-2#Pseudocode, that's the "Pre-processing (Padding)" part and the "for each chunk" loop just below it. I get why you'd want to leave those things out, since they're not really the interesting part, and the screen is already pretty packed as it is.

If anyone's feeling curious about implementing this yourself, take a look at these project notes: https://github.com/oconnor663/applied_crypto_2021_fall/tree/.... At some point I'll clean that up for public consumption, but for now just ignore the parts about grades and cheating :)

miki725
Was about to reply with the link to the project. If anyone is curious about sha2 highly highly recommend to go thorough the project. Jack did an amazing job explaining everything step by step. Writing the code really helps to understand all the concepts much better.
Drdrdrq
Thank you for the link to your repo, this it the first time I heard about length extension atracks. TIL, appreciate it! This SO answer explains them nicely, if anyone is curious: https://crypto.stackexchange.com/questions/3978/understandin...
Dowwie
What course did you teach?! Have you got a syllabus?
oconnor663
Applied Cryptography (CS-GY 6903) at NYU Tandon. You can find all our programming problem sets in the same repo: https://github.com/oconnor663/applied_crypto_2021_fall.
manceraio
Thanks for the feedback and I am glad you'll use it for teaching (which was the main goal of this project)! The padding part it's briefly explained on the "dynamic" notes on the left column, but yes, can be improved. Typing on the input gives you some sense of what is doing on the background, specially if it jumps to two blocks.

The "for each chunk" is also implemented (which was one of the most difficult parts to synchronize with the UI), but I agree too, I should come up with some way to represent it better. Thanks again :)

fragmede
Minor nit: input could also take hex.
This looks great, though it requires quite sometime to go through it (and figure out what could possibly be understood by someone with knowledge of programming and bit wise operators, and what can just be skipped because it’s something only cryptographers can understand).

If someone were to make a slower video explaining it with all the sections from this dissection, that’d be even more awesome.

Edit: Seems like the author created a video for this — https://www.youtube.com/watch?v=f9EbD6iY9zI (thanks to 1_player’s comment at https://news.ycombinator.com/item?id=23165906)

On a tangent, here’s an animation (by someone else) explaining AES encryption — https://m.youtube.com/watch?v=gP4PqVGudtg (thanks to harrigan’s comment at https://news.ycombinator.com/item?id=23165821)

May 13, 2020 · folli on Show HN: SHA-256 Animation
You need to watch his video, very cool, it really helps to understand how this works: https://www.youtube.com/watch?v=f9EbD6iY9zI
kebman
That video is really, really awesome! And it won't leave you feeling "Japanese" either. (Which is a great people, btw. I'd really like to go there someday, mostly for the food and language and history. And Anime also, I'm forced to admit.)
kebman
You guys really need to chill out. If you've got something to say, then say it.
Phenomenit
I think you're being down-voted because your comment doesn't really add anything to the discussion at hand.
kebman
Oh, to this site punishes people for adding a positive remark. Great... I'll keep that in mind then. Anyway, thanks for letting me know!
Dylan16807
Just being positive and nothing else is what the upvote button is for. And turning your comment slightly gray is not really a punishment.
tomhoward
Positive, supportive comments have always been welcome on HN.
Dylan16807
Being positive and supportive is a good quality, but it is not enough to make a comment good. Comments are supposed to have thought and substance too.

Shallow praise is better than a shallow dismissal, but not by enough.

tomhoward
I'm pretty sure I've seen dang or pg say that quick, low-effort comments that express enthusiasm for a comment or post are fine (whereas low-effort, drive-by dismissals are very much not fine). I've tried searching but it's not obvious what search terms would turn up such a comment.

But I hope my recollection about that is right. Yes we want comments to be substantive in general, but we don't want to be surly or even Grinch-like when someone is just expressing excitement and affirmation for someone else's contribution. I'm sure that's not what pg or dang would want here.

dang
That's too harsh, and not in the spirit of the site. Right from the beginning, pg made this distinction:

Empty comments can be ok if they're positive. There's nothing wrong with submitting a comment saying just "Thanks." What we especially discourage are comments that are empty and negative—comments that are mere name-calling.

https://news.ycombinator.com/newswelcome.html

https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...

chias
There's a few things in there that are factually incorrect -- in particular, the false notion that "every input has a unique output" can be quite dangerous in some cryptographic settings.

That said, the purpose of this talk is about the mechanics of the function, and not its properties or how to use it safely. So don't let that detract from what is, really, an awesome presentation.

tridentboy
I'm sorry, could you please elaborate? I was always under the assumption that hash functions have to be deterministic, and thus, that "every input has a unique output" was a correct statement.

AFAIK the contrary is invalid, so that "not every output is the result of one and only one input".

apeescape
There are 2^256 potential outputs for SHA-256, while the number of potential inputs is infinite. Therefore, the same output can be generated with different inputs, although finding such "collisions" by chance is extremely unlikely
surye
The claim is not that every output has a unique input, which would not be correct, and seems to be what you are addressing.
riquito
I see what you mean, but it sounds like the output is unique, and we probably agree that in this field you need to use sentences that cannot be easily misinterpreted.
chias
at 1:08 in the video, that is exactly what he claims:

"So every piece of data in the world has its own unique hash digest."

This is false for the reasons apeescape describes: every piece of data in the world has its own hash digest, but these hash digests are not unique.

ChristianBundy
On the other hand, if we can count "every piece of data in the world" then we can estimate the probability of having a collision.
infogulch
Yes that sentence is technically incorrect, but practically correct. We've never found a collision and though we expect it to be theoretically possible, even common if you consider "all possible inputs" and the pigeonhole principle, for practical purposes hash outputs are unique because nobody considers "all possible inputs" when evaluating probabilities.

I'm saying that for a layman explanation, it's reasonable to say that hash outputs are unique. Because following that with "technically, it's more 'practically' unique, theoretically there are collisions but you won't encounter them with probability > 2^-256" (or whatever it is) just confuses the topic to them more than just summarizing. You have to admit that most people won't go on a 200h adventure to learn about the state space of 256+ bits and how to conceptualize tiny statistical probabilities, so there must be a point where you have to cut the explanation to an approximation of the truth. This is true in every field.

chias
> I'm saying that for a layman explanation, it's reasonable to say that hash outputs are unique. [...] theoretically there are collisions but you won't encounter them

You could have said exactly the same thing about MD5 right up until you couldn't. Then you could have said "oh yeah well MD5 is broken, but it's safe to assume you'll never find one for SHA-1", right up until we did. So if you say "oh yeah well SHA-1 is broken, but it's safe to assume you'll never find one for SHA-256", I disagree.

It would be one thing if collisions in hash functions were found by just repeatedly hashing things until you find a collision. If that were the case, then yes, I'd agree with you on those 1-in-2^256 odds, at least for a while. But by and large, that's not what happens. Over time, weaknesses are found in algorithms which allow you shrink the search space, which significantly changes your odds.

chrisweekly
Kind of agree w you, but still feel adding a few words by way of a disclaimer about collisions is much better than presenting as plain truth something that merely approaches it.
tialaramex
I don't like to leave holes like this in people's comprehension. It's OK if people don't end up with an intuitive feeling for how relatively unlikely different things that don't actually happen are, but I want them to be aware of that category as distinct from things which can't happen because the type of argument needed is different.

The air molecules in the room you're in can't all gather in one corner because that's not possible, it's forbidden by conservation rules.

But they won't gather in two opposite corners only because that's so tremendously unlikely, it would be allowed by conservation but statistically it's ludicrous.

The same is true at the opposite end of the spectrum. Almost all real numbers are normal (in all bases) but the nature of "Almost all" in mathematics is different in an important way from "All" and I want people to grasp this difference when I'm discussing properties of numbers. It definitely is not true that all real numbers are normal, you probably rarely think about any normal numbers at all.

infogulch
> I don't like to leave holes like this in people's comprehension.

I agree. I think this wording would be better than in my previous comment, what do you think?

    it's reasonable to say that hash outputs are *almost surely* unique
chias
A function being deterministic means that any input will have a single output. But it is not unique for any hash function, SHA-256 included. The definition of a hash function is any function which takes an arbitrary length input and outputs an n-bit output for some fixed value of n. By virtue of the fact that you have infinite inputs and finite outputs, the outputs cannot be unique.

Generally when people make this claim, what they're actually referring to is what's called Collision Resistance (CR) and/or Weak Collision Resistance (WCR), which instead make claims on difficulty of finding such collisions (of which infinitely many exist).

WCR, necessary for almost any cryptographic use, states that for any given input it should be difficult to find a different input which hashes to the same value. CR, generally desirable for cryptographic hash functions, states that it should be difficult to find two different inputs such that their hashes are equal. CR implies WCR, but WCR does not imply CR -- for example, SHA-256 (currently) exhibits CR but SHA-1 only exhibits WCR.

tripzilch
> of which infinitely many exist

If we're going for "factually correct", there's a finite number of 256 bit strings.

Here's the author doing an in-depth explanation of how SHA-256 works using this code: https://www.youtube.com/watch?v=f9EbD6iY9zI

I'm halfway through, but looks very well done, thanks!

inersha
My pleasure, thank you.
newscracker
Wow, this was just what I asked for in a comment before seeing this comment. Thanks.
HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.