HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
How a File Format Led to a Crossword Scandal - Saul Pwanson

csvconf · Youtube · 174 HN points · 0 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention csvconf's video "How a File Format Led to a Crossword Scandal - Saul Pwanson".
Youtube Summary
In 2016 I designed a plain-text file format for crossword puzzle data, and then spent a couple of months building a micro-data-pipeline, scraping tens of thousands of crosswords from various sources. Then, having all those crosswords in a simple format, I wanted to see if there were any common grid patterns--and discovered egregious plagiarism by a major crossword editor that had gone on for years. This talk would cover the file format, data pipeline, and the design choices that aided rapid exploration; the evidence for the scandal, from the initial anomalies to the final damning visualization; and what it's like for a data project to get 15 minutes of fame.

Slides from Saul Pwanson's Presentation (https://doi.org/10.5281/zenodo.2836892#.XNyBaUXUB1Y)


https://csvconf.com/
HN Theater Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
Jun 13, 2020 · 171 points, 39 comments · submitted by luu
thaumasiotes
> Despite Parker’s denial, many in the crossword world see willful plagiarism in Parker’s puzzles, and they see the database that revealed the repetition as a tool of justice. “It’s like a murder mystery solved 50 years later with DNA evidence,” Matt Gaffney, a professional crossword constructor, told me.

There's a response to postmodernism that says "reality is that which, when you ignore it, doesn't go away".

I have a hard time seeing this as a "scandal"; it's firmly in the class of things that aren't a problem unless someone tells you you have a problem. A murder has victims. But if you're unhappy that the crossword puzzle you solved last week was secretly a rerun from 10 years ago, it's not obvious whether the blame for your unhappiness should go to the guy who reran the puzzle, or the guy who told you it was a rerun.

stordoff
I wouldn't really see the players as victims, but A) crossword constructors are potentially having their work ripped off and/or receiving less work and B) Uclick/USA Today are paying someone to do something when they could have just rerun old puzzles and got a similar result. A comparison to a murder investigation is maybe a bit much, but I can see where people are coming from.
fsckboy
it should not be summed up as a comparison to a murder investigation, but rather as a a comparison to DNA evidence.
VMG
> "reality is that which, when you ignore it, doesn't go away".

Reality is that which, when you stop believing in it, doesn't go away. Philip K. Dick, I Hope I Shall Arrive Soon

labster
If you’re unhappy after someone told you you worked a rerun crossword puzzle, maybe blame yourself? The only thing changing is your interpretation of your own experiences.
thaumasiotes
I was confused by this response, because it appears to just repeat the content of my original comment.

Now I'm more confused that my comment was upvoted and this was downvoted.

labster
I'm guessing it's because you said it in an indirect way, and I said it directly. And people don't like being told that their gut feelings and outrage are only in their own head. I'm never really sure which is the right way to approach people -- the indirect approach goes over some people's heads sometimes (like me, a little this time) but the direct approach often gets outright rejected from confirmation bias. Teaching is hard, man.
mkl
I don't think it's the rerunning that's the problem, but the misattribution, claiming others' work as their own or denying them credit (and presumably royalties).
thaumasiotes
If the originality of the crossword has no value, why would it matter whether someone's claim that it is original is true or false? The most logical basis for attributing value to the claim of originality is that there is value in the originality that bleeds through to the claim.

Compare e.g. someone being jailed for resisting arrest when there was no justification for arresting him in the first place.

ericsoderstrom
Sorry... what? Why would you say the originality of the crossword has no value? And what on earth does that have to do with resisting arrest?
TheRealPomax
The law tends to disagree: crossword puzzles are copyrightable material just like any other published text is, so their value comes from the material that they help sell, whether that's a newspaper, or a crossword puzzle book, or a website, or any other published, in the legal sense, work.
thaumasiotes
But misattribution is not a problem at all in that analysis. It's just as illegal to violate a copyright with proper attribution as it is if I claim the work is my own.

The law doesn't care whether you claim a copyrighted work is yours or not. It cares whether, if you copy a copyrighted work, you have the license to do so.

tzs
Here's the FiveThirtyEight article about this mentioned a few times in the video [1].

[1] https://fivethirtyeight.com/features/a-plagiarism-scandal-is...

ireflect
There's a footnote about Saul's interesting name, which leads to:

  Pawanson is a bit quirky — his unusual
  last name is the product of a decision
  he made years ago.
  
  "I was born Paul Swanson," he said.
  "But I thought, 'there are lots of Paul
  Swansons out there. 'So I changed it."
Amazing!
mdonahoe
Nice! I too was intrigued by his unusual name, and went on a quest to see if he had it changed from the more common "Paul Swanson".

It's a very interesting choice to just swap the letters like that instead of going for a completely different name.

It would be funny if his name gets included in a crossword puzzle, and people second guess the spelling.

wolfhumble
I don't know anything about him or his decision to change from Paul to Saul, but Paul/Saul is on of the most important Christian apostles. As both Jew and Roman citizen, his Jewish name was Saul (from the Jewish king Saul in the Old Testament maybe?) and his Roman name was Paul. So just changing the first letter might not be completely random. :-)

https://en.wikipedia.org/wiki/Paul_the_Apostle#Names

xorcist
The question that directly pops up is why not Pwanson?
gruturo
Actually it is Pwanson, indeed. A couple slides in the video confirm it.

Pawanson is a typo.

Erwin
Saul Pawnson would be a cute hacker alias, however.
servercobra
As someone with a name completely unique in the history of the world (so far as I have found), there are certain advantages! I wouldn't be surprised if people do this more often in the future. It is pretty nice that if you Google my name, you only get results about me.
matt-attack
Interesting. I relish the fact that when you google my name you get pages and pages of a semi-famous figure that, honestly, most people haven't heard of.

I cherish the anonymity.

busyant
I have a teacher friend named Mike Pence.

He tells me that it's impossible for students to snoop on him because he is "google-proofed."

abalaji
This is an awesome story, I especially like the speaker's organization of the narrative taking us along for the ride. Maybe this will be the push I need do a better job learning Unix tools.
colmvp
The delivery was very engaging and a good example to other engineers on how crafting a compelling narrative can help drive home the importance of your work.
smitty1e
> "It's not hoarding if it's organized."

Oh, that's getting thugged.

TheRealPomax
If it's organized, now it's archiving.
Erwin
The author's biography is quite fascinating. If there's a museum of visualization, it'd be an exhibit: https://www.saul.pw/biograph/
paxys
It's interesting that while sophisticated plagiarism detecting software is commonplace for writing submissions at newspapers, book publishers, universities etc., they don't bother doing the same with crosswords (and probably other puzzles like Sudoku).
xmprt
I didn't realize unix tools were this powerful. That's an amazing story.
fiddlerwoaroof
Yeah, Unix utilities and the whole text processing paradigm can do some amazing things if you know how to design for it. I’ve been doing some Cloudformation work recently, and it’s so easy just to throw together dashboards to watch the progress and outcome of a deploy.
smabie
I think the point is that they're not, usually, this powerful. Saul made a deliberate choice to create a file format that would be extremely amenable to these tools.
ericsoderstrom
Were the misattributed authors aware that they were being given credit for puzzles they didn't write? I'm assuming they must have been.
rabidrat
The misattributed authors are fake names, admitted by Timothy Parker himself. "Tim Burr" is one mentioned in the talk.
rafaelturk
Data is beautiful
rabidrat
Hi, this is Saul. If you like this kind of simplicity-first data-exploration approach, you might want to check out another project of mine, VisiData (visidata.org). It's specifically for lightweight data exploration and analysis and it runs directly in the terminal.
manjalyc
Hey, just wanted to say that this looks like a really cool and useful project. I work with a few medical databases and sometimes I just need a very quick breakdown of specific data and while I usually just write a short script, the utility and portability of this code looks fantastic to me. Which brings me to a question, how well does this program handle moderately large databases (~100GB-1TB) in your (or anyone else's) experience? In other words does it try to load everything into memory, or does it query as needed when given a database?
rabidrat
It loads everything into memory, so it works well with <1GB datasets. The architecture could be changed to allow for larger datasets like yours, but that would likely be a large undertaking and would probably be a paid feature.
rolandog
Hey Saul, your talk was great and engaging! Great work!
matthuggins
Good job selling yourself this time!
May 22, 2019 · 3 points, 0 comments · submitted by Tinyyy
HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.