HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
Cicoparser: Full game reverse engineering

Gabriel Archandel · Youtube · 52 HN points · 0 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention Gabriel Archandel's video "Cicoparser: Full game reverse engineering".
Youtube Summary
Conversion of game into C++ with cicoparser and IDA disassembler. It took just 25 minutes to get a fully working application from assembly file. All kinds of problems covered:
- indirect calls
- cross function jump
- infinite loops
- sp-analysis failures
- MS-DOS api interrupt calls

Tools used and full explanation: https://x.valky.eu/cico
HN Theater Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
Apr 20, 2021 · 52 points, 15 comments · submitted by gabonator
Cloudef
There's similar set of tools by notaz[1] that were used to static recompile starcraft, diablo, diablo 2, and jazz jackrabbit games to ARM Linux. You can read more about the recompilation here[2].

1: https://github.com/notaz/ia32rtools 2: https://www.giantpockets.com/starcraft-pandora-port-came/

gitowiec
This was great to watch. I wish it could work with Linux. First game I would like to recompile is Dune.
wts42
Excellent pick. Mentat approved.
gabonator
Should work with linux without any problem - cicoparser was initially developed on windows and does not use any libraries besides std, you can build it using gcc compiler... Host application is based on SDL2, so it should work really anywhere without any extra work
madmoose
We've started reverse engineering Dune a bit for a ScummVM engine and I think it would pose a severe challenge to cicoparser.

All the areas where you currently need to manually change cicoparser, Dune does everywhere. It jumps based on flags, makes indirect jumps and calls (Dune dynamically loads in video, pcm audio and midi drivers), it jumps wildly around, into, and out of the middle of other procedures, and so on. I'd love to see you try, though.

Secondly, I don't really see a great advantage of cicoparser over emulating the CPU. You've converted the disassembled code to assembler-in-C, but you haven't reverse engineered the application. If I've successfully converted an application with cicoparser, yes, I can run it, but I haven't learned much about how the original application works. It can be a good starting point for reverse engineering though.

We had an engine in ScummVM that was created with a similar process. The application was organized well enough (with clear function boundaries, etc.) that the engine could gradually be transitioned to proper C.

tibbydudeza
It gave me flashbacks to using DOS with Norton Commander :).
tralarpa
I got very excited when I saw the description of the video "Conversion of game into C++ with cicoparser and IDA disassembler". I thought "neat, a new decompiler".

But then I understood what CicoParser is doing: it translates machine instructions into C-statements, i.e. when your binary contains an instruction like "mov 123,sp", the output will be a C source file with a statement "memory16(_ds,123)=_sp;". On the github page, they say it is not a CPU emulator, but I would rather say it is a CPU emulator with AOT compilation of the binary.

albertzeyer
But what exactly is the difference to a decompiler then?
tralarpa
Probably depends on your definition of a decompiler. For me, a decompiler reverses to some extend the operation of a compiler. Variables instead of registers, function call arguments instead of stack pushs, etc.

Of course, you could also say that a decompiler is any tool that produces something from a binary that you can compile again. But in that case, I could claim that this here is also a decompiled program:

   byte[] programbinary={ put binary of the program here };
   runEmulator(programbinary);
teawrecks
A compiler has optimization steps. Rather than going straight from human readable C to binary, it compiles to an IR and then uses some heuristics to create binary that is more efficient for the machine to execute.

I feel like you're effectively asking for an optimization step. Decompile to an IR, and then use some heuristics to get back to C that is more efficient for humans to read.

And if a compiler without an optimizer is still a compiler, then a decompiler without an "optimizer" should still be called a decompiler.

CodeArtisan
A decompiler translates bytecote into a structured program.

https://en.wikipedia.org/wiki/Structured_programming

gabonator
If it was CPU emulator, it would update all the flags everytime performing any ALU operation (I have seen this approach in one source-to-source compiler). Actually, there is not much you can do: If the instruction stores SP into DS:123, it converts the instruction into simple assignment *((WORD*)&memory[ds*16+123]) = sp. All the ALU operations are directly calculated using the target instruction set, the flags register is updated only when necessary. Nor the memory is emulated, it directly accessess the memory buffer (in the video there are just extra range checks, even the *16 operation can be optimized replacing ds/es with memory pointers). Only thing that is emulated is the EGA adapter.
habibur
Right. From a birds eye view it might more look like assembly. But look closer and you see it summarizes a bunch of idiomatic assemblies into C code.

And it will improve over time if the developers continue to give it effort.

tralarpa
Thanks for the explanation. Very nice project. I guess self-modifying code does not work with this technique, does it? (I don't know much about DOS games and how common self-modifying code was on PCs).

Concerning access to video memory: I saw that you treat them "manually" in some cases. I am wondering whether you could avoid that by using virtual memory. You could mark the pages as invalid and when an instruction tries to access them, you catch it and replace the memory[...] access instruction by a call to memoryVideoGet. The JIT-Compiler of the Amiga emulator uses a similar technique for indirect accesses to hardware registers.

gabonator
Good point. In the set of games (10 in total, release date up to 1991) I was porting I found only one that used this nasty technique. And it was just rewriting only single byte of code (something like rewriting nop instruction into return). So very simple case so far. Of course using cicoparser doesn't mean that you get working code without any manual work. You will always need to fix some issues by hand. Virtual memory does not solve anything in this case. Writing to EGA video ram means that you want to display some pixels. But the write operation goes through some extra logic which decides what to do with the byte being written (extra rotation, masking...) and by reading the same addrees you are not guaranteed to get the same value back. EGA control registers handle this process and you simply need to emulate this behaviour somehow.
HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.