There's a discouragment the comes in the RE community that to be useful at all y...

boppo1 · on March 29, 2023

> I could probably teach a solid generalist who cared to get to the level of being able to dissassemble something and say, "yeah, this is dodgy" or not in an afternoon.

Please write and post this guide.

myself248 · on March 29, 2023

Seriously. There's lots of "you're already a rocket scientist so let's talk details" content out there, but very little with this incredibly-useful-sounding aim.

The trick is calibrating what a "solid generalist" means. I think I'd describe myself that way, but perhaps not among the HN crowd. Would be very interested in being a soundboard for such content, if that's helpful.

motohagiography · on March 29, 2023

thanks! it would to take longer to write, but a basic entry point starts with what you want to know about something: who, what, when, where, why, and how. Trouble is, we tend to start with the a complete picture of How without a sense of the rest to guide us.

What you want to know about a strange binary (barring obfuscation, sandbox escapes, and other nasties) is: who does it talk to (ip addresses, hostnames, sockets, etc), what does it open (files, registry entries, api's, services), when does it do these things (eg. runtime conditions, magic packets, port knocks, triggers, checking for other software), where does it write or read data (directories, filehandles, remote sites, etc), why does it do this given it's stated purpose (why does it have an encrypted section, and where is its key, is it using weird encoding to bypass filters, etc.) and then finally the How it does these all things is the effect of answering those other questions.

I think the hardest part of analysis is having an organized way of knowing what you are looking for because we don't know the right questions to ask and we tend to work at the edge of limited knowledge. Should this rando binary be talking some app hosting site, and why? Why would a developer encode endpoint names in a lookup table that only constructs and returns them at runtime? Why would someone use any of these libraries or data formats on purpose? The harder it is to answer these questions, the more suspicious I get.

If you start with the 5-W's, the How falls out of that a lot faster. If you can answer these questions about a binary, you're easily 50% there in determining whether it behaves as expected. Having an organized goal can take you from zero to basically useful if you answer those questions about it. The rest is just screenshots of menu items in ghidra and maybe cyberchef for purely static extraction.

I feel like I should pile on caveats here about how most malware isn't obfuscated or using novel techniques, a lot of it is just spyware capabilities you clicked through to accept, or a repackaged legit binary with some downloaded RAT attached and some nested compressed libraries. I'm sure someone who is more serious about this will say, "that's misleadingly simple!" but once you have a why and a what, the how is a work problem.

Dynamic debugging and stepping through is the next stage. It's also basic, but when you are goal oriented instead of being able to reproduce all usable code paths, it's more achievable. If you get the IP addresses out of random binary and what protocols it's talking, and maybe what files it accesses, it means you've set up your analysis environment and done the initial checks, and that's valuable grunt work you can pass on to someone with deeper skills.

If you can go from zero to this, that's an afternoon well spent, imo. It's not trivial, as it assumes a lot of knowledge about system architecture and network protocols, but the questions above necessarily have answers, so I can guarantee you can find them with some directed effort. I don't mean to trivialize more advanced analysis, this isn't the same thing, but as an entry point, this is how I would recommend approaching it.

myself248 · on March 29, 2023

That's an incredibly useful model for how to approach the problem! And it sounds like exactly the questions I find myself asking about random suspected-malware, which is often precisely your original example -- a burned CD included with some aliexpress hardware.

I'm familiar with 'strings' and I've been playing with 'binwalk' to take apart files, but I'm out of my depth when it comes to loading something up in a debugger or whatever (is ghidra a debugger or what's the difference?) and looking at code. I don't speak C, and everything seems to look like C when it's shown in the examples of these things. How do I know if I'm looking at a sensible decompilation with actual runnable code or just gibberish because I'm trying to interpret a jpeg as an executable?

I don't know if that makes me teachable or beyond help, but I'd be an eager student.

doktrin · on March 29, 2023

> I'm out of my depth when it comes to loading something up in a debugger or whatever (is ghidra a debugger or what's the difference?)

When you hear "debugger", think "breakpoints". It's any tool that lets you do things like set breakpoints and step through code execution.

Most debuggers will let you view machine code or bytecode respectively, but they won't decompile binaries or bytecode into the original higher level language.

Ghidra does include a basic debugger, but it can also do lots of other stuff (including decompilation).

> I don't know if that makes me teachable or beyond help, but I'd be an eager student.

It would probably help to get some baseline familiarity with systems programming. Check out the "15-213" CS course. The lectures are on YT, the reference book is probably online, and the labs are here :

https://www.cs.cmu.edu/~213/labs.html

pmoriarty · on March 29, 2023

"I don't speak C, and everything seems to look like C when it's shown in the examples of these things."

If you know how to program you could probably already make sense of a lot of C, and for the rest you could try asking an AI to explain it to you.

andai · on March 29, 2023

And if you learn a bit of assembly first, C will seem like a high level language again!

extrememacaroni · on March 29, 2023

When it comes to stuff that results in calls to dynamically linked libs e.g. OpenFile or whatever, you can also use Frida to intercept the calls and print out info about them/manipulate the inputs and/or return values. The advantage of Frida is that it uses JS to do this.

You need to run the executable to do this tho so maybe use a VM.

I used frida a few times to do random stuff like making foobar2000 always play the same mp3 regardless of what is in the playlist, and made a game's speed adjustable by intercepting calls to system gettime and changing the value.

Use Ghidra to check what the executable imports and intercept found functions in Frida.

DyslexicAtheist · on March 29, 2023

> I'm sure someone who is more serious about this will say, "that's misleadingly simple!" but once you have a why and a what, the how is a work problem.

loved your post. I'm by far a lot less experienced for sure. There is one thing in this sentenced that stood out because my order of what to address first is always what and how.

Only at the end the why (e.g motive) might or might not become visible. It has saved me from jumping to premature conclusion (or attribution) in the past ...

Most "who-dunnit" genre of films are based on making you believe the why is the ultimate goal. For me though the why is a by-product of addressing the what/how and I find things remain smoother and with less rabbit holes to get lost in.

andai · on March 29, 2023

>most malware isn't obfuscated or using novel techniques, a lot of it is just spyware capabilities you clicked through to accept

How does an antivirus tell the difference between e.g. TeamViewer and a repackaged app with a RAT?

hegzploit · on March 29, 2023

here's one way I love to think about it, A RAT will go all the way to try and persist, hide from AV, load other components from some remote endpoint. It will trigger so much events that can be detected by an AV. on the other hand, TeamViewer will not try to hide what Its doing, there's also a lot more stuff at play here since this is just heuristic analysis, AVs tend to be more complex and incorporate more methods of analysis like signature-based detection and integrity checking, etc...

j-bos · on March 29, 2023

I humbly second this ask.

cancerhacker · on March 29, 2023

You can get a long way just by running /usr/bin/strings against an executable, and maybe a platform specific version of otool -L. You should have a basic idea of how your OS does linking, shared libraries, etc.

youngtaff · on March 29, 2023

Yes please do

Tried Ghidra for the first time on the weekend to look at some 8051 firmware

Got stuck with disassembly as it seems to be misinterpreting some data sections as code - can see English strings in a hex editor but Ghidra is trying to convert them to asm

HelloNurse · on March 29, 2023

You are supposed to annotate what every part of the file is and how you want to display it. It's usually easy to distinguish reasonable assembler code from nonsense instructions interspersed with undecodable islands.

Disassembling all sections just in case they contain code is a common conservative policy for disassemblers: even without malicious payload hiding tricks even definitely never executed sections could contain embedded executable code.

youngtaff · on March 31, 2023

Thanks, I'll try that approach

It's been a while since I've looked at asm in anger so it's taking me a while to get back into it (plus this is a side project ATM)

neoncontrails · on March 29, 2023

Is cantor dust available as a plugin now? I remember watching the creator's tech talk as a young dev and being incredibly inspired by it. But I've looked it up a few times over the years, didn't find any evidence that the tool described was ever released.

motohagiography · on March 29, 2023

https://github.com/Battelle/cantordust

0d0a · on March 29, 2023

> There's a discouragment the comes in the RE community that to be useful at all you need to be able to write your own exotic packer decoders

Unless you are talking about obfuscated / virtualized payloads, isn't it common to just "cheat" by running it in an emulator / debugger, then taking the unpacked code section from memory and work from there? It was the approach I took in a CTF task: https://nevesnunes.github.io/blog/2021/10/03/CTF-Writeup-TSG...

motohagiography · on March 29, 2023

non-ghidra example, but just the other week I was pulling apart a commercial phishing kit that had implemented its own version of AES in javascript, and then created a kind of conceptual virtual file system based on nested layers of b64 and a "custom" rot-20k encoding that turned everything into unicode, where one blob was the image with offsets, and then different parts of the malware would be pulled out and decoded and decrypted at runtime - rendering the static analysis that AV and WAF tools do useless.

I used a REPL to manually do the steps you describe dynamically, but doing it statically means writing a decoder. You really need a proper sandbox to do dynamic analysis becase you don't know what's going to actually detonate, whereas static analysis gives you a whif of how off it seems, and that's sufficient for most security and privacy purposes. It was also common in Android apps several years ago now, not sure what the current state of the art is though. Android isn't my problem anymore.

Officially, I suck at this and I defer to more skilled people because I am a much better writer than hacker, but when they aren't around, you go to war with the army you have. :)

amatecha · on March 29, 2023

What do you mean by "installer driver package", like literally the setup.exe that the vendor provides? Or like, extract the resources out of that and open _those_ in Ghidra?

philsnow · on March 29, 2023

I've seen some sketchy crap while pulling apart mac .pkg files to see their preinstall/postinstall scripts. In particular one video conferencing company's installer did some "growth hacky" things a few years ago (I checked a recent one just now and it seems benign).

amatecha · on March 29, 2023

Ah yeah I remember those particular installer shenanigans for sure. Indeed, installers are often granted elevated permissions which is a perfect opportunity to drop in "extra" functionality :-O

zeeshanmh215 · on March 29, 2023

What you do is very interesting and might be helpful for the budding RE's and also privacy focussed general public. Can you point me to a direction where i can learn that stuff?

dataflow · on March 29, 2023

How dooyou tell something is dodgy from the call graph? Don't you have to decipher what FUN_918243 represents, or whatever?

saagarjha · on March 29, 2023

Generally you can click around some code and see what functions it calls, what strings it references, etc. to get a basic understanding of what it does.

amrb · on March 29, 2023

You can see imported library, strings and possible network calls, working backwards you going to see red flags if it's a basic app.

intelVISA · on March 29, 2023

Easy: all nonfree software you have to decompile to view the 'source' is dodgy by design.

Tools like Ghidra et al. merely lay bare the truth you already know.

biggieshellz · on March 29, 2023

What if someone gives you a binary that they claim is built from a particular source code? If you don't decompile it, how do you know if that's true or not? Or what if you can't trust your compiler (a la https://www.win.tue.nl/~aeb/linux/hh/thompson/trust.html)?

JCWasmx86 · on March 29, 2023

Reproducible builds. Sure not every project can be built in a reproducible manner, but it at least reduces the chances of getting shady binaries

pxc · on March 29, 2023

Check out `guix challenge` for a concrete example of how tidily this can be done with a system that supports reproducible builds well!

https://guix.gnu.org/manual/en/html_node/Invoking-guix-chall...

intelVISA · on March 29, 2023

I could never trust a bin I didn't build myself (with my own C compiler ofc).

arjvik · on March 29, 2023

Did you build that C compiler yourself? Using what compiler? Unless you bootstrapped it from a handwritten assembler, you'll need to consider the attack outlined in Reflections on Trusting Trust

intelVISA · on March 29, 2023

I did but I foolishly relied on GCC before it was self-hosted now I guess I should scrap the whole thing and build by hand.

pxc · on March 29, 2023

There's actually someone out there who has done some impressive work on this, believe it or not!

https://savannah.nongnu.org/projects/stage0/

saagarjha · on March 29, 2023

[flagged]

dang · on March 29, 2023

Please don't do this here.

saagarjha · on March 29, 2023

What would you suggest is an appropriate response to that comment? I could of course ignore or downvote it, but I'm not sure this actually conveys my sentiment towards it.

dang · on March 29, 2023

You can't just express unprocessed annoyance. You have to let the annoyance metabolize inside yourself until one of two things happens: either (1) you have something genuinely interesting to contribute; or (2) the need to respond goes away.

saagarjha · on March 31, 2023

On the contrary, I have spent quite a while considering how to respond to comments like these and this was the best I could come up with. I'm open to suggestions on what I might do instead but I will point out that the current options you've put forward either 1. make it very asymmetric to respond to stupid comments or 2. allow them to proliferate, which drives away and buries interesting conversation.

dang · on March 31, 2023

Responding to "stupid" comments fuels them, so it's better to post nothing. It's certainly much better to post nothing than to post something like https://news.ycombinator.com/item?id=35351862.

Re "allowing them to proliferate", the solution for that is flagging. (In case anyone doesn't know: to flag a comment, click on its timestamp to go to its page, then click the 'flag' link at the top. There's a small karma threshold before flag links appear.) And downvoting, of course (the karma threshold for that is higher). If the comment was bad enough to respond the way you did, why didn't you downvote and/or flag it?

saagarjha · on April 3, 2023

I don't flag comments very often unless they egregiously break the rules, but I've downvoted things like this before. Usually what happens is the person gets upset that they've been penalized by the system if the downvotes stick, or someone comes along and actually thinks the comment is good, and the effects of downranking it are reversed. I don't want to propose that my opinions on comments should overrule everyone else's, but I know you agree that just because people upvote vapid or clichéd content doesn't make it appropriate for Hacker News. So, I haven't really found your solution to work in practice.

What I could absolutely do is sit down and write a long reply about why I think the comment missed the mark, and how it could improve. I have done this in the past too. The problem here is that doing so is a lot of work. My thought process is that most of the people who are posting like this know that they're just posting low quality stuff, and if there's an easy reminder to stop doing that, they will. If they happen to reply with "no, I'm serious, here's why…" then there's no harm done; otherwise it signals (to other people too) that we're looking for something better than that.

dang · on April 3, 2023

I understand that writing a substantive comment can be a lot of work, but how can we be arguing about https://news.ycombinator.com/item?id=35351862? It was obviously unsubstantive and provocative, and would even probably land as a personal attack. It was even a case of the "vapid and cliché" that you're wanting to counteract.

Downvoting a bad comment is fine. Not posting is fine. If you want to neutrally let someone know that their comment isn't substantive enough for a good HN post, that's ok as long as you're careful how you do it. Dropping an insult on them is never helpful.

saagarjha · on April 6, 2023

I think the next steps here are that I refrain from doing this, continue downvoting, and reach out when I feel it isn't working and have more concrete feedback to provide.

graderjs · on March 29, 2023

Can you make a youtube video tutorial series on this? Would be great!

flangola7 · on March 29, 2023

How do you feel about models like GPT-4 using tools like that to RE?

amrb · on March 29, 2023

Its like a summary, speeds up the process by having a possible context.