Process Injection Part 2 | QueueUserAPC()

In the first process injection post, we talked about CreateRemoteThread() which, is the vanilla method of process injection that most threat actors use considering its simplicity and reliability. Today we’re going to look at QueueUserAPC which takes advantage of the asynchronous procedure call to queue a specific thread. This API has several benefits in which the most appreciated is its ability to circumvent Sysmon.

This post will be broken down into four (4) parts:

  • Process Injection Primer – Subject to the injection technique, we will review how this type of injection works programmatically.
  • Analyze High Level Windows API Calls – Use the MSDN Documented methods and functions.
    • API Call Analysis
    • Sysmon events and logging
  • Analyze Medium Level Windows Syscalls Using LoadLibrary – Use the NTAPI Undocumented functions via ntdll.dll
    • API Call Analysis
    • Sysmon events and logging
  • Analyze Low Level Windows Syscalls Using x86 Assembly – Custom via Rolling Our Own Syscalls 🔥
    • API Call Analysis
    • Sysmon events and logging

In concerns with the Sysmon Analysis, I am using same Sysmon Config as part 1 with one slight adjustment. Considering QueueUserAPC() will bypass Sysmon detection, I have expanded the Processes Accessed (Event ID 10) parameters based off of ion-storm’s configuration. With this event, we can detail the specifics of the injection without actually seeing an event (correlational).

Before we go any deeper, let me define what I mean when I say High, Medium, and Low level API’s:

  • High-Level API – This is the MSDN (proper/safe) method of execution. In other words, if you were a developer, and you needed to use these functions legitimately, you would use the MSDN documentation to accomplish your goal. These functions are High-Level because they are translated by the OS to Medium/Low level functions and instructions. This is done for several reasons the most important of which is ease of development.
  • Medium-Level API – The OS maps High-Level API calls to Lower-Level API Calls (which in the context of this post we’re calling medium-level). For our purposes, a large majority of those calls live within ntdll.dll and kernel32.dll. We get the “Medium-Level” by cutting out the middle man (The OS) and calling the Low-Level API Calls ourselves. The reason I call it Medium is that we are still using dlls that live on the Windows OS and when we map and call functions from those dlls the OS, and other system monitoring software, can still see those calls.
  • Low-Level API – The “Low-Level” is defined by rolling our own everything! We do not use ntdll.dll or kernel32.dll to accomplish our process injection (i.e., we do not map any functions). In this case, we have a custom x86 assembly file (custom.asm) and a corresponding header file (custom.h). These files map direct syscall functions in order to circumvent both the High and Medium level API’s (essentially). Regarding the process injection, we do not load external resources or rely on any OS translation, we do it ourselves.

All source code is written in C++ or x86 ASM. For continuity, I will be compiling all my builds for x64 bit architectures.


Reference Material

  • NTAPI Undocumented Functions for all Low-Level API Calls.
    • NtSuspendThread
    • NtAllocateVirtualMemory
    • NtWriteVirtualMemory
    • NtQueueApcThread
    • NtResumeThread
  • Tools
    • Sysmon: For the Sysmon Analysis portion, I am using SwiftOnSecurity’s Sysmon Configuration for basic analysis. This is a great Starter configuration that should be amended/changed to meet your organizations threat model/need. A good example as to what can be done with Swift’s stock configuration is one that ion-storm developed.
    • API Monitor: We look use this tool to review the true API calls for all levels of API’s.
    • Process Hacker: A great to for analyzing processes in general.
    • Procmon: Similar to Process Hacker but with some more advanced features such as determining API calls from user to kernel land.
    • SysWhispers: @Jackson_T’s python tool that generates x86 ASM that can be directly imported into your C++ Project.

Process Injection Primer

QueueUserAPC is an Asynchronous Procedure Call. Let’s break down what the means:

  • Asynchronous – not simultaneous or concurrent in time.
  • Procedure Call – The details of a specific, singular, procedure. In our example, it’s the shellcode that will execute notepad.exe.

This means a few things. First, since it’s asynchronous we need to be able to pause/suspend the thread that is going to execute our shellcode (i.e., we cannot have the thread running when we give it a procedure). Second, we need to open the thread (OpenThread()) and assign it a procedure. Last, we need to resume the thread to obtain code execution.

In the examples to follow, I generate a nslookup.exe process in the C++ code to which I then inject the shellcode into. However, an attacker may want to be more dynamic with their injection. There are really three ways to go about QueueUserAPC Injection:

  1. Start a suspended process (CreateProcess()), inject into it, resume threads.
  2. Have a predefined process name (such as explorer.exe) that we know is going to be running on system, enumerate processes for PID’s, enumerate the threads of the selected PID, suspend the threads, inject shellcode, resume threads.
    1. I have provided an example below within the Real World Example section.
  3. Have a list/array of predefined process names that the code will enumerate if said process is running, enumerate processes for PID’s, enumerate the threads of the selected PID, suspend the threads, inject shellcode, resume threads.

VirtualAllocEx() → WriteProcessMemory()

Just like CreateRemoteThread(), we allocate memory in the external processes’ memory space and write our shellcode to that newly allocated space.

QueueUserAPC() Process Injection - writing memory

SuspendThread()

NOTE: In the examples to come, I create a process in a Suspended state however, if you have selected an already running process you will have to manually enumerate the Threads associated with the process and suspend the threads you are going to inject into. I say this because you will not see the OpenProcess() or SuspendThread() API call in my code.

Once a process has been selected and you have the thread id’s, you can suspend all or a single thread with the SuspendThread() API call. In a debugger, I have generated a nslookup.exe process in a suspended state. We can look at the process in Process Hacker to verify that our thread is in-fact suspended.

QueueUserAPC() Process Injection - suspending threads

QueueUserAPC() → ResumeThread()

We assign the procedure call (execute the shellcode) to the nslookup.exe suspended thread via QueueUserAPC() API. Next, we resume the thread to move the threads state from Wait:Suspended to Running. Once the thread starts, it will execute the procedure call (our shellcode) and terminate itself.


High Level API

API Call Analysis

To analyze API calls, I am using API Monitor with the following filters: [Data Access and Storage, NT Native, System Services].

ItemCount
Number of API Calls174
Total Amount of Memory Used116 KB
QueueUserAPC() Process Injection - High Level API Analysis
QueueUserAPC() Process Injection - High Level API Analysis

We’ve mapped each API call during the High Level API execution flow. We can easily distinguish the High Level API to Low Level API functions that the OS translates during runtime (i.e., VirtualAllocEx → NtAllocateVirtualMemory) This is an important point of interest as it’s common practice for AV / EDR systems to hook these API calls prior to them being handed off to the Windows Kernel to execute a syscall. Our goal is to go lower and therefore avoid such hooks.


Sysmon Analysis

Sysmon is unable to detect process injection via QueueUserAPC(). This is, from my limited understanding, because we are not creating a new thread within the victim process. We are enumerating the threads the process has instantiated, opening the thread, suspending it, giving it a procedure call (our shellcode), and resuming the thread. We are simply accessing a process and telling it to execute some procedure which, is a bit less invasive.

QueueUserAPC() Process Injection - High Level Sysmon Analysis

Looking at the image above, we have keyed in on Sysmon Event ID 10: Process Accessed. During our injection, we do request access to a process via the OpenProcess() or NtOpenProcess(). My example does not open a process to access it since I have used CreateProcess() for demonstration purposes. However, We still can see that Sysmon has detected that the program PI_QUA_High_Level.exe has accessed C:\Windows\System32\nslookup.exe. The interesting part is that Sysmon records a full Call Trace for the event. This could possibly be used for heuristic detection.


Medium Level API

Let’s move a bit lower this time. As stated in the beginning of this post, the designation Medium Level API is simply my nomenclature which means we are avoiding the OS translation and are going to map/call the Nt* functions directly.

API Call Analysis

To analyze API calls, I am using API Monitor with the following filters: [Data Access and Storage, NT Native, System Services].

ItemCount
Number of API Calls199
Total Amount of Memory Used127 KB
QueueUserAPC() Process Injection - Medium Level API Analysis
QueueUserAPC() Process Injection - Medium Level API Analysis

The total number of API calls was 199 which is much larger than the High Level API’s 174 Calls. That’s pretty noisy but, not unexpected considering the total number of functions that we needed to map from ntdll.dll and kernel32.dll. This does create a lot of overall events that are not necessary if we were to map the functions within a custom implementation which, is what we are going to do within the Low Level API.

One thing that I would find very interesting is to analyze valid implementations for QueueUserAPC and determine if the sample above can easily be defined malicious subject to Call activity alone.

Overall, though, we did decrease the High Level API calls regarding the injection itself which, is a step in the right direction.


Sysmon Analysis

As expected, any indication of process injection has not been defined by Sysmon. Similar to the High Level code, we were able to distinguish which processes were accessed with details containing the Call Trace. Again, this could be very helpful to a RE or triage analyst who is responding to a possible compromise.

QueueUserAPC() Process Injection - Medium Level Sysmon Analysis

Low Level API

API Call Analysis

To analyze API calls, I am using API Monitor with the following filters: [Data Access and Storage, NT Native, System Services].

ItemCount
Total Number of Calls165
Total Amount of Memory Used112 KB
QueueUserAPC() Process Injection - Low Level API Analysis
QueueUserAPC() Process Injection - Low Level API Analysis

We have an overall very low number of calls which, makes us much quieter. To top it off, we also only see the WaitForSingleObject() call which is called right after we execute CreateProcess() in order to let nslookup.exe initialize. We do not see any of our process injection calls within API Monitor. It goes to reason then that AV / EDR systems would have a very difficult time hooking not only direct syscalls but, a process injection technique that is undetectable by Sysmon (That does not mean other systems can’t detect it, they can.).


Sysmon Analysis

The Sysmon output is symmetrical to the High and Medium level analysis. Honestly, I just put the image here for continuity’ sake.

QueueUserAPC() Process Injection - Low Level Sysmon Analysis

Real World Scenario

Let’s take a second and look at a real world example using this process injection technique. The reason for this section is to analyze the reliability of QueueUserAPC() injection and detail ways to make it more consistent. This code sample below is using the SysWhispers Direct Syscall Methodology just like the Low Level API example above. You will notice several differences within this source however. The first being we are not creating a process, we are in-fact looking for explorer.exe, obtaining a handle to the process, enumerating the processes threads, and injecting into five (5) of those threads.

When utilizing this form of process injection, it’s necessary to inject into 3-5 threads for reliability. That’s the first reason we hard code a five (5) thread injection limit. The second reason being, we don’t want to get 20-50 threads to execute our shellcode and obtain 20-50 remote callbacks to our C2. That’s just way too loud! For example, we are injecting into explorer.exe which is always going to be running. At any time, that process is going to have 20-50 threads running so, we limit our impact by only using 5.

The thread limit could actually be avoided via a check for C2 traffic and a Mutex but, I’m not going to detail that here.

QueueUserAPC() Process Injection - Real World Example Analysis

Once we execute the binary, we can see that the program found 35 threads associated with explorer.exe, and we injected into five (5) of those threads. We only received three (3) notepad.exe instances meaning that of the five (5) threads, only three (3) of those threads successfully executed our shellcode. In my experience, only 60%-70% of the injected threads successfully execute and it’s extremely variable with limited consistency.

The other issue I see often is that we crash the application we inject into. This is the case with several forms of process injection but with QueueUserAPC(), there is a very high probability of crashing explorer.exe considering the example above. And as a matter of a fact, we do crash explorer.exe during our injection.

QueueUserAPC() Process Injection - Real World Example Sysmon Analysis

This example shows the variability and overall success probability that should be taken into account each time you use this technique. We don’t want to crash systems, and we don’t want to impact employees day-to-day operations so, choose the process carefully and limit the overall exposure.


Conclusion

The ability execute our shellcode using direct syscalls rather than using Windows dependencies (Documented API, ntdll.dll, kernerl32.dll) allows for a much quieter and streamlined compromise. However, we were also able to circumvent common process injection detection via Sysmon by using QueueUserAPC() instead of CreateRemoteThread(). The image below details the QueueUserAPC Process Injection API calls that were observed via API Monitor.

QueueUserAPC() Process Injection - All API Analysis

QueueUserAPC() Vs. CreateRemoteThread()

Injection TypeAPI LevelTotal API CallsTotal Memory Used
CreateRemoteThread()High298192 KB
CreateRemoteThread()Medium309196 KB
CreateRemoteThread()Low288186 KB
QueueUserAPC()High174116 KB
QueueUserAPC()Medium199127 KB
QueueUserAPC()Low165112 KB

It’s worth mentioning the differences in the two process injection techniques we’ve looked at so far. It’s apparent that the CreateRemoteThread() is louder, easier to detect, and is the most common API call for process injection. CreateRemoteThread() Is also very reliable whereas QueueUserAPC() can be a bit unpredictable if you’re opening and already instantiated process and injecting into one of its threads. However, generating a process and injecting into it seems to create a much higher probability of success, as seen in all the examples above.

It’s fairly obvious that if you want to be quieter / more stealthy, QueueUserAPC() has the advantage.

No Comments

Post a Comment