Process Injection Part 1 | CreateRemoteThread()

In this new series, I am going to dive deep into Windows Process Injection. The purpose of this series is to dig into how each injection technique works at its core. Each post is going to be broken down into four (4) parts:

  1. Process Injection Primer – Subject to the injection technique, we will review how this type of injection works programmatically.
  2. Analyze High Level Windows API Calls – Use the MSDN Documented methods and functions.
    1. API Call Analysis
    2. Sysmon events and logging
  3. Analyze Medium Level Windows Syscalls Using LoadLibrary – Use the NTAPI Undocumented functions via ntdll.dll
    1. API Call Analysis
    2. Sysmon events and logging
  4. Analyze Low Level Windows Syscalls Using x86 Assembly – Custom via Rolling Our Own Syscalls 🔥
    1. API Call Analysis
    2. Sysmon events and logging

I am piggy backing off the phenomenal research conducted by Outflank as well as a project developed by @Jackson_T called SysWhispers that auto generates a x86 ASM functions and header files. Incredible work to say the least.

Each post in this series will contain source code that is written in C++ with both High Level API calls and corresponding Low Level Syscalls. For continuity, I am going to be compiling all my builds for x64 bit architectures.

Reference Material

For high level Windows API Calls. We will use the official Microsoft MSDN Documentation. For all the undocumented functions, which is what we will be using when we want to conduct direct system calls, we will reference the NTAPI Undocumented Functions. If you’re unfamiliar with Syscalls and the Windows API’s I will provide a small Process Injection primer however, I am not detailing how Windows User v. Kernel mode works and the associated rings. I highly suggest you read Outflanks Blog Post in order to understand more.

Code Examples:

  • All code examples use the same 64-bit shellcode generated from the Metasploit Frameworks Msfvenom tool.
    • msfvenom -p windows/x64/exec CMD=notepad.exe -f c
    • The shellcode executes Notepad.exe.

System Configuration / Tools:

Process Injection Primer

In regards to CreateRemoteThread() process injection, there are really three (3) main objectives that need to happen:

  1. VirtualAllocEx() – Be able to access an external process in order to allocate memory within its virtual address space.
  2. WriteProcessMemory() – Write shellcode to the allocated memory.
  3. CreateRemoteThread() – Have the external process execute said shellcode within another thread.



We first need to allocate a chunk of memory that is the same size as our shellcode. VirtualAllocEx is the Windows API we need to call in order to initialize a buffer space that resides in a region of memory within the virtual address space of a specified process (i.e., the process we want to inject into).

  • VirtualAllocEx – Reserves, Commits, or Changes the state of memory within a specified process. This API call takes an additional parameter, compared to VirtualAlloc, (HANDLE hProcess) which is a Handle to the victim process.

Looking at example above, we have a HANDLE to an external process (nslookup.exe in this case). With this handle, we can allocate a buffer the same size as our shellcode within the victim processes virtual memory pages.

Debugging Documented Process Injection

The image above is a snapshot of a Visual Studio Debugging session. I set a break point at the VirtualAllocEx CALL and then stepped over it in order to execute it. We can see that VirtualAllocEx() allocated a buffer located at 0x000001efdc9d000. This memory allocation should be within the nslookup.exe process space. To confirm, we can open the nslookup.exe process in ProcessHacker -→ properties -→ memory and look for the memory region we see in the debugger.

ProcessHacker.exe reviewing VirtualAlloc and WriteProcessMemory


Now that we have allocated a buffer the same size as our shellcode, we can write our shellcode into that buffer.

In the Visual Studio Debugger, I step forward once again which executes the WriteProcessMemory CALL. This writes the contents of our shellcode into the victim processes allocated memory space. In ProcessHacker, we can conduct a memory dump of the nslookup.exe and when we specifically analyze the memory we allocated via the VirtualAllocEx CALL, we can see that our shellcode was properly written to the nslookup.exe buffer.

ProcessHacker.exe analysis of WriteProcessMemory


With the shellcode loaded into the allocated virtual memory space of the victim process, we can now tell the victim process to create a new thread starting at the address of our shellcode buffer.

  • CreateRemoteThread() – Creates a thread that runs in the virtual address space of another process.

Stepping forward for the last time, we execute CreateRemoteThread and get a Notpad.exe instance.

Final Process Injection executing Notepad.exe

High Level Windows API

In the example below, I create a 64-bit Nslookup.exe process and then inject into it using default Metasploit shellcode that simply creates an instance of Notepad.exe. This is not a very “clean” method of injection but for the purpose of this example, it works. This example details the proper / MSDN documented method of executing code within a different process space.

API Call Analysis

To analyze API calls, I am using API Monitor with the following filters: [Data Access and Storage, NT Native, System Services, Undocumented].

Number of API Calls298
Total Amount of Memory User192 KB
API Monitor Analysis of High-Level API calls

Basic analysis of the API calls makes it very objective that the program is accessing and manipulating memory of an external process. When we narrow in and only look at,CRT_High_Level_API.exe and KERNELBASE.DLL we see each and every function call. The great part or API Monitor is that we can also analyze the decoded function parameters and the memory page at the time of the instruction/CALL.

API Monitor Analysis of High-Level API calls

As a mental note, look at the image above and look at each CRT_High_Level_API.exe function call (High Level API) and look at the lower level API functions the OS Calls. There is a distinct difference here and it’s one worth noting as we will call those functions’ direction when we look at the Low-Level API calls.

Sysmon Analysis

Sysmon detected five (5) distinct events related to CRT_High_Level_API.exe:

  1. Process Create – CRT_High_Level_API.exe
  2. Process Create – nslookup.exe
  3. CreateRemoteThread – Process Injection into nslookup.exe
  4. Process Terminated – CRT_High_Level_API.exe exit
  5. Process Create – nslookup.exe executes shellcode which opensnotepad.exe
Sysmon Analysis of High-Level API calls

For all testing, these events are going to be symmetrical considering how the Sysmon driver hooks syscalls. This is a great tool for Blue Team because, no matter how low we go, Sysmon “Should” be able to catch our CreateRemoteThread() process injection. But with a claim like that, let’s look at why Sysmon is so damn powerful.

Sysmon’s Power

Procmon Analysis of High-Level API execution in User-Land and Kernel-Land

The image above depicts the User-Land vs. Kernel-Land. In the example above, we executed the ntdll.dll!NtQueryVirtualMemory function and that set a chain of events that eventually ended up in Kernel-Land issuing a syscall. Many AV / EDR solutions run in User-Land and do not (can’t) touch Kernel-Land. That’s why later in this post, we are going to circumvent running to Kernel-Land to execute a syscall and just do it ourselves. But, Sysmon is different in that a driver is loaded (SysmonDrv.sys) that will still hook and enumerate the syscall regardless of where the instruction is executed.

Sysmon Driver hooking

Medium Level API – Ntdll.dll

We have set up a program that moves away from using the High-Level API and directly calls the undocumented functions that are resident within ntdll.dll. You’ll notice that we now have a series of new structs and custom types that are necessary in order to load the required functions and execute them properly.

The goal of this code is to mitigate the High-Level function calls that the OS then translates to several Lower-Level API calls by simply calling the lower-level API ourselves.

API Call Analysis

To analyze API calls, I am using API Monitor with the following filters: [Data Access and Storage, NT Native, System Services].

Number of API Calls309
Total Amount of Memory User196 KB

There was an uptick in overall API Calls and size of memory used (not a big deal). But, we successfully bypassed using the High-Level API to execute shellcode. When we look at the actual API Calls (in the image below), we can see a distinct difference. All the API calls were called via the executable rather than being passed to KERNELBASE.dll. This is because we loaded ntdll.dll and then dynamically loaded the functions we needed in order to inject into nslookup.exe.

API Monitor Analysis of Medium-Level API calls

Sysmon Analysis

Sysmon caught all the same events as the High-Level API calls. This is to be expected since the driver has the capability to hook events subject to the configuration.

Sysmon Analysis of Medium-Level API calls

To get a better idea as to where Sysmon was hooking our syscalls, we can review the process stack right before shellcode execution in nslookup.exe to see if the SysmonDrv.sys driver was loaded. And that’s exactly what we see. Procmon is a bit limited as to the overall functions we can see within the stack trace (regardless of proper symbol loading) but we can easily discern our own event timeline subject to the Process Create, Process Start, and Thread Create operations.

Prcmon.exe Analysis of Medium-Level API calls

Reviewing the image above shows that right when nslookup.exe was executing the shellcode in a new thread, Sysmon observes an event that the Sysmon Configuration file says we are interested in so, it hooks it and analyzes it.

Low Level API – Direct Syscalls

This is where the fun really starts. To this point, we’ve used the Windows High-Level MSDN Documented methods of accessing process memory, changing process memory, and creating a remote thread within an external process. Next, we went one step lower and manually mapped the Nt* functions residing within ntdll.dll to our program and called them directly. Now, we’re going to completely remove any Windows DLL imports and manually conduct the syscalls with our own custom assembly rather than having ntdll.dll or kernelbase.dll do it for us.

I’ve used a tool called SysWhispers to generate both the header file (common.h) and the x86 assembly file (common.asm). I followed the repositories instructions on how to load both files into the Visual Studio C++ project and then compiled common.asm and included the common.h header.

Let’s go through exactly how this works. First, common.asm contains the proper Windows Syscall integer values for many of the undocumented Windows functions. Common.asm not only maps the function name to the proper syscall integer but, it also takes OS Version into account in order to mitigate syscall issues subject to value changes over time. Let’s take a look at the NtAllocateVirtualMemory() syscall assembly that SysWhispers generated for us.

Subject to the OS Versioning, the NtAllocateVirualMemory instructions will set the correct integer value and load that value into the EAX Register. Next, it will jmp to NtAllocateVirtuallMemory_Epilouge which makes the syscall. The Windows Version that I am developing on is Windows 10.0.17763 (1809) which means the proper syscall value for NtAllocateVirtualMemory is 0x18 or decimal 24. And just to double check, we can look at a table that details all the functions and syscall values subject to OS Version here.

Let’s look at this in a debugger. I set a break point on NtAllocateVirtualMemory within the C++ source. Once I hit that break point I will step through the common.asm instructions until the EAX Register is updated which means an OS Version match was found and the Syscall Integer value was loaded into EAX (lower 32-bits of RAX).

NOTE: If you’re not sure how x86 and x86_64 bit registers correlate to each other here is a fantastic graphic that details all the registers:

How x86 registers fit into x86_64 registers

As expected, we’ve matched one of the OS Version values and the EAX register is set to 0x18. Next, we jump to the epilogue to execute the NtAllocateVirtualMemory syscall. Keep in mind, we did not load any external functions to help us execute this syscall. We directly issued this syscall.

Direct Syscall observed in the Visual Studio Debugger

API Call Analysis

To analyze API calls, I am using API Monitor with the following filters: [Data Access and Storage, NT Native, System Services].

Number of API Calls288
Total Amount of Memory User186 KB
API Monitor Analysis of Low-Level Direct Syscalls

We don’t see ANY of the standard API calls we saw with the High-Level and Medium-Level API calls. That’s because we did not load any external resources to conduct the syscall. But, the Syscall still happened, and we can confirm that by reviewing the Sysmon event logs. Remember, Sysmon hooking is running in Kernel-Land (SYSTEM) and as such, we can’t really hide ourselves from it unless we disable it (Need to be admin).

Sysmon Analysis

Once again, Sysmon was able to hook and log the five (5) events.

Sysmon Analysis of Low-Level Direct Syscalls


Red Team

The lower we can go, the better. We can evade AV / EDR systems that hook in User-Land and do all kinds of fancy things by rolling our own syscalls. Mixing this technique with a plethora of others such as Arbitrary Code Guard, off-binary payload ingestion, etc. can allow us to operate with less noise as well as arm Blue Team with new detection capabilities.

Blue Team

Use Sysmon! As we saw, the Sysmon driver was able to hook all of our remote thread activity. This is a Free tool with several features that should be rolled into your detection processes. For more information:

1 Comment

Post a Comment