# 출처

https://breakingmalware.com/injection-techniques/atombombing-brand-new-code-injection-for-windows/

AtomBombing: 윈도우 코드 인젝션

AtomBombing은 새로운 코드 인젝션 기법으로, 윈도우 Atom 테이블과 APC(Async Procedure Calls)를 공격하는 것입니다. 현재 이 공격은 침투를 탐지하는 일반적인 보안 솔루션에는 탐지가 되지 않습니다.

코드 인젝션은 수년동안 해커의 주 공격 수단이었습니다. 코드 인젝션에 대한 배경지식이나 APT 공격 시나리오에서의 다양한 사용법들은 아래 URL을 참고하시기 바랍니다.

http://blog.ensilo.com/atombombing-a-code-injection-that-bypasses-current-security-solutions

Overview

I started poking around to see how hard it would be for a threat actor to find a new method that security vendors are unaware of and bypasses most security products. It also needed to work on different processes rather than being tailored to fit a specific process.

아래 윈도우의 새로운 코드 인젝션인 AtomBombing에 대하여 설명하도록 하겠습니다.

AtomBombing은 크게 아래 3가지 단계로 동작합니다.

Write-What-Where – 목표 프로세스의 주소 공간 내 임의의 위치에 임의의 데이터를 작성합니다.
실행 – 1단계에서 작성한 코드를 실행한 목표 프로세스의 스레드를 하이재킹(hijacking) 합니다.
복구 – 2단계에서 하이재킹(hijacking) 한 스레드를 제거 및 복구합니다.

AtomBombing 1단계: Write-What-Where

저는 2개의 흥미로운 API 호출을 발견하였습니다.

GlobalAddAtom – 문자열을 아톰 테이블(global atom table)에 추가하고 문자열을 식별할 수 있는 특정 값(atom)을 리턴합니다.
GlobalGetAtomName – 특정 아톰(atom)과 관련 있는 문자열들을 복사하여 가져옵니다.

GlobalAddAtom 함수를 호출하여 아톰 테이블(global atom table)에 Null terminated buffer를 저장합니다. 아톰 테이블은 시스템의 모든 프로세스에 접근을 할 수 있습니다. GlobalGetAtomName을 호출하여 버퍼를 가져오게 됩니다. GlobalGetAtomName은 GlobalGetAtomName의 포인터는 출력 버퍼를 가리켜야하므로, 호출자는 null terminated buffer가 저장될 곳을 가리킨다. GlobalGetAtomName expects a pointer to an output buffer, so the caller chooses where the null terminated buffer will be stored.

이론상 GlobalAddAtom을 호출함으로써 아톰 테이블에 쉘코드가 작성된 버퍼를 추가한다면, GlobalGetAtomName을 호출하여 WriteProcessMemory 호출 없이 저의 프로세스에서 목표 프로세스로 코드를 복사하여 어떻게든 타겟 프로세스를 얻을 수 있습니다.

저의 프로세스에서 GlobalAddtom을 호출하는 것은 매우 간단하긴하지만, 여기서 GlobalGetAtomName을 호출하여 어떻게 타겟 프로세스를 얻을 수 있을까요?

APC(Async Procedure Calls)를 이용하면 다음과 같습니다.

QueueUserApc – 사용자모드의 APC 객체를 특정 스레드의 APC queue에 추가합니다.

DWORD WINAPI QueueUserAPC(
_In_ PAPCFUNC  pfnAPC, 
_In_ HANDLE    hThread, 
_In_ ULONG_PTR dwData
);

QueueUserApc는 아래와 같이 정의된 APCProc를 가리키는 포인터를 받게됩니다.

VOID CALLBACK APCProc(
  _In_ ULONG_PTR dwParam
);

GlobalGetAtomName의 프로토타입은 다음과 같습니다.

UINT WINAPI GlobalGetAtomName(  
_In_  ATOM   nAtom,  
_Out_ LPTSTR lpBuffer,  
_In_  int    nSize
);

GlobalGetAtomName은 3개의 파라미터가 필요하므로(APCProc는 1개의 파라미터만 사용합니다.), GlobalGetAtomName을 호출하여 타겟 프로세스를 얻기 위해 QueueUserApc를 사용할 수는 없습니다.

아래 QueueUserApc 내부를 살펴보시기 바랍니다.

Figure 1: QueueUserApc

Figure 1에서처럼 QueueUserApc는 타겟 스레드(thread)의 APC queue를 추가하기 위해 NtQueueApThread 시스템함수를 호출합니다.

흥미롭게도 NtQueueApcThread는 타겟 스레드 Interestingly enough NtQueueApcThread receives a pointer to a function that is to be called asynchronously in the target thread, but the function being passed is not the original APCProc function the caller passed to QueueUserApc. Instead the function being passed is ntdll!RtlDispatchAPC, and the original APCProc function passed to QueueUserApc is passed as a parameter to ntdll!RtlDispatchAPC.

아래 ntdll!RtlDispatchAPC를 살펴보기를 바랍니다.

Figure 2: ntdll!RtlDispatchAPC

3번째 파라미터가 유효한 것(즉, APC를 dispatching하기전에 ActivationContext가 활성화가 되어야 한다는 의미)을 확인함으로써 시작됩니다.

If an ActivationContext needs to be activated:

Figure 3: ntdll!RtlDispatchAPC – RtlActivateActivationContextUnsafeFast

ntdll!RtlDispatchAPC는 다음과 같이 실행합니다.

The passed ActivationContext (currently in ESI) will be activated by calling RtlActivateActivationContextUnsafeFast.
The parameter to the original APCProc function (i.e. the third parameter passed to QueueUserApc) is pushed onto the stack. This is because we are about to call the original APCProc function.
Right before dispatching the APC, a call to CFG (__guard_check_icall_fptr) is made to make sure the APC target is a CFG valid function.
A call to the original APCProc is made, and that’s it – the APC has been dispatched.

아래와 같이 일단 APCProc가 리턴(return)되면, activation context가 비활성화됩니다.

Figure 4: ntdll!RtlDispatchAPC – RtlDeactivateActivationContextUnsafeFast

반대로 activation context가 활성화되지 않는다면 아래와 같습니다.

Figure 5: ntdll!RtlDispatchAPC – no activation context

The code skips all the activation context related stuff and simply dispatches the APC right away after calling CFG.

What does all this mean? When calling QueueUserApc we are forced to pass an APCProc which expects one parameter. However, under the hood QueueUserApc uses NtQueueApcThread to call ntdll!RtlDispatchAPC which expects 3 parameters.

What was our goal? To call GlobalGetAtomName. How many parameters does it expect? 3. Can we do this? Yes. How? NtQueueApcThread!

See main_ApcWriteProcessMemory in AtomBombing’s GitHub repository.

AtomBombing Stage 2: Execution

Obviously I could never hope to consistently find RWX code caves in my target processes. I needed a way to consistently allocate RWX memory in the target process without calling VirtualAllocEx within the context of the injecting process. Sadly, I could not find any such function that I could invoke via APC and would allow me to allocate executable memory or change the protection flags of already allocated memory.

What do we have so far? Write-what-where + a burning desire to get some executable memory. I thought long and hard how to get over this hurdle, and then it hit me. When DEP was invented, its creators thought, “that’s it, data is no longer executable, therefore no one will ever be able to exploit vulnerabilities again”. Unfortunately, that was not the case; a new exploitation technique was invented solely to bypass DEP: ROP – Return Oriented Programming.

How can we use ROP to our advantage in order to execute our shellcode in the target process?

We can copy our code to an RW code cave in the target process (using the method described in stage 1). Then use a meticulously crafted ROP chain to allocate RWX memory, copy the code from the RW code cave to the newly allocated RWX memory, and finally jump to the RWX memory and execute it.

Finding an RW code cave is not a big problem. For this proof of concept, I decided to use the unused space after the data section of kernelbase.

See main_GetCodeCaveAddress in AtomBombing’s GitHub repository.

The ROP Chain:

Our ROP chain needs to do 3 things:

Allocate RWX memory
Copy the shellcode from the RW code cave to the newly allocated RWX memory
Execute the newly allocated RWX memory

ROP Chain Step 1: Allocating RWX Memory

We would like to allocate some RWX memory. The first function that comes to mind is VirtualAlloc – a very useful function that can be used to allocate RWX memory. The only problem is that the function returns the newly allocated RWX memory in EAX which would make our ROP chain complicated by having to find a way to pass the value VirtualAlloc stored in EAX to the next function in the chain.

A very neat trick can be employed in order to simplify our ROP chain and make it more sophisticated. Instead of using VirtualAlloc, we can use ZwAllocateVirtualMemory, which returns the newly allocated RWX memory as an output parameter. This way we can actually set up our stack so that ZwAllocateVirtualMemory stores the newly allocated memory further along the stack, effectively passing the address to the next function in the chain (see Table 1).

ROP Chain Step 2: Copying the Shellcode

The next function we need is a function that will copy memory from one buffer to another. Two options come to mind: memcpy and RtlMoveMemory. When creating this kind of ROP chain one might be initially inclined to go with RtlMoveMemory because it uses the stdcall calling convention, meaning it will clean up the stack after itself. This is a special case though. We need to copy memory to an address (placed on the stack by ZwAllocateVirtualMemory) and then somehow this address needs to be called. If we used RtlMoveMemory, it will pop the address of the RWX shellcode right off the stack upon its return. On the other hand, if we use memcpy, the first entry on the stack would be the return address of memcpy, followed by the destination parameter of memcpy (i.e. the RWX shellcode).

ROP Chain Step 3: Executing the newly allocated RWX memory

We have allocated RWX memory and copied our shellcode to it. We are about to return from memcpy but the address of the RWX shellcode on the stack is 4 bytes away from the return address. Therefore, all we have to do is add an extremely simple gadget to our ROP chain. This simple gadget will execute the opcode “ret”. memcpy will return to this simple gadget which will “ret” right into our RWX shellcode.

See main_FindRetGadget in AtomBombing’s GitHub repository.

For those who have to see it to believe it:

Set EIP to point to ZwAllocateVirtualMemory, and ESP to point to this ROP chain:

Address	Value	Comment
0x30000000	ntdll!memcpy	// Return address from ZwAllocateVirtualMemory
0x30000004	0xffffffff	// Pseudo handle to the current process
0x30000008	0x30000020	// Where to store the allocated memory
0x3000000C	NULL	// Irrelevant
0x30000010	0x30000028	// Pointer to the size of the needed memory
0x30000014	MEM_COMMIT	// Commit and not reserve
0x30000018	PAGE_EXECUTE_READWRITE	// RWX
0x3000001C	POINTER_TO_SOME_RET_INSTRUCTION	// Return Address from memcpy, our extremely simple ret gadget.
0x30000020	NULL	// Where the allocated memory will be saved and the destination parameter of memcpy. This will store the address of the RWX shellcode.
0x30000024	CODE_CAVE_ADDRESS	// The RW code cave containing the shellcode to be copied
0x30000028	SHELLCODE_SIZE	// The size of the shellcode to be allocated

Table 1: The whole ROP chain.

See main_BuildROPChain in AtomBombing’s GitHub repository.

Invoking the ROP Chain

But wait, APCs allow me to send 3 parameters. Obviously I need to store 11 parameters on the stack. Our best bet is to pivot the stack to some RW memory which will contain our ROP chain (e.g. the RW code cave in kernelbase).

How could we pivot the stack?

NTSYSAPI NTSTATUS NTAPI NtSetContextThread(
_In_       HANDLE  hThread,  
_In_ const CONTEXT *lpContext
);

This syscall will set the context (register values) of hThread to the values contained in lpContext. If we can get the target process to call this syscall with an lpContext that will set ESP to point to our ROP chain and set EIP to point to ZwAllocateVirtualMemory, then our ROP chain will execute. The execution of the ROP chain will eventually lead to the execution of our shellcode.

How do we get the target process to make this call? APC has been good to us so far, but this syscall expects 2 parameters and not 3, so when it returns the stack will be corrupt, and the behavior will be undefined. That said, if we pass a handle to the current thread as hThread, then the function will never return. The reason is that once execution gets passed on to the kernel, the context of the thread will be set to the context specified by lpContext, and there will be no trace that NtSetContextThread was ever called. If everything works out as we hope, we will have successfully hijacked a thread and got it to execute our malicious shellcode.

See main_ApcSetThreadContext in AtomBombing’s GitHub repository.

AtomBombing 3단계: 복구

We do have one problem, though. The thread that we hijacked had a purpose before we had hijacked it. If we don’t restore its execution, there is no telling what kind of effect we could have on the target process.

How do we restore execution? I’d like to remind you that we are now in the context of an APC. When the APC function completes, somehow execution is restored safely. Let’s look at the dispatching of APCs from the target process’s point of view.

It looks like the function in charge of dispatching APCs (WaitForSingleObjectEx in this example) is ntdll!KiUserApcDispatcher.

Figure 6: KiUserApcDispatcher

We can see 3 “calls” in this block of code. The first call is to CFG, the next call is to ECX (which is the address of the APC function), and finally a call to the undocumented ZwContinue.

ZwContinue expects to receive a pointer to a CONTEXT structure and resumes the execution. Actually the kernel will check if there are any more APCs in the thread’s APC queue, and dispatch them before finally resuming the thread’s original execution, but we can ignore that.

The CONTEXT structure being passed to ZwContinue is stored in EDI before calling the APC function (stored in ECX). We can save EDI’s value at the beginning of our shellcode, and call ZwContinue with EDI’s original value at the end of the shellcode, thereby restoring execution safely.

See AtomBombingShellcode in AtomBombing’s GitHub repository.

We have to make sure that the value of EDI will not be overridden during the call to NtSetContextThread, since it modifies the values of the registers. This can easily be accomplished by setting ContextFlags (member of the CONTEXT structure passed to NtSetContextThread) to CONTEXT_CONTROL which means that only EBP, EIP, SEGCS, EFLAGS, ESP, and SEGSS will be affected. As long as (CONTEXT.ContextFlags|CONTEXT_INTEGER == 0) we should be ok.

Figure 7: AtomBombing chrome.exe

And there you have it, we have injected code into chrome.exe. Our injected code spawned the classic calc.exe proving that it works.

Let’s try to inject code into vlc.exe:

Figure 8: AtomBombing vlc.exe

The complete implementation can be found on GitHub. It has been tested against Windows 10 x64 Build 1511 (WOW) and Windows 10 x86 Build 10240. Compile for “release”.

Let’s do the same with mspaint.exe:

Figure 9: AtomBombing mspaint.exe

Oh no, it crashed.

마지막 단계

How do we proceed from here? I have worked it out and at this point, I’d rather leave this as an exercise to the reader. As an initial hint, I suggest you take a look at my previous blog post (https://breakingmalware.com/documentation/documenting-undocumented-adding-control-flow-guard-exceptions/). I’m sure you’ll also find creative ideas, that I myself haven’t found, to handle this problem and I’d be happy to start this discussion.

You can use the comments below or catch me @tal_liberman. Through Twitter, I’ll also release some tidbits throughout the week. At any rate, I will publish my solution next week.

APPENDIX: Finding Alertable Threads

One thing we have not yet mentioned is that QueueUserApc only works on threads that are in an alertable state. How does a thread enter an alertable state?

According to Microsoft:

“””

A thread can only do this by calling one of the following functions with the appropriate flags:

SleepEx
WaitForSingleObjectEx
WaitForMultipleObjectsEx
SignalObjectAndWait
MsgWaitForMultipleObjectsEx

When the thread enters an alertable state, the following events occur:

The kernel checks the thread’s APC queue. If the queue contains callback function pointers, the kernel removes the pointer from the queue and sends it to the thread.
The thread executes the callback function.
Steps 1 and 2 are repeated for each pointer remaining in the queue.
When the queue is empty, the thread returns from the function that placed it in an alertable state.

https://msdn.microsoft.com/en-us/library/windows/desktop/aa363772(v=vs.85).aspx

“””

For our technique to be effective the target process must have at least one thread that is in an alertable state, or will enter an alertable state at some point, otherwise our APCs will never actually execute.

I’ve checked various software, and I’ve noticed that most of the programs I’ve checked have at least one alertable thread. Examples: Chrome.exe, Iexplore.exe, Skype.exe, VLC.exe, MsPaint.exe, WmiPrvSE.exe, etc.

So now we have to find an alertable thread in the target process. There are many ways of doing this. I chose to use a method that is trivial, works in most cases, and is easy to implement and explain.

We’ll create an event for each thread in the target process, then ask each thread to set its corresponding event. We’ll wait on the event handles, until one is triggered. The thread whose corresponding event was triggered is an alertable thread.

How can an event be set? By calling SetEvent(HANDLE hEvent).

How will we get the threads in the target process to call SetEvent? APC of course. Since SetEvent receives exactly one parameter we can use QueueUserApc to call it. The actual details of the implementation can be found in main_FindAlertableThread in AtomBombing’s GitHub repository.

Lonnia

AtomBombing: Brand New Code Injection for Windows