ethical.blue Magazine

// Cybersecurity clarified.

Shellcode for Windows x86-64 (x64)

2022-05-30   Dawid Farbaniec
...
Typical computer programs depend on references such as system functions (APIs) or simply application data (strings, numbers, etc.). On the other hand, the injectable shellcode should be self-contained, that is, in a descriptive way: contain what it needs to operate. Thanks to this, when it is placed anywhere in the computer memory, it will still work properly, and even have the ability to move its fragments to other places, (self-)relocation.

Calling APIs is Little Difficult

Processor instructions that do not reference API functions (for example: MOV, ADD, DEC, XOR etc. .) can be easily executed anywhere in memory.

However, in native Windows applications the WinAPI functions provide the main functionality. The system API allows one to create a file, read a file, make network connection, display a window and many other possibilities. In typical programs, the WinAPI function is called (simplified) by giving its name (and arguments, if any), which name is changed to an address (a numerical value specifying the function's place) and called.

Injectable shellcode inserted into some memory location has to deal with calling Windows API functions in an unusual way. In Assembly language program, the call is made with the CALL giving the name of the function and the required arguments through the appropriate registers/stack.

The shellcode described here cannot specify a function name. Function addresses cannot be also hardcoded into the code, because they changes. The standard retrieval of the address of a given function via GetProcAddress is also impossible, because one must first know the address of the function GetProcAddress.

Microsoft x64 Calling Convention

A function call in x64 Assembly, also known as a subroutine call, transfers control elsewhere in your code. When the code block determined by the function is executed, a return takes place, which is possible through the return address previously pushed on the program stack. Functions are called with the CALL processor instruction. It pushes the previously mentioned return address on the stack and passes control to the called subroutine.

There is no one way to call a function. It depends on the architecture, and how a call works and related operations are determined by calling conventions.

MASM32 Assembly for x86-32 architecture uses the stdcall convention, which is the default for the Windows API. Clearing a program stack is the job of the function being called, meaning that a programmer using such a function has no need to clear the stack. Arguments (also known as parameters) are passed from the end of the stack, that is, from right to left. If the function returns a result, it will go to the accumulator register EAX. Some functions, when the result is greater than 32-bits, return the result in the pair of EDX: EAX registers. In the stdcall convention, if we want (in our function) to modify the values ​​of ESI, EDI, EBP and EBX, then we should keep their values ​​on the stack, for example, and then restore them before returning to Windows.

Calling CreateFile in Visual C++
hFile = CreateFile(szFileName, 

    GENERIC_WRITE,
    0, 0,
    CREATE_NEW,
    FILE_ATTRIBUTE_NORMAL, 0);

Calling CreateFile in MASM32 Assembly (stdcall convention)
push 0 ;bez szablonu atrybutów

push FILE_ATTRIBUTE_NORMAL ;atrybuty pliku
push CREATE_NEW ;utwórz nowy plik
push 0 ;atrybuty bezpieczeństwa domyślne
push 0 ;sharing mode domyślny
push GENERIC_WRITE ;otwarcie do zapisu
push offset szFileName ;nazwa tworzonego pliku
call CreateFileA ;wywołanie funkcji WinAPI
mov hFile, eax

The MASM x64 for the x86-64 architecture (x64 for short) uses the Microsoft x64 calling convention. It is the responsibility of the caller to clear the program stack. Arguments are not only passed through the stack, but through selected registers such as: R9, R8, RDX, RCX . The stack is used when there are more than four arguments and pushed "from the end", that is, from right to left. The function result, if smaller than 64-bit in size, is returned in the RAX accumulator register. In the Microsoft x64 convention, if we want (in our function) to modify the values ​​of RBP, RBX, RDI, RSI , RSP, R12, R13, R14 and R15, then we should keep their values ​​on the stack, for example, and then restore them before returning to Windows. Also, be sure to align the stack to the round 16 bytes. The amount of space reserved on the stack along with the return address should be divisible by 16 without the remainder.

Calling CreateFile in MASM x64 (Microsoft x64 convention)
sub rsp, 38h ;alokacja miejsca na stosie

mov qword ptr [rsp+30h], 0 ;bez szablonu atrybutów
mov qword ptr [rsp+28h], FILE_ATTRIBUTE_NORMAL ;atrybuty pliku
mov qword ptr [rsp+20h], CREATE_NEW ;utwórz nowy plik
mov r9, 0 ;atrybuty bezpieczeństwa domyślne
mov r8, 0 ;sharing mode domyślny
mov rdx, GENERIC_WRITE ;otwarcie do zapisu
mov rcx, offset szFileName ;nazwa tworzonego pliku
call CreateFileA ;wywołanie funkcji WinAPI
mov hFile, rax
add rsp, 38h ;zwolnienie miejsca na stosie

Thread Environment Block (TEB)

The starting point for getting the addresses of WinAPI functions in the created shellcode is getting the base address of the system library kernel32.dll. The first step is to learn about the TEB (also known as the TIB).

The Thread Environment Block is a data structure that contains information about the thread currently executing. It can be accessed through the GS segment register in 64-bit mode and the FS register in 32-bit mode.

The important thing is that despite being lightly wrapped in secrecy about this structure (it is an internal system structure), it is possible to get it built without doing some reverse engineering (RCE). In the winternl.h header file for Visual C++ in the Visual Studio environment, one can find the definition of this structure:
//plik winternl.h

typedef struct _TEB {
    PVOID Reserved1[12];
    PPEB ProcessEnvironmentBlock;
    PVOID Reserved2[399];
    BYTE Reserved3[1952];
    PVOID TlsSlots[64];
    BYTE Reserved4[8];
    PVOID Reserved5[26];
    PVOID ReservedForOle;  // Windows 2000 only
    PVOID Reserved6[4];
    PVOID TlsExpansionSlots;
} TEB, *PTEB;

Please note that these are internal APIs and frameworks and are subject to change. Comment in the winternl.h file:
Quote:
winternl.h - This module defines the internal NT APIs and data structures that are intended for the use only by internal core Windows components. These APIs and data structures may change at any time. These APIs and data structures are subject to changes from one Windows release to another Windows release. To maintain the compatiblity of your application, avoid using these APIs and data structures.

As a summary of this subsection: learning about this structure brings us closer to the goal. Well, through the TEB structure it is possible to gain access to another structure: PEB.

Process Environment Block (PEB)

The PEB is similar to TEB and is the internal system structure.

One can get PEB base address for example with:
mov rax, gs:[30h] ;RAX = PEB structure
or by using instruction rdgsbase:
rdgsbase rax ;RAX = PEB structure
In Visual C++ the PEB base address can be retrieved with intrinsics:
//x86-64 (64-bit)

PVOID peb_x64 = (PVOID) __readgsqword(0x30);
//x86 (32-bit)
PVOID peb_x86 = (PVOID) __readfsdword(0x18);

Structure definition from winternl.h (Visual C++):
//winternl.h

typedef struct _PEB {
    BYTE Reserved1[2];
    BYTE BeingDebugged;
    BYTE Reserved2[1];
    PVOID Reserved3[2];
    PPEB_LDR_DATA Ldr;
    PRTL_USER_PROCESS_PARAMETERS ProcessParameters;
    PVOID Reserved4[3];
    PVOID AtlThunkSListPtr;
    PVOID Reserved5;
    ULONG Reserved6;
    PVOID Reserved7;
    ULONG Reserved8;
    ULONG AtlThunkSListPtr32;
    PVOID Reserved9[45];
    BYTE Reserved10[96];
    PPS_POST_PROCESS_INIT_ROUTINE PostProcessInitRoutine;
    BYTE Reserved11[128];
    PVOID Reserved12[1];
    ULONG SessionId;
} PEB, *PPEB;

Notice the field called Ldr This field is a struct _PEB_LDR_DATA and has a definition:
typedef struct _PEB_LDR_DATA {

    BYTE Reserved1[8];
    PVOID Reserved2[3];
    LIST_ENTRY InMemoryOrderModuleList;
} PEB_LDR_DATA, *PPEB_LDR_DATA;
In the (_PEB_LDR_DATA) structure there is a field InMemoryOrderModuleList from which one can get the kernel32.dll base address.

The _LIST_ENTRY structure is:
typedef struct _LIST_ENTRY {

    struct _LIST_ENTRY *Flink;
    struct _LIST_ENTRY *Blink;
} LIST_ENTRY, *PLIST_ENTRY, *RESTRICTED_POINTER PRLIST_ENTRY;

Get Kernel32.dll Base Address

After looking at the size in bytes of individual fields of the above-described structures, it is possible to traverse through them. The whole thing is to enter the appropriate offset values and memory dereference through the operator ptr (for 64-bit values qword ptr).

;Znajdź adres kernel32.dll

;poprzez przeszukanie struktury
;ProcessEnvironmentBlock (PEB)

;R10 = TEB.ProcessEnvironmentBlock
mov r10, gs:[60h]

;R10 = ProcessEnvironmentBlock->Ldr
mov r10, qword ptr [r10 + 18h]

;R11 = Ldr->InMemoryOrderModuleList
mov r11, qword ptr [r10 + 20h]

;R10 = InMemoryOrderModuleList.Flink
mov r10, qword ptr [r11]

;R11 = InMemoryOrderModuleList.Flink->Flink
mov r11, qword ptr [r10]

;R10 = InMemoryOrderModuleList.Flink->Flink->Flink
;R10 = kernel32.dll base address!
mov r10, qword ptr [r11 + 20h]

The above code in MASM x64 returns the base address of the kernel32.dll module in the R10 register. After accessing the InMemoryOrderModuleList list, it goes to its third element. This is because the third module is the kernel32.dll one are looking for. The first module is the executable file that is running and the second is ntdll.dll.

The application below additionally compares the base address of the kernel32.dll module found in the PEB structure with the address obtained by the usual method by calling the LoadLibrary function. Of course, this check is only for debugging/learning purposes and this snippet will not exist in the created shellcode.

; Get kernel32.dll base address

; through Process Environment Block
; ethical.blue 2019

extrn ExitProcess : proc
extrn LoadLibraryA : proc
extrn MessageBoxA : proc

.code
Main proc
;Znajdź adres kernel32.dll
;poprzez przeszukanie struktury
;ProcessEnvironmentBlock (PEB)

;R10 = TEB.ProcessEnvironmentBlock
mov r10, gs:[60h]

;R10 = ProcessEnvironmentBlock->Ldr
mov r10, qword ptr [r10 + 18h]

;R11 = Ldr->InMemoryOrderModuleList
mov r11, qword ptr [r10 + 20h]

;R10 = InMemoryOrderModuleList.Flink
mov r10, qword ptr [r11]

;R11 = InMemoryOrderModuleList.Flink->Flink
mov r11, qword ptr [r10]

;R10 = InMemoryOrderModuleList.Flink->Flink->Flink
;R10 = kernel32.dll base address!
mov r10, qword ptr [r11 + 20h]

;Poniżej sprawdzenie poprawności,
;czyli porównanie znalezionego
;adresu bazowego kernel32.dll
;z adresem pobranym przez LoadLibraryA

;zachowaj znaleziony
;adres kernel32.dll
;na stosie
push r10

;pobierz adres kernel32.dll
;w sposób zwykły
sub rsp, 28h
mov rcx, offset kernel32dll
call LoadLibraryA
add rsp, 28h

;zdejmij adres kernel32.dll
;ze stosu programu
pop rcx

;porównaj adresy
cmp rcx, rax
jne _bad

_good:
mov rdx, offset szFound
jmp _msgbox

_bad:
mov rdx, offset szNotFound

;wyświetl stosowny komunikat
_msgbox:
sub rsp, 28h
xor r9, r9
xor r8, r8
;rdx ustawione wcześniej
xor rcx, rcx
call MessageBoxA
add rsp, 28h

_exit:
sub rsp, 28h
xor rcx, rcx
call ExitProcess

kernel32dll db "kernel32.dll", 0
szFound db "kernel32.dll address found.", 0
szNotFound db "kernel32.dll address NOT found.", 0
Main endp
end

Sample Educational Win64 Shellcode Template

So we arrived to code template which allows to simplify shellcode writing. Build code step by step: x64 Assembly Project in Visual Studio 2022

;☣️ Custom payload template ☣️

;
;✔️ In R15 register you have GetProcAddress.
;✔️ In RDI register you have LoadLibraryA.
;
;📖 Windows API is at your disposal. 📖
;
;How to build executable (cmd.exe):
;ml64.exe prog1.asm /link /entry:Main /subsystem:windows

.code
    Main proc
        nop
        nop
        nop
        
        ;✂️--- CUT HERE ---✂️
        
        push rbp
        push rbx
        push rdi
        push rsi
        push rsp
        push r12
        push r13
        push r14
        push r15
        
        mov r10, gs:[60h]
        mov r10, qword ptr [r10 + 18h]
        mov r11, qword ptr [r10 + 20h]
        mov r10, qword ptr [r11]
        mov r11, qword ptr [r10]
        mov rbx, qword ptr [r11 + 20h]
        mov r9d, dword ptr [rbx + 3Ch]
        add r9, rbx
        add r9, 18h + 70h
        mov r11d, dword ptr [r9]
        lea r8, qword ptr [rbx + r11]
        mov ecx, dword ptr [r8 + 18h]
        mov r12d, dword ptr [r8 + 20h]
        add r12, rbx
        
        _search_loop:
        lea r10, qword ptr [r12 + rcx * sizeof dword]
        mov edi, dword ptr [r10]
        add rdi, rbx
        lea rsi, qword ptr [szGetProcAddress]
        
        _compare_str:
        cmpsb
        jne _function_not_found
        mov al, byte ptr [rsi]
        test al, al
        
        jz _function_found
        jmp _compare_str
        
        _function_not_found:
        loop _search_loop
        jmp _exit
        
        _function_found:
        mov r10d, dword ptr [r8 + 24h]
        add r10, rbx
        mov cx, word ptr [r10 + rcx * sizeof word]
        mov r10d, dword ptr [r8 + 1Ch]
        add r10, rbx
        mov eax, dword ptr [r10 + rcx * sizeof dword]
        add rax, rbx
        mov r15, rax
        
        ;GetProcAddress("kernel32.dll", "LoadLibraryA");
        mov rcx, "Ayra"
        push rcx
        mov rcx, "rbiLdaoL"
        push rcx
        mov rdx, rsp
        mov rcx, rbx
        sub rsp, 30h
        call rax ;call GetProcAddress
        add rsp, 30h + 10h
        ;RDI = LoadLibraryA
        mov rdi, rax
        
        ;In R15 register you have GetProcAddress.
        ;In RDI register you have LoadLibraryA.
        ;(...)
        ;Your function calls HERE.
        ;(...)
        ;Windows API is at your disposal.
        
        pop r15
        pop r14
        pop r13
        pop r12
        pop rsp
        pop rsi
        pop rdi
        pop rbx
        pop rbp
        
        jmp _exit
            szGetProcAddress db "GetProcAddress", 0
        _exit:
        ret
        
        ;✂️--- CUT HERE ---✂️
        
        nop
        nop
        nop
    Main endp
end

Educational Shellcode/Payload Generator

Quote:
Educational x64 shellcode/payload generator for ethical hacking. This educational application can help You learn about relocatable and injectable code. There are 4 harmless built-in payloads presented as MASM x64 Assembly source code. Program contains also Custom payload source code template and generator to obfuscate payload bytes against easy reverse engineering. Application does not contain executable or malicious files. All code examples are presented as colorized HTML source.









Get it from Microsoft: https://apps.microsoft.com/store/detail/ethical7/9NWG8CDLW3T6

Bibliography

Advanced Micro Devices Inc., 2017 – AMD64 Architecture Programmer's Manual
Intel Corporation, 2019 – Intel 64 and IA-32 Architectures Software Developer's Manual
https://docs.microsoft.com/en-us/cpp/assembler/masm/masm-for-x64-ml64-exe [access: 2020-07-28]
https://docs.microsoft.com/en-us/windows/win32/api/winternl/ns-winternl-peb [access: 2020-07-28]