pmeerw's blog

Thu, 26 Nov 2020

QEMU user-mode emulation

qemu can emulate all kind of architectures and processors, including x86 and x86_64, it has presets for a long list of CPUs ([1], 486, pentium, Haswell, etc.)

I've tried this using qemu 4.2.1 on Ubuntu 20.04, latest is 5.1.0.

qemu does full-system emulation AND user-mode emulation. While the former allows to run a wide range of operating systems on any supported architecture [2], the later runs programs for another Linux or BSD target.

       Full-system                     User-mode
+---------------------+         +---------------------+
| Userspace emulation |         | Userspace emulation |
+----------+----------+         +----------+----------+
           |                               |
 +---------+--------+              +-------+-------+
 | Kernel emulation |              | Kernel native |
 +---------+--------+              +-------+-------+
           |                               |
+----------+---------+            +--------+--------+
| Hardware emulation |            | Hardware native |
+--------------------+            +-----------------+

Let's compile the following simple program (hello.c):

#include <stdio.h>
int main() {
  printf("hello world %p\n", main);
  return 0;
And link statically to be self-contained; qemu can handle dynamically linked executables just fine as well.

To compile and link for 32-bit ARM [3]: arm-linux-gnueabihf-gcc -static -o hello-arm hello.c
For 64-bit x86: gcc -static -o hello-x86_x64 hello.c

Let's check:
$ file hello-arm
hello-arm: ELF 32-bit LSB executable, ARM, EABI5 version 1 (GNU/Linux), statically linked, for GNU/Linux 3.2.0, not stripped
$ file hello-x86_x64
hello-x86_x64: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, for GNU/Linux 3.2.0, not stripped

On Ubuntu, we need qemu-user [4], and can then execute both binaries:
$ qemu-arm -- ./hello-arm
hello world 0x10425
$ qemu-x86_64 -- ./hello-x86_64
hello world 0x401ce5

qemu translates the input binary to run on the native CPU, also in case the architectures match. It uses internal micro ops (some intermediate representation), these can be observed before and after optimization:
qemu-x86_64 -d op -- ./hello-x86_64
qemu-x86_64 -d op_opt -- ./hello-x86_64

For example:

 mov_i64 tmp0,r13
 mov_i64 tmp1,r13
 and_i64 cc_dst,tmp0,tmp1
 discard cc_src
 discard loc10

Also the input and output assembler code can be seen:
qemu-x86_64 -d in_asm -- ./hello-x86_x64
qemu-x86_64 -d out_asm -- ./hello-x86_x64

[1] qemu -cpu help
[2] arm, m64k, mips, mips64, ppc, sparc, sparc64, etc.
[3] apt install gcc-arm-linux-gnueabihf
[4] apt install qemu-user
[5] To show log items: qemu-x86_64 -d help

posted at: 23:45 | path: /programming | permanent link

Tue, 05 May 2020

Statically checking C/C++ for unused return values

A seemingly simple problem: check C/C++ code statically for unused return values, but surprisingly here is no easily available tooling. Let's look at some options:

  1. C++-17 has annotation [[nodiscard]], e.g. the following code (unused-return.cpp)
    int foo() {
      return 42;
    int bar() {
      return 23;
    int main() {
    when compiled with g++-8 unused-return.cpp, will result in
    unused-return.cpp: In function ‘int main()’:
    unused-return.cpp:12:6: warning: ignoring return value of ‘int bar()’, declared with attribute nodiscard [-Wunused-result]
    unused-return.cpp:6:5: note: declared here
     int bar() {
    (tested with GCC 8.4 / Ubuntu)

    No warning will printed (foo()), unless [[nodiscard]] is annotated (bar()).

  2. With GCC and clang, an attribute can be added to the function declaration, e.g. unused-return.c:
    __attribute__ ((warn_unused_result))
    int bar() {
      return 23;
    resulting in a warning
    unused-return.c: In function ‘main’:
    unused-return.c:12:3: warning: ignoring return value of ‘bar’, declared with attribute warn_unused_result [-Wunused-result]
    when compiled with gcc unused-return.c (GCC 8.4/Ubuntu). It doesn't help to enable warnings to get a similar warning for function foo().
  3. Synopsys Coverity can be used, at least it will report a warning when the return value of a function is checked inconsistently. The tool is costly and probably a bit overkill...
  4. A linter can be used, e.g the free splint tool, splint unused-return.c, but the output is quite verbose and doesn't cover C++:
    Splint 3.1.2 --- 20 Feb 2018
    unused-return.c: (in function main)
    unused-return.c:11:3: Return value (type int) ignored: foo()
      Result returned by function call is not used. If this is intended, can cast
      result to (void) to eliminate message. (Use -retvalint to inhibit warning)
    unused-return.c:12:3: Return value (type int) ignored: bar()
    Finished checking --- 2 code warnings
  5. The clang-query tool can be used to moreless interactively query the AST of the program. This is expored in more detail below...

Stackoverflow provides all the basics: a clang-query script which matches call expressions in the abstract syntax tree (AST) of the program, then restricting to 'intersting cases'.

For a nice intro to clang-query, see this devblog article.

I've added the -w switch to suppress clang warnings when processing the input program, and some bind trickery to make the output a bit nicer.

# Run clang-query to report unused return values.

# When --dump, print the AST of matching syntax.
if [ "x$1" = "x--dump" ]; then
  dump="set output dump"


clang-query-9 -extra-arg="-w" -c="set bind-root false" -c="$dump" -c="$query" "$@" --
The output should look like
Match #1:

unused-return.c:11:3: note: "unused-return" binds here

Match #2:

unused-return.c:12:3: note: "unused-return" binds here
2 matches.
A recent clang version is needed, tested with clang 9 / Ubuntu; clang 6 did not work.

All files for download:

Update (2020-05-05): MSVC has _Check_return_ and _Must_inspect_result_, for good measure.
Update (2020-05-06): clang-tidy has bugprone-unused-return-value to check for missing return values of certain configured functions, such as std::async(), std::unique_ptr::release(), std::remove()
Update (2020-05-06): see reddit

posted at: 01:30 | path: /programming | permanent link

Thu, 03 Oct 2019

Windows RtlAddFunctionTable, the missing documentation

Microsoft Windows offers the RtlAddFunctionTable function which "adds a dynamic function table to the dynamic function table list."

  DWORD64 BaseAddress

A function table is used to associate unwind information to functions in order to help 64-bit Windows unwind the call stack, but also to register an exception handler for the function. For an executable image (PE), the compiler and linker normally put the function table into the .pdata section. In case code is generated at runtime (e.g. by a just-in-time (JIT) compiler) and stack unwinding or exception handling is required, a new function table can be added using the RtlAddFunctionTable API.

A process may be composed of several modules (one executable and zero or more DLLs), each module can provide a function table. In addition, there is a dynamic function table, managed by RtlAddFunctionTable and RtlDeleteFunctionTable. A function table is relative to an (image) base address, as it contains RVAs, relative virtual addresses.

Observations (Windows 10):

  1. RtlAddFunctionTable succeeds, even if the BaseAddress coincides with an image with a static function table.
  2. A static function table (.pdata) takes precedence over a dynamic function table for the entire image size.
  3. Consequently, no runtime info (i.e. exception handler) can be added to a function loaded from an image.

The RtlLookupFunctionEntry function can be used to find the function table entry corresponding to a code address (i.e. program counter, PC, or instruction pointer, IP):

  DWORD64 ControlPc,
  PDWORD64 ImageBase,

Observations (Windows 10):

  1. ImageBase is an output parameter; it is unmodified if the lookup fails.
  2. HistoryTable can be NULL; it can be used to speed up the lookup and should initially point to a zero-initialized HistoryTable.

// sample code to demonstrate Windows RtlAddFunctionTable API
// (c) 2019 Peter Meerwald-Stadler,
// 64-bit only, compile with: cl rtlft.c /link /fixed

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

#include "windows.h"

typedef uint8_t UBYTE;
typedef uint16_t USHORT;

typedef union _UNWIND_CODE {
	struct {
		UBYTE CodeOffset;
		UBYTE UnwindOp : 4;
		UBYTE OpInfo   : 4;
	USHORT FrameOffset;

typedef struct _UNWIND_INFO {
	UBYTE Version       : 3;
	UBYTE Flags         : 5;
	UBYTE SizeOfProlog;
	UBYTE CountOfCodes;
	UBYTE FrameRegister : 4;
	UBYTE FrameOffset   : 4;
	UNWIND_CODE UnwindCode[1];
/*	UNWIND_CODE MoreUnwindCode[((CountOfCodes + 1) & ~1) - 1];
 *	OPTIONAL ULONG ExceptionHandler;
 *	OPTIONAL ULONG ExceptionData[]; */

typedef struct {
	uint8_t code[0x1000];
	RUNTIME_FUNCTION function_table[1];
	UNWIND_INFO unwind_info[1];

static EXCEPTION_DISPOSITION handler(PEXCEPTION_RECORD ExceptionRecord, ULONG64 EstablisherFrame, PCONTEXT ContextRecord, PDISPATCHER_CONTEXT DispatcherContext) {
	ContextRecord->Rip += 3;
	return ExceptionContinueExecution;

int main() {
	int ret;
	uint8_t *code = dynsection->code;
	size_t p = 0;
	code[p++] = 0xb8; // mov rax, 42
	code[p++] = 0x2a;
	code[p++] = 0x00;
	code[p++] = 0x00;
	code[p++] = 0x00;
	code[p++] = 0xc6; // mov byte [rax], 0  -- raises exception!
	code[p++] = 0x00;
	code[p++] = 0x00;
	code[p++] = 0xc3; // ret

	size_t trampoline = p;
	code[p++] = 0x48; // mov rax, 
	code[p++] = 0xb8;
	size_t patch_handler_address = p;
	code[p++] = 0x00; // address to handler patched here
	code[p++] = 0x00;
	code[p++] = 0x00;
	code[p++] = 0x00;
	code[p++] = 0x00;
	code[p++] = 0x00;
	code[p++] = 0x00;
	code[p++] = 0x00;
	code[p++] = 0xff; // jmp rax
	code[p++] = 0xe0;
	DWORD64 dyn_base = 0;
	q = RtlLookupFunctionEntry((DWORD64) code, &dyn_base, NULL);
	printf("lookup 'code' %p %llx\n", q, dyn_base); // no function table entry

	DWORD64 image_base = 0;
	q = RtlLookupFunctionEntry((DWORD64) main, &image_base, NULL);
	printf("lookup 'main' %p %llx\n", q, image_base); // there is a function table entry

	dyn_base = (DWORD64) dynsection;
	UNWIND_INFO *unwind_info = dynsection->unwind_info;
	unwind_info[0].Version = 1;
	unwind_info[0].Flags = UNW_FLAG_EHANDLER;
	unwind_info[0].SizeOfProlog = 0;
	unwind_info[0].CountOfCodes = 0;
	unwind_info[0].FrameRegister = 0;
	unwind_info[0].FrameOffset = 0;
	*(DWORD *) &unwind_info[0].UnwindCode = (DWORD64) &code[trampoline] - dyn_base;

	RUNTIME_FUNCTION *function_table = dynsection->function_table;
	function_table[0].BeginAddress = (DWORD64) &code[0] - dyn_base; // set RVA of dynamic code start
	function_table[0].EndAddress = (DWORD64) &code[trampoline] - dyn_base; // RVA of dynamic code end
	function_table[0].UnwindInfoAddress = (DWORD64) unwind_info - dyn_base; // RVA of unwind info

	*(DWORD64 *) &code[patch_handler_address] = (DWORD64) handler; // VA of handler

	printf("main VA %016llx\n", (DWORD64) main);	
	printf("code VA %016llx\n", (DWORD64) code);
	printf("function table VA %016llx\n", (DWORD64) function_table);
	printf("unwind info VA %016llx\n", (DWORD64) unwind_info);
	printf("handler VA %016llx\n", (DWORD64) handler);
	printf("RUNTIME_FUNCTION begin RVA %08x, end RVA %08x, unwind RVA %08x\n",
		function_table[0].BeginAddress, function_table[0].EndAddress,
	printf("UNWIND_INFO handler RVA %08x\n", *(DWORD *) &unwind_info[0].UnwindCode);
	if (!RtlAddFunctionTable(function_table, 1, dyn_base)) {
		printf("RtlAddFunctionTable() failed, exit.\n");

	q = RtlLookupFunctionEntry((DWORD64) code, &dyn_base, NULL);
	printf("lookup 'code' %p %llx\n", q, dyn_base); // should return address of function table entry

	uint64_t (*call)() = (uint64_t (*)()) code;
	uint64_t result = (*call)();
	printf("result = %llx\n", result);	

	if (!RtlDeleteFunctionTable(function_table)){
		printf("RtlDeleteFunctionTable() failed, exit.\n");

The output should show that the exception has been handled and look similar to:
C:\>cl rtlft.c /link /fixed
lookup 'code' 0000000000000000 0
lookup 'main' 000000014001F00C 140000000
main VA 0000000140001050
code VA 0000000000020000
function table VA 0000000000021000
unwind info VA 000000000002100c
handler VA 0000000140001000
RUNTIME_FUNCTION begin RVA 00000000, end RVA 00000009, unwind RVA 0000100c
UNWIND_INFO handler RVA 00000009
lookup 'code' 0000000000021000 20000
result = 2a

The story here is related to how C++ exceptions work on Windows (video)...

posted at: 23:58 | path: /programming | permanent link

Sun, 16 Jun 2019

Assembler at your fingertips: rappel

rappel is a Linux-based assembly REPL (read-eval-print loop) supporting Intel syntax. Quite handy to try out various instructions:

rax: 0x0000000000000001	rbx: 0x0000000000000002	rcx: 0x0000000000000000	rdx: 0x0000000000000000
rsi: 0x0000000000000000	rdi: 0x0000000000000000	r8 : 0x0000000000000000	r9 : 0x0000000000000000
r10: 0x0000000000000000	r11: 0x0000000000000000	r12: 0x0000000000000000	r13: 0x0000000000000000
r14: 0x0000000000000000	r15: 0x0000000000000000
rip: 0x0000000000400006	rsp: 0x00007fffd64d8f10	rbp: 0x0000000000000000
flags: 0x0000000000000202 [cf:0, zf:0, of:0, sf:0, pf:0, af:0, df:0]
> add eax,ebx
rax: 0x0000000000000003	rbx: 0x0000000000000002	rcx: 0x0000000000000000	rdx: 0x0000000000000000
rsi: 0x0000000000000000	rdi: 0x0000000000000000	r8 : 0x0000000000000000	r9 : 0x0000000000000000
r10: 0x0000000000000000	r11: 0x0000000000000000	r12: 0x0000000000000000	r13: 0x0000000000000000
r14: 0x0000000000000000	r15: 0x0000000000000000
rip: 0x0000000000400003	rsp: 0x00007fffd64d8f10	rbp: 0x0000000000000000
flags: 0x0000000000000206 [cf:0, zf:0, of:0, sf:0, pf:1, af:0, df:0]

Under the hood, it just runs nasm and observes register values. FP/XMM is supported as well...

posted at: 23:00 | path: /programming | permanent link

Sat, 04 May 2019

Rewriting Windows binary with Python's pefile

Using Python module pefile to rewrite a Windows PE/PE+ file (I think both 32-bit and 64-bit are supported, tested 64-bit only). The goal is to append a new section, change the executable's entry point to the new section, jump back to the original entry point (OEP).

#!/usr/env python3

import pefile

def adjust_SectionSize(sz, align):
  if sz % align: sz = ((sz + align) // align) * align
  return sz

pe = pefile.PE('../hello.exe')

last_section = pe.sections[-1]

new_section = pefile.SectionStructure(pe.__IMAGE_SECTION_HEADER_format__)

# fill with zeros

# place section header after last section header (assume there is enough room)
new_section.set_file_offset(last_section.get_file_offset() + last_section.sizeof())

new_section.Name = b'.test'
new_section_size = 100

new_section.SizeOfRawData = adjust_SectionSize(new_section_size, pe.OPTIONAL_HEADER.FileAlignment)
new_section.PointerToRawData = len(pe.__data__)

new_section.Misc = new_section.Misc_PhysicalAddress = new_section.Misc_VirtualSize = new_section_size
new_section.VirtualAddress = last_section.VirtualAddress + adjust_SectionSize(last_section.Misc_VirtualSize, pe.OPTIONAL_HEADER.SectionAlignment)

new_section.Characteristics = 0x40000000 | 0x20000000 | 0x20 # read | execute | code

# create new section data containing jump to OEP
reljmp = pe.OPTIONAL_HEADER.AddressOfEntryPoint - (new_section.VirtualAddress + 5)
print('rel jmp %08x' % (reljmp))
new_section_data = bytearray(new_section.SizeOfRawData)
new_section_data[0] = 0xe9
new_section_data[1:4] = reljmp.to_bytes(4, byteorder='little', signed=True)

# change address of entry point to beginning of new section
pe.OPTIONAL_HEADER.AddressOfEntryPoint = new_section.VirtualAddress

# increase size of image
pe.OPTIONAL_HEADER.SizeOfImage += adjust_SectionSize(new_section_size, pe.OPTIONAL_HEADER.SectionAlignment)

# increase number of sections
pe.FILE_HEADER.NumberOfSections += 1

# append new section to structures

# add new section data to file
pe.__data__ = bytearray(pe.__data__) + new_section_data


posted at: 10:33 | path: /programming | permanent link

Made with PyBlosxom