ASLR-Guard

# ASLR-Guard Presented @ Computer and Communications Security 2015

A general prerequisite for code reuse attack is to locate the code you want to reuse. By randomizing the location of code at each execution, it's harder for an attacker to precompute a payload. Theorically, that's impossible indeed.

What's been shown by research in the last year is that attackers can ignore randomizations sometimes.

Say you have some vulnerability that allows attackers to read data from the stack. Assuming the C calling convention is used, the return address of a procedure is pushed onto the stack. `[ ... | stored_rip | stored_rbp | ... ]` `<<<<<<<<^~~~~~~~~~<<<<<<<<<<<<<<<<<<<<<`

Given the attacker knows `WHERE` the procedure is supposed to return, if the read from the stack is successful, attacker now knows the location of `WHERE` in memory. This potentially gives the attacker a very evocative clue about the base address of the module that contains the `WHERE` instruction. base_address = &WHERE - offset_of_WHERE

Hacker's jargon gives this kind of vulnerability the name *information leak*. That is, attackers obtain code location infos **after** ASLR has done its job. While this could be technically challenging for mortals, it's exploitable in theory -- and kickass CTFers did it for real.

The paper tries to come up with a solution for this kind of vulnerability by instrumenting code to behave in a more secure way. Leak of return addresses are not the only dangerous one. Indeed, the paper tries to prevent all the potential leak of *code locators*.

As defined by the paper, **code locator** is > "... any pointer or data that can be used to infer code addresses."

*Generating* a code locator means somehow *building* it, then storing in a register. From that point on, it might be saved in memory, *e.g.* used in a variable assigment. Lu *et al.* tried to categorize them all, then came up with different strategies to protect them.

4 different categories have been defined, based on the program's life cycle. + **load-time** + **runtime** + **OS-injected** + **correlated**

The *correlated* category represents any information about data position in memory, where data is at a known and fixed offset from code. Assuming the attacker know how to correctly perform addition/subtraction, those information are dangerous too if leaked (from a defender perspective).

A `python` tool (~1K lines) has been developed to analyse memory *before* and *after* specified hooks. For example, *before* and *after* the syscall 42: if while analysing memory, some `8byte` chunk, *after* the syscall, is found pointing to a byte inside an executable segment of memory, we now know syscall 42 generates a code locator and injects it somehow in memory.

This memory analysis tool has been used both to understand how the kernel injects code locators in the process' address space, and to validate the *static* deductions on how the rest of code locators are generated (made by reading the source code of `ld`, `as`, `cc` and `ld.so`).

## How to catch code locators for different categories?

## Load-time 'locators Any code locators generated at load-time relies on relocation information, assuming ASLR is active. Indeed, code locators generated at load-time depends on a state known at load-time only. That is, before being used they must be relocated. Via hooking the loader's relocation procedure, any code locator generated at load time could be checked and properly protected.

## Runtime 'locators Any `call`, which will `mov`e `%rip` onto the stack, generates a code locator (leaking position of code *to* the stack).

## Runtime 'locators `lea {offset}(%rip), ...` possibly used when loading a pointer with the value of a local function def fn(): def g(): pass ptr = g

## Runtime 'locators `{set,long}jmp`, a code locator is pushed onto the stack (by `set`), then dereferenced (by `long`). (`goto`?, `try/catch`?)

## OS-injected Apparently, the program entry point is pushed onto the stack by the kernel. Also, the entire *environment* (`%eip` included) is saved in the process address space, for signal handling.

# How to protect each category?

## Correlated When code and data sections are mapped in the same segment, there might be logic, in code, that access data using an offset (possibly from the current location of `%eip`). That means that even leaks about data position in memory might be dangerous. Randomizing sections makes code and data sections offset random, and known at load-time only.

## OS-injected Use two stacks. One, whose top is stored as usual in `%rsp`, the *AG-Stack*. On `%r15`, the top of a second stack is kept. The AG-Stack is used for storing sensitive information (*return addresses*) and other data pushed by the kernel after a syscall or a signal handled. The other, unsafe, stack is used for any general program data.

Since the AG-Stack never contains program data (parameters or vars), there won't be code referencing it. Its location is randomized too, and never leave `%rsp`.

## Runtime Return addresses are stored in the AG-Stack. Code locators generated by *GetPC* or *GetRet* set of instructions are **encrypted** instead.

# Encryption of code locators.

When a code locator is hard to keep isolate in memory, but instead it's used in unsafe memory, it is *somehow* encrypted.

This way, even if the attacker succeed to read unsafe memory, it will only read the encrypted version of the code locator.

## How. A table is used: *AG-RandMap*. Each entry is a 16byte chunk, consisting of `[ code locator ] [ ... 0 ... ] [ nonce ]`

When a code locator needs to be encrypted, a random nonce with 32bits of entropy is generated. The code locator and `b'0'*4` are prepended to the nonce and inserted inside the AG-RandMap table with an offset generated on the fly, with 32bits of entropy too. The encrypted code locator returned is an 8bytes chunk consisting of `[ random offset ] [ nonce ]`

Whenever an encrypted code locator is used, assuming it's stored in `%rax` and the base address of the AGRand-Map is in `%gs`, it can be decrypted via ... xor %gs:8(%eax), %rax call %gs:(%rax) ... At the end of the decryption "*routine*", `%rax` will contain the correct offset to fetch the right code locator only if the nonce was the same generated during the encryption routine.

The rest of code locators are stored in plaintext in an isolated data structure called the *safe vault*, that's guaranteed to remain isolated by randomizing it's base address and never saving it in memory but handling any kind of reference to it in registers only.

## How is this accomplished? Via a static toolchain and a dynamic loader. That means, for binaries to be hardened by that technique, the source code of the program as of all the loaded modules must be available.

### Compiler + Reserve the `%r15` register for the regular/unsafe stack. + Prefer `mov` instructions to `push/pop`, `enter/leave` -- to avoid `%rsp` modification.

### Assembler + Append the encryption routine right after a code locator is generated by one (or a set of) instruction(s). + Prepend the decryption routine when dereferencing encrypted code locators.

### Static linker + Strip encryption/decryption routines when addresses are generated to access data. Indeed, the assembler appends the encryption routine in a conservative way. Now, the linker knows what's supposed to be code and what data; thus it can strip the routines not needed.

### Dynamic loader + Inizialize the stack(s!), allocate space for the random mapping table, isolate it (i.e. randomize its base address, store it in the `%gs` segment register). + Encrypt all code locators generated at load-time, hooking the relocation routine.

## How secure

From a theoritical point of view, if the target is a binary compiled with the ASLR-Guard toolchain, and all the loaded modules are as well, what is the chance of success for an attacker to hijack the control flow to an address `x`? She could either rewrite the content of the safe valut -- but she needs to locate it first, with a chance of `2**-28`, or she could rewrite an encrypted code locator -- assuming at least one entry for `x` exists in the random mapping table; yet, she needs to find the correct nonce, with a chance of success `<= 2**-32`.

In both cases, the chance of success is `<= 2**-28`. That means, an ASLR-Guard instrumented binary should provide at least the same security *plain* ASLR provides.

Empirically, the memory analysis tool is used one more time, hooking program's entry/exit point and right after every syscall. The whole memory is dumped there. The entire software suite of the SPEC benchmark 2006 is used, with the following results + No single *plaintext* code locator is left in unsafe memory. + Encrypted code locators are less than `10%` for most programs; for many of them `~20`.

### `nginx v1.4.0` Since a spawned nginx worker, if it crashes, won't cause the entire server to crash, via exploiting a buffer overflow vulnerability, the return address could be repeatdly rewritten, until the correct one is found, hence obtaining a code locator *after* ASLRandomization. `BROP` was a tool that automatically exploited `nginx v1.4.0`. Via rebuilding `nginx` using the ASLR-Guard toolchain, `BROP` fails to exploit nginx. Indeed, the return address isn't even present on the stack BROP is reading!

### Performance Taking the average of 10executions for the software used by the SPEC benchmark, an overhead of less than `1%` has been registered as far as time is concerned. Building the software takes longer too, with an overhead of `~6%`. While loading is still very fast, `~1μs`. File size grows by `6%` on average, while memory size is `~2MB` larger as for the structures kept in memory that are not loaded for the *not* hardened binary.

`exit(-1);`