# ASLR-Guard
					Presented @ Computer and Communications Security 2015
					
					
					A general prerequisite for code reuse attack
					is to locate the code you want to reuse.
					By randomizing the location of code
					at each execution, it's harder for
					an attacker to precompute a payload.
					Theorically, that's impossible indeed.
					
					
					What's been shown by research in the last
					year is that attackers can ignore randomizations
					sometimes.
					
					
					Say you have some vulnerability
					that allows attackers to read data
					from the stack.
					Assuming the C calling convention
					is used, the return address of a procedure
					is pushed onto the stack.
					`[ ... | stored_rip | stored_rbp | ... ]`
					`<<<<<<<<^~~~~~~~~~<<<<<<<<<<<<<<<<<<<<<`
					
					
					Given the attacker knows `WHERE` the procedure
					is supposed to return, if the read from the
					stack is successful, attacker now knows the
					location of `WHERE` in memory.
					This potentially gives the attacker
					a very evocative clue about the base address
					of the module that contains the `WHERE` instruction.
						base_address = &WHERE - offset_of_WHERE
					
					
					Hacker's jargon gives this kind
					of vulnerability the name *information leak*.
					That is, attackers obtain code location
					infos **after** ASLR has done its job.
					While this could be technically
					challenging for mortals, it's exploitable in theory
					-- and kickass CTFers did it for real.
					
					
					The paper tries to come up with
					a solution for this kind of vulnerability
					by instrumenting code to behave in a
					more secure way.
					Leak of return addresses are not
					the only dangerous one.
					Indeed, the paper tries to prevent
					all the potential leak of *code locators*.
					
					
					As defined by the paper, **code locator**
					is
					> "... any pointer or data that can
					be used to infer code addresses."
					
					
					*Generating* a code locator means
					somehow *building* it, then storing
					in a register. From that point on,
					it might be saved in memory, *e.g.*
					used in a variable assigment.
					Lu *et al.* tried to categorize them all,
					then came up with different strategies
					to protect them.
					
					
					4 different categories have been defined,
					based on the program's life cycle.
					+ **load-time**
					+ **runtime**
					+ **OS-injected** 
					+ **correlated**
					
					
					The *correlated* category
					represents any information	about data position in memory,
					where data is at a known and fixed offset from code.
					Assuming the attacker know how to correctly perform addition/subtraction,
					those information are dangerous too if leaked (from a defender perspective).
					
					
					A `python` tool (~1K lines) has been developed
					to analyse memory *before* and *after* specified hooks.
					For example, *before* and *after* the syscall 42: if while
					analysing memory, some `8byte` chunk,
					*after* the syscall, is found pointing to a byte
					inside an executable segment of memory, we now know syscall 42
					generates a code locator and injects it somehow in memory.
					
					
					This memory analysis tool has been used
					both to understand how the kernel injects
					code locators in the process' address space,
					and to validate the *static* deductions
					on how the rest of code locators are generated
					(made by reading the source code of
					`ld`, `as`, `cc` and `ld.so`).
					
					
					## How to catch code locators for different categories?
					
					
					## Load-time 'locators
					Any code locators generated at load-time
					relies on relocation information, assuming
					ASLR is active.
					Indeed, code locators generated at load-time
					depends on a state known at load-time only.
					
					That is, before being used they must be relocated.
					Via hooking the loader's relocation procedure,
					any code locator generated at load time could
					be checked and properly protected.
					
					
					## Runtime 'locators
					Any `call`, which will `mov`e `%rip`
					onto the stack, generates a code locator
					(leaking position of code *to* the stack).
					
					
					## Runtime 'locators
					`lea {offset}(%rip), ...`
					
					possibly
					used when loading a pointer with the
					value of a local function
						def fn():
							def g(): pass
							ptr = g
					
					
					## Runtime 'locators
					`{set,long}jmp`, a 
					code locator is pushed onto
					the stack (by `set`), then dereferenced
					(by `long`).
					
					(`goto`?, `try/catch`?)
					
					
					## OS-injected
					Apparently, the program entry point
					is pushed onto the stack by the kernel.
					Also, the entire *environment*
					(`%eip` included)
					is saved in the process address space,
					for signal handling.
					
					
					# How to protect each category?
					
					
					## Correlated
					When code and data sections
					are mapped in the same segment,
					there might be logic, in code,
					that access data using an offset
					(possibly from the current location of `%eip`).
					That means that even leaks about
					data position in memory might be
					dangerous.
					Randomizing sections makes
					code and data sections offset
					random, and known at load-time only.
					
					
					## OS-injected
					Use two stacks.
					One, whose top is stored
					as usual in `%rsp`, the *AG-Stack*.
					On `%r15`, the top of a second
					stack is kept.
					The AG-Stack is used for storing
					sensitive information (*return addresses*)
					and other data pushed by the kernel after
					a syscall or a signal handled.
					The other, unsafe, stack is used
					for any general program data.
					
					
					Since the AG-Stack never contains
					program data (parameters or vars),
					there won't be code referencing it.
					Its location is randomized too,
					and never leave `%rsp`.
					
					
					## Runtime
					Return addresses are stored in the AG-Stack.
					Code locators generated by *GetPC*
					or *GetRet* set of instructions are
					**encrypted** instead.
					
					
					# Encryption of code locators.
					
					
					When a code locator is hard to keep
					isolate in memory, but instead
					it's used in unsafe memory, it is
					*somehow* encrypted.
					
					
					This way, even if the attacker
					succeed to read unsafe memory, it
					will only read the encrypted version
					of the code locator.
					
					
					## How.
					A table is used: *AG-RandMap*.
					Each entry is a 16byte chunk,
					consisting of
					`[ code locator ] [ ... 0 ... ] [ nonce ]`
					
					
					When a code locator needs to be encrypted,
					a random nonce with 32bits of entropy
					is generated.
					The code locator and `b'0'*4` are prepended
					to the nonce and inserted inside the AG-RandMap
					table with an offset generated on the fly,
					with 32bits of entropy too.
					The encrypted code locator returned is
					an 8bytes chunk consisting of
					`[ random offset ] [ nonce ]`
					
					
					Whenever an encrypted code locator
					is used, assuming it's stored
					in `%rax` and the base address
					of the AGRand-Map is in `%gs`,
					it can be decrypted via
						...
						xor  %gs:8(%eax), %rax
						call %gs:(%rax)
						...
					At the end of the decryption "*routine*",
					`%rax` will contain the correct offset
					to fetch the right code locator
					only if the nonce was the same generated
					during the encryption routine.
					
					
					The rest of code locators are stored
					in plaintext in an isolated data structure
					called the *safe vault*, that's guaranteed
					to remain isolated by randomizing it's
					base address and never saving it in memory
					but handling any kind of reference to it
					in registers only.
					
					
					## How is this accomplished?
					Via a static toolchain and a dynamic loader.
					That means, for binaries to be hardened
					by that technique, the source code of the program
					as of all the loaded modules must be
					available.
					
					
					### Compiler
					+ Reserve the `%r15` register
					for the regular/unsafe stack.
					+ Prefer `mov` instructions
					to `push/pop`, `enter/leave`
					-- to avoid `%rsp` modification.
					
					
					### Assembler
					+ Append the encryption routine
					right after a code locator is
					generated by one (or a set of)
					instruction(s).
					+ Prepend the decryption routine
					when dereferencing encrypted
					code locators.
					
					
					### Static linker
					+ Strip encryption/decryption
					routines when addresses are
					generated to access data.
					Indeed, the assembler appends the encryption
					routine in a conservative way. Now,
					the linker knows what's supposed to be
					code and what data; thus it can strip
					the routines not needed.
					
					
					### Dynamic loader
					+ Inizialize the stack(s!), allocate
					space for the random mapping table,
					isolate it (i.e. randomize its base
					address, store it in the `%gs` segment register).
					+ Encrypt all code locators generated
					at load-time, hooking the relocation routine.
					
					
					
					From a theoritical point of view,
					if the target is a binary compiled
					with the ASLR-Guard toolchain, and
					all the loaded modules are as well,
					what is the chance of success for
					an attacker to hijack the control
					flow to an address `x`?
					She could either rewrite the
					content of the safe valut -- but
					she needs to locate it first, with
					a chance of `2**-28`,
					or she could rewrite an encrypted
					code locator -- assuming at least
					one entry for `x` exists in the random
					mapping table; yet, she needs to find
					the correct nonce, with a chance
					of success `<= 2**-32`.
					
					
					In both cases, the chance of success
					is `<= 2**-28`.
					That means, an ASLR-Guard instrumented
					binary should provide at least the same
					security *plain* ASLR provides.
					
					
					Empirically, the memory analysis tool
					is used one more time, hooking
					program's entry/exit point and
					right after every syscall. The whole
					memory is dumped there.
					The entire software suite of the
					SPEC benchmark 2006 is used, with
					the following results
					+ No single *plaintext* code locator
					is left in unsafe memory.
					+ Encrypted code locators are less than `10%`
					for most programs; for many of them `~20`.
					
					
					### `nginx v1.4.0`
					Since a spawned nginx worker, if it crashes,
					won't cause the entire server to crash, via
					exploiting a buffer overflow vulnerability,
					the return address could be repeatdly
					rewritten, until the correct one is found,
					hence obtaining a code locator *after*
					ASLRandomization.
					`BROP` was a tool that automatically
					exploited `nginx v1.4.0`.
					Via rebuilding `nginx` using
					the ASLR-Guard toolchain, `BROP`
					fails to exploit nginx. Indeed,
					the return address isn't even present
					on the stack BROP is reading!
					
					
					### Performance
					Taking the average of 10executions
					for the software used by the SPEC
					benchmark, an overhead of less than `1%`
					has been registered as far as time
					is concerned.
					Building the software takes longer too,
					with an overhead of `~6%`. While loading
					is still very fast, `~1μs`.
					File size grows by `6%` on average,
					while memory size is `~2MB` larger
					as for the structures kept
					in memory that are not loaded
					for the *not* hardened binary.