An abridged history of Linux kernel security


Russell Currey


Everything Open 2023

Introductions!

  • I’m a Linux kernel developer
  • I lead the Linux on Power hardening team at IBM
  • I’m based in Canberra, Australia
  • I’ve been awake since 4am

A few ground rules…

Focused on technology

I’m going to talk primarily about what went into the kernel

Focused on upstream

A lot of kernel security features originated outside of the upstream kernel in projects like grsecurity, but my focus will be on upstream mainline Linux

Focused on security of the kernel itself

We’re talking about the kernel protecting itself against attacks, rather than security features the kernel exposes to userspace

specifically, kernel hardening / kernel self protection

A journey through time!

  • the before times
  • the formative years
  • ramping up
  • the modern era

The before times

  • bugs reported and fixed, eventually
  • security not at the forefront of developer’s minds
  • extremely primitive tooling

“First the problem [with] Linux is that there are too many people ‘hacking’ the code. It has reached a complexity where the ‘I-hack-quickly-some-code’ approach doesn’t work anymore.” - Paul Starzetz

“It’s 12 new CVEs, from just one guy with grep and a few days in his spare time (and some of the CVEs cover multiple vulnerabilities).” - Brad “spender” Spengler

Linux looks pretty secure when your point of comparison is Windows

The need for kernel hardening

  • kernel has enormous scope
  • kernel changes very rapidly
  • kernel does lots of stuff on lots of different devices
  • bugs are inevitable

The formative years: 2007

Linux v2.6.15

I’ve just started grade 9

Everyone’s wearing this shirt

What does read-only mean?

The holy trinity

  • R (read)
  • W (write)
  • X (execute)

ELF sections

Idx Name          Size      VMA               LMA               File off  Algn
  0 .head.text    00008000  c000000000000000  0000000000000000  00010000  2**7
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .text         02794574  c000000000008000  0000000000008000  00018000  2**8
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  2 .rodata       00e4b446  c0000000027a0000  00000000027a0000  027b0000  2**8
                  CONTENTS, ALLOC, LOAD, DATA

CONFIG_DEBUG_RODATA

“Mark the kernel read-only data as write-protected in the pagetables, in order to catch accidental (and incorrect) writes to such const data.” - Arjan van de Ven

kernel init

/*
 * Ok, we have completed the initial bootup, and
 * we're essentially up and running. Get rid of the
 * initmem segments and start the user-mode stuff..
 */
free_initmem();
unlock_kernel();
mark_rodata_ro();
system_state = SYSTEM_RUNNING;

“Debug feature”

  • now known as STRICT_KERNEL_RWX / “kernel memory protections”
  • code (.text) should be R-X
  • data (.data) should be RW-
  • read-only data (.rodata) should be R--

The formative years: 2008

Linux v2.6.24

I become a Linux user

The kernel gets a canary

Stack protections

compiler feature, now used for the kernel itself

canary inserted into the stack

kernel panic if canary changes

return address can’t be modified without defeating the canary

The formative years: 2010

Linux v2.6.33

I’m in my senior year of high school

Kernel modules

  • dynamically loaded
  • separate virtual address space
  • still using RWX permissions everywhere

DEBUG_SET_MODULE_RONX

  • same concepts as DEBUG_RODATA for modules
  • code R-X, data RW-, rodata R--
  • was going to be renamed to HARDENED_MODULE_MAPPINGS before finally becoming STRICT_MODULE_RWX

Ramping up: 2011

Linux v3.0

User memory and kernel memory

  • userspace and kernel share the same virtual memory (usually)
  • lack of isolation makes attacks on the kernel easy
  • logically very different, but no way to enforce the barrier

Override any function pointer and you win!

MMUs are getting smarter

  • only so much can be done just by software
  • software interventions often costly to performance
  • bigger focus on security from silicon vendors

Supervisor Mode Execution Prevention (SMEP)

  • also known as PXN, KUEP and others
  • MMU feature that prevents the kernel from executing user memory
  • attackers can’t trick the kernel into executing user code
    • getting your exploit code inside kernel memory is much harder

Ramping up: 2012

Linux v3.2 to v3.7…

yeah they’re sane now

The world ended

This happened

Kernel accesses to user memory

  • a necessary evil
  • functions like copy_to_user() and copy_from_user()
  • tricking the kernel into accessing user memory is a powerful exploit primitive

Supervisor Mode Access Prevention (SMAP)

  • MMU feature that prevents the kernel from accessing user memory
    • but we need to do that!
  • user accesses enabled at the start of copy_to/from_user() and disabled after
  • much more complex than SMEP. can’t just set and forget

Why do we have copy_{to/from}_user() anyway?

  • Windows doesn’t have SMAP, because they don’t have dedicated accessors
  • without these functions, we’d have to replace thousands of sites
  • thankfully present since Linux v0.0.1!
    • Linus wanted to play with as many i386 features as possible

Ramping up: 2013

Linux v3.8 to v3.12

Linux wins Jeopardy!

Kernel Address Space Layout Randomization (KASLR)

  • randomises the layout of objects in kernel memory
  • attackers can’t rely on the location of kernel symbols

What’s a __ptrval__?

  • kernel must be careful to prevent info leaks
  • kernel big, mistakes easy
  • KASLR-enabled kernels are harder to debug
  • info leak prevention spreads through the kernel

Debatable effectiveness

  • “industry standard” KASLR defeats are pretty quick
  • many different mechanisms to locate the kernel
  • KASLR makes attacks harder, but not substantially
  • only takes one info leak, and they’re annoying to fix

Ramping up: 2015

Linux v3.19 - v4.3

Kernel gets worse

commit affddff69c55eb68969448f35f59054a370bc7c1
Author: Russell Currey <ruscur@russell.cc>
Date:   Fri Nov 27 17:23:07 2015 +1100

    powerpc/powernv: Add a kmsg_dumper that flushes console output on panic

Kernel Address Sanitizer (KASAN)

  • dynamic memory safety error detector
  • finds memory misuse bugs like out-of-bounds and use-after-frees
  • uses shadow memory to record & check if accesses are safe

~70% of security issues are memory misuse bugs

Memory corruption with KASAN:

The buggy address belongs to the object at ffff8801f44ec300
 which belongs to the cache kmalloc-128 of size 128
The buggy address is located 123 bytes inside of
 128-byte region [ffff8801f44ec300, ffff8801f44ec380)
The buggy address belongs to the page:
page:ffffea0007d13b00 count:1 mapcount:0 mapping:ffff8801f7001640 index:0x0
flags: 0x200000000000100(slab)
raw: 0200000000000100 ffffea0007d11dc0 0000001a0000001a ffff8801f7001640
raw: 0000000000000000 0000000080150015 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory corruption without KASAN:

  • best case: you crash, and it’s obvious why
  • bad case: you crash, and it’s not obvious why
  • worst case: you don’t notice

Ramping up: 2016

Linux v4.4 - v4.9

My first LCA!

Virtually mapped kernel stacks

  • reworking the kernel to use virtual memory for its own stacks
  • allows buffer overflows to be detected and reported instead of silent
  • kernel stacks no longer have to be physically contiguous

These don’t have to be next to each other any more!

The modern era

Spectre & Meltdown

see my talk “How can we effectively test transient execution mitigations?” from Linux Security Summit North America 2022 for more

Android

syzkaller

more sanitisers!

  • memory allocation (KASAN)
  • concurrency (KCSAN)
  • memory accesses (KMSAN)
  • undefined behaviour (UBSAN)

Rust

  • can’t (won’t?) rewrite the kernel in it
  • won’t solve all our problems
  • won’t take us to the promised land
  • …but it is good and it will help

thanks for watching history

i hope i didn’t miss anythin g

Questions!