Researchers dubbed it a “straightforward Linux kernel locking bug” that they exploited against Debian Buster’s 18.104.22.168-amd64 kernel.
In 2017, MacAfee researchers disclosed a memory corruption bug inside the Linux kernel’s UDP fragmentation offload (UFO) that allowed unauthorized individuals to gain local privilege escalation. The bug affected both IPv4 and IPv6 code paths running kernel version 4.8.0 of Ubuntu xenial and was fixed in Commit 85f1bd9.
Now, Google’s Project Zero team has shared details of a similar yet much simpler bug that can cause complete system compromise. Researchers dubbed it a “straightforward Linux kernel locking bug” that they exploited against Debian Buster’s 22.214.171.124-amd64 kernel.
About the Bug
According to the Project Zero blog post, the bug was located in the ioctl handler tiocspgrp. The pgrp member of the terminal side (real_tty) was modified to exploit it while the old and new process groups’ reference count was adjusted accordingly using put_pid and get_pid.
The lock is taken on tty, which depending on the file descriptor that the researchers passed to ioctl(), can be any end of the pseudoterminal pair. So, they called the TIOCSPGRP ioctl on both sides of the pseudoterminal so that data races between concurrent accesses to the pgrp member, causing reference counts to become skewed through several races.
Jann Horn of Google’s Project Zero identified that the refcount of the old struct pid showed decrement by 1 too much in both cases while A’s or B’s were incremented by 1 too much.
A proof of concept is also released by the team and is available here.
How it Attacks
Research further revealed that the memory corruption bug allows an attacker to skew the refcount of a struct pid down whichever way the race happens. Researchers revealed that they could run colliding TIOCSPGRP calls from two threads repeatedly, which messed up the refcount often. However, they couldn’t determine the number of times the refcount skew actually occurred.
Moreover, the SLUB allocator was replacing the first 8 bytes when the object was freed with an XOR-obfuscated freelist pointer. Hence, the count and level fields were rendered useless.
“This means that the load from pid->numbers[pid -> level] will now be at some random offset from the pid, in the range from zero to 64 GiB. As long as the machine doesn’t have tons of RAM, this will likely cause a kernel segmentation fault,” Horn wrong in a blog post.
Hence, a somewhat more straightforward way to exploit a dangling reference to a SLUB object is reallocating the object via the same kmem_cache it came from and preventing the page from reaching the page allocator.
Another way to exploit the UAF at the SLUB allocator level is flushing the page out to the page allocator aka buddy allocator. This is the last level of dynamic memory allocation on the Linux system as from there, the page can end up in any context.
“At the point where the victim page has reached the page allocator’s freelist, it’s essentially game over – at this point, the page can be reused as anything in the system, giving us a broad range of options for exploitation. In my opinion, most defenses that act after we’ve reached this point are fairly unreliable,” the blog post read.
Page tables are a type of allocation directly served from the page allocator, and their ability to modify a page can be abused by enabling the read/write bit in a page table entry that maps a file page that is supposed to offer read access. This abuse can lead to gaining write access to a portion of a setuid binary’s .text segment and rewritten with malicious code.
Although it is hard to determine the victim page’s offset in which the victim object is located, a page table comprises an array of size 8, 8-byte-aligned elements. The victim object is a multiple of that, so unless all array elements are sprayed, the attacker won’t need to know the object’s offset.
“Struct pid has the same alignment as a PTE, and it starts with a 32-bit refcount so that refcount is guaranteed to overlap the first half of a PTE, which is 64-bit. Therefore we can increment one of the PTEs by repeatedly triggering get_pid(), which tries to increment the refcount of the freed object. If the kernel notices the Dirty bit, later on, that might trigger writeback, which could crash the kernel if the mapping isn’t set up for writing.”