Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation mechanism for user-supplied kernel object pointers #2025

Closed
nashif opened this issue May 24, 2017 · 23 comments
Closed

Validation mechanism for user-supplied kernel object pointers #2025

nashif opened this issue May 24, 2017 · 23 comments

Comments

@nashif
Copy link

nashif commented May 24, 2017

Reported by Andrew Boie:

Since we are using the same kernel APIs for userspace, we are going to need to validate any kernel object pointers passed in via system calls. When userspace passes in a pointer to a kernel object, we need to enforce that pointer is valid, lives in a region of memory controlled by the kernel, and actually corresponds to the requested kernel object type.

Brief summary, from an email thread on the subject:
{quote}
We want to have the same kernel APIs with and without memory protection. This will mean that kernel APIs called from userland will still take pointers to kernel objects as arguments even though the kernel objects live in memory private to the kernel.
When userspace passes in a kernel object pointer, we will need to verify that it indeed points to a kernel object of the expected type.

So we have 2 cases of pointers from userspace: pointers to buffers, and pointers to kernel objects. I think we need safe_memcpy for the latter case.

The proposed method (credit to Inaky) IIRC is as follows. Any given kernel object (let's use struct k_sem as an example), with memory protection turned on, will always have as its first member a struct:

struct kernel_object {
uint32_t encrypted_ptr;
... other metadata as needed
};

struct k_sem {
#ifdef CONFIG_MEMORY_PROTECTION
struct kernel_object ko;
#endif
... regular struct k_sem members
};

At boot for every kernel object type, the kernel will randomly generate some XOR keys, stored in memory only visible to the kernel. When a kernel object is created, the pointer value of that object is XOR'd with that key and the encrypted value stored in encrypted_ptr.
When userspace passes the kernel an object with address A:

  1. Validate that the pointer can be safely dereferenced by ensuring that it falls within the RAM range reserved for kernel objects.
  2. Knowing that A points to 4 bytes of memory that we can read, the value of A XOR'd with the encryption key for k_sem objects should equal ((struct kernel_object *)A)->encrypted_ptr. This will prove that A is a valid instance of struct k_sem.
    {quote}

(Imported from Jira ZEP-2187)

@nashif
Copy link
Author

nashif commented May 25, 2017

by Andrew Boie:

I started to look into an implementation of this, this could even be used by itself outside of memory protection if there is suspicion that garbage or uninitialized pointers are being passed to kernel objects.

I've found a problem though. This mechanism depends on run-time initialization of kernel objects, since any kernel object will need its encrypted pointer value stored. For example:

enum k_objects {
	K_OBJ_THREAD,
	K_OBJ_MUTEX,
	K_OBJ_SEM,
	K_OBJ_ALERT,
	K_OBJ_MSGQ,
	K_OBJ_MBOX,
	K_OBJ_PIPE,
	K_OBJ_QUEUE,
	K_OBJ_LIFO,
	K_OBJ_STACK,
	K_OBJ_MEM_SLAB,
	K_OBJ_MEM_POOL,
	K_OBJ_TIMER,
	K_OBJ_POLL_EVENT,
	K_OBJ_POLL_SIGNAL,

	K_OBJ_LAST
};

struct k_object {
	u32_t enc_ptr;
};

extern void _k_object_validate(void *obj, enum k_objects otype);
extern void _k_object_init(void *obj, enum k_objects otype);

Any kernel object will have struct k_object as first member so you can just do a cast:

struct k_sem {
	struct k_object obj;
	_wait_q_t wait_q;
	unsigned int count;
	unsigned int limit;
	_POLL_EVENT;

	_OBJECT_TRACING_NEXT_PTR(k_sem);
};

This is fine for runtime initialization, you just stick a _k_object_init() call in the init function.
However, for almost all kernel objects we also have static initializer macros:

#define K_SEM_INITIALIZER(obj, initial_count, count_limit) \
	{ \
	.wait_q = SYS_DLIST_STATIC_INIT(&obj.wait_q), \
	.count = initial_count, \
	.limit = count_limit, \
	_POLL_EVENT_OBJ_INIT \
	_OBJECT_TRACING_INIT \
	}

It doesn't work for this case, k_sem_init() is never called.
In addition, these initializers can be embedded within other kernel objects, for an example a k_alert has a k_sem inside it:

#define K_ALERT_INITIALIZER(obj, alert_handler, max_num_pending_alerts) \
	{ \
	.handler = (k_alert_handler_t)alert_handler, \
	.send_count = ATOMIC_INIT(0), \
	.work_item = K_WORK_INITIALIZER(_alert_deliver), \
	.sem = K_SEM_INITIALIZER(obj.sem, 0, max_num_pending_alerts), \
	_OBJECT_TRACING_INIT \
	}

I need to figure out some kind of preprocessor voodoo such that all objects of a particular type initialized this way, even ones embedded within other objects, will get their pointer values stuck in a special section that can be iterated over at boot to set the encrypted pointer value.

this would be a lot easier to do if K_INITIALIZER macros were private to the kernel and not public. If we can do that, and enforce that only K_DEFINE() macros are public, then this is easy, the data structures already get put in a special section that can be iterated over.

@nashif
Copy link
Author

nashif commented May 25, 2017

by Inaky Perez-Gonzalez:

Hmm, good point

This might be a security issue if we do it at link time, as if we know the location and the crypted key we could guess the cookie.

We could, however, at link time, record in a section the list of pointers that have to be crypted and with which type cookie and at kernel initialization time, go over that table, generate the cookies and then free up the table.

Would this work?

@nashif
Copy link
Author

nashif commented May 25, 2017

by Andrew Boie:

{quote}
This might be a security issue if we do it at link time, as if we know the location and the crypted key we could guess the cookie.
{quote}
I was pursuing an approach where these keys were randomly generated at boot.
I guess we could look into generating them at build time, but you could unfortunately just read them out of flash.

{quote}
We could, however, at link time, record in a section the list of pointers that have to be crypted and with which type cookie and at kernel initialization time, go over that table, generate the cookies and then free up the table.
{quote}

I'm planning on doing that. The trick is that it's easy if people use K_DEFINE(). I already have some code which does this.
I think it's impossible to know about those pointers if people use K
INITIALIZER() for embedded kernel objects.
I talked to Benjamin Walsh and it was not intentional for K
**_INITIALIZER() to be public. So I'm going to deprecate them and make private APIs.

@nashif
Copy link
Author

nashif commented May 25, 2017

by Inaky Perez-Gonzalez:

Oh yes, I agree, they have to be randomly generated at boot--sorry I made it confusing.

Unless there is a need to keep K_*_INITIALIZER() around, seems a very sensible solution to me.

@nashif
Copy link
Author

nashif commented Jun 19, 2017

by David Brown:

I think I'm ok with using this XOR'd marker to validate objects, but we really shouldn't use the terms "encrypted". This isn't an encryption in any sense. Perhaps "token" or something as just a validator of the object.

@nashif
Copy link
Author

nashif commented Jun 19, 2017

by David Brown:

Also, 32-bits isn't all that big and may not offer all that much protection against rogue pointers passed in.

@nashif
Copy link
Author

nashif commented Jun 19, 2017

by Andrew Boie:

{quote}Perhaps "token" or something as just a validator of the object.{quote}

Fine with me.

{quote}Also, 32-bits isn't all that big and may not offer all that much protection against rogue pointers passed in.{quote}

You lost me here. A bad pointer, to pass this check, would have to have the first 4 bytes of memory that it points to XOR'd with the validator and the bad pointer value be the result. Considering that these bad pointers will also be bounds checked to only be within the kernel's memory area and the odds of this happening by accident seem astronomical to me.

@nashif
Copy link
Author

nashif commented Jun 20, 2017

by David Brown:

bq. You lost me here. A bad pointer, to pass this check, would have to have the first 4 bytes of memory that it points to XOR'd with the validator and the bad pointer value be the result. Considering that these bad pointers will also be bounds checked to only be within the kernel's memory area and the odds of this happening by accident seem astronomical to me.

As a way of detecting bugs and such, 4 bytes would be perfectly adequate. I wouldn't consider this adequate for security, since that is a reasonable number for a malicious agent to retry.

@nashif
Copy link
Author

nashif commented Jun 20, 2017

by Andrew Boie:

David Brown that's reasonable, thanks

@nashif
Copy link
Author

nashif commented Jun 20, 2017

by Inaky Perez-Gonzalez:

While 4G tries on an MCU might take quite a while, the risk is there. However, adding that:

  • there will be also a type check (the object has to be the right type for the call)--so there is a secondary random key to guess here for the type check
  • the crypto/hashing/salting is done with process-specific keys

what is your concern here?

Is the concern that process A being able to find process' B pointer? It'd have to try a max of 2^64 combinations (assuming byte alignment) and then multiply that by the combinations of type key (so another 2^32). I think hitting one out of 2^96 will impose a good protection in there (of course, this is assuming proper RNG).

As well, as Andrew mentioned, there will be bounds checks, so we know we are not being directed to a hyperspace location for DoS.

What other concerns do you have? I'm curious to make sure we close all the possible holes or have rationales for them.

@nashif
Copy link
Author

nashif commented Jun 20, 2017

by David Brown:

I think my main concern is that we're effectively trying to do to much, and will end up moving ourselves away from a microcontroller OS. I think in light of both a debugging use, as well as a protection against some malicious uses, the 32-bit value will be fine. It would not be adequate against a platform where untrusted users are able to run arbitrary code. But, in our environment, the code is generally controlled, and this will reasonably protect these structures.

At least as long as we don't try to call it encryption.

@nashif
Copy link
Author

nashif commented Jun 21, 2017

by Andrew Boie:

{quote}I think my main concern is that we're effectively trying to do to much, and will end up moving ourselves away from a microcontroller OS{quote}

David Brown can you please elaborate here?
The end-state goal we are pursuing is to augment Zephyr with optional functionality in the model of FreeRTOS-MPU, without significantly changing the kernel APIs or getting in the way of users that don't want/need thread protection.
We want to introduce unprivileged threads that can't clobber other threads' stacks or corrupt the kernel itself.
We want this to be something that can be turned on or off, not a required part of the kernel.
Do you believe FreeRTOS-MPU is doing too much? How are we moving ourselves away from being a microcontroller OS?
If you think what FreeRTOS-MPU is doing is a bad idea I don't think there's much opportunity for consensus here but I would at least like to know what specifically in the plans that have been made are concerning to you..

@nashif
Copy link
Author

nashif commented Jun 21, 2017

by David Brown:

I think looking at FreeRTOS-MPU is useful. I just fear we are implementing a bunch of stuff, and I'm not sure we know what our threat model even is. It seems to be easy to say "We have an MPU/MMU, what can we do with it", rather than figuring out what our use cases are, what threads we have, prioritize those threats. This is important as we have significantly limited resources here, and Zephyr wants to work across a fairly diverse range of device capabilities (from something with 16K of RAM, and 8 MPU slots, to things much more capable).

Having a kernel/user-space separation is a fairly drastic step, doesn't fit particularly well with the current APIs, and as I understand, early in Zephyr's life was an explicit decision to not support.

For example, it is perfectly reasonable to support an MPU/MMU with everything running in privileged mode. If our threats are against accidental errors in code, and things like errors resulting in rogue reads that could leak data, we can still protect against a lot, even though truly malicious code could just reprogram the MPU/MMU, a lot won't.

I'm not saying we shouldn't separate threads, or any of the things discussed, but we need to understand our use cases, our what threats we have, and weigh them against the costs of implementing these features.

@nashif
Copy link
Author

nashif commented Jun 21, 2017

by Andrew Boie:

David Brown , unless I'm greatly misunderstanding something, nothing planned here will prevent Zephyr from continuing to run on the lowest-end devices. They may not be able to use all the new features they are bringing in but we are not saying good-bye to any class of device with this effort. I would also contend that you are underestimating how much we have considered our needs for this feature, maybe you weren't part of the discussion or didn't see the mailing list threads, I don't know.

The threat we (Intel) are primarily interested in is to make it easier for developers to write complex multi-threaded applications and be able to debug them more effectively. Security for untrusted code on the system is a secondary goal, which we would like to also support well, but right now the outside requests we are getting are all to have thread protection, with FreeRTOS-MPU's design specifically called out as an example.

As I discussed on the mailing list, this effort has several layers which can be supported on devices depending on their capabilities, in increasing order of complexity:

  1. Boot-time configuration of memory regions. We have this now on ARM and x86. Produce a CPU exception if we read/write pointers that don't map to real memory, try to execute RAM that doesn't have code, etc etc. The memory policy configuration is fixed at boot and doesn't change.

  2. Simple stack protection. Throw an exception if a thread exceeds its stack space. Massively useful for debugging, normally when stack overflows you get all kinds of weird and unpredictable behavior. Requires some simple runtime reconfiguration of memory policy on context switch.

  3. Thread protection. Introduce user vs supervisor threads. User threads can't touch any stack but theirs, and can't touch kernel memory. System calls to do privilege elevation. Reconfiguration of memory regions on context switch. For debugging complex multi-threaded applications that would otherwise might stomp on each other's memory or crash the kernel itself.

  4. "Secure" thread protection. Run untrusted code sandboxed in a user thread and have full confidence that it will not be able to attack the rest of the system. The technique described in this JIRA (Validation mechanism for user-supplied kernel object pointers #2025) would not be appropriate for this use-case, but we can drop in an alternative implementation (that might be slower or involve indirection handles) to support it.

  5. Full virtual memory. Use MMU to implement processes with their own virtual memory. We don't have plans to do this at this time.

Right now the focus on our side for this effort is on No. 3. No. 4 is desirable as an iterative refinement.

{quote}doesn't fit particularly well with the current APIs{quote}
Please, be specific. For example, I sent a mail earlier this week discussing ISR callbacks (I don't think we really have a problem here). Can you at least reply to that?
Other than defining some allocators for kernel objects the general feeling we have is that we can leave the existing kernel APIs alone, this is not going to be a re-write and users who are uninterested in this feature should not be impacted.

{quote}
and as I understand, early in Zephyr's life was an explicit decision to not support.
{quote}
Zephyr (in its previous branding as Viper OS) had full virtualized userspace for a long time. If you look at the very first commit in the git history I believe the MMU enabling code is there. It was removed because full MMU virtualization was felt to be too heavyweight. However we have a lot of people clamoring for some method of not having threads stomp on each other.

@nashif
Copy link
Author

nashif commented Jun 22, 2017

by Anas Nashif:

@nashif
Copy link
Author

nashif commented Jun 22, 2017

by Anas Nashif:

@nashif
Copy link
Author

nashif commented Jun 22, 2017

by Anas Nashif:

David Brown Not sure you were present when Alex presented the outcome of the kernel dive-in we had. The outcome was captured in this slide:

!screenshot-1.png|thumbnail!

from this original slide set.

[^2017-01-12_1_Zephyr_Kernel_assets.pptx]

Thread isolation is the bare minimum we need to be able to call a system secure. I know Ruud Derwig was present in this discussion at least.
So I mostly agree with Andrew Boie here, I still have an issue calling this a debugging feature, given the amount of work and resources we are investing making this happen ;-)

@nashif
Copy link
Author

nashif commented Jun 27, 2017

by Andrew Boie:

I'm still debugging and enabling validation for various kernel objects, but a sneak preview of this work can be seen here:

https://github.com/andrewboie/zephyr/tree/kobject

Certain network stack data structures have kernel objects embedded within them, I need to figure out the best way to deal with those.

@nashif
Copy link
Author

nashif commented Jul 18, 2017

by Andrew Boie:

zephyrproject-rtos/zephyr#834

@nashif
Copy link
Author

nashif commented Jul 25, 2017

by Andrew Boie:

Taking a different approach than the XOR method. I have an RFC drafted and will present to the memory protection working group tomorrow morning, and a revision after that discussion will be posted here.

@nashif
Copy link
Author

nashif commented Aug 1, 2017

by Andrew Boie:

Problem statement:

There should be no way for userspace threads, either through mistakes or
malice, to corrupt the Zephyr kernel itself via misuse of kernel objects,
such that the consequences of this misuse extend past the threads directly
involved in the use of that kernel object.

Terminology:

When we speak of "kernel objects" here, this includes all the typical
kernel objects defined in include/kernel.h that require a privilege elevation
to interact with them, plus all driver 'struct device *' instances fetched
either directly or via device_get_binding(). The validation of drivers
will be done at the subsystem level, we want to touch the actual drivers
themselves as little as possible.

Design goals:

We would like to have the same APIs for both userspace and kernel space,
which would mean that userspace will be passing pointers to the kernel
objects as part of making kernel APIs calls.

  • Some users of thread protection may want to only turn it on during
    development phase and shut it off for production without changing their
    code.
  • Maintaining a parallel set of userspace APIs that work in terms of
    descriptors instead of kernel object pointers is not something we want
    to do at this time
  • We need a solution that is both secure and has completely predictable
    overhead, we are going to want to be able to sandbox interpreters or
    other untrusted code into a userspace thread and have confidence that
    it won't be able to crash, corrupt, or spy on anything but itself.

The scope of this proposal is to show how we intend, once we do the privilege
elevation from user to supervisor mode, that the kernel object pointers
provided in an API call are:

  • Pointing to valid memory for a kernel object
  • That this memory corresponds to a kernel object of the expected type,
    or a driver instance of the expected driver subsystem.
  • That the object has been properly initialized
  • That the accessing thread has permission to use this kernel object

We are NOT trying to prevent against malfeasance in supervisor mode,
by definition you cannot, although these validation features can be helpful
to catch bugs for supervisor mode threads.

We are only trying to limit the scope of kernel object misuse, not catch all
potential issues. If user thread A holds a semaphore and then crashes,
user thread B waiting on that semaphore will still wait forever. But the
effect is limited to just thread A and B, the kernel does not explode, other
threads not involved with that semaphore are unaffected.

TLDR proposal:

At build time, parse the ELF binary to find all the kernel objects declared
in the system and build up a list of their memory addresses. Use this list to
construct a build-time perfect hashtable, such that if we want to test at
runtime whether a particular memory address is indeed a kernel object, this
hash table will confirm or deny this in O(1) time. The hash table will map to a
compact array of object metadata which will, for each entry, indicate the
thread permissions, type, and initialization state of the object address that
maps to it.

The constraint of this approach is that at minimum for objects referenced by
user threads, all these objects need to be declared at build-time, it will not
be allowed to dynamically create objects on a kernel stack or heap. Augmented
APIs for kernel object pools will be provided for dynamic object use-cases.

For objects referenced by supervisor threads, heap allocation can be allowed
if either the validation mechanism is shut off for supervisor threads, or
we create a supplemental runtime hash table for these dynamic objects with the
understanding that lookup performance may not be O(1). Neither of these will
be enabled by default, and users should know exactly what they are doing as
many of the protections given by this mechanism are defeated by these.

Permission tracking of kernel objects will be done with bitfields, imposing
a maximum number of threads in the system, tunable via Kconfig. Permission
bits are only enforced by user threads.

Implementation Details

Creating the perfect hash table

We need to know where all the kernel objects are. These come in several flavors:

  • Allocated in bss and initialized at runtime with an init function
  • Allocated in data and initialized at build time with macros like
    K_SEM_DECLARE();
  • Embedded within a larger data structure, for example many driver runtime
    data structures have kernel objects embedded deeply within them.

We can find all of these using the DWARF debugging data inside the ELF binaries
created by the build. Using the Python elftools library, we can scan the entire
kernel, find all instances of kernel objects, and build up a list of their
memory addresses, type, and whether they have been statically initialized, and
do some sanity checking to make sure that these are, for instance, actual
kernel objects and not from some alternate definition with the same name.

This information will then be used to create a perfect hashtable:

https://en.wikipedia.org/wiki/Perfect_hash_function

Creating this table will be easy, we can just use GNU gperf to do it:

https://www.gnu.org/software/gperf/manual/gperf.html

This emits some C code which we can build and link into the kernel. Our
build system already has a notion of a 2-pass build, we create a
'zephyr_prebuilt.elf' first and feed that to various build-time tools
which create data structures that end up in the final binary. We do this
for creating the MMU page tables, IDT, and GDT on x86, and also to create
the IRQ vector tables for other architectures. Care must be taken that when
the kernel is re-linked with the code/data from the gperf output, that this
does not shift the location of any previsouly existing code/data in the system.

Object Metadata and Permissions

The hash function created by gperf will only tell us that a particular memory
address is valid for a kernel object, but says nothing about that object's
type, permissions, or initialization state. We are initially going to create
an array of:

struct k_object __packed {
char perms[CONFIG_NUM_THREAD_BITS]; /* Default 2, max of 16 threads /
char type; /
Some value in an enumeration of all kernel objects
* and driver subsystems /
char flags; /
Bit 0 indicates initialization state */
};

We will need to implement this array such that given the address of a kernel
object, we can fetch the index in this array in constant time using the
hash function provided by gperf.

With a default of max 16 threads, this whole thing will fit in a u32_t. There
will be a 1:1 mapping between instances of struct k_object and all the kernel
objects instantiated at build time in the system. This data structure will
be created at build time using a post-build step just like the gperf code/data.

The permissions field will be initially zero. This is a bitfield; when threads
are created they will be assigned an index in this bitfield. Threads running
in supervisor mode can access any kernel object they want.. Threads running in
user mode will be restricted:

  • If a user thread initializes a kernel object that was previously
    uninitialized, it will be granted permission for that thread.
  • A user thread that has permission on a kernel object may grant
    permission to other threads for that object through an API call.
  • A supervisor thread can always grant permission to an object to other
    threads via API call, or grant permission to itself before it does the
    one-way operation to drop permissions down to user mode.

The API call to grant permission would be something like:

k_object_grant_access(void *object, struct k_thread *thread);

Validation will be done to ensure that the 'object' parameter actually points
to a valid instance of a kernel object, and that if the caller is a user thread,
that the caller already had permission to maniuplate it.

System call code flow

Note that in the below caae, "fail" will typically mean "throw an exception
which kills the calling thread" although there might be some scenarios where
we set errno and return, case-by-case basis.

It's also worth noting that not all kernel APIs will be exposed to userspace
in the system call layer. For example, APIs which register callbacks that
run in interrupt or supervisor thread context will be supervisor-only.

The code flow, for a userspace thread making an API call, would look roughly
as follows:

user code:
...
k_sem_give(&my_sem);
...

What happens next is performed by the system call stubs, which are going to be
auto-generated via preprocessor/linker/post-build black magic:

  1. The &my_sem parameter is marshaled through the system call interface
    and a privilege elevation is done via arch-specific mechanism to get
    execution context in kernel mode at a designated entry point, typically
    via software interrupt or special instruction like SVC, SYSENTER, etc.

  2. The &my_sem is looked up in the hash table. A negative result indicates
    that this isn't a kernel object and we fail.

  3. We get a pointer to the index in the k_object metadata array. We check
    that the object type matches the type of system call we were making,
    if &my_sem actually pointed to a struct k_work (for example) and not
    a struct k_sem, we fail.

  4. We check the initialization bit in the metadata. Unless this was an init
    call, and the bit is unset, we fail.

  5. We get the ID of the calling thread and check that the corresponding bit
    in the perms field is lit up.

  6. If the caller provides additional buffers for data exchange (not the
    case for k_sem, but definitely for other objects and driver APIs),
    walk the page tables or MPU configuration to ensure that these buffers
    live in RAM that the calling thread has access to.

For kernel threads making API calls, much or all of this can be skipped,
certainly steps 1, 5 ,and 6. We are never going to try to prevent malfeasance
by a thread running in supervisor mode. However it can be useful to catch bugs,
so we can optionally allow some validation for APIs called from supervisor
context and perform steps 2, 3, and 4.

Constraints

Constraints imposed by any protection implementation

Regardless of whether this or some other method is used, it is a hard
requirment that user threads may not read/write kernel objects directly,
all they can do is pass their address to system calls. Kernel objects
contain data that is private to the kernel, and if corrupted could potentially
take the entire system down. They must only be manipulated in a very controlled
manner.

This means that no matter what, putting kernel objects onto a user thread's
stack is forbidden, as that is outside the kernel's memory space. If
kernel objects are declared toplevel, and the CONFIG_APPLICATION_MEMORY
option is enabled (which grants read/write access to all toplevel globals
defined in application object code), either the __kernel decorator must be used
or macros like K_SEM_DEFINE() which ensure the object ends up in the right
place in RAM.

User-allocated data structures that don't live in kernel memory will not be
able to contain kernel objects directly embedded within them, they will need to
store pointers to such objects instead. Complex data structures that live
in kernel memory and only manipulated by supervisor contexts (such as drivers)
are allowed to have embedded kernel objects, the build-time DWARF parsing
will be able to find them.

User threads will not be able to allocate kernel objects at runtime on
generic kernel memory pools. free() carries no context about the particular
type of memory being freed, and its possible to defeat the validation mechanism
by doing a system call on a kernel object pointer that was initialized at
some point but later freed. If a supervisor thread allocates an object onto
a kernel heap and then grants permission to a user thread to access it, that
should be considered a security bug.

Additional constraints imposed by this proposal

With this proposal, and without making some potential exceptions (described
below), the set of all kernel objects must be known at build time.
They can't be allocated on a stack or heap even if that memory is within
the kernel's RAM.

To compensate for this, the plan is to augment the existing
k_mem_slab APIs to make it very simple for users to reserve pools of kernel
objects for dynamic use-cases. Object pools are much easier to deal with,
the memory is reserved ahead of time, the DWARF parsing can locate all the
kernel objects within them at build time, and this memory could never get
overwritten with random data by a user thread.

If, for whatever reason, an application just has to be able to put objects
on a kernel heap, there are some options:

  • Disable all validation checks for kernel APIs when the caller is in
    supervisor mode, we won't care if the objects is in the hashtable or not.
  • Create an additional run-time hash table to register kernel objects
    defined in this way, with the caveat that lookup time could be as bad
    as O(n) where n is the number of dynamic kernel objects. Sharing
    these objects with user thread would still be considered a security bug.

Both of these are perilous and should only be enabled as a last resort.

Lifecycle management:

For the initial implementation, very little lifecycle management is planned,
but more advanced stuff could be a later enhancment:

  • With all objects statically allocated, their memory is never going to go
    away or be repurposed, there's nothing a user thread could do to make
    RAM reserved for a k_sem become garbage.

  • k_thread objects do get recycled, but every k_thread_create() call will
    default to generate a new thread ID and not re-use an old one. Otherwise,
    permission bits granted to the previous lifetime of the thread object
    would be inherited and this needs to be a clean slate every time. We
    could consider re-using unused thread IDs, but this would require
    walking the entire metadata array and clearing bits in the perms field;
    probably make this a Kconfig option.

  • Thread objects contained within a slab pool, that are released back to the
    pool, would have their permissions zeroed, and initialization bits
    cleared in their corresponding struct k_object; only the type information
    would remain. For objects where initialization requires no parameters,
    instead of clearing the initialization bit, the object is instead restored
    to some initial state.

Other Comments

For the initial implementation, we are trying to change kernel APIs as little
as possible, and even trivial access to kernel objects will require privilege
elevation.

In the future, we may try to see if some objects could be implemented without
system calls. It would be desirable, for instance, to see if we can implement
a k_mutex done in userspace. Once the initial implementation of thread
protection is out, there will be a great deal of interest in optimization
and reducing the need for privilege elevation wherever possible.

@nashif
Copy link
Author

nashif commented Aug 24, 2017

by Andrew Boie:

I have the DWARF parsing working, and can generate the hashtable with gperf.

Now for the fun part: ensuring that including the text/data generated by gperf does not shift the memory addresses that we are hashing.

@nashif
Copy link
Author

nashif commented Aug 28, 2017

by Andrew Boie:

This is now available for code review:

zephyrproject-rtos/zephyr#1276

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant