Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zephyr Networking Stack Architectural Updates #7591

Closed
10 of 11 tasks
laperie opened this issue May 16, 2018 · 16 comments
Closed
10 of 11 tasks

Zephyr Networking Stack Architectural Updates #7591

laperie opened this issue May 16, 2018 · 16 comments
Assignees
Labels
area: Sockets Networking sockets Enhancement Changes/Updates/Additions to existing features priority: high High impact/importance bug

Comments

@laperie
Copy link
Collaborator

laperie commented May 16, 2018

This is an umbrella issue, covering all works we currently do on Zephyr networking arch updates:

Port protocols to use sockets

@nashif nashif added the Enhancement Changes/Updates/Additions to existing features label May 17, 2018
@andrewboie
Copy link
Contributor

I suggest we add a line items to implement system calls for the new socket APIs, and convert as many of the examples as we can to run in user mode. Being able to parse outside network data, and do upper level protocols in user mode is something we get asked about a lot.

@lpereira
Copy link
Collaborator

If we're going through the trouble of making it possible for user mode applications to use the network stack, I would go as far as saying that it would be great to have the network stack itself run in user mode. It's a big attack surface that would be better off with lesser privileges.

Another thing that I'd like to maybe see is to restrict which thread has access to the network. I don't know yet how to do this: we have ACLs for kernel objects, but I don't know which object we would use to control this permission.

@andrewboie
Copy link
Contributor

I completely agree with @lpereira. Ideally the kernel part would just be syscalls to the hardware driver such as Ethernet.
I am not familiar with network stack code. What would be impediments to running it all in user mode? Does it use IRQ locking, or kernel objects unavailable to user mode such as k_mem_pool? Interrupt-context callbacks?

Another thing that I'd like to maybe see is to restrict which thread has access to the network. I don't know yet how to do this: we have ACLs for kernel objects, but I don't know which object we would use to control this permission.

Might be as simple as access to the underlying hardware driver.

@lpereira
Copy link
Collaborator

lpereira commented May 17, 2018

It's been quite a while since I took a hard look at the network layer, but if I remember correctly, there shouldn't be a lot of changes to make this happen, at least from the point of view of handling layer 2 stuff (transmitting and receiving frames).

Drivers are all over the place, but a common pattern is to have a fixed chunk of memory (usually the size of an ethernet frame, or 1500 bytes) that the ethernet device will DMA to; the interrupt will then allocate a network packet and copy that buffer. Network packets are comprised of fragments, that are usually 128 bytes long, so there might be some "memory allocation" going on, but all allocation happens with K_NO_WAIT timeouts. Once all fragments were copied to a packet, the interrupt then notifies the network layer that it should process it.

In order to send packets, again, a common pattern is just to DMA the network packet fragments to the ethernet controller. Once it's done transmitting the frame, it will unref the packet.

So we would have to maybe modify some things and invert how things are done: instead of the driver notifying the network layer that there's a new packet that needs processing, the network layer will block on an interface object (maybe on some syscall such as net_iface_receive_frame(struct net_iface *, u8_t *frame, int timeout), and call something like net_iface_transmit_frame(iface, frame, timeout) when it's time to push something through the wire.

In the receive case, we already have a memory copy, so I think this is fine. For the transmission case, we'll introduce a new memory copy, which might be fine.

I don't know however how this would play out with the network layer being exposed as syscalls. Would have to look at the code again; I remember seeing multiple receive queues and traffic classes, which were things that were not there when I used to work in that subsystem, so a lot of things will be unfamiliar and we might have some caveats to this plan.

@pfalcon
Copy link
Contributor

pfalcon commented May 18, 2018

I would go as far as saying that it would be great to have the network stack itself run in user mode. It's a big attack surface that would be better off with lesser privileges.

It's not just attack surface, it's also an attack bastion: for many usecases, integrity of networking data is critical. And letting one user thread fiddle with networking data of another thread (as implied by "running stack in user mode") doesn't seem like exactly a good choice.

@andrewboie
Copy link
Contributor

andrewboie commented May 23, 2018

I talked a bit to @laperie and @nashif today about system calls for the network stack. We discussed where to do the privilege elevation, either at the socket layer or much lower in the system. We agreed that for the initial implementation, doing the syscalls at the socket layer would likely be simplest.

Looking at include/net/socket.h I don't think this would even be particularly time-consuming to implement. The steps required for declaring new syscalls are documented here: http://docs.zephyrproject.org/kernel/usermode/syscalls.html

Here are some key things to think about:

The first is that system calls have to be able to validate all their arguments. The socket file descriptor in this implementation is actually a pointer to a struct net_context, so struct net_context would need to be turned into a kernel object.

Right now, it looks like you have a build-time array of struct net_context for allocating them. Perfect. Since struct net_context is added to the set of kernel objects, and they are all declared at build time as an array managed by net_context.c, then this is dead easy. You just need to apply:

diff --git a/kernel/userspace.c b/kernel/userspace.c
index ab1cc4db9..a739922c8 100644
--- a/kernel/userspace.c
+++ b/kernel/userspace.c
@@ -17,6 +17,7 @@
 #include <device.h>
 #include <init.h>
 #include <logging/sys_log.h>
+#include <net/net_context.h>
 
 #define MAX_THREAD_BITS                (CONFIG_MAX_THREAD_BYTES * 8)
 
diff --git a/scripts/gen_kobject_list.py b/scripts/gen_kobject_list.py
index f14665601..70b868645 100755
--- a/scripts/gen_kobject_list.py
+++ b/scripts/gen_kobject_list.py
@@ -23,6 +23,7 @@ kobjects = [
     "k_thread",
     "k_timer",
     "_k_thread_stack_element",
+    "net_context",
     "device"
 ]
 

And then struct net_context can be validated as kernel objects and the contexts array will automatically be tracked.

Then the socket APIs need to be turned into syscalls. The key thing is that a new handler function must be implemented which rigorously validates the arguments. This includes validating any kernel object pointers passed in, validating any data buffers, and possibly copying some data if passed in by the user and kept by the kernel after the call completes.

Example (this code is only build-tested btw):

(in include/net/socket.h)

diff --git a/include/net/socket.h b/include/net/socket.h
index f7890ab1c..d5ee7ad54 100644
--- a/include/net/socket.h
+++ b/include/net/socket.h
@@ -67,8 +67,8 @@ ssize_t zsock_send(int sock, const void *buf, size_t len, int flags);
 ssize_t zsock_recv(int sock, void *buf, size_t max_len, int flags);
 ssize_t zsock_sendto(int sock, const void *buf, size_t len, int flags,
                     const struct sockaddr *dest_addr, socklen_t addrlen);
-ssize_t zsock_recvfrom(int sock, void *buf, size_t max_len, int flags,
-                      struct sockaddr *src_addr, socklen_t *addrlen);
+__syscall ssize_t zsock_recvfrom(int sock, void *buf, size_t max_len, int flags,
+                                struct sockaddr *src_addr, socklen_t *addrlen);
 int zsock_fcntl(int sock, int cmd, int flags);
 int zsock_poll(struct zsock_pollfd *fds, int nfds, int timeout);
 int zsock_inet_pton(sa_family_t family, const char *src, void *dst);
@@ -185,4 +185,6 @@ static inline void freeaddrinfo(struct zsock_addrinfo *ai)
  * @}
  */
 
+#include <syscalls/socket.h>
+
 #endif /* __NET_SOCKET_H */

Here all I had to do was tag the API as a syscall and include an auto-generated header. Build system does the rest.

(in socket.c)

diff --git a/subsys/net/lib/sockets/sockets.c b/subsys/net/lib/sockets/sockets.c
index 5bcd771f5..8d220816d 100644
--- a/subsys/net/lib/sockets/sockets.c
+++ b/subsys/net/lib/sockets/sockets.c
@@ -17,6 +17,7 @@
 #include <net/net_context.h>
 #include <net/net_pkt.h>
 #include <net/socket.h>
+#include <syscall_handler.h>
 
 #define SOCK_EOF 1
 #define SOCK_NONBLOCK 2
@@ -454,8 +455,8 @@ ssize_t zsock_recv(int sock, void *buf, size_t max_len, int flags)
        return zsock_recvfrom(sock, buf, max_len, flags, NULL, NULL);
 }
 
-ssize_t zsock_recvfrom(int sock, void *buf, size_t max_len, int flags,
-                      struct sockaddr *src_addr, socklen_t *addrlen)
+ssize_t _impl_zsock_recvfrom(int sock, void *buf, size_t max_len, int flags,
+                            struct sockaddr *src_addr, socklen_t *addrlen)
 {
        struct net_context *ctx = INT_TO_POINTER(sock);
        enum net_sock_type sock_type = net_context_get_type(ctx);
@@ -471,6 +472,47 @@ ssize_t zsock_recvfrom(int sock, void *buf, size_t max_len, int flags,
        return 0;
 }
 
+#ifdef CONFIG_USERSPACE
+Z_SYSCALL_HANDLER(zsock_recvfrom, sock, buf, max_len, flags, src_addr,
+                 addrlen_param)
+{
+       socklen_t addrlen_copy;
+       socklen_t *addrlen_ptr = (socklen_t *)addrlen_param;
+       ssize_t ret;
+
+       if (Z_SYSCALL_OBJ(sock, K_OBJ_NET_CONTEXT)) {
+               return -EINVAL;
+       }
+
+       struct net_context *ctx = INT_TO_POINTER(sock);
+       enum net_sock_type sock_type = net_context_get_type(ctx);
+
+       if (sock_type != SOCK_DGRAM && sock_type != SOCK_STREAM) {
+               return -EINVAL;
+       }
+
+       Z_OOPS(Z_SYSCALL_MEMORY_WRITE(buf, max_len));
+       if (src_addr) {
+               /* addrlen is input/output, need to make a copy */
+               Z_OOPS(Z_SYSCALL_MEMORY_WRITE(addrlen_ptr,
+                                             sizeof(socklen_t *)));
+               addrlen_copy = *addrlen_ptr;
+               Z_OOPS(Z_SYSCALL_MEMORY_WRITE(src_addr, addrlen_copy));
+       } else {
+               addrlen_copy = 0;
+       }
+
+       ret = _impl_zsock_recvfrom(sock, (void *)buf, max_len, flags,
+                                  (struct sockaddr *)src_addr, &addrlen_copy);
+
+       if (src_addr) {
+               *addrlen_ptr = addrlen_copy;
+       }
+
+       return ret;
+}
+#endif /* CONFIG_USERSPACE */
+
 /* As this is limited function, we don't follow POSIX signature, with
  * "..." instead of last arg.
  */

Here I've renamed the implementation function per the internal naming convention, and implemented the handler function.

  1. I validate that sock, which is a struct net_context, is valid, the caller has permission on it, and it's initialized (the implementation of zsock_socket() would need to grant the caller permission and activate initialized state on the allocated context before it returns the pointer, zsock_close would clear the initialization state and revoke all permissions on the object)
  2. I do the same check of the socket type done in the implementation function, but return an error instead of failing an assertion. You could just do this in the implementation function if you wanted to instead of asserting.
  3. I validate that the provided buffer is a region of memory that the caller has write access to.
  4. If we provide src_addr, then addrlen is an input/output parameter. These have to be treated carefully to avoid TOCTOU attacks. After validating the memory pointed to by addrlen, I make a copy of it before I do anything to it. I then use it to validate the buffer represented by src_addr since according to the man page that represents its size.
  5. I then call the implementation function, using the copy I made of addrlen.
  6. I copy addrlen back to the source pointer and return to the caller.

I believe implementing the rest of the syscalls would be fairly straightforward. One gap is that we don't have a Z_SYSCALL.xxxx() macro for validating C strings, but that wouldn't be hard to add.

@pfalcon
Copy link
Contributor

pfalcon commented Jun 5, 2018

Some comments based on today's Networking Forum meeting follow.

@pfalcon
Copy link
Contributor

pfalcon commented Jun 5, 2018

doing the syscalls at the socket layer

As I don't think there was a clear assignment for this, let give my 2 cents: I'm not much interested in userspace/kernel separation. As a cure/punishment for such an attitude, I can take the task of implementing these syscalls - but only as a low-priority one. I.e. I'd still roughly target for 1.13, but as it's summer/vacation time, it may slip.

Summing up, if there's a party who think that implementing syscalls is high priority, please go for it. Otherwise, feel free to leave it to me as sockets subsys maintainer. (In this case I'd be interested in diving into the userspace separation stuff myself, and would find comments on @andrewboie sufficient to start digging into it myself, as I'm sure he has more important stuff to do).

@pfalcon
Copy link
Contributor

pfalcon commented Jun 5, 2018

DNS to use sockets #7587

As mentioned by @jukkar, DNS is abstracted away behind POSIX getaddrinfo() call. I.e., it's not important how it's implemented - with sockets or native lib (like it is now). Again, it may become more important later due to the proverbial userspace/kernel separation, but for now I'd strike it out from the list.

@pfalcon
Copy link
Contributor

pfalcon commented Jun 5, 2018

HTTP to use sockets #7585

I would be kinda interested in this. But the underlying motivation is apparently long discussions with @jukkar about his net_app-based client, so I'm keen "to do it differently" ;-). And I'm afraid of getting into the same situation as with #5985, so can take it only if my boss assigns me to it. And it would be sufficiently different to existing client, as mentioned above.

So, the same note applies - if there's a party who needs it in some specific way, please go for this. If not, I can take it. (Actually, I'd rather spend time porting existing libs, just in case of HTTP, I guess I still arrogant enough to think that "I can do it better" (where better == smaller)).

@pfl
Copy link
Collaborator

pfl commented Jun 6, 2018

I validate that sock, which is a struct net_context, is valid, the caller has permission on it, and it's initialized (the implementation of zsock_socket() would need to grant the caller permission and activate initialized state on the allocated context before it returns the pointer, zsock_close would clear the initialization state and revoke all permissions on the object)

Is this part of the code validating that the net_context is one that belongs to the calling thread? In the above code, the requested net_context int/pointer is found at least to reside in the global array of net_context entries, but can the caller calculate another pointer that would still end up in the net_context array, but belong to a different user space (or kernel space!) thread?

@andrewboie any thoughts about this?

@andrewboie
Copy link
Contributor

Summing up, if there's a party who think that implementing syscalls is high priority, please go for it.

We have some interest in at least getting some basic functionality up, and I have reserved some bandwidth in 1.13 cycle so I can spend some time now to get the ball rolling, with the understanding that this will be handed off at some point as discussed in the meeting.

@nashif nashif added Feature A planned feature with a milestone and removed Enhancement Changes/Updates/Additions to existing features labels Jul 10, 2018
@nashif nashif added the priority: high High impact/importance bug label Aug 21, 2018
@carlescufi
Copy link
Member

@laperie can we tick the first item now? setsockopt with TLS is merged, isn't it?

@pfalcon
Copy link
Contributor

pfalcon commented Sep 6, 2018

setsockopt with TLS is merged, isn't it?

Yep, so let's tick it, I'm sure @laperie will ack that.

@carlescufi
Copy link
Member

@pfalcon thanks Paul

@pfalcon pfalcon added the area: Sockets Networking sockets label Dec 19, 2018
@nashif nashif removed their assignment Feb 13, 2019
@nashif nashif added Enhancement Changes/Updates/Additions to existing features and removed Feature A planned feature with a milestone labels Feb 21, 2019
@laperie
Copy link
Collaborator Author

laperie commented Mar 19, 2019

Marking as fixed as all the substantial modifications of the stack have been done.
We'll keep engineering debt reduction open for the forthcoming releases.
THANKS EVERYBODY FOR WORKING HARD ON THIS ONE!

@laperie laperie closed this as completed Mar 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Sockets Networking sockets Enhancement Changes/Updates/Additions to existing features priority: high High impact/importance bug
Projects
None yet
Development

No branches or pull requests

10 participants