Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: buf: Fix net_buf struct issue due to frags and node elem union #38830

Conversation

XavierChapron
Copy link
Contributor

net_buf elements are used in net_pool which are based on lifo
implementation. Yet, net_buf structure doesn't respect lifo requierement
that the first word must be reserved for the lifo kernel implementation.

In most cases, this is fine as this word is mostly accessed when the
element is not allocated, however, this is not always true.
In such situation, node element is written, ehnce frags element value is
not NULL anymore and anything might happen...

This fixes #38829 issue.

Signed-off-by: Xavier Chapron xavier.chapron@stimio.fr

net_buf elements are used in net_pool which are based on lifo
implementation. Yet, net_buf structure doesn't respect lifo requierement
that the first word must be reserved for the lifo kernel implementation.

In most cases, this is fine as this word is mostly accessed when the
element is not allocated, however, this is not always true.
In such situation, node element is written, ehnce frags element value is
not NULL anymore and anything might happen...

This fixes zephyrproject-rtos#38829 issue.

Signed-off-by: Xavier Chapron <xavier.chapron@stimio.fr>
@XavierChapron XavierChapron force-pushed the xch/net-buf-implem-issue branch from fd639f2 to 80e5a7b Compare September 24, 2021 15:33
@XavierChapron XavierChapron changed the title subsys: net: buf: Fix net_buf struct issue due to frags and node elem union net: buf: Fix net_buf struct issue due to frags and node elem union Sep 24, 2021
Copy link
Member

@jhedberg jhedberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way net_buf is currently designed (there may be bugs there however) is that when the buffer is inside a FIFO the frags member is never used. Instead there's a separate flag bit that's set in the buffer flags to indicate that there are related fragments coming after the buffer in the FIFO. Take a look at the net_buf_get() and net_buf_put() implementations which should be used to insert and remove buffers from a FIFO.

The memset to zero looks like a potentially valid fix, but the removal of the union doesn't. Or rather, if we are going to remove the union then there's much more rework to be done than just a simple bug fix.

FWIW, there has been plans to eliminate this feature with fragments and FIFOs and have a separate fragments list, however it needs some careful consideration because it increases the net_buf struct size, which can be quite critical for memory constrained systems that use a lot of buffers.

@@ -265,6 +265,7 @@ struct net_buf *net_buf_alloc_len(struct net_buf_pool *pool, size_t size,
irq_unlock(key);

buf = pool_get_uninit(pool, uninit_count);
memset(&buf->node, 0, sizeof(buf->node));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this one, it might be better placed near buf->frags = NULL; below.

@XavierChapron
Copy link
Contributor Author

The way net_buf is currently designed (there may be bugs there however) is that when the buffer is inside a FIFO the frags member is never used. Instead there's a separate flag bit that's set in the buffer flags to indicate that there are related fragments coming after the buffer in the FIFO. Take a look at the net_buf_get() and net_buf_put() implementations which should be used to insert and remove buffers from a FIFO.

@jhedberg
Mcumgr net buf implementation uses net_buf_unref()

mcumgr_buf_free(struct net_buf *nb)

And net_buf_unref() cares about the frags member.

Maybe it means that we should more simply update mcumgr_buf_free() to use net_buf_put instead()?

However, I don't find it safe to have a "reserved member for kernel", used in a union which is memset at each alloc and shared with some other things.... I understand the optimisation, but I don't think that's the best 4 bytes to spare...

@XavierChapron
Copy link
Contributor Author

Take a look at the net_buf_get() and net_buf_put() implementations which should be used to insert and remove buffers from a FIFO.

@jhedberg This rule seems not to be correctly applied in multiples parts of the code. I've not find a single usage of NET_BUF_POOL_DEFINE not calling net_buf_alloc().

I think we either need to fixed this quitte everywhere or sanitize the usage of net_buf_alloc() and net_buf_unref().

@jhedberg
Copy link
Member

Maybe it means that we should more simply update mcumgr_buf_free() to use net_buf_put instead()?

No, net_buf_put() is for a completely different purpose than net_buf_unref(). put() is for transferring buffers through a FIFO whereas unref() is for releasing a reference to the buffer that you own.

@jhedberg This rule seems not to be correctly applied in multiples parts of the code. I've not find a single usage of > NET_BUF_POOL_DEFINE not calling net_buf_alloc().

I'm not following. What's wrong with net_buf_alloc()?

@XavierChapron
Copy link
Contributor Author

Maybe it means that we should more simply update mcumgr_buf_free() to use net_buf_put instead()?

No, net_buf_put() is for a completely different purpose than net_buf_unref(). put() is for transferring buffers through a FIFO whereas unref() is for releasing a reference to the buffer that you own.

@jhedberg This rule seems not to be correctly applied in multiples parts of the code. I've not find a single usage of > NET_BUF_POOL_DEFINE not calling net_buf_alloc().

I'm not following. What's wrong with net_buf_alloc()?

@jhedberg It seem I didn't get your first message, I've understant that you were telling me that rather than updating net_buf structure, one should rather use net_buf_get() and net_buf_put() when the buffer allocation mechanism is based on a FIFO.
It's seems that my understanding wasn't correct.

I'm not following. What's wrong with net_buf_alloc()?

To me, net_buf_alloc() clear the first word of net_buf allocated elements, which doesn't feels safe even if it might not be a problem for current implementation.

Anyway, my problem was more related to NET_BUF_POOL_DEFINE which elements are usually allocated with net_buf_alloc() and freed with net_buf_unref(). Indeed NET_BUF_POOL_DEFINE allocation mechanism is based on a LIFO, which sometimes sets the net_buf element first word which leads to a misuse when releasing the element with net_buf_unref().
@jhedberg Do we agree on this part?

@jhedberg
Copy link
Member

@jhedberg It seem I didn't get your first message, I've understant that you were telling me that rather than updating net_buf structure, one should rather use net_buf_get() and net_buf_put() when the buffer allocation mechanism is based on a FIFO.

I simply meant to look at those implementations to see how they solve fragment buffer accounting with the help of buf->flags when k_fifo will overwrite the first four bytes.

There might be a missing resetting of the first four bytes somewhere, but least both net_buf_get() and net_buf_alloc_len() do clear them before returning the buffer (in the net_buf_get() case this is only true if the buffer had no associated fragment buffers).

@jhedberg
Copy link
Member

@XavierChapron now that I look at your net_buf_alloc_len() change, that seems unnecessary since the same function sets buf->frags = NULL; just after the success: label, i.e. before returning the buffer.

@XavierChapron
Copy link
Contributor Author

Ok my bad, it's already fixed in zephyr 2.6... See #33780

In my opinion there are two others ways to fix this:

  • remove the union in net_buf struct losing 4 byte per element
  • update list implementation so that at least, the node elem is removed automatically when the node is pop from the list. Something like:
diff --git a/include/sys/list_gen.h b/include/sys/list_gen.h
index 9f15deefb7..3f08410e51 100644
--- a/include/sys/list_gen.h
+++ b/include/sys/list_gen.h
@@ -170,6 +170,7 @@
#define Z_GENLIST_GET_NOT_EMPTY(__lname, __nname)			\
	static inline sys_ ## __nname ## _t *				\
	sys_ ## __lname ## _get_not_empty(sys_ ## __lname ## _t *list)	\
	{								\
		sys_ ## __nname ## _t *node =				\
				sys_ ## __lname ## _peek_head(list);	\
									\
		z_ ## __lname ## _head_set(list,			\
				z_ ## __nname ## _next_peek(node));	\
		if (sys_ ## __lname ## _peek_tail(list) == node) {	\
                        z_ ## __lname ## _tail_set(list,                \
                                sys_ ## __lname ## _peek_head(list));   \
                }                                                       \
+               z_ ## __nname ## _next_set(node, NULL); \
                                                                        \
                return node;                                            \
        } 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: API Changes to public APIs area: Networking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

net_buf issue leads to unwanted elem free
3 participants