Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recently added HighCache does not appear to be re-entrant/atomic #365

Closed
MRawlings opened this issue Jan 20, 2025 · 3 comments
Closed

Recently added HighCache does not appear to be re-entrant/atomic #365

MRawlings opened this issue Jan 20, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@MRawlings
Copy link

Some observations for the addition of the 'HighCache' between v0.18.4 and
v0.18.5.

We are using this library indirectly via the libopenapi_validator.

We've fallen behind in our release updates and are now trying to catch back up
again.

We use the libraries in a concurrent manner to validate REST calls, using a
standard REST server supporting concurrent request processing.

Because we use a concurrent server we perform our unit-tests with:

go test --race --count 3 ./...

This gives us a good chance of hitting concurrency issues during our
unit-tests.

Since race conditions may not occur in every test run due to scheduling
randomness, we run tests 3 times (--count 3) to increase the statistical
probability of hitting race conditions.

  • libopenapi v0.18.4 works fine for us with this setup.

  • libopenapi v0.18.5 breaks due to race conditions.

    The functions implicated are: GetHighCache(), AddHit(), AddMiss(),

    Whilst found in v0.18.5 the comments are also valid for the current mainline.

NOTE: I am checking with my company for permission to contribute
pull-requests to this project. But, I wanted to get this information to
you in the meantime.

Much of the code of the library is already written with concurrency in mind
as is evidenced by the use of mutexs and atomic operations.

Below are some observation on the new cache code and a couple of suggestions
for making the wrapped atomic operations actually atomic.

We don't intent to imply these suggestion will fix all concurrency issues.
But, these seem like simple wins and help to make clear how atomic
operations can be wrapped and can remain atomic.

These observations are all on the file: index/cache.go; which, based on
the subset of functions we are using, is the only place we are seeing
failures in v0.18.5.

GetHighCache() line 40:

The structure of this function would appear to be an anti-pattern to
reentrancy and atomic-operation.

We feel that initialisation of the index.highModelCache should not be
included in a function that is ostensibly a Get operation.

The inclusion of the initialisation makes this function non-reentrant.

Maybe the initialisation of this structure could/should be moved to an
explicit NewSpecIndex function so that the CreateNewCache() is
guaranteed to be called only once. As it is, two go routines can be
executing GetHighCache in parallel and both call CreateNewCache() and
set index.HighModelCache. Resulting in non-deterministic operation.

AddHit()/AddMiss() lines 102-105, 108-111

The AddHit() and AddMiss() do not provide atomic operation.

Whilst they use two atomic functions, Add and Load, only the individual
operations are atomic. The sequences of atomic operations are not.
Rescheduling will occur between the Add and the Load. Providing
non-deterministic operation.

The atomic package provides individual, low-level, atomic operations; not
sequences of atomic operations. Once you have a sequence of atomic
operations, and you need determinism, you need to use a mutex. (We don't
think you do need a mutex here, since we don't think you need the sequence.)

For the two AddHit()/AddMiss() functions we would suggest simply returning
the result of the atomic Add.

Which is the atomic result you need without the need for the Load, and
the resultant rescheduling window.

SetHits()/SetMisses() Lines 114-117, 120-123

Besides the fact these do not provide Set/Store operations as the
comments imply, but replicate the Add operations; they have the same
issue as the AddHit()/AddMiss(). They do not provide atomic operation.

Even if they used Store() instead of Add(), by being defined as
returning a value, a Load() operation is implied.
As for the AddHit() and AddMiss() functions, the SetHits() and
SetMisses() functions are not atomic. A sequence of Store and Load
calls is not atomic.

We suggest a couple of possible approaches:

  • Simply remove the functions, they don't appear to be directly used?

  • Since these are 'store' operations, there is no necessity to return a
    value.

    Removing the return value would remove the need for a Load operation
    and therefore the need for a non-atomic sequence of atomic operations.

    Also, it seems a little 'strange' that the atomic.Store() operations
    don't return a value, but these abstracted versions do.

    (We appreciate the change in function signature would be a breaking
    change.)

  • If we really need to return the value, simply return the value passed
    in directly, not via the Load() operation.

I hope this is useful as it is intended.

@daveshanley
Copy link
Member

Thank you for this, I have been meaning to return to this code as there are other reports of this (I have verified). I spent a little time on it but I got stumped a little.

Thanks for your detailed breakdown, fresh eyes are always appreciated. I will review and try again on tackling this issue.

@daveshanley daveshanley added the bug Something isn't working label Jan 20, 2025
@MRawlings
Copy link
Author

Thanks for the update/response.

Just to record/add.

HighCacheHit()/HighCacheMiss() Lines 18-21, 24-27

Also have the same issue of directly/indirectly using a sequence of atomic operations.

We believe both can simply remove the use of highModelCache.GetHits(), highModelCache.GetMisses() and return the result of the highModelCache.AddHit(), highModelCache.AddMiss() calls which are now atomic in their own right.

daveshanley added a commit that referenced this issue Feb 1, 2025
Removed non atomic code. Breaking changes too.
@daveshanley daveshanley mentioned this issue Feb 1, 2025
@daveshanley
Copy link
Member

I applied these changes to fix this issue in v0.21.0.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants