Introducing Quiescent State-Based Reclamation to Chapel #8182

LouisJenkinsCS · 2018-01-11T23:14:44Z

I introduce Quiescent State-Based Reclamation, a memory reclamation algorithm that can be from the confines of the runtime and potentially from Chapel user code (with some GOTCHAS). The memory reclamation algorithm comes with very little performance regression and can ensure the eventual cleanup of memory so long as checkpoints are periodically called from all threads (although their placement is up to debate).

QSBR can come in handy for any performance-critical data structure, and currently is used in Chapel's privatization table. Future uses can be for the thread-safety of Chapel's callback system and perhaps in more dynamic task local storage. In the future, a specific QSBR table is planned for users that will not be able to interfere with the runtime.

Potential Uses of QSBR:

Non-Blocking Data Structures: It is safe to defer the deletion of anything logically removed from the data structure in question. For example, a lock-free queue that removes a node from the head can safely defer deletion as it will be the only thread to do so (since the CAS operation will return true for only one successful operation)
Multiword Compare-And-Swap: This also applies to anything using descriptors for helping mechanisms, but a K-Word Compare-And-Swap where we make use of an object that holds enough state to represent some in-progress operation to claim ownership of the location. The descriptor can be accessed by concurrent threads and so it is safe to defer for deletion after the owning thread/task completes its operation.
Software Transactional Memory: While things like reading and writing are trivial to implement, a huge problem that arises is the deletion of data during a transaction and even after committing it. Deleting the object immediately can result in a segmentation fault for tasks that have not yet aborted their transaction, so deferring deletion would be a solution so long as no checkpoint is invoked within a transaction.

There are many other potential uses, the sky is the limit.

Reviewed by @mppf, @gbtitus, @ronawho.

Testing:

10000 trials of test/distributions/privatization/runtime/ in the standard configuration
full local testing
ugni testing for test/release/example/primers
full quickstart testing
full gasnet testing
full gasnet+fifo testing

ronawho · 2018-01-11T23:37:29Z

runtime/src/chpl-privatization.c

+// Determines current instance. (MUST BE ATOMIC)
+atomic_int_least8_t currentInstanceIdx;
+
+chpl_priv_block_t chpl_priv_block_create() {


I never remember this, but in C 0 argument functions should take "void". In fact, I just messed this up yesterday: #8168

Fixed, although I'm not 100% certain I understand the actual need (something about it being possible for someone to pass arguments to a no-argument function and mess with the stack contents... kinda interested in what happens in that case, actually...)

For compatibility with old C, a no argument prototype actually means "Don't assume or check anything about the args", where's a void signature means "this is a 0 argument function" -- https://stackoverflow.com/questions/693788/is-it-better-to-use-c-void-arguments-void-foovoid-or-not-void-foo

ronawho · 2018-01-11T23:47:07Z

runtime/include/chpl-privatization.h

@@ -27,12 +27,7 @@ void chpl_privatization_init(void);

 void chpl_newPrivatizedClass(void*, int64_t);

-// Implementation is here for performance: getPrivatizedClass can be called
-// frequently, so putting it in a header allows the backend to fully optimize.


This was moved in #6212, and it had a pretty significant impact on performance of prk stencil. We should definetly do some performance analysis of this PR before merging.

Unfortunately, its going to result in a minor performance regression in any-case for chpl_getPrivatizedClass, and definitely not by too much, and now both chpl_newPrivatizedClass (when it doesn't need to allocate more space) and chpl_clearPrivatizedClass will be on-par with it... hopefully. This is bleeding-edge stuff, after-all.

Overall, I can say that there is one way that will likely counter any regression, but that involves a complete overhaul to privatization as a whole... that'd be an interesting GSoC student project though :)

mppf · 2018-01-12T16:06:32Z

@LouisJenkinsCS - I have some feedback for you about this.
First, I was expecting you to create some sort of generally useful RCU mechanism. This looks to me more like manually re-written reader-writer locks. I was hoping a generally-useful RCU mechanism could serve as a building block for your future work.
Second, if it makes Chapel benchmarks slower, we'll probably keep the memory leak instead of doing these changes.
If I understand correctly, full-on RCU implementations can avoid the atomic operations on the "read" path. I think in this case the "reads" (get privatized object ID) are so far ahead of the "writes" (create a new privatized object) that we're likely to care about that.

LouisJenkinsCS · 2018-01-12T17:26:13Z

This looks to me more like manually re-written reader-writer locks.

May I ask in what way these look like reader-writer locks? While writers do require mutual exclusion, readers are not blocked by a writer to complete or by other readers. (Perhaps you could open a code review listing which parts I need to elaborate on?)

If I understand correctly, full-on RCU implementations can avoid the atomic operations on the "read" path.

RCU is all about atomics for readers, normally in the form of memory barriers (such as the fact that reads must first atomically read the current instance using rcu_dereference, or must enter a read-side critical section using rcu_read_lock and rcu_read_unlock as per the official API). In which case my acquireRead and releaseRead serve as.

mppf · 2018-01-12T17:57:51Z

May I ask in what way these look like reader-writer locks? While writers do require mutual exclusion, readers are not blocked by a writer to complete or by other readers. (Perhaps you could open a code review listing which parts I need to elaborate on?)

I agree with you - it's just that I didn't see anything I recognized as rcu_ terms.

RCU is all about atomics for readers, normally in the form of memory barriers (such as the fact that reads must first atomically read the current instance using rcu_dereference, or must enter a read-side critical section using rcu_read_lock and rcu_read_unlock as per the official API). In which case my acquireRead and releaseRead serve as.

I'm having some trouble figuring out what's going on in liburcu, but how would it compare with what you have done here? What would make us use liburcu instead of this mechanism, or vice versa? How is the performance different? Would liburcu help with your distributed ideas?

LouisJenkinsCS · 2018-01-12T18:25:22Z

Truth be told, the major differences boil down to optimization. Keep in mind that my algorithm has been developed Chapel-side, where I had to deal around problems with abstraction and lack of certain features (ahem task-local storage), and only in the span of a single day had to be revised to C. My algorithm also was built around maintenance over an entire cluster, while LibURCU's is built around maintenance over a single SMP system. Lastly, my algorithm was devised to solve a single problem (which it did), while LibURCU was meant to be reused anywhere (although my algorithm can, apparently, also be used similarly).

Performance-wise, I have not attempted to produce benchmarks between the two (nor had the time to do so), but I'd imagine mine to be around a small fraction of LibURCU's at a single node, merely due to optimization. One plus is that my code is significantly smaller and less complex, making it easier to implement in languages that lack certain features (ahem). Finally, LibURCU wouldn't help for my purposes much in that all I needed was the basic premise (read-side critical sections, wait-for-readers, single-writer, etc.), I've gotten all I could out of that concept.

mppf · 2018-01-12T18:47:28Z

Finally, LibURCU wouldn't help for my purposes much in that all I needed was the basic premise (read-side critical sections, wait-for-readers, single-writer, etc.), I've gotten all I could out of that concept.

I don't really follow - are you saying you couldn't use LibURCU for the distributed case? I seem to be confused about something here.

Anyway, for the specific matter of this PR - the privatization arrays - I think we'll need a sense of the performance impact of this change in order to decide if we proceed with it or no.

LouisJenkinsCS · 2018-01-12T19:02:51Z

I don't really follow - are you saying you couldn't use LibURCU for the distributed case?

LibURCU is AFAIK for a SMP and so wouldn't have usage outside of a single node (at least not with the implementation I saw here)... but now that I think about it, you can have readers on each node could use LibURCU, and elect one writer over the entire cluster to perform an LibURCU write/update on each node.

mppf · 2018-01-12T19:52:18Z

Naturally re-using liburcu (or any other single-node rcu implementation) has the advantage that we can have better within-node performance (since these implementations are tuned etc). Does having a RCU across multiple locales necessarily mean we have to start from scratch? Let's think about that some more.

LouisJenkinsCS · 2018-01-12T20:08:29Z

It depends on the application... for the case of distributed arrays that are both indexable and resizable, no, the issue of recycling memory has been addressed (making resizing possible), and loosening the classification for 'reads' to include writes to returned references is what leads to significant performance improvements The RCU itself is just used for memory management (again, if Chapel had garbage collection, we wouldn't really need this at all). LibURCU can do the job for that. I guess what I'm trying to say is that the RCU itself should be seen as just a memory management tool.

LouisJenkinsCS · 2018-01-12T20:43:20Z

As well, if the goal here is to implement LibURCU, that'd be a nightmare of a time. It requires Thread-Local Storage, and I mean it, and each implementation makes assumptions about it as well that could be disastrous (for example, the classic build uses pthread_getspecific, but in a tasking layer like qthread it would mean multiplexed tasks on the same thread would be subject to undefined behavior), and I don't have a clue how we're going to ensure this can be incorporated into the tasking layer itself. That's a GSoC student project for sure though.

I understand if this won't be accepted (didn't really expect it to, but sorry to disappoint), but I want to focus on the application it was originally purposed for: Global Atomic Objects (or at this case, distributed indexable resizable arrays)

ronawho · 2018-01-12T21:34:21Z

I did some quick perf testing, and unfortunately it looks like this adds a significant amount of overhead (2000x slowdown for prk-stencil):

cd $CHPL_HOME/test/studies/prk/Stencil/optimized/
chpl stencil-opt.chpl --fast --set iterations=3 --set order=8000 --no-local
./stencil-opt

For master:

Rate (MFlops/s): 56398.417036  Avg time (s): 0.0215393
stencil time = 0.0129623
increment time = 0.00855267
comm time = 2.26667e-05

For RCU-Privatization:

Rate (MFlops/s): 29.297862  Avg time (s): 41.4632
stencil time = 41.4546
increment time = 0.00866033
comm time = 2.6e-05

LouisJenkinsCS · 2018-01-13T00:28:32Z

Does the benchmark do more read or write operations? I'll investigate it myself, but 2000x slower is high than I expected, I would have suspected no worse than 10x, unless its like all writes in which it'd be expected.

LouisJenkinsCS · 2018-01-13T01:02:54Z

Okay, I did a bit of profiling... I added an atomic counter for read and write operations respectively... the amount of reads I'm seeing are really large, like wow.

Privatized Reads: 1214851089, Writes: 0

That's for one iteration... That 1.2B reads... I see why now you guys had chpl_getPrivatizedObject inlined into the header, thats insane. Hm, with that much, its likely the issue of having too much traffic on the read counter (necessary due to lack of TLS). In reality, the only additional code over what was in master boils down to 1 Fetch-And-Add, 1 Fetch-And-Sub, and 2 Atomic Reads. The Fetch-And-Add and Fetch-And-Sub would cause a load of bus traffic on that cache line its on. Definitely would need LibURCU for single-node.

I think I'd also be interested in helping out with it more.

ronawho · 2018-01-13T01:04:39Z

It should be almost entirely reads and very very few writes. The part that slows down is https://github.com/chapel-lang/chapel/blob/master/test/studies/prk/Stencil/optimized/stencil-opt.chpl#L151-L164

With param unfolding I think that will be ~10 calls to the getPrivatizedCopy per loop iteration. Even in the fast path for reading, I think your code still does at least 2 atomic operations, which I think is just going to be way too much overhead.

Some rough numbers:

atomic operations are relatively slow -- 1,000,000 uncontested (serial) atomic adds takes ~.01 second (depends on machine and other factors, but this is a rough estimate.)
For this small problem size (8,000 x 8,000 matrix) that's going to be at least ~1,280,000,000 atomic ops ( 8,000**2 iters * 10 getPrivatizedCopy per iter * 2 atomics per getPrivatizedCopy) and these atomic ops won't be serial, they'll be concurrent. So at a min that would be ~10 seconds just for the atomic ops, and from the benchmark we see it's more like 40 seconds.

Something like #6184 would alleviate the number of times getPrivatizedCopy is being called, but I don't think we're going to get to that any time soon (and even if we did licm can't always run, so I'm not sure we could pay this kind of cost)

ronawho · 2018-01-13T01:13:40Z

Yeah, as you're seeing there are a lot of calls to getPrivatizedCopy for this benchmark, basically one per array index operation, and there's a ton of them. Ideally we would hoist them or something, but that's hard, so for now we took the "easy" way out of having a really fast implementation that can be fully inlined and optimized by the backend.

The code we generate (especially with the param unfolding) starts to get pretty unwieldy, but it's worth noting that our performance is on par with the reference MPI+OpenMP version up to at least 256 locales

LouisJenkinsCS · 2018-01-13T01:28:05Z

It is actually interesting... there's another hit taken due to the amount of indirection needed (basically from void ** -> void ****), but that's required to make the algorithm work as a whole (having 2 instances, segmenting data into blocks rather than being contiguous memory, etc.) and probably has some significant impact too.

Also didn't know that each index into the array is a call to getPrivatizedCopy, so many subtle details here and there...

LouisJenkinsCS · 2018-01-13T02:20:56Z

I wonder... Do you think that under the FIFO tasking layer, it would be safe to use TLS? I'm thinking of trying it out.

LouisJenkinsCS · 2018-01-13T14:30:08Z

Okay, I have another idea to make this work...

What if we disable preemption when you call chpl_getPrivatizedCopy? The primary issue I'm seeing is that multiplexed threads would have issues sharing the same TLS, but what if we make it so that only a single task per thread can use any of the privatized runtime calls? (As in, make a call to disable preemption before making the call to rcu_read_lock, enable preemption after rcu_read_unlock) Doing so means if multiple tasks on a given thread request chpl_getPrivatizedCopy, then they just become serial (for that thread), but you can still have parallelism from other threads (Plus this only becomes an issue with oversubscribing anyway...)

I believe threads must be registered before use, but that can be performed during chpl_privatization_init, right? If so, with the combination of toggling preemption during calls into the runtime for privatization, then we might be able to let LibURCU work its magic.

ronawho · 2018-01-13T17:45:34Z

We don't multiplex tasks for fifo, so using thread local storage should be fine. For qthreads you might be able to use task local storage. Note that chpl_task_getId() does use task local storage for qthreads

Also note that qthreads does not have preemptive scheduling, qthreads is a cooperative scheduler. If t1 and t2 are scheduled on pthread1, the only way for the tasks to switch is either with an explicit call to chpl_task_yield()/qthread_yield() or some higher level call that will end up calling them.

LouisJenkinsCS · 2018-01-13T20:02:40Z

I'm almost satisfied with the fact that the data structure managed to perform ~30M Op/Sec (honestly, I don't think its possible yet for any data structure to compare to an unprotected read). I have revised it yet again to make use of TLS, and interestingly the benchmark time didn't change much, if any, at all. I rebuilt the runtime too (and I have to correct a few errors here and there) so I know its running the new version. Right now, RCU readers perform absolutely zero RMW atomic operations (2 atomic reads), and just do a volatile write to their own thread-specific node; I maintain a global table (similar to Hazard Pointers) and use a lock-free approach to append a new TLS node to it once (the first time used), which allows the writer to see all 'thread-specific' data.

The changes I made did yield a better runtime on swan, from 110 seconds to 70 seconds, but that means that the issue has to be due to the added indirection inherent in design of my data structure, but at this point there isn't anything else I can do.

LouisJenkinsCS · 2018-01-13T20:08:48Z

Actually... perhaps there is another thing... the reason for the indirection is so that indexes can count as 'reads' and so that updates from one instance carries into the other. However, if the most important thing is chpl_getPrivatizedCopy, then I can make it so that all of the methods will perform the extreme heavy write operations. If I do this, I can go back, similar to how it was before, and just have a single void ** array. I can probably look into the epoch-based memory reclamation or snapshot-based.

LouisJenkinsCS · 2018-01-14T17:50:46Z

I believe I have done enough research to say that the type of RCU-like memory reclamation I was performing was the Epoch-based (without even knowing it), but there is another, more efficient one without actual need for any memory barriers, called Quiescent-Based State Reclamation, which unlike the Epoch-based where we declare the critical-section in which we are making use of memory, instead we have to inject some 'checkpoints' which declare that we aren't using the memory. The only place where this would be appropriate, I believe, is chpl_task_yield, or whichever handler is for preemption (so long as preemption never occurs inside chpl-privatization.c.

The significance is that we do not require any memory barriers, but TLS is still required (good thing I've managed to handle this myself). This will actually allow us to place the extern chpl_privatizedObjects and chpl_getPrivatizedCopy back into the header file with 0-overhead. This is actually interesting, in that the QSBR reclamation strategy can work for any application where we can call that 'checkpoint' repeatedly, so while its usage is limited it fits our needs perfectly. I'll see if I can get this done by tomorrow.

Edit:

I have another idea... I believe a reader-writer lock for chpl_newPrivatizedCopy and chpl_clearPrivatizedCopy is in order... The chpl_getPrivatizedCopy can still use zero-overhead Quiescent State-Based Reclamation where we inject a call to some 'checkpoint' to update the current epoch, but for the other two we can easily allow concurrent writes to the current snapshot of the chpl_privatizedObjects array so we can have that be the 'read' portion of the reader-writer lock and have the resizing be apart of the 'write' portion of the reader-writer lock. This way, reads have zero-overhead and privatization only pays a performance toll during rare times of resizing.

Man, I really need to write this stuff down in a journal rather than polluting the pull request.

LouisJenkinsCS · 2018-01-15T00:06:21Z

@mppf

I've done it, there is now zero-overhead RCU implemented based on Quiescent State-Based Reclamation, and it passes both tests/distributions/privatization/* and that 'stencil-ppk' or whatever its called, within the same amount of time as before. I injected calls for the quiescent state

Parallel Research Kernels Version 2.17
Serial stencil execution on 2D grid
Grid size            = 8000
Radius of stencil    = 2
Type of stencil      = star
Data type            = real(64)
Number of iterations = 3
Distribution         = Stencil
Solution validates
Rate (MFlops/s): 8326.713990  Avg time (s): 0.14589
stencil time = 0.0893613
increment time = 0.0565093
comm time = 1.6e-05

ronawho · 2018-01-15T00:22:45Z

Quick misc portability notes:

Avoid using __sync primitives directly, and instead use the chpl-atomics wrappers.
Can you use runtime/include/chpl-thread-local-storage.h instead of pthread_getspecific() and friends?
Try to avoid direct calls to chpl_malloc and friends, and instead use the chpl_mem_allocMany wrappers (which hook into our memory tracking interface)

LouisJenkinsCS · 2018-01-15T01:04:44Z

Okay, so I see now that while there is no regression in reads, writes are hit too hard right now (as well I think I may be deadlocking on writers right now), but I'm beginning to see that I need to insert checkpoints at more places.

@ronawho Since you're on, do you know if there is a particular callback for when a task finishes? It seems that when a task is finished, chpl_task_yield is not called (which makes sense) so my checkpoint isn't, meaning the writer is blocked waiting forever. In fact, I see now that the only way I can make this work is to registers threads with at least one task, and unregister threads without any so we don't wait for them)

ronawho · 2018-01-15T01:16:44Z

There is a callback interface that you can find at runtime/include/chpl-tasks-callbacks.h, but note that this is mostly intended for debug or profiling tools. We've optimized the 0-callback case, but there will probably be a non-trivial amount of overhead added to task create/begin/end if any callbacks are registered, so it's probably not appropriate for something that will be used for "fast" code.

If you just want to play around you could add calls to the task shims (like you did with chpl_task_yield). See chapel_wrapper in qthreads (or search for the chpl_task_cb_event_kind_end sentinel to look for places where tasks finish)

LouisJenkinsCS · 2018-01-15T16:48:07Z

Now passes all of test/distributions/privatization and aces the stencil-ppk benchmarks. Although I haven't tested memory leakage, it should be apparent that leakage is impossible since writers always finish and they always delete the previous instance before doing so. I think this is 100% successful @mppf

LouisJenkinsCS · 2018-01-15T17:01:15Z

Output of Stencil...

Parallel Research Kernels Version 2.17
Serial stencil execution on 2D grid
Grid size            = 8000
Radius of stencil    = 2
Type of stencil      = star
Data type            = real(64)
Number of iterations = 1
Distribution         = Stencil
Solution validates
Rate (MFlops/s): 15427.595586  Avg time (s): 0.078741
stencil time = 0.059808
increment time = 0.018887
comm time = 4.3e-05

mppf · 2018-01-15T18:43:02Z

@LouisJenkinsCS - now if we are doing Quiescent State-Based Reclamation it feels like real RCU to me. What would it take to generalize this into an RCU interface available to the C runtime? Is that possible, so we could use RCU in other places in the C runtime or from Chapel code? Or is this necessarily a one-off solution for some reason? (Note, I havn't dug into the code yet).

…matically register the main thread

… when to yield

… value is zero or non-zero

…rnings of unused variables

…emented Michael's suggested changes; made adjustments to runtime's makefile so that it will add the appropriate variables without overwriting any of the others.

mppf · 2018-03-13T21:40:50Z

test/distributions/privatization/runtime/privatizationProfiler.chpl

@@ -0,0 +1,16 @@
+use PrivatizationWrappers;


This test is missing a .good file, could you add it & check that start_test on it passes?

Meant to add a 'notest' file for that, sorry.

mppf · 2018-03-13T21:45:27Z

quickstart / fifo configuration seems to cause core dumps, e.g.

[Error matching program output for release/examples/hello3-datapar]
[Error matching program output for release/examples/hello4-datapar-dist]

Could you have a look?

mppf · 2018-03-13T22:28:52Z

GASNet (local, testing failed 1 test with an apparent core dump:
[Error matching program output for memory/qsbr/serial_deferDeletion]

Does this program use a lot of memory or something? The core dump happened during a parallel test run but I'm not reproducing it in 100 trials.

… qthread.h so craycc stops complaining about it

…_test; it was added to profile the average time for privatization

LouisJenkinsCS · 2018-03-13T23:16:20Z

I'm manually running quickstart myself to ensure its fixed. I've been using GASNet + Qthreads local the entire time so my guess is that that issue is specific to not clobbering third-party/qthread and rebuilding it? I'll look into it anyway.

LouisJenkinsCS · 2018-03-13T23:24:20Z

Wait, when you said "I can't reproduce it in 100 trials" you mean its a race condition?

mppf · 2018-03-14T14:04:30Z

Wait, when you said "I can't reproduce it in 100 trials" you mean its a race condition?

I don't know what it is, but it's some sort of intermittent failure. It might be that the test will only fail on a loaded system.

LouisJenkinsCS · 2018-03-14T15:51:19Z

(I'll have to deal with it later, working on paper right now)

mppf · 2018-03-14T20:03:54Z

Passed a gasnet testing run twice.

LouisJenkinsCS · 2018-03-15T01:07:00Z

Passed uGNI test for test/release/example/primers

Comment out unused get_defer_list in chpl-qsbr.c It's unused and that causes compilation errors in some configurations. This is a follow-on to PR #8182. Trivial and not reviewed.

…rivatization" This reverts commit 2adbbb4, reversing changes made to 1b8a7a7.

Revert QSBR PR #8182 This PR reverts the QSBR PR #8182. The QSBR work is valuable but it is not ready yet to be on the master branch. Once the issues noted in this PR are addressed, QSBR can be added back in again. Passed full local testing.

ronawho · 2018-03-17T06:26:13Z

I see that this has been reverted already, but FWIW there were some outstanding issues, and ones that should be addressed prior to remerging this. I don't see an open issue for this, but if you have a list somewhere please add:

We should contribute the qthreads-core changes upstream. There might be a few more schedulers not in the release, and we'd need to come up with some sort of standalone tests for this new feature (and I'd probably want to spend a little time looking at your changes). Note that I can help with contributing upstream

At least to experiment, I would try just using _thread and later we can figure out how to use the portable CHPL_TLS wrappers

…rivatization" This reverts commit 2adbbb4, reversing changes made to 1b8a7a7.

ronawho reviewed Jan 11, 2018

View reviewed changes

LouisJenkinsCS added 7 commits March 13, 2018 15:25

Added prefix to init_tls

91eeb98

Added registration to UGNI and also made it so that qthread will auto…

0435569

…matically register the main thread

Added header for chpl-qsbr to UGNI

e517d02

Removed chpl_task_yield2 in favor of using a TLS counter to determine…

7a66e95

… when to yield

Changed #ifdef to #if to correctly only execute code based on whether…

5329de6

… value is zero or non-zero

Added profiling atomics to conditional preprocessor macro to avoid wa…

752d1bf

…rnings of unused variables

Added new flag CHPL_QSBR_CHECK that performs correctness checks; impl…

50307e7

…emented Michael's suggested changes; made adjustments to runtime's makefile so that it will add the appropriate variables without overwriting any of the others.

LouisJenkinsCS force-pushed the RCU-Privatization branch from 43c3259 to 50307e7 Compare March 13, 2018 19:28

mppf reviewed Mar 13, 2018

View reviewed changes

LouisJenkinsCS added 2 commits March 13, 2018 19:13

Added a quickcheck to pthread/fifo layer, and added newline to end of…

06d8c3f

… qthread.h so craycc stops complaining about it

Added notest to privatizationProfiler so it will not show up in start…

6526e4a

…_test; it was added to profile the average time for privatization

mppf merged commit 2adbbb4 into chapel-lang:master Mar 15, 2018

mppf mentioned this pull request Mar 15, 2018

Comment out unused get_defer_list in chpl-qsbr.c #8834

Merged

mppf added a commit to mppf/chapel that referenced this pull request Mar 16, 2018

Revert "Merge pull request chapel-lang#8182 from LouisJenkinsCS/RCU-P…

9008bd0

…rivatization" This reverts commit 2adbbb4, reversing changes made to 1b8a7a7.

mppf mentioned this pull request Mar 16, 2018

Revert QSBR PR #8182 #8841

Merged

17 tasks

LouisJenkinsCS added a commit to LouisJenkinsCS/chapel that referenced this pull request Mar 16, 2018

Revert "Revert QSBR PR chapel-lang#8182"

7f83d50

LouisJenkinsCS mentioned this pull request Mar 16, 2018

Quiescent-State Based Reclamation - Overhaul [ W.I.P ] #8842

Closed

ben-albrecht pushed a commit to ben-albrecht/chapel that referenced this pull request Mar 23, 2018

Revert "Merge pull request chapel-lang#8182 from LouisJenkinsCS/RCU-P…

12bfe1c

…rivatization" This reverts commit 2adbbb4, reversing changes made to 1b8a7a7.

LouisJenkinsCS mentioned this pull request Jul 13, 2018

Should we make privatization a user-facing feature? #10275

Open

Introducing Quiescent State-Based Reclamation to Chapel #8182

Introducing Quiescent State-Based Reclamation to Chapel #8182

Conversation

LouisJenkinsCS commented Jan 11, 2018 • edited by mppf Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mppf commented Jan 12, 2018

LouisJenkinsCS commented Jan 12, 2018

mppf commented Jan 12, 2018

LouisJenkinsCS commented Jan 12, 2018

mppf commented Jan 12, 2018

LouisJenkinsCS commented Jan 12, 2018

mppf commented Jan 12, 2018

LouisJenkinsCS commented Jan 12, 2018

LouisJenkinsCS commented Jan 12, 2018

ronawho commented Jan 12, 2018

LouisJenkinsCS commented Jan 13, 2018

LouisJenkinsCS commented Jan 13, 2018

ronawho commented Jan 13, 2018 • edited Loading

ronawho commented Jan 13, 2018

LouisJenkinsCS commented Jan 13, 2018

LouisJenkinsCS commented Jan 13, 2018

LouisJenkinsCS commented Jan 13, 2018

ronawho commented Jan 13, 2018

LouisJenkinsCS commented Jan 13, 2018

LouisJenkinsCS commented Jan 13, 2018

LouisJenkinsCS commented Jan 14, 2018 • edited Loading

LouisJenkinsCS commented Jan 15, 2018

ronawho commented Jan 15, 2018

LouisJenkinsCS commented Jan 15, 2018

ronawho commented Jan 15, 2018

LouisJenkinsCS commented Jan 15, 2018

LouisJenkinsCS commented Jan 15, 2018

mppf commented Jan 15, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mppf commented Mar 13, 2018

mppf commented Mar 13, 2018

LouisJenkinsCS commented Mar 13, 2018

LouisJenkinsCS commented Mar 13, 2018

mppf commented Mar 14, 2018

LouisJenkinsCS commented Mar 14, 2018

mppf commented Mar 14, 2018 • edited Loading

LouisJenkinsCS commented Mar 15, 2018

ronawho commented Mar 17, 2018

LouisJenkinsCS commented Jan 11, 2018 •

edited by mppf

Loading

ronawho commented Jan 13, 2018 •

edited

Loading

LouisJenkinsCS commented Jan 14, 2018 •

edited

Loading

mppf commented Mar 14, 2018 •

edited

Loading