Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hidden implication cache gets corrupted, breaks method reordering #3058

Closed
fingolfin opened this issue Nov 27, 2018 · 6 comments
Closed

Hidden implication cache gets corrupted, breaks method reordering #3058

fingolfin opened this issue Nov 27, 2018 · 6 comments
Labels
kind: bug Issues describing general bugs, and PRs fixing them
Milestone

Comments

@fingolfin
Copy link
Member

The method reordering code we merged for GAP 4.11 caused all kinds of new test failures (see issue #2818 and PR #2835 among others), most of them caused by actual bugs that slipped through our radar for years. One of the last remaining ones looks like this in the test logs, after using LoadAllPackages():

# Input is:
IsSolvable(DihedralGroup(IsFpGroup,24));
# Expected output:
true
# But found:
Error, Record Element: '<rec>.absolutelyIrreducible' must have an assigned value

Note that this issue does not always occur! E.g. for PR #2835, it appears in the 32bit build, but not the 64 bit build. On my computer, in 64bit builds, I can sometimes reproduce it and sometimes not. I have not yet determined a pattern telling me when and why, though :-/.

Anyway, after managing to successfully reproduce it often enough, further analysis showed that this is already triggered by DihedralGroup(IsFpGroup,24), which in turn leads to IrreducibleModules being invoked. For that operation, there are two different methods, and the wrong one was called... And that despite the correct one actually having higher rank.

Further inspection revealed that the two methods were ordered incorrectly. Indeed, one can fix the problem by asking GAP to reorder methods again after LoadAllPackages() (which also reorders methods):

gap> LoadAllPackages();
...
gap> PRINT_REORDERED_METHODS:=true;
true
gap> RECALCULATE_ALL_METHOD_RANKS();
IsFinite 1 args. Moving method 11 (IsFinite: pcgs computable groups from GAPROOT/lib/grppcatr.gi:225) to position 10
Size 1 args. Moving method 49 (Size: pcgs computable groups from GAPROOT/lib/grppcatr.gi:238) to position 32
IsNilpotentGroup 1 args. Moving method 12 (IsNilpotentGroup: method for pc groups from GAPROOT/lib/grppcprp.gi:14) to position 11
IsSupersolvableGroup 1 args. Moving method 4 (IsSupersolvableGroup: method for pc groups from GAPROOT/lib/grppcprp.gi:25) to position 3
ChiefSeriesUnderAction 2 args. Moving method 2 (ChiefSeriesUnderAction: method for a pcgs computable group from GAPROOT/lib/grppc.gi:2162) to position 1
CoreOp 2 args. Moving method 5 (CoreOp: pcgs computable groups from GAPROOT/lib/grppc.gi:1010) to position 2
IsomorphismFpGroup 2 args. Moving method 8 (IsomorphismFpGroup: pc groups from GAPROOT/lib/grppcfp.gi:97) to position 7
StabilizerFuncOp 5 args. Moving method 6 (StabilizerFuncOp: G (solv.), pnt, gens, gens, act from GAPROOT/lib/oprtpcgs.gi:488) to position 5
StabilizerFuncOp 5 args. Moving method 8 (StabilizerFuncOp: G (solv.), pnt, gens, gens, act from GAPROOT/lib/oprtpcgs.gi:458) to position 7
StabilizerFuncOp 6 args. Moving method 6 (StabilizerFuncOp: G (solv.), D,pnt, gens, gens, act from GAPROOT/lib/oprtpcgs.gi:498) to position 5
StabilizerFuncOp 6 args. Moving method 8 (StabilizerFuncOp: G (solv.), D,pnt, gens, gens, act from GAPROOT/lib/oprtpcgs.gi:473) to position 7
AbsolutIrreducibleModules 3 args. Moving method 2 (AbsolutIrreducibleModules: generic method for groups with pcgs from GAPROOT/lib/grppcrep.gi:586) to position 1
IrreducibleModules 3 args. Moving method 2 (IrreducibleModules: generic method for groups with pcgs from GAPROOT/lib/grppcrep.gi:604) to position 1
gap>

Going further, I determined that the method reordering code is fine; the problem seems to be in the handling of hidden implication, esp. the WITH_HIDDEN_IMPS_FLAGS_CACHE. To get to that conclusion, I first applied this patch to GAP as a kind of "breakpoint" whenever it sorted those methods:

diff --git a/lib/oper.g b/lib/oper.g
index 481e6165a..9804b8501 100644
--- a/lib/oper.g
+++ b/lib/oper.g
@@ -1993,6 +1993,7 @@ BIND_GLOBAL( "RECALCULATE_ALL_METHOD_RANKS", function()
             meths := METHODS_OPERATION(oper, n);
             entrysize := BASE_SIZE_METHODS_OPER_ENTRY+n;
             nmethods := LENGTH(meths)/entrysize;
+if NAME_FUNC(oper) = "IrreducibleModules" and n = 3 then Error("breakpoint"); fi;
             for i in [1..nmethods] do
                 base := (i-1)*entrysize;
                 # data for this method is meths{[base+1..base+entrysize]}

Now when starting GAP, and when loading package, one is thrown onto a break loop:

$ ./gap
Error, breakpoint at GAPROOT/lib/oper.g:1999 called from
RECALCULATE_ALL_METHOD_RANKS(  ); at GAPROOT/lib/filter.g:236 called from
<function "ResumeMethodReordering">( <arguments> )
 called from read-eval loop at GAPROOT/lib/init.g:981
you can 'quit;' to quit to outer loop, or
you can 'return;' to continue
brk> List(MethodsOperation(IrreducibleModules,3), x -> [x.info,List(x.argFilt, RankFilter)]);
[ [ "IrreducibleModules: generic method for groups with pcgs", [ 40, 61, 18 ] ],
  [ "IrreducibleModules: generic method for groups and finite field", [ 35, 59, 18 ] ] ]
brk> # note that the '40' is the rank of CanEasilyComputePcgs:
brk> RankFilter(CanEasilyComputePcgs);
40
brk> return;
gap> LoadAllPackages();
...
Error, breakpoint at GAPROOT/lib/oper.g:1999 called from
RECALCULATE_ALL_METHOD_RANKS(  ); at GAPROOT/lib/filter.g:236 called from
<function "ResumeMethodReordering">( <arguments> )
 called from read-eval loop at GAPROOT/lib/init.g:981
you can 'quit;' to quit to outer loop, or
you can 'return;' to continue
brk> List(MethodsOperation(IrreducibleModules,3), x -> [x.info,List(x.argFilt, RankFilter)]);
[ [ "IrreducibleModules: generic method for groups with pcgs", [ 40, 73, 18 ] ],
  [ "IrreducibleModules: generic method for groups and finite field", [ 43, 71, 18 ] ] ]
brk> # notice how the above does *NOT* match the rank of CanEasilyComputePcgs anymore
brk> RankFilter(CanEasilyComputePcgs);
50
brk> # force clear the hidden implication cache
brk> CLEAR_HIDDEN_IMP_CACHE(IsObject);
brk> # now the filter rank of the first argument of the first method changed from 40 (incorrect) to 50:
brk> List(MethodsOperation(IrreducibleModules,3), x -> [x.info,List(x.argFilt, RankFilter)]);
[ [ "IrreducibleModules: generic method for groups with pcgs", [ 50, 73, 18 ] ],
  [ "IrreducibleModules: generic method for groups and finite field", [ 43, 71, 18 ] ] ]

A simple fix is to change CLEAR_HIDDEN_IMP_CACHE to always clear the whole cache. That resolves the issue in my tests. But I have not yet really understood why this is -- and that concerns me, I don't like fixes that I don't fully understand.

One thing I did notice is that strictly speaking, CLEAR_HIDDEN_IMP_CACHE right now is not quite correct for a hash table with open addressing: By replacing entries with NULL, it can break the "chains" used by open addressing. But that should only lead to entries getting "lost" in the hash table, not to bad values getting inserted. So I don't quite see how this should cause the issue at hand.

Anybody got any ideas? In particular people familia with this code, such as @frankluebeck , @ThomasBreuer , @stevelinton ?

@stevelinton
Copy link
Contributor

Justa very quick thought based on your last paragraph. If you break chains and then insert new entries, you can restore the chains in ways you may not expect or want.

@fingolfin
Copy link
Member Author

Yes, and I even experimented with a patch where instead of zeroing "deleted" entries, I replaced the key with a dummy value (Fail), and adjusted the other code suitably. Didn't help, but even if it did, I'd still be at a loss as to why: because CLEAR_HIDDEN_IMP_CACHE(filterWIthNewImplication) does not walk chains, it just go through every entry in the hash table, linearly, and removes any that might be invalid due to filterWIthNewImplication, i.e., whose "value" contains filterWIthNewImplication.

So breaking chains at worst should cause some entries to appear twice in the table, but only valid ones.

So the other idea is that perhaps we call CLEAR_HIDDEN_IMP_CACHE not in all cases were we really should be calling it. But I found no evidence for that; I even experimented with calling it in all kinds of places, to no effect. At this point, I think I might be misunderstanding something about how it works... Anyway, I thought it best to let more people look at it, esp. since I am distracted with some other things right now :)

@fingolfin fingolfin added the kind: bug Issues describing general bugs, and PRs fixing them label Dec 5, 2018
@fingolfin fingolfin added this to the GAP 4.11 milestone Dec 5, 2018
@fingolfin
Copy link
Member Author

This issue is IMHO a blocker for GAP 4.11. We must either resolve it, or remove / disable the method reordering.

@wilfwilson
Copy link
Member

I agree. Unfortunately I don't think I have the necessary skills to help to resolve the problem.

@stevelinton
Copy link
Contributor

@fingolfin Do you have instructions to reproduce this? For me with current master and a fairly recent set of packages I can't reproduce it.

@fingolfin
Copy link
Member Author

I wasn't able to reproduce this again recently, so perhaps it was a fluke?!? I am closing this for now, and will reopen should I manage to trigger it again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: bug Issues describing general bugs, and PRs fixing them
Projects
None yet
Development

No branches or pull requests

3 participants