-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recent changes of ZAP broke czmq zauth #2762
Comments
Thanks for the analysis - I'll have a look shortly |
I don't know czmq, and I cannot see from this information which test case failed exactly. |
It's a bit tricky, as technically it is a public API change :-/ |
Every bug fix is a public API change. Fixing a bug breaks code depending on the presence of the bug. A different option to fix this, would be to use some "dummy" ZAP domain by default if any of the CURVE keys is set, which must be explicitly cleared if one wishes to disable ZAP authentication. Maybe this should not be fixed in a 4.2.3, but I think it should be fixed in a 4.3.0. The current master contains many changes that are beyond bug fixes, which IMO should not be released as a 4.2.3 at all. While they may not break existing tests, they change behaviour that is not tested explicitly. I would advise to keep changes in any PATCH version as minimal as possible. |
So this is what the failing test does:
The helper function binds and connects the server and client, and tries to bounce messages.
That's not always the case though - most of the times a bug is just a bug, ie: something does not work. Here the problem is that there are a lot of examples where the previous, wrong, behaviour was used, and that users could have copied. We ourselves often suggest users to go and look at the self tests to see how to use the API. So I think your suggestion of initialising the domain to a default one is be a pretty good compromise. This way we fix the broken implementation of the protocol, but also we avoid breaking existing applications - if they didn't set the domain before, then it won't matter if it's going to be called something like "zap_default" or something like that.
Yes if there are backward compatible changes in behaviour (ie: new APIs move out of draft state) then we'll bump the minor. |
Ok, I admit it is rather unlikely that someone depends on the presence of a crash for example ;) Still, it is a change in visible behavior. Besides the fact that the ZAP protocol was violated, the behaviour of the different mechanisms was inconsistent (which may also be viewed as a bug). This inconsistency was deliberately resolved by #2752. IIRC, before #2752
Now, after #2752
If we use a default ZAP domain for CURVE and PLAIN, this will change to
I am not completely convinced that this is really a good idea, as it will partly defeat the attempt to make behaviour consistent across mechanisms. In #2711, I suggested at some point to introduce an additional socket option that is solely used to activate/deactivate ZAP handling. For 4.2.x, the default could be to resort to the "legacy" behaviour, but for 4.3.x I would rather suggest that the default is that ZAP handling is enabled, but a ZAP domain must be set explicitly. I admit that this is a kind of two "second-order bugs":
|
keep it broken- or dummies like me [zyre+curve] won't fix it the right way until it breaks again.. :) course- defaulting to "global" or something is what i'm about to PR to zyre, so maybe that's not such a bad idea either [less code up high]. speaking from "i'm learning ZAP" this feels to me as a "you'll know when you need domains when you need them and until then you're just gonna use a global default anyway..." |
also- the way it's re-fixed today- teh czmq tests fail, but in other code i've been testing with (up in python land), the only reason i noticed this, is because i wasn't seeing the zauth messages i thought i would have (all of a sudden, they were there ~13 days ago). so this has the potential to fail silently if you implemented it improperly too... spent a few hours scratching my head on that one, finally rebuilt to see the czmq build failures with make check... |
This is the required diff for CZMQ: Usage of PLAIN is the worst offender - the domain was basically never set. There is also one incompatibility for CURVE - before with the ZAP handler available but with no domain set, the connection was rejected, but now it is allowed. So perhaps a new option as you suggested to keep legacy behaviour as expected for now would be a good solution instead, with a big, fat warning in the NEWS to let users know the default is going to change. A big problem I see is that there not only applications, but a million and a half bindings as well. Many are well maintained, but some are not, and breaking backward compatiblity means making the latter group useless. Then when we tag v5.0.0 we can break backward compatibility in this regard and switch the defaults around. What do you guys think? |
i agree. breaking something at a patch level seems... awkward at best [may make things worse if zauth just starts ignoring stuff]. i'm poking through zyre to at-least make sure the defaults are set [and configurable]. we may wanna clean up the czmq zauth text a bit: https://github.com/zeromq/czmq/blob/master/include/zauth.h#L54 that may have been where my brain suggested "i guess you don't need to set the domain" [and then it worked OK and ...]. even though the tests had set the domain a few times.. (may help bindings users with adoption if it's consistent imo). i guess this might answer the social contract question- which is more important, the API or the RFC.. [and i think the 'API' is probably the right answer, the RFC is something we aspire to be..]. :) |
@bluca here are the ones' i'm testing if you wanna send your PR first (no hurry, once you get the feedback you feel comfortable with) https://github.com/zeromq/czmq/compare/master...wesyoung:fix/zap-gossip?expand=1 they initially appear to work OK, i did a little more re-factoring now that i'm wrapping my head around the gsl generated bits, so they're a little bigger- but figure you can fix the tests and i'll add this.. which should pass OK once we're settled on a direction in this issue. also- i think i know how to apply the whole CURVE + ZAP thing to the zproto/gsl layer so the defaults are laid properly to new protocols moving forward. |
@bluca sounds acceptable to me. I might add that option, but I fear it will at least until end of next week until I can work on that again. so if anyone else might implement that, don't hesitate ;) I will be happy to give it a look in terms of a review :) |
Solution: add ZMQ_ZAP_DOMAIN_REQUIRED to hide backward incompatible change and make it disabled by default. In a future release that breaks API compatibility we can then switch the default to enabled in order to achieve full RFC compatibility. Fixes zeromq#2762
@sigiesec this is my proposal: bluca@0800772 I'll send a PR in a while once the CI is clean. I've tested it with both CZMQ and Malamute, both before and after fixing them to set the domain, and it seems to work in all cases. |
I gave your proposal a quick look. It's somewhat different from what I had in mind. How would you evolve this for a future 5.0 version? Remove it? Change the default to 1 and disallow setting any other value? I had a ZMQ_ENABLE_ZAP option in mind that defaults to -1 now, which means legacy behaviour. Other values are 0=disable and 1=enable. In 5.0.0, the -1 value would be removed, and the default change to either 1 or 0. This would also allow to change the default to something else already now via a compile-time option. |
Yes the idea is that in 5.0.0 the default would change to true, and in 6.0.0 we then can remove it completely. I'm not sure about ZMQ_ENABLE_ZAP - the issue I see is that it could be misleading, as ZAP can already be enabled, just not with 100% spec compliance. |
Honestly, I think both may be confusing in different ways but to a similar degree. |
|
I'll open a PR with ZMQ_ZAP_ENFORCE_DOMAIN and then we can iterate from that after we get feedback, this way we can move on, ok? |
Solution: add ZMQ_ZAP_ENFORCE_DOMAIN to hide backward incompatible change and make it disabled by default. In a future release that breaks API compatibility we can then switch the default to enabled in order to achieve full RFC compatibility. Fixes zeromq#2762
Solution: add ZMQ_ZAP_ENFORCE_DOMAIN to hide backward incompatible change and make it disabled by default. In a future release that breaks API compatibility we can then switch the default to enabled in order to achieve full RFC compatibility. Fixes zeromq#2762
@bluca Sorry it took some time. Actually, I do not like this, but apparently it is too late. In #2711, we (or at least I ;) ) came to the conclusion that we can save the effort/complexity to add a new socket option to make controlling the activation/use/... of ZAP explicit, but that controlling it explicitly would be more desirable. Now a new socket option is introduced, so we have added complexity, but still the control is not explicit. |
Sorry about that - but as they say "perfect" is the enemy of "done" and given backward compatibility was affected I wanted to fix it sooner rather than later :-) |
@bluca Never mind :) I just opened a new issue so that we/I do not forget about this. |
make check on czmq fails
I run the bisect on libzmq code and the result is
After reverting that one, czmq tests passed again. However it's not possible to do so on master as there is a queue of identity related changes. cc @bluca, @sigiesec please take a look on a problem, or I'll revert the changes. We'll have workshop this weekend, so need working libzmq master :)
I run bisect with following script
The text was updated successfully, but these errors were encountered: