-
Notifications
You must be signed in to change notification settings - Fork 715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleaning up the APIs with the major version updated #1763
Comments
Awesome! Thanks for writing this up. So my two cents here is the gRPC being the only API is not ideal. What I would like to see is a library with a public interface that is similar to what we have today, but is also trimmed down to what is needed. Let me try to describe the differences between the RPC and Library API's and why we should have both.
My concern with limiting the API surface to the gRPC server is not only for the developer, but for security and safety. If the only way to manage the server is through gRPC than anyone can modify anything and implementing ACLs on top of BGP sounds like a pain. In my specific case this is not acceptable to run the gRPC server, I intentionally do not use the gRPC server inside this repository. Everything else sounds great! |
I think you are on the right path here, these are great high level goals. The only thing I don't entirely understand is the server refactoring you mention:
What is the motive of those changes? I think there is a lot of value not just to users, but most of all to the GoBGP project to be able to spin up reliable but short lived BGP servers for use in your unit tests. I think once some of the changes in #1331 are made to accept context it would be much easier to test against. You could quickly spin it up for a unit test and then know it was shutdown before the next test began. My thoughts are it would be great if the server package was broken it into smaller sub packages. This would force the various structures to cross package boundaries by public API. I believe that helps prevents things like race conditions and makes it easier to prove code is correct since it's not possible to access private fields. It also gives you a nice perspective on how your users talk to your API as well since you have to eat your own dog food so to speak. Spit balling this to find a balance between effort and value, I would say to break server into:
If you started by just separating the packages at first and only refactored what was needed to keep things running identically as they were before it would give a nice holistic view of what a public API would look like. Some issues would probably surface that would be best solved with context.Context, and having all the packages split up would make it easiest for small incremental changes (which to be clear I would be happy to help with if you want to delegate anything to me feel free to ask). It might not make sense to do this, but it's food for thought. I think some of the challenges of contributions, features and refactoring may stem from the monolithic design of the GRPC / BGP server. Breaking them up may make it easier for people to contribute small measurable changes that are easy to prove correct. |
@jeffbean @cstockton thanks a lot for the comments.
I really want to minimize public APIs (and structures) to easily evolve the code later. For example, currently, BgpServer's public methods use various structures for arguments and returned values. How about making BgpServer's public methods correspond to the gRPC APIs?
You can create a BgpServer object and call these methods directly (without playing grpc). Also we could make these methods handle context properly if necessary. cmd/gobgpd/main.go calls api.RegisterGobgpApiServer with a BgpServer object.
Might make sense but I'm not sure until I see the code. However, I have no intention of exporting such code to external applications; internal/pkg/ instead.
Yeah, that's the plan. I'll try after the next release. |
I see, I think that is fair since it will give you more freedom while you're refactoring everything. One thing to consider designing the internal server API is that anyone who uses the GRPC api will need to test their code. Right now it takes a pretty hefty docker compose file to run unit tests for my GoBGP related code and it's kind of difficult to setup/tear down each unit test since the state is all shared. The test code needs test code to ensure it is correct. So maybe once the private version solidifies a public version could be exported and cmd/gobgp{,d} would vendor that public version. I'm looking forward to seeing how everything evolves it sounds great so far, let me know if you want any help I am in the GoBGP slack as well. Thanks for the work you're doing. |
Finished the following.
No real refactoring yet. I tried to finish this with minimum code change. Let me know now if you have any concerns. I'll push it in a couple of days and start the real work. |
@fujita I some how missed this issues update, but I just looked at the |
@cstockton all the native APIs for golang use structures in the api package so the api breakage should not happen. |
@fujita I try to avoid returning channels with 'watch" api's since it forces you to define a bullet point list of semantics around cancellation and draining the channels. One thing that comes to mind in your examples is I don't see any context, leaving no way for the caller to signal they are finished other than a synchronous call to pm.Close(), making it difficult to stop while blocked in a receive. I imagine part of your design is influenced by the GRPC API you are using? I had a discussion with the GRPC authors about the difficult ergonomics of the streaming API in grpc/grpc-go#2159 - unfortunately I failed to convey the challenges properly. To summarize it is very difficult to ensure message delivery in real world systems where requests can be canceled. In general I think there are two types of streaming API's:
I've come to prefer a contextual iterator ( // GetPaths will block the caller and send the current global rib until the given
// context is done or fn returns false, whichever comes first.
//
// Note: here you don't really need to even describe that if iteration is halted by
// ctx being done that ctx.Err() is returned, it's just assumed. When fn returns false
// it's fair to just return nil, since it was control flow that was the signal.
func (c *Client) EachGlobalRIB(ctx context.Context, fn func(*api.Path) bool) error {} Since GetGlobalRIB is sequential and terminal users will typically call it with their current request or worker scoped context. They don't need to create a second context they For the second case the above doesn't seem to feel as nice, because it may be a very long time between updates and their is no fixed boundary, so halting a search range is never a user requirement. Anytime I ever have used such an API it's always in some type of worker context, in fact that is what I do with GoBGP's peer monitor API. I have a worker that simply stays open as long as the program is running to update the health check system that drives metrics and alerting. For these I believe it's perfectly reasonable to depend on the incoming context to signal completion, since in 99% of cases you're not exiting until the app has received a SIGTERM and the top level context has been canceled anyways. // MonitorPeers will block and send peer changes to the caller until the
// given context becomes done or a connection error occurs.
//
// Note: this function can only return non-nil errors, since it doesn't exit unless
// the connection dies or the ctx becomes done. This is what I expect because
// I can always check if the returned err = the current ctx.Err() if I have a case
// where I explicitly cancel (as long as the ctx.Err() does not get wrapped).
func (c *Client) MonitorPeers(ctx context.Context, fn func(*api.Peer)) error {} I come to prefer regular functions over type healthMonitor struct {
c *client.Client
h *health.Manager
}
func (hm *healthMonitor) peerMonitor(p *api.Peer) error {
if p.BGP.Status == "UP" { hm.h.up() } else { hm.h.down() }
}
func (hm *healthMonitor) Work(ctx context.Context) error {
return hm.c.MonitorPeers(ctx, hm.peerMonitor)
} To summarize the two most important things to me are:
If either of those 2 things do not hold true, generally unpleasant ergonomic are guaranteed to emerge eventually to supplement them. |
osrg#1763 (comment) Follow Chris's proposal; consistent with the rest of the APIs. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
@cstockton Thanks a lot! About the second type of streaming API (i.e. watching), I think that your proposal is much better; consistent with the rest of the APIs and looks more handy. I changed the APIs in the following way:
About the first type of streaming API, you are talking about replacing the current ListSomething APIs such as?: Currently, the APIs return all in one shot but If you prefer the streaming API, I'll change all of them. |
@fujita Glad to help, the changes look great. For the Small nit: if you decide to leave the list functions, it may be worth considering deferring the slice backing allocation until you have the size: func (s *BgpServer) ListPolicy(ctx context.Context, r *api.ListPolicyRequest) ([]*api.Policy, error) {
var l []*api.Policy
s.mgmtOperation(func() error {
v := s.policy.GetPolicy(r.Name)
l = make([]*api.Policy, len(v))
copy(l, v)
return nil
}, false)
return l, nil
} |
osrg#1763 Follow Chris's proposal Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
osrg#1763 (comment) Follow Chris's proposal; more consistent with gRPC streaming API. Also supports context properly. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
@cstockton I feel that supporting context instead of ignoring it and being consistent with the gRPC streaming API is nice rather than returning everything in one shot. Thanks for pointing out the slice allocation issue. I put some fixes in the above PR but there are still lots, I suppose. After the next major release, I try to improve those internal stuff. |
API wise it looks fantastic @fujita, I'm impressed at the great work you've done over the past several months. Thanks for being so receptive to feedback from your users, we all appreciate it. |
If you just use gobgp and gobgpd binaries, you can stop reading here. The next release (early July) would be the last release of 1.X version. However, 2.X (likely early September) should work as before (e.g. the configuration file).
This is the summary of the possible changes to the APIs
Removing the binary type in gRPC
As pointed at #1457, for example, AddPath gRPC API uses the bgp-on-wire binary format for path attributes. Non go languages impossibly use this API because of this (you need to parse the binary yourself). Protobuf structures for all path attributes and bgp capabilities are used with the new gRPC APIs.
Clarifying the internal and public libraries
For example, I think that table/ code is for only internal usage but clearly there are some users who use it for their own applications. To make clear what libraries can be safely used by external applications, I use the golang feature with the directory layout change.
I move code for internal usage to internal/pkg/; e.g. table, config, zebra, etc.
I move code for external usage to pkg; e.g. server/ and probably packet/.
api/ has protobuf structures and functions with minimum helper functions that introduce no external package dependency). I’ll try to apply a rule that api/ must not import packages within the gobgp repository, as proposed at #1721
Also I try to follow the widely used directory layout (e.g. move gobgp and gobgpd to /cmd).
https://github.com/golang-standards/project-layout
Making the gRPC APIs only public APIs for simplicity
server/BgpServer struct has methods corresponding to the GRPC APIs. I’ll delete them; you access to BgpServer via the gRPC APIs. Probably, the struct has only two public methods like Start() and Stop(). If someone is interested in having something like gobgp-sdk-go at gitHub/osrg, please let me know.
Miscellaneous
There should be some minor changes to gRPC APIs like #1436. GetROA() should use stream interface (because the response could be huge).
I tend to switch to gogo/protobuf.
I’m sure that I miss something.
The text was updated successfully, but these errors were encountered: