-
Notifications
You must be signed in to change notification settings - Fork 715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Breaking changes to table.Table without any available work around and massive allocation increase #1721
Comments
First, sorry about the breakage.
I think that only the gRPC API are guaranteed to be stable. A huge mistake I made is not using the structures generated by protobuf (the structures is a port of the gRPC API) to implement gobgp (client). Instead we convert these structures into internal structures (table package) and use them. That leads users to a wrong assumption that these internal structures are stable. I plan more changes to Path, e.g., deleting nlri member from it for reducing footprints. I think that there are two options:
|
Thanks for the clarification around API stability, this will help me pick a new path forward for now. But you did not address my concerns about the changes to the Table. I think it's possible it may just be a bug since your unit tests don't check the string format- only the length? If the change is intentional I strongly oppose it for reasons I've covered and have better illustrated in the code below. I know opposing a change without something in it's place is not helpful, so I provided a potential alternative under the premise your change to the key format was for performance reasons. Measuring the drawbacks I've listed for your change against the simplicity and overal benefits of implementing my alternative (zero allocation reads, everywhere that calls nlri.String() benefits, simple to implement, no API changes) I feel it's hard to justify your change. If my premise for the change is wrong, my apologies- if you wouldn't mind explaining in more detail to help me better understand that would be great. To be clear here what I am challenging is the change to map keys in
I would be happy to implement and measure a change that allows consistent map keys in Table and revert the more restrictive package main
import (
"fmt"
"net"
"testing"
"unicode/utf8"
"github.com/osrg/gobgp/packet/bgp"
"github.com/osrg/gobgp/table"
)
func inc(ip net.IP) net.IP {
for j := len(ip) - 1; j >= 0; j-- {
ip[j]++
if ip[j] > 0 {
break
}
}
return ip
}
func recov(fn func()) (err error) {
defer func() {
if r := recover(); r != nil {
err = fmt.Errorf("%v", r)
}
}()
fn()
return
}
func main() {
// My main opposition is the inconsistent key format, I feel it's going to
// lead to regressions. Two identical prefixes are stored in two separate
// formats:
//
// // tbl.lookupKey(T)
// key(0a0a0a0018)
// - when T is "10.10.10.10/24" of *IPAddrPrefix
// key(31302e31302e31302e31302f3234)
// - when T is "10.10.10.10/24" of *LabeledIPAddrPrefix
//
{
a1 := bgp.NewIPAddrPrefix(24, "10.10.10.10")
d1 := table.NewDestination(a1, 0)
d1Str := d1.GetNlri().String()
t1 := table.NewTable(bgp.RF_IPv4_UC, d1)
m1 := t1.GetDestinations()
a2 := bgp.NewLabeledIPAddrPrefix(24, "10.10.10.10", bgp.MPLSLabelStack{
Labels: []uint32{0x32, 0x64},
})
d2 := table.NewDestination(a2, 0)
d2Str := d1.GetNlri().String()
t2 := table.NewTable(bgp.RF_IPv4_MPLS, d2)
m2 := t2.GetDestinations()
fmt.Println("Both nlri return the same String():",
d1Str == d2Str) // true
fmt.Println("However the RF_IPv4_UC and RF_IPv4_MPLS tables have different keys:")
for k, v := range m1 {
fmt.Printf(" m1 key(%x) val(%v)\n", k, v)
}
for k, v := range m2 {
fmt.Printf(" m2 key(%x) val(%v)\n", k, v)
}
}
// it can also produce invalid unicode string sequences:
{
dest := table.NewDestination(bgp.NewIPAddrPrefix(32, "0.0.0.128"), 0)
tbl := table.NewTable(bgp.RF_IPv4_UC, dest)
for k := range tbl.GetDestinations() {
if !utf8.ValidString(k) {
fmt.Println("invalid unicode!")
}
}
}
// The change causes the potential for panics where none existed:
{
fmt.Println("panic:", recov(func() {
a1 := bgp.NewIPAddrPrefix(24, "10.10.10.10")
d1 := table.NewDestination(a1, 0)
a2 := bgp.NewLabeledIPAddrPrefix(24, "10.10.10.10", bgp.MPLSLabelStack{
Labels: []uint32{0x32, 0x64},
})
d2 := table.NewDestination(a2, 0)
tbl := table.NewTable(bgp.RF_IPv4_UC, d1, d2)
tbl.GetDestinations()
// interface conversion: bgp.AddrPrefixInterface is \
// *bgp.LabeledIPAddrPrefix, not *bgp.IPAddrPrefix
}))
}
// While being subtle:
{
a1 := bgp.NewIPAddrPrefix(24, "10.10.10.10")
d1 := table.NewDestination(a1, 0)
tbl := table.NewTable(bgp.RF_IPv4_MPLS, d1)
tbl.GetDestinations()
// The above won't panic.
fmt.Println("The above won't panic until a lookup is performed")
}
// The change prevents cross referencing table types:
{
fmt.Println("panic:", recov(func() {
a1 := bgp.NewIPAddrPrefix(24, "10.10.10.10")
d1 := table.NewDestination(a1, 0)
a2 := bgp.NewLabeledIPAddrPrefix(24, "10.10.10.10", bgp.MPLSLabelStack{
Labels: []uint32{0x32, 0x64},
})
tbl := table.NewTable(bgp.RF_IPv4_MPLS, d1)
tbl.GetDestination(a2)
// interface conversion: bgp.AddrPrefixInterface is \
// *bgp.LabeledIPAddrPrefix, not *bgp.IPAddrPrefix
}))
}
// I need a temporary nlri JUST to do a map lookup to the tune of N allocs
// for even a basic prefix type:
{
a2 := bgp.NewLabeledIPAddrPrefix(24, "10.10.10.10", bgp.MPLSLabelStack{
Labels: []uint32{0x32, 0x64},
})
d2 := table.NewDestination(a2, 0)
tbl := table.NewTable(bgp.RF_IPv4_MPLS, d2)
// If I wanted to use the GoBGP Table object (the api pkg returns this type
// in several calls like GetRib) I will have to store the NLRIs along with
// the prefix string in my app to do lookups.
a1 := bgp.NewIPAddrPrefix(24, "10.10.10.10")
// This sounds okay until I realize I will need to store a separate table
// lookup key for each route family. Unreasonable. Only other option is to
// construct a temporary lookup addr:
allocs := testing.AllocsPerRun(1000, func() {
rf := tbl.GetRoutefamily()
afi, safi := bgp.RouteFamilyToAfiSafi(rf)
pfx := a1.String()
tmpLookup, err := bgp.NewPrefixFromRouteFamily(afi, safi, pfx)
if err != nil {
panic(err)
}
tbl.GetDestination(tmpLookup)
})
// 12 allocations ... for a single map lookup using the new API.
fmt.Println(allocs)
}
} |
The consistent key format is nice but the memory usage with the full routes is a real issue that I must fix (note that not the number of allocations). Also I don't assume that creating a table with bgp.RF_IPv4_MPLS and inserting ipv4 destination to it. As I wrote yesterday, I think that table/*.go should not be public. |
I think making an internal I do thank you for telling me the rationale of the change was memory reduction. I decided to profile cc42a31 - Showing nodes accounting for 162.04MB, 90.96% of 178.15MB total
dc9fe2b - Showing nodes accounting for 142.04MB, 95.26% of 149.11MB total (next commit, API break)
cstockton - Showing nodes accounting for 54.09MB, 99.08% of 54.59MB total (cc42a31 + my diff) Below are the raw notes I took while auditing memory usage in GoBGP. The main reason for allocations is the constant transient type conversions, for example Path ToLocal(): func (p *Path) ToLocal() *Path {
...
n := nlri.(*bgp.LabeledVPNIPAddrPrefix)
_, c, _ := net.ParseCIDR(n.IPPrefix())
ones, _ := c.Mask.Size()
... We see it call the IPPrefix() below- which converts to a prefix using fmt.Sprintf: func (l *LabeledVPNIPAddrPrefix) IPPrefix() string {
masklen := l.IPAddrPrefixDefault.Length - uint8(8*(l.Labels.Len()+l.RD.Len()))
return fmt.Sprintf("%s/%d", l.IPAddrPrefixDefault.Prefix, masklen)
} Which then the caller immediately converts the IP parsed f7rom net.ParseCIDR to func (p *Path) ToLocal() *Path {
...
ones, _ := c.Mask.Size()
nlri = bgp.NewIPAddrPrefix(uint8(ones), c.IP.String())
nlri.SetPathLocalIdentifier(pathId) Now once in the NewIPAddrPrefix call frame the callers string is immediately converted to a another net.IP using net.ParseIP(prefix): func NewIPAddrPrefix(length uint8, prefix string) *IPAddrPrefix {
p := &IPAddrPrefix{
IPAddrPrefixDefault{
Length: length,
},
4,
}
p.IPAddrPrefixDefault.decodePrefix(net.ParseIP(prefix).To4(), length, 4)
return p
} Before returning to the caller there will be one more allocation of addrlen which seems like an attempt to avoid storing the 16 bytes allocated by net.IP implementation. However in most cases I've seen (and the profiling reflects this) the original net.IP is being retained in another structure so it's not saving allocations, but increasing them: func (r *IPAddrPrefixDefault) decodePrefix(data []byte, bitlen uint8, addrlen uint8) error {
bytelen := (int(bitlen) + 7) / 8
if len(data) < bytelen {
eCode := uint8(BGP_ERROR_UPDATE_MESSAGE_ERROR)
eSubCode := uint8(BGP_ERROR_SUB_MALFORMED_ATTRIBUTE_LIST)
return NewMessageError(eCode, eSubCode, nil, "network bytes is short")
}
b := make([]byte, addrlen)
copy(b, data[:bytelen])
// clear trailing bits in the last byte. rfc doesn't require
// this but some bgp implementations need this...
rem := bitlen % 8
if rem != 0 {
mask := 0xff00 >> rem
lastByte := b[bytelen-1] & byte(mask)
b[bytelen-1] = lastByte
}
r.Prefix = b
return nil
} Many of these transient allocations used to cross package boundaries are released upon the next GC so may seem unimportant when the goal is memory footprint and any number of short-lived allocations are acceptable in the name of that goal. However these operations have a heavy hidden cost in how they have influenced the design which indirectly I believe is the root cause of the memory consumption when you consider there is no clarity around immutability. Each caller may or may not read or write to fields directly and choose to copy or reuse slice backings. Consider the following design choice: every object is immutable and copy-on-write. This would greatly reduce the memory footprint of GoBGP, because much of it comes from the fact that duplicate objects that are otherwise immutable are constantly being stored, for example originator id: func NewPathAttributeOriginatorId(value string) *PathAttributeOriginatorId {
t := BGP_ATTR_TYPE_ORIGINATOR_ID
return &PathAttributeOriginatorId{
PathAttribute: PathAttribute{
Flags: PathAttrFlags[t],
Type: t,
Length: 4,
},
Value: net.ParseIP(value).To4(),
}
} Here you parse the value that will be the same for any paths from this originator, each one stores a 4 byte copy of the net.IP or 16 bytes for ipv6 which adds up very fast. This is same pattern is followed for ALL path attributes that share common values, Community attributes are duplicated for every path in each adj-rib, Aggregator, ClusterLists, even smaller structures like PathAttributeAigp start to add up when you consider that thousands of paths may share the same metric, there we have 4 bytes in PathAttribute, another 3 words in To test my theory I made a simple set of changes, first I added *FromIP factories to various GoBGP functions to reuse the same IP address: NewPathAttributeAggregatorFromIP(as interface{}, ip net.IP) *PathAttributeAggregator
NewPathAttributeOriginatorIdFromIP(ip net.IP) *PathAttributeOriginatorId
NewPathAttributeClusterListFromIPS(value []net.IP) *PathAttributeClusterList
NewPathAttributeAs4AggregatorFromIP(as uint32, ip net.IP) *PathAttributeAs4Aggregator
// Continued for many other packet types Another big area of allocations was slice backings that were over-allocated, in many places append is used on slice backings that may double the capacity. For example the The IP reuse and more precise slice backings yielded a 3x memory improvement: cc42a31 - Showing nodes accounting for 162.04MB, 90.96% of 178.15MB total
dc9fe2b - Showing nodes accounting for 142.04MB, 95.26% of 149.11MB total (next commit, API break)
cstockton - Showing nodes accounting for 54.09MB, 99.08% of 54.59MB total (cc42a31 + my diff) This did cause a small performance hit of about 15-20% for startup- this is due to a few locations calling @@ -8896,10 +8906,10 @@ func GetPathAttribute(data []byte) (PathAttributeInterface, error) {
type BGPUpdate struct {
WithdrawnRoutesLen uint16
- WithdrawnRoutes []*IPAddrPrefix
+ WithdrawnRoutes []AddrPrefixInterface
TotalPathAttributeLen uint16
PathAttributes []PathAttributeInterface
- NLRI []*IPAddrPrefix
+ NLRI []AddrPrefixInterface
} And the reused the slice backings (including attrs) in the @@ -1522,12 +1522,12 @@ func TestProcessBGPUpdate_bestpath_lost_ipv4(t *testing.T) {
pathAttributes1 := []bgp.PathAttributeInterface{
origin1, aspath1, nexthop1, med1, localpref1,
}
- nlri1 := []*bgp.IPAddrPrefix{bgp.NewIPAddrPrefix(24, "10.10.10.0")}
+ nlri1 := []bgp.AddrPrefixInterface{bgp.NewIPAddrPrefix(24, "10.10.10.0")}
bgpMessage1 := bgp.NewBGPUpdateMessage(nil, pathAttributes1, nlri1)
// path 1 withdraw
w1 := bgp.NewIPAddrPrefix(24, "10.10.10.0")
- w := []*bgp.IPAddrPrefix{w1}
+ w := []bgp.AddrPrefixInterface{w1}
bgpMessage1_w := bgp.NewBGPUpdateMessage(w, nil, nil)
peer1 := peerR1() Which adds a lot of noise to an already pretty big diff: .../osrg/gobgp$ git diff --stat
packet/bgp/bgp.go | 1227 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------------
packet/bgp/helper.go | 4 +-
packet/bgp/validate_test.go | 2 +-
table/destination.go | 15 +-
table/destination_test.go | 20 +-
table/message.go | 14 +-
table/message_test.go | 22 +-
table/path.go | 10 +-
table/path_test.go | 20 +-
table/policy_test.go | 90 +++----
table/table_manager.go | 68 ++---
table/table_manager_test.go | 54 ++--
table/table_test.go | 10 +-
13 files changed, 1172 insertions(+), 384 deletions(-)
.../osrg/gobgp$ git diff |wc -l
3407 I understand you may be hesitant to accept such a big pull request, I've had difficulty finding the right process for contributions in the past so I decided to summarize my findings and give you the option to implement any of them in your own way if you find it suitable to do so. You may close this issue out I'll be avoiding using anything outside the |
If the api.Path gives structures for path attributes, you don't need something like ToNativePath()? |
As a general guide, if we are concerned on things relating to performance, it is usually shown through Benchmark Tests so the contributors and community can see the reasoning and guidelines behind changes. This change in particular is a strange one that has no benchmark tests to show the results. I would love to see before and after for both the original change and this proposed solution. |
@fujita I think it would be nice to not have to do any serialization on paths, right now the grpc api feels more like a "shim" between grpc server and the private libraries. I think the only way to be certain the GRPC api is a truly portable and independent interface between the GoBGP server and your users it satisfy a single rule: @jeffbean I did post the memory consumption before & after for my changes with a detailed analysis on how I arrived to them, though it could easily be missed in my wall of text. Which is why I didn't include the full benchmark output like I did in #1741. The benchmark tests were not tests- but an end-to-end approach using http/pprof because it seemed to be the method the GoBGP authors use. Most of the test setup was just arduous Makefile and docker-compose generation in a python for loop, I could post it if you are interested though. cc42a31 - Showing nodes accounting for 162.04MB, 90.96% of 178.15MB total
dc9fe2b - Showing nodes accounting for 142.04MB, 95.26% of 149.11MB total (next commit, API break)
cstockton - Showing nodes accounting for 54.09MB, 99.08% of 54.59MB total (cc42a31 + my diff) Closing this issue, no more actionable items remaining. Replaced my previous usage of
Thanks for investigating this issue! |
Fine by me. I guess that with that rule, the main issue is how to make structures in api/ handy for golang users, for example, nlri and path attributes.
|
Once I finish the api stuff, I'll take closer look at the memory analysis. About trie stuff, IICR, the algorithm used in Linux networking stack is patented. |
I think you could still define a common interface in the api package, just by making sure all attributes implemented a common interface. You could have an interface like type Attributer interface {
AttrType() AttrType // akin to bgp.BGPAttrType ?
}
func (m *OriginAttribute) AttrType { return OriginAttr } // akin to BGPOriginAttr constant Or you could try something a little new- since we only need a way to enforce type safety and the concrete type carries the attribute type maybe something like this would work out: type Prefixer interface {
Prefix() Prefixer // represents NLRI - various IPAddr interfaces?
}
// Maybe path & destinations defines Prefix ?
func (m *Path) Prefix() Prefixer { return m.Nlri }
func (m *Destination) Prefix() Prefixer { return m.Nlri }
// Prefix metadata
type PrefixInfo struct {
Address net.IP
Length uint8
// any private or public fields, needed to unify introspection of
// all the various pfx types.
}
// Which also allows more performant options to be considered, like reusing
// buffers / Network ips etc if a user wants to via:
func (PrefixInfo) Reset(pfx Prefixer) {
}
// Constructing them can happen in the API package, keeping type safety:
func GetPrefixInfo(attr Prefixer) PrefixInfo {
switch attr.(type) {
case *Path:
return AttributeInfo{
// bgp meta data
}
case *Destination:
return AttributeInfo{}
default:
// etc
}
}
// Which could also translate in a similar fashion for AttributeInfo.
type Attributer interface {
Attribute() Attributer
}
// all attributes satisfy the interface by returning themselves
func (m *OriginAttribute) Attribute() Attributer { return m }
func (m *AsSegment) Attribute() Attributer { return m }
type AttributeInfo struct {
// more granular bgp meta data fields ?
Flags uint
}
func GetAttrInfo(attr Attributer) AttributeInfo {
switch T := attr.(type) {
case *OriginAttribute:
return AttributeInfo{
// no flags!
}
case *AsSegment:
return AttributeInfo{
Flags: T.Flags,
}
default:
return AttributeInfo{}
}
} The main spirit behind this sort of design is to keep the concrete types very light and more importantly concise, not compromising on their memory layout or design to achieve a unified interface. Then provide specific utilities that accept the concrete structure and return the common properties. This gives callers the option for callers to use the utilities from the API for debugging while guaranteeing that each concrete type wholly represents itself. Any approach would be okay as long as it encouraged / enforced callers to construct their own normalized views. For example converting an I think you're on a great path though and I'm looking forward to seeing the API stabilize in the future thanks a lot for the work you're doing. Let me know if I can be of any assistance. |
Here's the summary of the API breakage updates in my mind. Comments are appreciated. |
I use the GRPC API for GoBGP and made heavy use of Table's for path selection. Previously destinations were all prefix strings in cidr notation, i.e.
"10.10.10.10/24"
. This was very nice to work with because I could easily cross API boundaries from my internal libraries to GoBGP using the prefix. However the latest changes made huge changes thetable
package, changing some but not all function signatures that do key lookups to require a Nlri instead of the Prefix. I am not sure the rationale for this change, I am assuming it is to support this https://github.com/osrg/gobgp/blob/master/table/table.go#L293 function which as far as I can tell is an attempt to optimize away the nlri.String() call. I don't think this change is safe has had all the design implications fully considered based on the following rationale:GetLongerPrefixDestinations(key string)
while some like the one I used nearly every single requestGetDestination(nlri bgp.AddrPrefixInterface)
where nlri was once a prefix. So now callers can no longer query for a longer prefix since the lookupKey function is home-grown and private.bgp.AddrPrefixInterface
implementations it leaves more inconsistency which will likely surface as bugs in other areas of the code base and cause more work around by allocating transient structures to support compatibility.GetDestination
with a different implementation ofbgp.AddrPrefixInterface
to pass this check and result in a panic. I fixed this in the short term with table: fix potential panics in tableKey #1720.So the root problem seems to be one-off optimization cases surface for expensive conversion operations from
bgp.AddrPrefixInterface
to cidr notationnlri.String
or other formats such as AddrToRadixkey. Since the solution is implemented on a one-off bases at call sites in hot-spots there are slightly different constraints and approaches taken. This leaves the code bases MANYnlri.String()
(for things as simple as logging) allocating and has a high code complexity cost.Instead I propose fixing the problem at it's root in the concrete implementations of
bgp.AddrPrefixInterface
- the simplest approach to illustrate my proposal would involve adding a private field ofnameMe
on IPAddrPrefixDefault that would hold the pre-computed value of the current String() implementation, callingfmt.Sprintf("%s/%d", r.Prefix.String(), r.Length)
. This basic approach would cause 1 additional allocation and a small 2 word memory footprint increase with the benefit that every caller of nlri.String() immediately stops allocating and no special cases need to exist. Eachbgp.AddrPrefixInterface
could also define it's own String method, potentially opting out of this optimization.This would leave out other temporary operations like
AddrToRadixkey
but that could be solved for as well the same way. Their are a lot of approaches to this problem, but the key takeaway is to shift to a compute-once model from the current on-demand method for conversion operations regardless if it's done lazily or pre-computed.At this point if the API change is concrete I'll work around it- if that is the case do the GoBGP authors have any guidelines for what API's are safe or unsafe to use? Or maybe a roadmap I can follow to know what API's I should prevent using for stability purposes?
The text was updated successfully, but these errors were encountered: