Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking changes to table.Table without any available work around and massive allocation increase #1721

Closed
cstockton opened this issue Jun 4, 2018 · 11 comments

Comments

@cstockton
Copy link

cstockton commented Jun 4, 2018

I use the GRPC API for GoBGP and made heavy use of Table's for path selection. Previously destinations were all prefix strings in cidr notation, i.e. "10.10.10.10/24". This was very nice to work with because I could easily cross API boundaries from my internal libraries to GoBGP using the prefix. However the latest changes made huge changes the table package, changing some but not all function signatures that do key lookups to require a Nlri instead of the Prefix. I am not sure the rationale for this change, I am assuming it is to support this https://github.com/osrg/gobgp/blob/master/table/table.go#L293 function which as far as I can tell is an attempt to optimize away the nlri.String() call. I don't think this change is safe has had all the design implications fully considered based on the following rationale:

  • The API is now inconsistent, you have some functions that require a prefix like GetLongerPrefixDestinations(key string) while some like the one I used nearly every single request GetDestination(nlri bgp.AddrPrefixInterface) where nlri was once a prefix. So now callers can no longer query for a longer prefix since the lookupKey function is home-grown and private.
  • Since this change is only for two types of bgp.AddrPrefixInterface implementations it leaves more inconsistency which will likely surface as bugs in other areas of the code base and cause more work around by allocating transient structures to support compatibility.
  • Knowing the above we can prove it is unsafe- because the type assertion is only guarded by a conditional check in a switch statement on the routeFamily. However we can call GetDestination with a different implementation of bgp.AddrPrefixInterface to pass this check and result in a panic. I fixed this in the short term with table: fix potential panics in tableKey #1720.
  • Finally this has a massive performance implication for reading when not using a bgp.AddrPrefixInterface, I am unsure about the GoBGP code base but I always have the prefix in cidr notation stored in a string so querying a map use to cause zero allocations, now it causes 1. With tens of thousands of entries at any given time what use to be a nearly free operation has turned into milliseconds for me I imagine GoBGP was affected in similar ways.

So the root problem seems to be one-off optimization cases surface for expensive conversion operations from bgp.AddrPrefixInterface to cidr notation nlri.String or other formats such as AddrToRadixkey. Since the solution is implemented on a one-off bases at call sites in hot-spots there are slightly different constraints and approaches taken. This leaves the code bases MANY nlri.String() (for things as simple as logging) allocating and has a high code complexity cost.

Instead I propose fixing the problem at it's root in the concrete implementations of bgp.AddrPrefixInterface - the simplest approach to illustrate my proposal would involve adding a private field of nameMe on IPAddrPrefixDefault that would hold the pre-computed value of the current String() implementation, calling fmt.Sprintf("%s/%d", r.Prefix.String(), r.Length). This basic approach would cause 1 additional allocation and a small 2 word memory footprint increase with the benefit that every caller of nlri.String() immediately stops allocating and no special cases need to exist. Each bgp.AddrPrefixInterface could also define it's own String method, potentially opting out of this optimization.

This would leave out other temporary operations like AddrToRadixkey but that could be solved for as well the same way. Their are a lot of approaches to this problem, but the key takeaway is to shift to a compute-once model from the current on-demand method for conversion operations regardless if it's done lazily or pre-computed.

At this point if the API change is concrete I'll work around it- if that is the case do the GoBGP authors have any guidelines for what API's are safe or unsafe to use? Or maybe a roadmap I can follow to know what API's I should prevent using for stability purposes?

@fujita
Copy link
Member

fujita commented Jun 5, 2018

First, sorry about the breakage.

At this point if the API change is concrete I'll work around it- if that is the case do the GoBGP authors have any guidelines for what API's are safe or unsafe to use? Or maybe a roadmap I can follow to know what API's I should prevent using for stability purposes?

I think that only the gRPC API are guaranteed to be stable. A huge mistake I made is not using the structures generated by protobuf (the structures is a port of the gRPC API) to implement gobgp (client). Instead we convert these structures into internal structures (table package) and use them. That leads users to a wrong assumption that these internal structures are stable.

I plan more changes to Path, e.g., deleting nlri member from it for reducing footprints. I think that there are two options:

  1. developers uses protobuf structures. we could provide some handy functions.
  2. make table/*.go a part of the public API with the protobuf structures. gobgpd uses different structures.

@cstockton
Copy link
Author

Thanks for the clarification around API stability, this will help me pick a new path forward for now.

But you did not address my concerns about the changes to the Table. I think it's possible it may just be a bug since your unit tests don't check the string format- only the length? If the change is intentional I strongly oppose it for reasons I've covered and have better illustrated in the code below.

I know opposing a change without something in it's place is not helpful, so I provided a potential alternative under the premise your change to the key format was for performance reasons. Measuring the drawbacks I've listed for your change against the simplicity and overal benefits of implementing my alternative (zero allocation reads, everywhere that calls nlri.String() benefits, simple to implement, no API changes) I feel it's hard to justify your change. If my premise for the change is wrong, my apologies- if you wouldn't mind explaining in more detail to help me better understand that would be great.

To be clear here what I am challenging is the change to map keys in table.Table.

  • table.lookupKey provides different keys based on concrete type
  • The api provides functions to lookup by both bgp.AddrPrefixInterface and string
  • Cross referencing tables is impossible without expensive transient lookup structures.
  • Requiring a more restrictive interface to support table.lookupKey for only 2 lookup methods out of the many has no clear benefit.

I would be happy to implement and measure a change that allows consistent map keys in Table and revert the more restrictive string -> bgp.AddrPrefixInterface requirement for map lookups.

package main

import (
	"fmt"
	"net"
	"testing"
	"unicode/utf8"

	"github.com/osrg/gobgp/packet/bgp"
	"github.com/osrg/gobgp/table"
)

func inc(ip net.IP) net.IP {
	for j := len(ip) - 1; j >= 0; j-- {
		ip[j]++
		if ip[j] > 0 {
			break
		}
	}
	return ip
}

func recov(fn func()) (err error) {
	defer func() {
		if r := recover(); r != nil {
			err = fmt.Errorf("%v", r)
		}
	}()
	fn()
	return
}

func main() {

	// My main opposition is the inconsistent key format, I feel it's going to
	// lead to regressions. Two identical prefixes are stored in two separate
	// formats:
	//
	//   // tbl.lookupKey(T)
	//   key(0a0a0a0018)
	//     - when T is "10.10.10.10/24" of *IPAddrPrefix
	//   key(31302e31302e31302e31302f3234)
	//     - when T is "10.10.10.10/24" of *LabeledIPAddrPrefix
	//
	{
		a1 := bgp.NewIPAddrPrefix(24, "10.10.10.10")
		d1 := table.NewDestination(a1, 0)
		d1Str := d1.GetNlri().String()
		t1 := table.NewTable(bgp.RF_IPv4_UC, d1)
		m1 := t1.GetDestinations()

		a2 := bgp.NewLabeledIPAddrPrefix(24, "10.10.10.10", bgp.MPLSLabelStack{
			Labels: []uint32{0x32, 0x64},
		})
		d2 := table.NewDestination(a2, 0)
		d2Str := d1.GetNlri().String()
		t2 := table.NewTable(bgp.RF_IPv4_MPLS, d2)
		m2 := t2.GetDestinations()

		fmt.Println("Both nlri return the same String():",
			d1Str == d2Str) // true

		fmt.Println("However the RF_IPv4_UC and RF_IPv4_MPLS tables have different keys:")
		for k, v := range m1 {
			fmt.Printf("  m1 key(%x) val(%v)\n", k, v)
		}
		for k, v := range m2 {
			fmt.Printf("  m2 key(%x) val(%v)\n", k, v)
		}
	}

	// it can also produce invalid unicode string sequences:
	{
		dest := table.NewDestination(bgp.NewIPAddrPrefix(32, "0.0.0.128"), 0)
		tbl := table.NewTable(bgp.RF_IPv4_UC, dest)
		for k := range tbl.GetDestinations() {
			if !utf8.ValidString(k) {
				fmt.Println("invalid unicode!")
			}
		}
	}

	// The change causes the potential for panics where none existed:
	{
		fmt.Println("panic:", recov(func() {
			a1 := bgp.NewIPAddrPrefix(24, "10.10.10.10")
			d1 := table.NewDestination(a1, 0)

			a2 := bgp.NewLabeledIPAddrPrefix(24, "10.10.10.10", bgp.MPLSLabelStack{
				Labels: []uint32{0x32, 0x64},
			})
			d2 := table.NewDestination(a2, 0)

			tbl := table.NewTable(bgp.RF_IPv4_UC, d1, d2)
			tbl.GetDestinations()
			// interface conversion: bgp.AddrPrefixInterface is \
			//   *bgp.LabeledIPAddrPrefix, not *bgp.IPAddrPrefix

		}))
	}

	// While being subtle:
	{
		a1 := bgp.NewIPAddrPrefix(24, "10.10.10.10")
		d1 := table.NewDestination(a1, 0)
		tbl := table.NewTable(bgp.RF_IPv4_MPLS, d1)
		tbl.GetDestinations()

		// The above won't panic.
		fmt.Println("The above won't panic until a lookup is performed")
	}

	// The change prevents cross referencing table types:
	{
		fmt.Println("panic:", recov(func() {
			a1 := bgp.NewIPAddrPrefix(24, "10.10.10.10")
			d1 := table.NewDestination(a1, 0)

			a2 := bgp.NewLabeledIPAddrPrefix(24, "10.10.10.10", bgp.MPLSLabelStack{
				Labels: []uint32{0x32, 0x64},
			})

			tbl := table.NewTable(bgp.RF_IPv4_MPLS, d1)
			tbl.GetDestination(a2)
			// interface conversion: bgp.AddrPrefixInterface is \
			//   *bgp.LabeledIPAddrPrefix, not *bgp.IPAddrPrefix
		}))
	}

	// I need a temporary nlri JUST to do a map lookup to the tune of N allocs
	// for even a basic prefix type:
	{
		a2 := bgp.NewLabeledIPAddrPrefix(24, "10.10.10.10", bgp.MPLSLabelStack{
			Labels: []uint32{0x32, 0x64},
		})
		d2 := table.NewDestination(a2, 0)
		tbl := table.NewTable(bgp.RF_IPv4_MPLS, d2)

		// If I wanted to use the GoBGP Table object (the api pkg returns this type
		// in several calls like GetRib) I will have to store the NLRIs along with
		// the prefix string in my app to do lookups.
		a1 := bgp.NewIPAddrPrefix(24, "10.10.10.10")

		// This sounds okay until I realize I will need to store a separate table
		// lookup key for each route family. Unreasonable. Only other option is to
		// construct a temporary lookup addr:
		allocs := testing.AllocsPerRun(1000, func() {
			rf := tbl.GetRoutefamily()
			afi, safi := bgp.RouteFamilyToAfiSafi(rf)
			pfx := a1.String()

			tmpLookup, err := bgp.NewPrefixFromRouteFamily(afi, safi, pfx)
			if err != nil {
				panic(err)
			}
			tbl.GetDestination(tmpLookup)
		})

		// 12 allocations ... for a single map lookup using the new API.
		fmt.Println(allocs)
	}
}

@fujita
Copy link
Member

fujita commented Jun 5, 2018

The consistent key format is nice but the memory usage with the full routes is a real issue that I must fix (note that not the number of allocations). Also I don't assume that creating a table with bgp.RF_IPv4_MPLS and inserting ipv4 destination to it. As I wrote yesterday, I think that table/*.go should not be public.

@cstockton
Copy link
Author

I think making an internal *table.Table package and leaving the public facing table for your users would have been a much better solution. Mostly because the grpc api has functions to return ToNativeT for all types which really encourages the use of those objects since an *api.Path gives you no access to any path attributes, only a [][]byte so the first thing you do is call ToNativePath() which brings you into an unsupported package.

I do thank you for telling me the rationale of the change was memory reduction. I decided to profile gobgp again for memory-usage (vs allocs as last time) to see if I could revert the change I disagree with in *table.Table and regain your memory footprint elsewhere reduction. The TLDR here is I was able to reduce the memory footprint by about 3x as shown below.

  cc42a31 - Showing nodes accounting for 162.04MB, 90.96% of 178.15MB total
  dc9fe2b - Showing nodes accounting for 142.04MB, 95.26% of 149.11MB total (next commit, API break)
  cstockton - Showing nodes accounting for 54.09MB, 99.08% of 54.59MB total (cc42a31 + my diff)

Below are the raw notes I took while auditing memory usage in GoBGP.


The main reason for allocations is the constant transient type conversions, for example Path ToLocal():

  func (p *Path) ToLocal() *Path {
    ...
  	n := nlri.(*bgp.LabeledVPNIPAddrPrefix)
  	_, c, _ := net.ParseCIDR(n.IPPrefix())
  	ones, _ := c.Mask.Size()
    ...

We see it call the IPPrefix() below- which converts to a prefix using fmt.Sprintf:

  func (l *LabeledVPNIPAddrPrefix) IPPrefix() string {
    masklen := l.IPAddrPrefixDefault.Length - uint8(8*(l.Labels.Len()+l.RD.Len()))
    return fmt.Sprintf("%s/%d", l.IPAddrPrefixDefault.Prefix, masklen)
  }

Which then the caller immediately converts the IP parsed f7rom net.ParseCIDR to
a string with IP.String() which makes another string:

  func (p *Path) ToLocal() *Path {
    ...
  	ones, _ := c.Mask.Size()
  	nlri = bgp.NewIPAddrPrefix(uint8(ones), c.IP.String())
  	nlri.SetPathLocalIdentifier(pathId)

Now once in the NewIPAddrPrefix call frame the callers string is immediately converted to a another net.IP using net.ParseIP(prefix):

  func NewIPAddrPrefix(length uint8, prefix string) *IPAddrPrefix {
  	p := &IPAddrPrefix{
  		IPAddrPrefixDefault{
  			Length: length,
  		},
  		4,
  	}
  	p.IPAddrPrefixDefault.decodePrefix(net.ParseIP(prefix).To4(), length, 4)
  	return p
  }

Before returning to the caller there will be one more allocation of addrlen which seems like an attempt to avoid storing the 16 bytes allocated by net.IP implementation. However in most cases I've seen (and the profiling reflects this) the original net.IP is being retained in another structure so it's not saving allocations, but increasing them:

  func (r *IPAddrPrefixDefault) decodePrefix(data []byte, bitlen uint8, addrlen uint8) error {
  	bytelen := (int(bitlen) + 7) / 8
  	if len(data) < bytelen {
  		eCode := uint8(BGP_ERROR_UPDATE_MESSAGE_ERROR)
  		eSubCode := uint8(BGP_ERROR_SUB_MALFORMED_ATTRIBUTE_LIST)
  		return NewMessageError(eCode, eSubCode, nil, "network bytes is short")
  	}
  	b := make([]byte, addrlen)
  	copy(b, data[:bytelen])
  	// clear trailing bits in the last byte. rfc doesn't require
  	// this but some bgp implementations need this...
  	rem := bitlen % 8
  	if rem != 0 {
  		mask := 0xff00 >> rem
  		lastByte := b[bytelen-1] & byte(mask)
  		b[bytelen-1] = lastByte
  	}
  	r.Prefix = b
  	return nil
  }

Many of these transient allocations used to cross package boundaries are released upon the next GC so may seem unimportant when the goal is memory footprint and any number of short-lived allocations are acceptable in the name of that goal. However these operations have a heavy hidden cost in how they have influenced the design which indirectly I believe is the root cause of the memory consumption when you consider there is no clarity around immutability. Each caller may or may not read or write to fields directly and choose to copy or reuse slice backings.

Consider the following design choice: every object is immutable and copy-on-write. This would greatly reduce the memory footprint of GoBGP, because much of it comes from the fact that duplicate objects that are otherwise immutable are constantly being stored, for example originator id:

    func NewPathAttributeOriginatorId(value string) *PathAttributeOriginatorId {
    	t := BGP_ATTR_TYPE_ORIGINATOR_ID
    	return &PathAttributeOriginatorId{
    		PathAttribute: PathAttribute{
    			Flags:  PathAttrFlags[t],
    			Type:   t,
    			Length: 4,
    		},
    		Value: net.ParseIP(value).To4(),
    	}
    }

Here you parse the value that will be the same for any paths from this originator, each one stores a 4 byte copy of the net.IP or 16 bytes for ipv6 which adds up very fast. This is same pattern is followed for ALL path attributes that share common values, Community attributes are duplicated for every path in each adj-rib, Aggregator, ClusterLists, even smaller structures like PathAttributeAigp start to add up when you consider that thousands of paths may share the same metric, there we have 4 bytes in PathAttribute, another 3 words in Values for the slice header which is only 28 bytes so you end up with 32 after alignment + 1 word after the metric. All metrics could instead be stored in a sync.RWMutex guarded map[uint64]*AigpTLVIgpMetric and used instead. The same could be done for other common attributes like IP addresses inside a patricia trie.

To test my theory I made a simple set of changes, first I added *FromIP factories to various GoBGP functions to reuse the same IP address:

NewPathAttributeAggregatorFromIP(as interface{}, ip net.IP) *PathAttributeAggregator
NewPathAttributeOriginatorIdFromIP(ip net.IP) *PathAttributeOriginatorId
NewPathAttributeClusterListFromIPS(value []net.IP) *PathAttributeClusterList
NewPathAttributeAs4AggregatorFromIP(as uint32, ip net.IP) *PathAttributeAs4Aggregator
// Continued for many other packet types

Another big area of allocations was slice backings that were over-allocated, in many places append is used on slice backings that may double the capacity. For example the packet/bgp decoding often does not pre-compute slice backings fully, by first iterating the to-be-serialized objects and calling Size(). In many places the original buffer may have a size set- with a conditional but frequent append after which doubles the slice backing. One of the biggest contributors of high len / cap disparity is in table/message.go, functions that conditionally add values to PathAttributes in update messages use plain append() which doubles the slice backing of the often previously len == cap sized slice backings. So I went through just a handfull of the biggest over-allocators and made the slice backings more precise.

The IP reuse and more precise slice backings yielded a 3x memory improvement:

  cc42a31 - Showing nodes accounting for 162.04MB, 90.96% of 178.15MB total
  dc9fe2b - Showing nodes accounting for 142.04MB, 95.26% of 149.11MB total (next commit, API break)
  cstockton - Showing nodes accounting for 54.09MB, 99.08% of 54.59MB total (cc42a31 + my diff)

This did cause a small performance hit of about 15-20% for startup- this is due to a few locations calling copy() during several phases of growth to maintain len == cap invariant. To see if I could get that timing back I found the biggest allocator was due to the BGPUpdate struct processing in ProcessMessage - currently it performs a very large amount of allocations because all of the []*bgp.IPAddrPrefix types are all converted to []bgp.PathAttributeInterface directly after decoding. I changed the structure to:

@@ -8896,10 +8906,10 @@ func GetPathAttribute(data []byte) (PathAttributeInterface, error) {

 type BGPUpdate struct {
 	WithdrawnRoutesLen    uint16
-	WithdrawnRoutes       []*IPAddrPrefix
+	WithdrawnRoutes       []AddrPrefixInterface
 	TotalPathAttributeLen uint16
 	PathAttributes        []PathAttributeInterface
-	NLRI                  []*IPAddrPrefix
+	NLRI                  []AddrPrefixInterface
 }

And the reused the slice backings (including attrs) in the ProcessMessage function by setting the length of attrs to 0 and filter-appending during reachability switching. This change not only returned the startup costs but halved them. However this change unfortunately required updating a lot of unit tests, i.e.:

@@ -1522,12 +1522,12 @@ func TestProcessBGPUpdate_bestpath_lost_ipv4(t *testing.T) {
 	pathAttributes1 := []bgp.PathAttributeInterface{
 		origin1, aspath1, nexthop1, med1, localpref1,
 	}
-	nlri1 := []*bgp.IPAddrPrefix{bgp.NewIPAddrPrefix(24, "10.10.10.0")}
+	nlri1 := []bgp.AddrPrefixInterface{bgp.NewIPAddrPrefix(24, "10.10.10.0")}
 	bgpMessage1 := bgp.NewBGPUpdateMessage(nil, pathAttributes1, nlri1)

 	// path 1 withdraw
 	w1 := bgp.NewIPAddrPrefix(24, "10.10.10.0")
-	w := []*bgp.IPAddrPrefix{w1}
+	w := []bgp.AddrPrefixInterface{w1}
 	bgpMessage1_w := bgp.NewBGPUpdateMessage(w, nil, nil)

 	peer1 := peerR1()

Which adds a lot of noise to an already pretty big diff:

.../osrg/gobgp$ git diff --stat
 packet/bgp/bgp.go           | 1227 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------------
 packet/bgp/helper.go        |    4 +-
 packet/bgp/validate_test.go |    2 +-
 table/destination.go        |   15 +-
 table/destination_test.go   |   20 +-
 table/message.go            |   14 +-
 table/message_test.go       |   22 +-
 table/path.go               |   10 +-
 table/path_test.go          |   20 +-
 table/policy_test.go        |   90 +++----
 table/table_manager.go      |   68 ++---
 table/table_manager_test.go |   54 ++--
 table/table_test.go         |   10 +-
 13 files changed, 1172 insertions(+), 384 deletions(-)
.../osrg/gobgp$ git diff |wc -l
3407

I understand you may be hesitant to accept such a big pull request, I've had difficulty finding the right process for contributions in the past so I decided to summarize my findings and give you the option to implement any of them in your own way if you find it suitable to do so. You may close this issue out I'll be avoiding using anything outside the api package until you declare them stable. Thanks a lot for the heads up.

@fujita
Copy link
Member

fujita commented Jun 12, 2018

*api.Path gives you no access to any path attributes, only a [][]byte so the first thing you do is call ToNativePath() which brings you into an unsupported package.

If the api.Path gives structures for path attributes, you don't need something like ToNativePath()?
#1729

@jeffbean
Copy link
Contributor

As a general guide, if we are concerned on things relating to performance, it is usually shown through Benchmark Tests so the contributors and community can see the reasoning and guidelines behind changes.

This change in particular is a strange one that has no benchmark tests to show the results. I would love to see before and after for both the original change and this proposed solution.

@cstockton
Copy link
Author

@fujita I think it would be nice to not have to do any serialization on paths, right now the grpc api feels more like a "shim" between grpc server and the private libraries. I think the only way to be certain the GRPC api is a truly portable and independent interface between the GoBGP server and your users it satisfy a single rule: Package api must not import packages within the gobgp repo. As that rule is satisfied the demand for more structures will present themselves as you go along.

@jeffbean I did post the memory consumption before & after for my changes with a detailed analysis on how I arrived to them, though it could easily be missed in my wall of text. Which is why I didn't include the full benchmark output like I did in #1741. The benchmark tests were not tests- but an end-to-end approach using http/pprof because it seemed to be the method the GoBGP authors use. Most of the test setup was just arduous Makefile and docker-compose generation in a python for loop, I could post it if you are interested though.

  cc42a31 - Showing nodes accounting for 162.04MB, 90.96% of 178.15MB total
  dc9fe2b - Showing nodes accounting for 142.04MB, 95.26% of 149.11MB total (next commit, API break)
  cstockton - Showing nodes accounting for 54.09MB, 99.08% of 54.59MB total (cc42a31 + my diff)

Closing this issue, no more actionable items remaining. Replaced my previous usage of table.Table with my an implementation of an lc-trie similar to the FIB implementation in the Linux kernel which provides attribution to:

An experimental study of compression methods for dynamic tries
Stefan Nilsson and Matti Tikkanen. Algorithmica, 33(1):19-33, 2002.
http://www.csc.kth.se/~snilsson/software/dyntrie2/

Thanks for investigating this issue!

@fujita
Copy link
Member

fujita commented Jun 19, 2018

Package api must not import packages within the gobgp repo

Fine by me. I guess that with that rule, the main issue is how to make structures in api/ handy for golang users, for example, nlri and path attributes.

  1. direct usage of protobuf structures. No interface type so we can't have a method that returns nlri nicely like table.Path.GetNlri().
  2. helper functions to return attributes in bgp/ in client/ (or somewhere else). We can't have a method in protobuf structures but we can add something like GetPathAttributes(p *api.Path) []bgp.PathAttrubuteInterface. Or we could create something like client.Path and embed api.Path.

@fujita
Copy link
Member

fujita commented Jun 19, 2018

Once I finish the api stuff, I'll take closer look at the memory analysis. About trie stuff, IICR, the algorithm used in Linux networking stack is patented.

@cstockton
Copy link
Author

Fine by me. I guess that with that rule, the main issue is how to make structures in api/ handy for golang users, for example, nlri and path attributes.

direct usage of protobuf structures. No interface type so we can't have a method that returns nlri nicely like table.Path.GetNlri().

I think you could still define a common interface in the api package, just by making sure all attributes implemented a common interface. You could have an interface like PathAttributeInterface:

type Attributer interface {
    AttrType() AttrType // akin to bgp.BGPAttrType ? 
}
func (m *OriginAttribute) AttrType { return OriginAttr } // akin to BGPOriginAttr constant

Or you could try something a little new- since we only need a way to enforce type safety and the concrete type carries the attribute type maybe something like this would work out:

type Prefixer interface {
	Prefix() Prefixer // represents NLRI - various IPAddr interfaces?
}

// Maybe path & destinations defines Prefix ?
func (m *Path) Prefix() Prefixer        { return m.Nlri }
func (m *Destination) Prefix() Prefixer { return m.Nlri }

// Prefix metadata
type PrefixInfo struct {
	Address net.IP
	Length  uint8
	// any private or public fields, needed to unify introspection of
	// all the various pfx types.
}

// Which also allows more performant options to be considered, like reusing
// buffers / Network ips etc if a user wants to via:
func (PrefixInfo) Reset(pfx Prefixer) {
}

// Constructing them can happen in the API package, keeping type safety:
func GetPrefixInfo(attr Prefixer) PrefixInfo {
	switch attr.(type) {
	case *Path:
		return AttributeInfo{
			// bgp meta data
		}
	case *Destination:
		return AttributeInfo{}
	default:
		// etc
	}
}

// Which could also translate in a similar fashion for AttributeInfo.
type Attributer interface {
	Attribute() Attributer
}

// all attributes satisfy the interface by returning themselves
func (m *OriginAttribute) Attribute() Attributer { return m }
func (m *AsSegment) Attribute() Attributer       { return m }

type AttributeInfo struct {
	// more granular bgp meta data fields ?
	Flags uint
}

func GetAttrInfo(attr Attributer) AttributeInfo {
	switch T := attr.(type) {
	case *OriginAttribute:
		return AttributeInfo{
			// no flags!
		}
	case *AsSegment:
		return AttributeInfo{
			Flags: T.Flags,
		}
	default:
		return AttributeInfo{}
	}
}

The main spirit behind this sort of design is to keep the concrete types very light and more importantly concise, not compromising on their memory layout or design to achieve a unified interface. Then provide specific utilities that accept the concrete structure and return the common properties. This gives callers the option for callers to use the utilities from the API for debugging while guaranteeing that each concrete type wholly represents itself.

Any approach would be okay as long as it encouraged / enforced callers to construct their own normalized views. For example converting an api.Path to an table.Path would happen in table.PathFromAPI, bgp.PathAttributeInterface would be in bgp.AttrFromAPI(attr Attributer). This changes the direction of the dependency graph and lots of positive side effects will occur as a result I think for library users.

I think you're on a great path though and I'm looking forward to seeing the API stabilize in the future thanks a lot for the work you're doing. Let me know if I can be of any assistance.

@fujita
Copy link
Member

fujita commented Jun 25, 2018

Here's the summary of the API breakage updates in my mind. Comments are appreciated.
#1763

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants