Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VRF-HLD document #242

Closed
wants to merge 9 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added doc/VRF_HIGH_LEVEL_DIAGRAM.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
343 changes: 343 additions & 0 deletions doc/sonic-vrf-hld-v0.2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,343 @@
SONiC VRF support design spec draft

Table of Contents

Document History
================

| Version | Date | Author | Description |
|---------|------------|--------------|--------------------------------------------------|
| v.01 | 06/07/2018 | Shine/Andrew | Initial version |
| v.02 | 06/08/2018 | Shine | Revised per Guohan/prince(MSFT) opinion |
| v.03 | 09/18/2018 | Guohan | Format document |

Abbreviations
=============

| **Term** | **Definition** |
|----------|-------------------------------------------------------------------------------------------------------------------------------------------------|
| VRF | Virtual routing forwarding |
| FRR | FRRouting is an IP routing protocol suite for Linux and Unix platforms which includes protocol daemons for BGP, IS-IS, LDP, OSPF, PIM, and RIP. |
| Quagga | Open IP routing protocol suite |
| RIB | Routing Information Base |

1. References

- VRF feature Requirement

- Add or Delete VRF instance
- Bind VRF to a L3 interface.
- Static IP route with VRF
- Enable eBGP/OSPF VRF aware in SONiC
- Support fall through lookup
- TBD: VRF route leaking between VRFs.

Note: linux kernel use VRF master device to support VRF and it supports admin
up/down on VRF maser dev. But we don't plan to support it on SONIC.

Dependencies
============

VRF feature needs the following software package/upgrade

1. Linux kernel 4.9

Linux Kernel 4.9 support generic IP VRF with L3 master net device. Every L3
master net device has its own FIB. The name of the master device is the
VRF’s name. Real network interface can join the VRF by becoming the slave of
the master net device.

Application can get creation of deletion of VRF master device via RTNETLINK,
as well as information about slave net device joining a VRF.

Linux kernel supports VRF forwarding using PBR scheme. It will fall to main
routing table to check IP lookup. VRF also can have its own default network
instruction in case VRF lookup fails.

2. FRRouting is needed to support BGP/OSPF VRF aware routing.

3. IProute2 version should be ss161212 or later to support iproute2 CLIs to
configure the switch.

Example of using iprout2:

```
VRF name: vrf-blue,fib-table-id: 10

$ ip link add name vrf-blue type vrf table 10

//enable VRF

$ ip link set dev vrf-blue up

// disable global VRF lookup

$ ip [-6] route add table 10 unreachable default

//binding sw1p3 device to vrf-blue

$ ip link set dev sw1p3 master vrf-blue

// descend local table pref

ip [-6] rule add pref 32765 table local && ip [-6] rule del pref 0
```

4. SAI VRF support

SAI right now does not seem having VRF concept, it does have VR.

We propose to implement VR as “virtual router” and VRF as “virtual router
forwarding”

VR is defined as an logical routing system. VRF is defined as forwarding
domain within a VR.

As this stage, we assume one VR per system. Only implement VRFs within this
VR.

Accordingly, we need to add vrf_id to sai_Route_entry and add vrf attribute
to sai_routeInterface object.

An alternative method is using VR as VRF. But it is needed to add two
attribution to VR object.

```
/*
* \@brief if it is global vrf
*
* \@type bool
* \@flags CREATE_AND_SET
* \@default true
*/
SAI_VIRTUAL_ROUTER_ATTR_GLOBAL

/*
* \@brief continue to do global fib lookup while current vrf fib lookup
* missed
*
* \@type bool
* \@flags CREATE_AND_SET
* \@default false
*/
SAI_VIRTUAL_ROUTER_ATTR_FALL_THROUGH
```

SONiC system diagram for VRF
============================

The following is high level diagram of modules with VRF support.

## The schema changes

1. Adding VRF related configuration in config_db.json

```
"VRF": {
"VRF-blue": {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why you are using VRF-blue instead of vrf-blue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why you are using VRF-blue instead of vrf-blue?

I think vrf-blue is okay. I will revise it @nikos-github

"fall_through":"1" //enable global fib lookup while vrf fib lookup missed

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change fall_through to fallback. The term fall_through tends to be used for routemaps but for vrfs the industry established term is fallback.

There should be no restriction that the fallback be the global/main table even if that may be the only one we plan to support (and I don't see why we plan to support only global/main as fallback).

Your document needs to specify how the fallback is specified. I assume the table id will be used since linux doesn't have a name for the global/main table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikos-github Fallback term is better.I will revise it. The fallback global feature is very useful for specify VRF user access internet through global/main route which is defined by RFC4364. Some enterprise user still use this to access internet on vpn environment. Different vrf route leak is beyond this document scope.

},
"VRF-red":{
"fall_through": "1"
},
"VRF-yellow":{
"fall_through":"0" //disable global fib lookup while vrf fib lookup missed
}
},

"INTERFACE":{
"Ethernet0":{
"mtu":"1500",
"vrf":"vrf-blue"
},
"Ethernet1":{
"mtu":"1500”,
"vrf":"vrf-red"
},
"Ethernet0|11.11.11.1/24": {},
"Ethernet1|12.12.12.1/24": {},
"Ethernet2|13.13.13.1/24": {},
"Ethernet3|14.14.14.1/24": {},
},

"VLAN_INTERFACE": {
"Vlan100":{
"mtu":"1500",
"vrf":"vrf-blue"
},
"Vlan100|15.15.15.1/24": {},
}
```

2. **Adding a VRF_TABLE** in APP_DB

```
;defines virtual routing forward table
;
;Status: stable

key = VRF_TABLE:VRF_NAME ;
fall_through = "1"/"0"
```

3. **Breaking up app-intf-table into app-intf-table and app-intf-prefix-table**

app-intf-table is defined as the following:

```
;defines logical network interfaces, an attachment to a PORT name
;
;Status: stable

key = INTF_TABLE:ifname
if_mtu = 1\*4DIGIT ; MTU for the interface
VRF_NAME = 1\*64VCHAR ;
```

app-intf-prefix-table is defined as the following:

```
;defines logical network interfaces with IP-prefix, an attachment to a PORT and
list of 0 or more ip prefixes;

;Status: stable
key = INTF_TABLE:ifname:IPprefix ; an instance of this key will be repeated for
Copy link

@nikos-github nikos-github Oct 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the advantage of having multiple keys?

Please provide how the json looks for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikos-github There is two reason why break up app-intf-table into app-intf-table and app-intf-prefix-table.

  1. Multiple ip addresses can be configured on one interface.So I put interface common attribute into app-intf-table and keep ip-prefix specific attribute on app-intf-prefix-table.
  2. Interface can be put to specific VRF before ip address is configured on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jason file looks like below:
"Ethernet1":{
"mtu":"1500”,
"vrf":"vrf-red"
},
"Ethernet0|11.11.11.1/24": {},
"Ethernet1|12.12.12.1/24": {},
"Ethernet2|13.13.13.1/24": {},
"Ethernet3|14.14.14.1/24": {},

each prefix
IPprefix = IPv4prefix / IPv6prefix ; an instance of this key/value pair will be repeated for each prefix
scope = "global" / "local" ; local is an interface visible on this localhost only
if_mtu = 1\*4DIGIT ; MTU for the interface (move to INTF_TABLE:ifname table)
family = "IPv4" / "IPv6" ; address family
```

4. **Adding VRF key to app-route-table key list**

```
;Stores a list of routes
;Status: Mandatory

key = ROUTE_TABLE:VRF_NAME:prefix ;
nexthop = \*prefix, ;IP addresses separated “,” (empty indicates no gateway)
intf = ifindex? PORT_TABLE.key ; zero or more separated by “,” (zero indicates no interface)
blackhole = BIT ; Set to 1 if this route is a blackhole (or null0)
```

Since global vrf name is null, global vrf key will becomes ROUTE_TABLE:prefix.
Copy link

@nikos-github nikos-github Sep 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest you use ROUTE_TABLE:DEFAULT:prefix or ROUTE_TABLE:GLOBAL:prefix or even better, the tableid.


## Agent changes

### VrfMgrd

Listening VRF related configuration in config_db such as VRF
creation/deletion, VRF binding to any interface. Once detected, update
kernel using iproute2 CLIs and write VRF information to app-VRF-table and
app-intf-table.

VrfMgrd process will be placed in swss docker. In case of “swss restart”,
VRF device will be still retained in kernel. When VrfMgrd starts up it
querys all master device from kernel and clean up all vrf related device and
restore vrf device per configdb.

VRFOrch has three member routerCnt , neighCnt and rifCnt which record VRF
related route number , neigh number and rif number.VRFOrch is added to
RouteOrch , NeighOrch and intfsOrch member to update routeCnt , neighCnt and
rifCnt.

When VRFOrch receives vrf-delete event VRF object won’t  be deleted until
routerCnt  ,neighCnt and rifCnt  is decreased to zero.

When device binds to specified VRF, the ip address of slave device will be
removed and kernel will delete all neigh associated slave device.

### fpmsyncd

with added VRF ID, fpmsyncd can use rtnl_route_get_table to acquire table id.
Hence can send VRF routes further down. The messages from FRR has nh (next hop)
information which contain further information about (nexthop_ipaddress and
interface index),tableid can be derived from the interface index.

Fpmsyncd can build ```<tableid, vrf_name>``` pairs using rtnetlink api.

### vrforch (new)

Monitorying VRF_TABLE in APPDB,Using sai_create_virtual_router_fn or

sai_remove_virtual_router_fn defined in saivirtualrouter.h to track
(VR,VRF) creation/deletion.and save (vrf_name, vrf-vid) pairs.

### intfsorch

Adding following logics:

- adding vrforch as one member to intfsorch
- intfsorch monitors both app-intf-table和app-intf-prefix-table,when
app-intf-table has changes,handle updating vrf attribute on
routerintf,request vrforch for tableid/vrf-id .

### routeorch

Adding the following logics:

- Adding vrforch member to routesorch
- Once app-route-table has new udpate,get tableid from vrforch for route
add/delete.

When query nexthop,keys now are (tableid, ipaddress),tableid of nexthop
can be acquired using nexthop interface.

### neighorch

Adding the following logic:

- the Key of NextHop now is changed from only ipaddress to a pair of
(ipaddress, interface_name)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be (ipaddress, tableid) and not (ipaddress, interface_name). Expect to be doing a lot of conversions and lookups if you leave as (ipaddress, interface_name).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we point a route in vrf red to a nexthop (neighbor) in vrf blue? @prsunny , do we need to change the nexthop format in app db?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lguohan Yes you can have routes in vrf red with nexthops in vrf blue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikos-github , yes it is possible but the format in APP_DB is same for route since every route has a nexthop IP with the interface name. Currently orchagent doesn't look for the interface which requires a fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikos-github When set neigh to SAI layer (ipaddress, interface_name) would be more comfortable. And relatedAPP_DB/ASIC_DB needn't be changed.


### TODO

- (Mirror,tunnel,PBR) to be designed in future.

## CLI

VRF configureation can be done via SONiC build-in CLIs (to implement)

sonic CLIs are proposed as followings:

```
config vrf <add|del> <VRF-name>
config vrf <VRF-name> member <add|del> interface <interface-name>
config vrf <VRF-name> global-lookup <enable|disable>
config route add [vrf <vrf-name>] prefix <route_prefix/mask> nexthop [vrf <vrf-name>] <nh>
config route del [vrf <vrf-name>] prefix <route_prefix/mask> nexthop [vrf <vrf-name>] <nh>
```

Impact to other service after import VRF feature
================================================

For apps that don't care VRF they don't need to modify after sonic import VRF.

Linux supports “VRF-global” socket from kernel 4.5.  The socket listened by
service are VRF-global by default unless the VRF instance is specified. It
means the service can accept connection over all VRFs. Connected sockets are
bound to the VRF domain in which the connection originates.

Take teamd as an example. Teamd is layer2 apps and it doesn't care VRF
attribute. Teamd code is as followed with removing some exceptional code. It
uses VRF-global socket for every port-channel member port.

```
{
sock = socket(PF_PACKET, type, 0);
err = attach_filter(sock, fprog, alt_fprog);
memset(&ll_my, 0, sizeof(ll_my));
ll_my.sll_family = AF_PACKET;
ll_my.sll_ifindex = ifindex;
ll_my.sll_protocol = family;
ret = bind(sock, (struct sockaddr \*) &ll_my, sizeof(ll_my));
}
```

Put port-channel in different VRF instance doesn't affect vrf-global socket
to receive lacp protocol packet from member port. So teamd doesn't  need to
be modified or restarted for VRF binding event.

For layer 3 apps such as snmpd or ntpd they are using vrf-global socket too.
So they are vrf-transparent too.