Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: iqiyi/dpvs
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v1.8.10
Choose a base ref
...
head repository: iqiyi/dpvs
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: master
Choose a head ref

Commits on May 20, 2021

  1. IPVS: fix ipvs rr/wrr/wlc problem of uneven load distribution across …

    …dests.
    
     Different workers should start schedule algorith from the dests that are
     evenly distributed across the whole dest list. It can avoid the clustering
     of connections across dests on the early phase after the service setup,
     especially for such scheduling methods as rr/wrr/wlc.
    
    Signed-off-by: ywc689 <ywc689@163.com>
    ywc689 committed May 20, 2021
    Copy the full SHA
    8a8bf22 View commit details

Commits on May 27, 2021

  1. Add UDP_CHECK health checkers

    1.The MISC_CHECK method consumes more CPU resources,When reaching hundreds of RSs, the CPU usage will be close to 100%.
    2.The UDP_CHECK method has less CPU usage,The CPU usage is less than 100% at 10,000 RSs.
    
    use example:
    real_server 10.xxx.xxx.xxx 8000 {
        weight 1
        inhibit_on_failure
        UDP_CHECK {
            retry 3
            connect_timeout 5
            connect_port 8000
            payload hello world
            require_reply hello world
            min_reply_length 3
            max_reply_length 15
        } !UDP_CHECK
    } !real_server
    weiyanhua committed May 27, 2021
    Copy the full SHA
    f6414b0 View commit details

Commits on Jun 9, 2021

  1. Merge pull request #730 from ywc689/ipvs_sched_bugfix

    IPVS: fix ipvs rr/wrr/wlc problem of uneven load distribution across …
    ywc689 authored Jun 9, 2021

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
    Copy the full SHA
    076ad88 View commit details
  2. Merge pull request #731 from weiyanhua100/udp-health

    Add UDP_CHECK health checkers
    ywc689 authored Jun 9, 2021

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
    Copy the full SHA
    f7be2d9 View commit details

Commits on Jun 11, 2021

  1. doc: update tutorial doc of section 'Full-NAT with Keepalived (one-arm)'

    Signed-off-by: ywc689 <ywc689@163.com>
    ywc689 committed Jun 11, 2021
    Copy the full SHA
    1b2af29 View commit details

Commits on Jun 23, 2021

  1. Fix bonding mode 4 problem caused by LACP failure.

    The problem is disscussed in Issue #725 in detail.
    
    Signed-off-by: ywc689 <ywc689@163.com>
    ywc689 committed Jun 23, 2021
    Copy the full SHA
    5579aff View commit details
  2. netif: add config option "dedicated_queues" for bonding mode 4 (802.3ad)

    It helps avoid the lacp failure problem for some pmd drivers(i.e. mlx5)
    when enabled dedicated queues in 802.3ad bonding mode.
    
    Signed-off-by: ywc689 <ywc689@163.com>
    ywc689 committed Jun 23, 2021
    Copy the full SHA
    ed4ffd2 View commit details
  3. Merge pull request #734 from ywc689/bugfix-bond4

    Bugfix bond4
    ywc689 authored Jun 23, 2021

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
    Copy the full SHA
    860587d View commit details
  4. Merge pull request #735 from ywc689/update-doc

    doc: update tutorial doc of section 'Full-NAT with Keepalived (one-arm)'
    ywc689 authored Jun 23, 2021

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
    Copy the full SHA
    c7e9ef9 View commit details

Commits on Jun 24, 2021

  1. 1
    Copy the full SHA
    3eed601 View commit details
  2. Copy the full SHA
    dd24c38 View commit details
  3. netif_flow: bugfix

    ywc689 committed Jun 24, 2021
    Copy the full SHA
    e4e5486 View commit details
  4. Copy the full SHA
    5df6130 View commit details
  5. Copy the full SHA
    c7604f4 View commit details
  6. Copy the full SHA
    6f7b712 View commit details
  7. makefile: update meson build for DPDK

    dpdk build infrastructure has moved out of Makefile to meson. Adding meson
    build support for extracting cflags and libs for meson installed pkg config
    path. Mitigate the error for inline function definition missing when not
    present in soruce c file.
    
    Signed-off-by: Vipin Varghese <vipinpv@gmail.com>
    vipinpv85 authored and ywc689 committed Jun 24, 2021
    Copy the full SHA
    f10db1e View commit details
  8. doc: update the README for meson build

    update the steps for building with meson-ninja for DPDK and
    install path.
    
    Signed-off-by: Vipin Varghese <vipinpv@gmail.com>
    vipinpv85 authored and ywc689 committed Jun 24, 2021
    Copy the full SHA
    b8da933 View commit details
  9. fix meson build failure problem

    Signed-off-by: ywc689 <ywc689@163.com>
    ywc689 committed Jun 24, 2021
    Copy the full SHA
    6741ce9 View commit details
  10. merge dpdk-stable-20.11.x (abandon dpdk-stable-18.11.x)

    Signed-off-by: ywc689 <ywc689@163.com>
    ywc689 committed Jun 24, 2021
    Copy the full SHA
    ff650eb View commit details
  11. refactor Makefile and fix some bugs after merging dpdk 20.11

    Signed-off-by: ywc689 <ywc689@163.com>
    ywc689 committed Jun 24, 2021
    Copy the full SHA
    fa7bc4b View commit details
  12. patch: add patches for dpdk-stable-20.11.1

    Signed-off-by: ywc689 <ywc689@163.com>
    ywc689 committed Jun 24, 2021
    Copy the full SHA
    0e37d91 View commit details
  13. script: add helper script to facilitate dpdk build.

    Signed-off-by: ywc689 <ywc689@163.com>
    ywc689 committed Jun 24, 2021
    Copy the full SHA
    4cb2913 View commit details
  14. patch: remove patches of old dpdk versions

    Signed-off-by: ywc689 <ywc689@163.com>
    ywc689 committed Jun 24, 2021
    Copy the full SHA
    39760a1 View commit details
  15. main: add dpdk version check

    Signed-off-by: ywc689 <ywc689@163.com>
    ywc689 committed Jun 24, 2021
    Copy the full SHA
    c53a255 View commit details
  16. doc: update docs with dpdk 20.11

    Signed-off-by: ywc689 <ywc689@163.com>
    ywc689 committed Jun 24, 2021
    Copy the full SHA
    1918017 View commit details
  17. makefile: update config.mk

    ywc689 committed Jun 24, 2021
    Copy the full SHA
    c57a0db View commit details
  18. Copy the full SHA
    9a5d303 View commit details
  19. ci: adapt ci to dpdk-20.11

    Signed-off-by: ywc689 <ywc689@163.com>
    ywc689 committed Jun 24, 2021
    Copy the full SHA
    82f66c5 View commit details
  20. patch: add dpdk 20.11.1 bonding mode 4 patch for mlx5

    Signed-off-by: ywc689 <ywc689@163.com>
    ywc689 committed Jun 24, 2021
    Copy the full SHA
    5e23cc9 View commit details

Commits on Jun 29, 2021

  1. single worker rte_flow invalid process bugfix patch

    huangyichen committed Jun 29, 2021
    Copy the full SHA
    291f4a4 View commit details

Commits on Jul 8, 2021

  1. The reason of error return in cons_parse_ntuple_filter() comment

    And safe free rte_flow_list item in rte_flow_destroy
    huangyichen authored and ywc689 committed Jul 8, 2021
    Copy the full SHA
    2ec96f0 View commit details

Commits on Jul 9, 2021

  1. Copy the full SHA
    2947963 View commit details

Commits on Jul 22, 2021

  1. Merge pull request #737 from ywc689/dpdk2011-rebase

    Dpdk2011 rebase
    ywc689 authored Jul 22, 2021

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
    Copy the full SHA
    4eebe78 View commit details

Commits on Jul 28, 2021

  1. fix dpvs_sockopts sockoptid_t register duplicated in sockopts_exist

    Signed-off-by: linjianying <linjianying.dev@gmail.com>
    林剑影 committed Jul 28, 2021
    Copy the full SHA
    4cdb9e5 View commit details

Commits on Aug 2, 2021

  1. patch: allow bonding slaves from different numa nodes

    Note the patch may have a negative influnce on performance.
    It's not a good practice to bonding slaves across numa nodes.
    
    Signed-off-by: ywc689 <ywc689@163.com>
    ywc689 committed Aug 2, 2021
    Copy the full SHA
    c68bd02 View commit details

Commits on Aug 4, 2021

  1. netif: make bonding numa node configurable

    Signed-off-by: ywc689 <ywc689@163.com>
    ywc689 committed Aug 4, 2021
    Copy the full SHA
    19b3475 View commit details
  2. netif: fix kni mac address update problem

    Signed-off-by: ywc689 <ywc689@163.com>
    ywc689 committed Aug 4, 2021
    Copy the full SHA
    531b178 View commit details
  3. update bonding mode 4 patch

    huangyichen committed Aug 4, 2021
    Copy the full SHA
    1e9d036 View commit details
  4. clean whitespace

    huangyichen committed Aug 4, 2021
    Copy the full SHA
    7dae118 View commit details

Commits on Aug 5, 2021

  1. neigh: fix -Wpacked-not-aligned error

    Signed-off-by: ywc689 <ywc689@163.com>
    ywc689 committed Aug 5, 2021
    Copy the full SHA
    a1fa2a0 View commit details
  2. Merge pull request #7 from you-looks-not-tasty/dpdk2011-rebase

    user space multicast packet fetch miss bug fix
    ywc689 authored Aug 5, 2021

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
    Copy the full SHA
    5457e8c View commit details

Commits on Aug 6, 2021

  1. Merge pull request #748 from githubljy/devel

    fix dpvs_sockopts sockoptid_t register duplicated in sockopts_exist
    ywc689 authored Aug 6, 2021

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
    Copy the full SHA
    c53553b View commit details
  2. Merge pull request #750 from ywc689/bonding-bugfix

    Bonding bugfix
    ywc689 authored Aug 6, 2021

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
    Copy the full SHA
    4163ee7 View commit details

Commits on Aug 11, 2021

  1. patch: don't drop multicast/broadcast packets when all-multicast isn'…

    …t enabled in rx_burst_8023ad
    
    Signed-off-by: ywc689 <ywc689@163.com>
    ywc689 committed Aug 11, 2021
    Copy the full SHA
    2352f54 View commit details

Commits on Aug 17, 2021

  1. netif: fix several logging problem

    1. correct log of bonding mode4 dedicated queue enable
    2. polish logs in netif_tx_burst and increase error log level
    
    Signed-off-by: ywc689 <ywc689@163.com>
    ywc689 committed Aug 17, 2021
    Copy the full SHA
    625286a View commit details

Commits on Aug 26, 2021

  1. patch: don't drop lacp packets received from worker queues when dedic…

    …ated queue enabled
    
    Signed-off-by: ywc689 <ywc689@163.com>
    ywc689 committed Aug 26, 2021
    Copy the full SHA
    b5368f7 View commit details

Commits on Aug 27, 2021

  1. netif: don't flush flow filters after port starting up

    PMD drivers may preset some flow rules before netif port starts.
    For example, a ethertype flow is set by bond driver when dedicated
    queue is enabled with 8023ad mode. Flush flow filters after port
    starting up would invalidate the preset flow rules, thus we just
    do nothing and it should be expected the device drivers reset all
    the flow filters on initial stage of bootup.
    
    Signed-off-by: ywc689 <ywc689@163.com>
    ywc689 committed Aug 27, 2021
    Copy the full SHA
    14a9fb7 View commit details
  2. Merge pull request #752 from ywc689/bond4-bugfix

    patch: don't drop multicast/broadcast packets when all-multicast isn'…
    ywc689 authored Aug 27, 2021

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
    Copy the full SHA
    be1d364 View commit details
  3. bugfix: fix dpvs build problem

    Signed-off-by: ywc689 <ywc689@163.com>
    ywc689 committed Aug 27, 2021
    Copy the full SHA
    7ab7052 View commit details
  4. version: release v1.9.0

    Signed-off-by: ywc689 <ywc689@163.com>
    ywc689 committed Aug 27, 2021
    Copy the full SHA
    06ea842 View commit details
Showing 658 changed files with 105,024 additions and 12,555 deletions.
39 changes: 39 additions & 0 deletions .github/workflows/build-lts.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
name: BUILD-LTS

on:
push:
branches:
- 'DPVS-1.9-LTS'
release:
branches:
- 'DPVS-1.9-LTS'
types:
- published
pull_request:
branches:
- 'DPVS-1.9-LTS'
types:
- labeled

jobs:
build-basic:
runs-on: self-hosted
env:
PKG_CONFIG_PATH: /data/dpdk/20.11.10/dpdklib/lib64/pkgconfig
steps:
- name: Checkout Code
uses: actions/checkout@v4
- name: Build
run: make -j

build-all:
runs-on: self-hosted
env:
PKG_CONFIG_PATH: /data/dpdk/20.11.10/dpdklib/lib64/pkgconfig
steps:
- name: Checkout Code
uses: actions/checkout@v4
- name: Config
run: sed -i 's/=n$/=y/' config.mk
- name: Build
run: make -j
69 changes: 27 additions & 42 deletions .github/workflows/build.yaml
Original file line number Diff line number Diff line change
@@ -1,59 +1,44 @@
name: Build
name: BUILD

on:
push:
branches: [master, devel]
branches:
- 'master'
- 'devel'
release:
branches: [master]
types: [published]
branches:
- 'master'
- 'devel'
types:
- published
schedule:
- cron: '30 2 * * 1'
pull_request:
branches: [master, devel]
types: [labeled]
branches:
- 'master'
- 'devel'
types:
- labeled

jobs:
build-basic:
runs-on: self-hosted
env:
RTE_SDK: /data/dpdk/intel/dpdk-stable-18.11.2
RTE_TARGET: x86_64-native-linuxapp-gcc
PKG_CONFIG_PATH: /data/dpdk/24.11/dpdklib/lib64/pkgconfig
steps:
- uses: actions/checkout@v2
- name: make
run: make -j32

build-mlnx:
runs-on: self-hosted
env:
RTE_SDK: /data/dpdk/mlnx/dpdk-stable-18.11.2
RTE_TARGET: x86_64-native-linuxapp-gcc
steps:
- uses: actions/checkout@v2
- name: config
run: sed -i 's/^CONFIG_MLX5=./CONFIG_MLX5=y/' src/config.mk
- name: make
run: make -j32

build-debug:
runs-on: self-hosted
env:
RTE_SDK: /data/dpdk/intel/dpdk-stable-18.11.2
RTE_TARGET: x86_64-native-linuxapp-gcc
steps:
- uses: actions/checkout@v2
- name: config
run: sed -i 's/#CFLAGS +=/CFLAGS +=/' src/config.mk && sed -i 's/^#DEBUG := 1/DEBUG := 1/' src/Makefile
- name: make
run: make -j32
- name: Checkout Code
uses: actions/checkout@v4
- name: Build
run: make -j

build-olddpdk:
build-all:
runs-on: self-hosted
env:
RTE_SDK: /data/dpdk/intel/dpdk-stable-17.11.6
RTE_TARGET: x86_64-native-linuxapp-gcc
PKG_CONFIG_PATH: /data/dpdk/24.11/dpdklib/lib64/pkgconfig
steps:
- uses: actions/checkout@v2
- name: make
run: make -j32

- name: Checkout Code
uses: actions/checkout@v4
- name: Config
run: sed -i 's/=n$/=y/' config.mk
- name: Build
run: make -j
32 changes: 32 additions & 0 deletions .github/workflows/run-lts.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
name: RUN-LTS

on:
push:
branches:
- 'DPVS-1.9-LTS'
release:
branches:
- 'DPVS-1.9-LTS'
types:
- published
pull_request:
types:
- labeled
branches:
- 'DPVS-1.9-LTS'

jobs:
run-dpvs:
runs-on: self-hosted
env:
PKG_CONFIG_PATH: /data/dpdk/20.11.10/dpdklib/lib64/pkgconfig
#ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION: true
steps:
- name: Checkout Code
uses: actions/checkout@v4
- name: Build
run: make -j
- name: Install
run: make install
- name: Run DPVS
run: sudo dpvsci $(pwd)/bin/dpvs
36 changes: 22 additions & 14 deletions .github/workflows/run.yaml
Original file line number Diff line number Diff line change
@@ -1,28 +1,36 @@
name: Run
name: RUN

on:
push:
branches: [master, devel]
branches:
- 'master'
- 'devel'
release:
branches: [master]
types: [published]
branches:
- 'master'
- 'devel'
types:
- published
schedule:
- cron: '30 3 * * 1'
- cron: '30 3 * * 1'
pull_request:
branches: [master, devel]
types: [labeled]
types:
- labeled
branches:
- 'master'
- 'devel'

jobs:
run-dpvs:
runs-on: self-hosted
env:
RTE_SDK: /data/dpdk/intel/dpdk-stable-18.11.2
RTE_TARGET: x86_64-native-linuxapp-gcc
PKG_CONFIG_PATH: /data/dpdk/24.11/dpdklib/lib64/pkgconfig
steps:
- uses: actions/checkout@v2
- name: make
run: make -j32
- name: install
- name: Checkout Code
uses: actions/checkout@v4
- name: Build
run: make -j
- name: Install
run: make install
- name: run-dpvs
- name: Run DPVS
run: sudo dpvsci $(pwd)/bin/dpvs
167 changes: 167 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
# Important Notes:
#
# Two local dependencies should be ready before build container image with the Dockerfile.
# - MLNX_OFED: Please download it from the official website
# `https://network.nvidia.com/products/infiniband-drivers/linux/mlnx_ofed/`
# to the local fileserver indicated by the ARG `MLNX_OFED`.
# We cannot download it in the Dockerfile automatically for the authentication
# restriction of the website.
# - RPM_PKGCONFIG: The `pkg-config` tool of v0.29.2 is required to build DPVS.
# However, the default installation version on centos7 is v0.27.1. You need to
# download it or build the v0.29.2 RPM from source and put it to the the local
# fileserver indicated by the ARG `RPM_PKGCONFIG`. Alternatively, building a
# binary `pkg-config` and installing it in the local binary path is also ok.
#
# No kernel dependencies of dpdk/dpvs or network driver are built and installed.
# You should ensure the host has installed the drivers before running a dpvs
# container on it.
#

ARG BASE_IMAGE=centos:centos7.9.2009

###### `builder` stage builds the docker image for DPVS devel environments ######
FROM $BASE_IMAGE as builder

# replace it with the address of your own file server
ARG FILE_SERVER=127.0.0.1

LABEL maintainer="IQiYi/QLB team"
LABEL email="iig_cloud_qlb@qiyi.com"
LABEL project="https://github.com/iqiyi/dpvs"
LABEL image_maker="docker build --target builder -t github.com/iqiyi/dpvs-builder:{version} ."

# download the tarball from https://network.nvidia.com/products/infiniband-drivers/linux/mlnx_ofed/
# FIXME: remove thefile server dependency
ARG MLNX_OFED=http://$FILE_SERVER/deploy/MLNX_OFED/MLNX_OFED_LINUX-5.6-2.0.9.0-rhel7.9-x86_64.tgz

# the pkgconfig default installed version is 0.27.1 on centos7, update it to 0.29.2
# the 0.29.2 rpm is built from source based on the rpm spec file of 0.27.1.
# FIXME: remove the file server dependency
ARG RPM_PKGCONFIG=http://$FILE_SERVER/deploy/rpms/centos7/pkgconfig-0.29.2-1.el7.x86_64.rpm

# golang install files
ARG GO_PACKAGE=https://go.dev/dl/go1.20.4.linux-amd64.tar.gz

# go-swagger binary
ARG GO_SWAGGER_BIN=https://github.com/go-swagger/go-swagger/releases/download/v0.30.4/swagger_darwin_amd64

ENV PKG_CONFIG_PATH=/dpvs/dpdk/dpdklib/lib64/pkgconfig
ENV PATH=$PATH:/usr/local/go/bin

COPY . /dpvs/
WORKDIR /dpvs

RUN set -x \
&& yum install -y epel-release \
&& yum install -y tcl tk iproute wget vim patch meson python36 emacs-filesystem \
gcc make lsof libnl3 ethtool libpcap pciutils numactl-libs numactl-devel \
openssl-devel automake popt-devel ninja-build meson libnl3-devel cgdb git \
&& mkdir deps \
&& rpm -Uvh $RPM_PKGCONFIG \
&& wget $GO_PACKAGE -P deps \
&& tar -C /usr/local -xzf deps/go*.gz \
&& curl -L -o /usr/local/bin/swagger $GO_SWAGGER_BIN \
&& chmod 544 /usr/local/bin/swagger \
&& wget $MLNX_OFED -P deps \
&& tar xf deps/$(basename $MLNX_OFED) -C deps \
&& pushd deps/$(basename $MLNX_OFED | sed 's/.tgz//') \
&& ./mlnxofedinstall --user-space-only --upstream-libs \
--dpdk --without-fw-update --force \
&& popd \
&& sed -i 's/Denable_kmods=true/Denable_kmods=false/' scripts/dpdk-build.sh \
&& ./scripts/dpdk-build.sh \
&& sed -i 's/CONFIG_DPVS_AGENT=n/CONFIG_DPVS_AGENT=y/' config.mk \
&& make -j && make install \
&& rm -rf deps && yum clean all

RUN set -x \
&& mkdir libraries \
&& ldd bin/dpvs | grep "=> /" | awk '{print $3}' | xargs -I '{}' cp '{}' libraries \
&& ldd bin/ipvsadm | grep "=> /" | awk '{print $3}' | xargs -I '{}' cp '{}' libraries \
&& ldd bin/dpip | grep "=> /" | awk '{print $3}' | xargs -I '{}' cp '{}' libraries \
&& ldd bin/keepalived | grep "=> /" | awk '{print $3}' | xargs -I '{}' cp '{}' libraries

ENTRYPOINT ["/bin/bash"]


###### `runner` stage builds the docker image for DPVS product environments ######
#
# docker run --name dpvs \
# -d --privileged --network host \
# -v /dev:/dev \
# -v /sys:/sys \
# -v /lib/modules:/lib/modules \
# -v {dpvs-directory}:/dpvs \
# github.com/iqiyi/dpvs:{version} \
# -c /dpvs/dpvs.conf -p /dpvs/dpvs.pid -x /dpvs/dpvs.ipc \
# -- -a {nic-pci-bus-id}
#
# docker run --name ipvsadm \
# --rm --network none \
# -v {dpvs-directory}:/dpvs \
# -e DPVS_IPC_FILE=/dpvs/dpvs.ipc \
# --entrypoint=/usr/bin/ipvsadm \
# github.com/iqiyi/dpvs:{version} \
# ...
#
# docker run --name dpip \
# --rm --network none \
# -v {dpvs-directory}:/dpvs \
# -e DPVS_IPC_FILE=/dpvs/dpvs.ipc \
# --entrypoint=/usr/bin/dpip \
# github.com/iqiyi/dpvs:{version} \
# ...
#
# docker run --name keepalived \
# -d --privileged --network host \
# --cap-add=NET_ADMIN --cap-add=NET_BROADCAST --cap-add=NET_RAW \
# -v {dpvs-directory}:/dpvs \
# -e DPVS_IPC_FILE=/dpvs/dpvs.ipc \
# --entrypoint=/usr/bin/keepalived github.com/iqiyi/dpvs:{version} \
# -D -n -f /dpvs/keepalived.conf \
# --log-console --log-facility=6 \
# --pid=/dpvs/keepalived.pid \
# --vrrp_pid=/dpvs/vrrp.pid \
# --checkers_pid=/dpvs/checkers.pid
#
# docker run --name dpvs-agent \
# -d --network host \
# -v {dpvs-directory}:/dpvs \
# --entrypoint=/usr/bin/dpvs-agent \
# github.com/iqiyi/dpvs:{version} \
# --log-dir=/dpvs/logs/dpvs-agent \
# --ipc-sockopt-path=/dpvs/dpvs.ipc\
# --host=0.0.0.0 --port=6601
#
# docker run --name healthcheck \
# -d --network host \
# -v {dpvs-directory}:/dpvs \
# --entrypoint=/usr/bin/healthcheck \
# github.com/iqiyi/dpvs:{version} \
# -log_dir=/dpvs/logs/healthcheck \
# -lb_iface_addr=localhost:6601
#
FROM $BASE_IMAGE as runner

LABEL maintainer="IQiYi/QLB team"
LABEL email="iig_cloud_qlb@qiyi.com"
LABEL project="https://github.com/iqiyi/dpvs"
LABEL image_maker="docker build --target runner -t github.com/iqiyi/dpvs:{version} ."

RUN set -x \
&& yum install -y iproute wget ncat nmap tcpdump socat \
&& yum clean all

COPY --from=builder /dpvs/bin/ /usr/bin
COPY --from=builder /dpvs/libraries /usr/lib64

# Other available entrypoint are:
# * /usr/bin/keepalived
# * /usr/bin/dpvs-agent
# * /usr/bin/healthcheck
# * /usr/bin/ipvsadm
# * /usr/bin/dpip
# * /bin/bash
# use `docker run --entrypoint ...` to override the default entrypoint.

ENTRYPOINT ["/usr/bin/dpvs"]
6 changes: 6 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -21,6 +21,7 @@
MAKE = make
CC = gcc
LD = ld
RM = rm

SUBDIRS = src tools

@@ -29,6 +30,8 @@ export INSDIR

export KERNEL = $(shell /bin/uname -r)

include $(CURDIR)/config.mk

all:
for i in $(SUBDIRS); do $(MAKE) -C $$i || exit 1; done

@@ -43,3 +46,6 @@ distclean:
install:all
-mkdir -p $(INSDIR)
for i in $(SUBDIRS); do $(MAKE) -C $$i install || exit 1; done

uninstall:
-$(RM) -f $(TARGET) $(INSDIR)/*
205 changes: 131 additions & 74 deletions README.md

Large diffs are not rendered by default.

64 changes: 31 additions & 33 deletions conf/dpvs.bond.conf.sample
Original file line number Diff line number Diff line change
@@ -14,14 +14,17 @@
global_defs {
log_level WARNING
! log_file /var/log/dpvs.log
! log_async_mode off
! pdump off
! <init> log_async_mode off
! <init> kni on
! <init> pdump off
lldp on
}

! netif config
netif_defs {
<init> pktpool_size 1048575
<init> pktpool_cache 256
<init> fdir_mode perfect

<init> device dpdk0 {
rx {
@@ -32,14 +35,11 @@ netif_defs {
tx {
queue_number 8
descriptor_number 1024
}
fdir {
mode perfect
pballoc 64k
status matched
mbuf_fast_free on
}
! mtu 1500
! promisc_mode
! allmulticast
! kni_name dpdk0.kni
}

@@ -52,14 +52,11 @@ netif_defs {
tx {
queue_number 8
descriptor_number 1024
}
fdir {
mode perfect
pballoc 64k
status matched
mbuf_fast_free on
}
! mtu 1500
! promisc_mode
! allmulticast
! kni_name dpdk1.kni
}

@@ -73,14 +70,11 @@ netif_defs {
tx {
queue_number 8
descriptor_number 1024
}
fdir {
mode perfect
pballoc 64k
status matched
mbuf_fast_free on
}
! mtu 1500
! promisc_mode
! allmulticast
! kni_name dpdk2.kni
}

@@ -93,14 +87,11 @@ netif_defs {
tx {
queue_number 8
descriptor_number 1024
}
fdir {
mode perfect
pballoc 64k
status matched
mbuf_fast_free on
}
! mtu 1500
! promisc_mode
! allmulticast
! kni_name dpdk3.kni
}

@@ -109,6 +100,7 @@ netif_defs {
slave dpdk0
slave dpdk1
primary dpdk0
! numa_node 1 ! /sys/bus/pci/devices/[slaves' pci]/numa_node
kni_name bond0.kni
}

@@ -117,6 +109,7 @@ netif_defs {
slave dpdk2
slave dpdk3
primary dpdk2
! numa_node 1 ! /sys/bus/pci/devices/[slaves' pci]/numa_node
kni_name bond1.kni
}
}
@@ -250,7 +243,7 @@ worker_defs {
<init> worker cpu8 {
type slave
cpu_id 8
icmp_redirect_core
! icmp_redirect_core
port bond0 {
rx_queue_ids 7
tx_queue_ids 7
@@ -265,17 +258,18 @@ worker_defs {
}
}

!<init> worker cpu9 {
!<init> worker cpu17 {
! type kni
! cpu_id 9
! cpu_id 17
! port bond0 {
! rx_queue_ids 8
! tx_queue_ids 8
! }
! port bond1 {
! rx_queue_ids 8
! tx_queue_ids 8
! }
!}

}

! timer config
@@ -287,7 +281,12 @@ timer_defs {
! dpvs neighbor config
neigh_defs {
<init> unres_queue_length 128
<init> timeout 60
timeout 60
}

! dpvs ipset config
ipset_defs {
<init> ipset_hash_pool_size 131072
}

! dpvs ipv4 config
@@ -319,9 +318,6 @@ ctrl_defs {
sync_msg_timeout_us 20000
priority_level low
}
ipc_msg {
<init> unix_domain /var/run/dpvs_ctrl
}
}

! ipvs config
@@ -331,7 +327,7 @@ ipvs_defs {
<init> conn_pool_cache 256
conn_init_timeout 3
! expire_quiescent_template
! fast_xmit_close
! <init> fast_xmit_close
! <init> redirect off
}

@@ -340,6 +336,7 @@ ipvs_defs {
uoa_mode opp
uoa_max_trail 3
timeout {
oneway 60
normal 300
last 3
}
@@ -366,7 +363,7 @@ ipvs_defs {
mss 1452
ttl 63
sack
! wscale
! wscale 0
! timestamp
}
! defer_rs_syn
@@ -386,5 +383,6 @@ ipvs_defs {

! sa_pool config
sa_pool {
pool_hash_size 16
<init> pool_hash_size 16
<init> flow_enable on
}
64 changes: 43 additions & 21 deletions conf/dpvs.conf.items
Original file line number Diff line number Diff line change
@@ -11,17 +11,22 @@

! global config
global_defs {
#daemon <disalbe>
log_level INFO <none>
log_file /var/log/dpvs.log <none>
<init> log_async_mode off <off, on|off>
<init> pdump off <off, on|off>
#daemon <disalbe>
log_level INFO <none>
log_file /var/log/dpvs.log <none>
log_with_timestamp off <off, on|off> # note: only effective for async log now
<init> log_async_mode off <off, on|off>
<init> log_async_pool_size 16383 <16383, 1023-unlimited>
<init> pdump off <off, on|off>
<init> kni on <on, on|off>
lldp on <off, on|off>
}

! netif config
netif_defs {
<init> pktpool_size 2097151 <65535, 1023-134217728>
<init> pktpool_cache 256 <256, 32-8192>
<init> fdir_mode perfect <perfect, perfect|signature> # only for ixgbe

<init> device dpdk0 {
rx {
@@ -33,15 +38,12 @@ netif_defs {
tx {
queue_number 6 <16, 0-16>
descriptor_number 512 <512, 16-8192>
}
fdir {
<init> filter on <on, on/off>
mode perfect <perfect, none|signature|perfect|perfect_mac_vlan|perfect_tunnel>
pballoc 64k <64k, 64k|128k|256k>
status matched <matched, close|matched|always>
mbuf_fast_free on <on, on|off> ## Disable it when ports used for two-arm forwarding
## located at different NUMA nodes.
}
! mtu 1500 <1500,0-9000>
! promisc_mode <disable>
! allmulticast <disable>
! kni_name dpdk0.kni <char[32]>
}

@@ -58,19 +60,31 @@ netif_defs {
}
! mtu 1500
! promisc_mode
! allmulticast <disable>
! kni_name dpdk1.kni
}

<init> device bond0 {
<init> bonding bond0 {
mode 4 <0-6>
slave dpdk0 <device name>
slave dpdk1 <device name>
primary dpdk0 <device name, use primary slave queue conf for bond>
numa_node 0 <0, int value from /sys/bus/pci/devices/[pci_bus]/numa_node>
kni_name bond0.kni <char[32]>

! supported options:
! dedicated_queues=on|enable|off|disable, default on
options OPT1=VAL1;OPT2=VAL2;...
}
}

! worker config (lcores)
! notes:
! 1. rx(tx) queue ids MUST start from 0 and continous
! 2. cpu ids and rx(tx) queue ids MUST be unique, repeated ids is forbidden
! 3. cpu ids identify dpvs workers only, and not correspond to physical cpu cores.
! If you are to specify cpu cores on which to run dpvs, please use dpdk eal options,
! such as "-c", "-l", "--lcores". Use "dpvs -- --help" for supported eal options.
worker_defs {
<init> worker cpu0 {
cpu_id 0
@@ -139,9 +153,11 @@ worker_defs {
cpu_id 5
icmp_redirect_core
port dpdk0 {
rx_queue_ids 4
tx_queue_ids 6
}
port dpdk1 {
rx_queue_ids 4
tx_queue_ids 4
}
}
@@ -159,6 +175,11 @@ neigh_defs {
timeout 60 <60, 1-3600>
}

! dpvs ipset config
ipset_defs {
<init> ipset_hash_pool_size 131072 <131072, 65536-524288>
}

! dpvs ipv4 config
ipv4_defs {
forwarding off <off, on/off>
@@ -175,29 +196,27 @@ ipv4_defs {
ipv6_defs {
disable off <off, on/off>
forwarding off <off, on/off>
addr_gen_mode eui64 <eui64,none,stable-privacy,random>
stable_secret "" <128-bit hexadecimal string, used in stable-privacy mode >
<stable_secret can be produced by `uuidgen | sed 's/-//g'>
route6 {
<init> method "hlist" <"hlist"/"lpm">
recycle_time 10 <10, 1-36000>
lpm {
<init> lpm6_max_rules 1024 <1024, 16-2147483647>
<init> lpm6_num_tbl8s 65536 <65536, 16-2147483647>
<init> rt6_array_size 65536 <65536, 16-2147483647>
<init> rt6_hash_bucket 256 <256, 2-2147483647>
<init> lpm6_max_rules 1024 <1024, 16-65536>
<init> lpm6_num_tbl8s 16384 <16384, 256-1048576>
<init> rt6_hash_bucket 256 <256, 16-65536>
}
}
}

! control plane config
ctrl_defs {
lcore_msg {
#bucket_number 256
<init> ring_size 4096 <4096, 256-524288>
sync_msg_timeout_us 2000 <2000, 1-∞>
priority_level low <low, low|norm|high|ign>
}
ipc_msg {
<init> unix_domain /var/run/dpvs_ctrl </var/run/dpvs_ctrl, max chars: 256>
}
}

! ipvs config
@@ -216,6 +235,7 @@ ipvs_defs {
uoa_mode opp <opp for private protocol by default, or ipo for IP-option mode>
uoa_max_trail 3 <max trails for send UOA for a connection>
timeout { <1-31535999>
oneway 300 <300>
normal 300 <300>
last 3 <3>
}
@@ -242,9 +262,10 @@ ipvs_defs {
mss 1452 <1452, 1-65535>
ttl 63 <63, 1-255>
sack <enable>
! wscale <disable>
! wscale <0, 0-14>
! timestamp <disable>
}
!close_client_window <disable>
!defer_rs_syn <disable>
rs_syn_max_retry 3 <3, 1-99>
ack_storm_thresh 10 <10, 1-999>
@@ -262,4 +283,5 @@ ipvs_defs {

sa_pool {
<init> pool_hash_size 16 <16, 1-128>
<init> flow_enable on <on, on|off>
}
54 changes: 32 additions & 22 deletions conf/dpvs.conf.sample
Original file line number Diff line number Diff line change
@@ -14,14 +14,17 @@
global_defs {
log_level WARNING
! log_file /var/log/dpvs.log
! log_async_mode on
! pdump off
! <init> log_async_mode on
! <init> kni on
! <init> pdump off
lldp on
}

! netif config
netif_defs {
<init> pktpool_size 1048575
<init> pktpool_cache 256
<init> fdir_mode perfect

<init> device dpdk0 {
rx {
@@ -32,14 +35,11 @@ netif_defs {
tx {
queue_number 8
descriptor_number 1024
}
fdir {
mode perfect
pballoc 64k
status matched
mbuf_fast_free on
}
! mtu 1500
! promisc_mode
! allmulticast
kni_name dpdk0.kni
}

@@ -52,14 +52,11 @@ netif_defs {
tx {
queue_number 8
descriptor_number 1024
}
fdir {
mode perfect
pballoc 64k
status matched
mbuf_fast_free on
}
! mtu 1500
! promisc_mode
! allmulticast
kni_name dpdk1.kni
}

@@ -69,10 +66,17 @@ netif_defs {
! slave dpdk1
! primary dpdk0
! kni_name bond0.kni
! options dedicated_queues=off # for mode 4 only
!}
}

! worker config (lcores)
! notes:
! 1. rx(tx) queue ids MUST start from 0 and continous
! 2. cpu ids and rx(tx) queue ids MUST be unique, repeated ids is forbidden
! 3. cpu ids identify dpvs workers only, and not correspond to physical cpu cores.
! If you are to specify cpu cores on which to run dpvs, please use dpdk eal options,
! such as "-c", "-l", "--lcores". Use "dpvs -- --help" for supported eal options.
worker_defs {
<init> worker cpu0 {
type master
@@ -201,7 +205,7 @@ worker_defs {
<init> worker cpu8 {
type slave
cpu_id 8
icmp_redirect_core
! icmp_redirect_core
port dpdk0 {
rx_queue_ids 7
tx_queue_ids 7
@@ -216,17 +220,18 @@ worker_defs {
}
}

!<init> worker cpu9 {
!<init> worker cpu17 {
! type kni
! cpu_id 9
! cpu_id 17
! port dpdk0 {
! rx_queue_ids 8
! tx_queue_ids 8
! }
! port dpdk1 {
! rx_queue_ids 8
! tx_queue_ids 8
! }
!}

}

! timer config
@@ -241,6 +246,11 @@ neigh_defs {
timeout 60
}

! dpvs ipset config
ipset_defs {
<init> ipset_hash_pool_size 131072
}

! dpvs ipv4 config
ipv4_defs {
forwarding off
@@ -270,9 +280,6 @@ ctrl_defs {
sync_msg_timeout_us 20000
priority_level low
}
ipc_msg {
<init> unix_domain /var/run/dpvs_ctrl
}
}

! ipvs config
@@ -282,7 +289,7 @@ ipvs_defs {
<init> conn_pool_cache 256
conn_init_timeout 3
! expire_quiescent_template
! fast_xmit_close
! <init> fast_xmit_close
! <init> redirect off
}

@@ -291,6 +298,7 @@ ipvs_defs {
uoa_mode opp
uoa_max_trail 3
timeout {
oneway 60
normal 300
last 3
}
@@ -317,9 +325,10 @@ ipvs_defs {
mss 1452
ttl 63
sack
! wscale
! wscale 0
! timestamp
}
close_client_window
! defer_rs_syn
rs_syn_max_retry 3
ack_storm_thresh 10
@@ -337,5 +346,6 @@ ipvs_defs {

! sa_pool config
sa_pool {
pool_hash_size 16
<init> pool_hash_size 16
<init> flow_enable on
}
45 changes: 23 additions & 22 deletions conf/dpvs.conf.single-bond.sample
Original file line number Diff line number Diff line change
@@ -14,7 +14,9 @@
global_defs {
log_level WARNING
! log_file /var/log/dpvs.log
! log_async_mode on
! <init> log_async_mode on
! <init> kni on
lldp on
}

! netif config
@@ -32,13 +34,9 @@ netif_defs {
queue_number 8
descriptor_number 1024
}
fdir {
mode perfect
pballoc 64k
status matched
}
! mtu 1500
! promisc_mode
! allmulticast
! kni_name dpdk0.kni
}

@@ -52,22 +50,20 @@ netif_defs {
queue_number 8
descriptor_number 1024
}
fdir {
mode perfect
pballoc 64k
status matched
}
! mtu 1500
! promisc_mode
! allmulticast
! kni_name dpdk2.kni
}

<init> bonding bond0 {
mode 0
mode 4
slave dpdk0
slave dpdk2
primary dpdk0
! numa_node 1 ! /sys/bus/pci/devices/[slaves' pci]/numa_node
kni_name bond0.kni
options dedicated_queues=off
}
}

@@ -158,7 +154,7 @@ worker_defs {
<init> worker cpu8 {
type slave
cpu_id 8
icmp_redirect_core
! icmp_redirect_core
port bond0 {
rx_queue_ids 7
tx_queue_ids 7
@@ -167,14 +163,14 @@ worker_defs {
}
}

!<init> worker cpu9 {
!<init> worker cpu17 {
! type kni
! cpu_id 9
! cpu_id 17
! port bond0 {
! rx_queue_ids 8
! tx_queue_ids 8
! }
!}

}

! timer config
@@ -189,6 +185,11 @@ neigh_defs {
timeout 60
}

! dpvs ipset config
ipset_defs {
<init> ipset_hash_pool_size 131072
}

! dpvs ipv4 config
ipv4_defs {
forwarding off
@@ -218,9 +219,6 @@ ctrl_defs {
sync_msg_timeout_us 20000
priority_level low
}
ipc_msg {
<init> unix_domain /var/run/dpvs_ctrl
}
}

! ipvs config
@@ -230,7 +228,7 @@ ipvs_defs {
<init> conn_pool_cache 256
conn_init_timeout 3
! expire_quiescent_template
! fast_xmit_close
! <init> fast_xmit_close
! <init> redirect off
}

@@ -239,6 +237,7 @@ ipvs_defs {
uoa_mode opp
uoa_max_trail 3
timeout {
oneway 60
normal 300
last 3
}
@@ -265,9 +264,10 @@ ipvs_defs {
mss 1452
ttl 63
sack
! wscale
! wscale 0
! timestamp
}
close_client_window
! defer_rs_syn
rs_syn_max_retry 3
ack_storm_thresh 10
@@ -285,5 +285,6 @@ ipvs_defs {

! sa_pool config
sa_pool {
pool_hash_size 16
<init> pool_hash_size 16
<init> flow_enable on
}
35 changes: 19 additions & 16 deletions conf/dpvs.conf.single-nic.sample
Original file line number Diff line number Diff line change
@@ -14,7 +14,9 @@
global_defs {
log_level WARNING
! log_file /var/log/dpvs.log
! log_async_mode on
! <init> log_async_mode on
! <init> kni on
lldp on
}

! netif config
@@ -32,13 +34,9 @@ netif_defs {
queue_number 8
descriptor_number 1024
}
fdir {
mode perfect
pballoc 64k
status matched
}
! mtu 1500
! promisc_mode
! allmulticast
kni_name dpdk0.kni
}
}
@@ -130,7 +128,7 @@ worker_defs {
<init> worker cpu8 {
type slave
cpu_id 8
icmp_redirect_core
! icmp_redirect_core
port dpdk0 {
rx_queue_ids 7
tx_queue_ids 7
@@ -139,14 +137,14 @@ worker_defs {
}
}

!<init> worker cpu9 {
!<init> worker cpu17 {
! type kni
! cpu_id 9
! cpu_id 17
! port dpdk0 {
! rx_queue_ids 8
! tx_queue_ids 8
! }
!}

}

! timer config
@@ -161,6 +159,11 @@ neigh_defs {
timeout 60
}

! dpvs ipset config
ipset_defs {
<init> ipset_hash_pool_size 131072
}

! dpvs ipv4 config
ipv4_defs {
forwarding off
@@ -190,9 +193,6 @@ ctrl_defs {
sync_msg_timeout_us 20000
priority_level low
}
ipc_msg {
<init> unix_domain /var/run/dpvs_ctrl
}
}

! ipvs config
@@ -202,7 +202,7 @@ ipvs_defs {
<init> conn_pool_cache 256
conn_init_timeout 3
! expire_quiescent_template
! fast_xmit_close
! <init> fast_xmit_close
! <init> redirect off
}

@@ -211,6 +211,7 @@ ipvs_defs {
uoa_mode opp
uoa_max_trail 3
timeout {
oneway 60
normal 300
last 3
}
@@ -237,9 +238,10 @@ ipvs_defs {
mss 1452
ttl 63
sack
! wscale
! wscale 0
! timestamp
}
close_client_window
! defer_rs_syn
rs_syn_max_retry 3
ack_storm_thresh 10
@@ -257,5 +259,6 @@ ipvs_defs {

! sa_pool config
sa_pool {
pool_hash_size 16
<init> pool_hash_size 16
<init> flow_enable on
}
33 changes: 33 additions & 0 deletions config.mk
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# configs
export CONFIG_DPVS_MAX_SOCKET=2
export CONFIG_DPVS_MAX_LCORE=64

## modules
export CONFIG_DPVS_AGENT=n
export CONFIG_DPVS_LOG=y
export CONFIG_PDUMP=y
export CONFIG_ICMP_REDIRECT_CORE=n

# debugging and logging
export CONFIG_DEBUG=n
export CONFIG_DPVS_NEIGH_DEBUG=n
export CONFIG_RECORD_BIG_LOOP=n
export CONFIG_DPVS_SAPOOL_DEBUG=n
export CONFIG_DPVS_IPVS_DEBUG=n
export CONFIG_DPVS_SERVICE_DEBUG=n
export CONFIG_SYNPROXY_DEBUG=n
export CONFIG_TIMER_MEASURE=n
export CONFIG_TIMER_DEBUG=n
export CONFIG_DPVS_CFG_PARSER_DEBUG=n
export CONFIG_NETIF_BONDING_DEBUG=n
export CONFIG_TC_DEBUG=n
export CONFIG_DPVS_IPVS_STATS_DEBUG=n
export CONFIG_DPVS_IP_HEADER_DEBUG=n
export CONFIG_DPVS_MBUF_DEBUG=n
export CONFIG_DPVS_IPSET_DEBUG=n
export CONFIG_NDISC_DEBUG=n
export CONFIG_MSG_DEBUG=n
export CONFIG_DPVS_MP_DEBUG=n
export CONFIG_DPVS_NETIF_DEBUG=n
export CONFIG_DPVS_ICMP_DEBUG=n
export CONFIG_DPVS_ROUTE_DEBUG=n
743 changes: 743 additions & 0 deletions doc/IPset.md

Large diffs are not rendered by default.

15 changes: 8 additions & 7 deletions doc/TODO.md
Original file line number Diff line number Diff line change
@@ -3,16 +3,17 @@ DPVS TODO list

* [x] IPv6 Support
* [x] Documents update
* [ ] NIC without Flow-Director (FDIR)
* [x] NIC without Flow-Director (FDIR)
- [x] Packet redirect to workers
- [ ] RSS pre-calcuating
- [ ] Replace fdir with Generic Flow(rte_flow)
- [x] Replace fdir with Generic Flow(rte_flow)
* [x] Merge DPDK stable 18.11
* [ ] Merge DPDK stable 20.11
* [x] Merge DPDK stable 20.11
* [x] Merge DPDK stable 24.11
* [x] Service whitelist ACL
* [ ] IPset Support
* [x] IPset Support
- [ ] SNAT ACL with IPset
- [ ] TC policing with IPset
- [x] TC policing with IPset
* [x] Refactor Keepalived (porting latest stable keepalived)
* [ ] Keepalived stability test and optimization.
* [x] Packet Capture and Tcpdump Support
@@ -21,13 +22,13 @@ DPVS TODO list
- [ ] Session based logging (creation, expire, statistics)
* [x] CI, Test Automation Setup
* [ ] Performance Optimization
- [ ] Performance test tools and docs
- [x] Performance test tools and docs
- [x] CPU Performance Tuning
- [x] Memory Performance Tuning
- [ ] Numa-aware NIC
- [ ] Minimal Running Resource
- [x] KNI performance Tuning
- [ ] Multi-core Performance Tuning
- [x] Multi-core Performance Tuning
- [x] TC performance Tuning
* [x] 25G/40G NIC Supports
* [ ] VxLAN Support
73 changes: 70 additions & 3 deletions doc/Worker-Performance-Tuning.md
Original file line number Diff line number Diff line change
@@ -11,7 +11,7 @@ DPVS is a multi-thread DPDK application program. It is based on the "polling" fr
* **Isolate Recieving Worker**: the optional workers used to take the responsibility of *Forwarding Worker* to receive packets to reduce NIC packets imiss.
* **KNI Worker**: an optional worker used to do kni related jobs to avoid performance disturbance caused by work loads of *Master/Forwarding Worker*.

As all other DPDK applications, each DPVS Worker is bound to a distinct CPU core to avoid they interfere with each other. By default, the first N CPUs of the system are bound with DPVS Workers. The performance may not good enough when many other work loads are scheduled into these CPUs by the system. For example, CPU0, the first CPU core in the system, is generally a lot busier than other CPU cores, because many processes, interrupts, and kernel threads run on it by default. The following of this doc would tell you how to alleviate/offload irrelative work load on DPVS Workers.
Like other DPDK applications, each DPVS Worker is bound to a distinct CPU core to avoid they interfere with each other. By default, the first N CPUs of the system are bound with DPVS Workers. The performance may not good enough when many other work loads are scheduled into these CPUs by the system. For example, CPU0, the first CPU core in the system, is generally a lot busier than other CPU cores, because many processes, interrupts, and kernel threads run on it by default. The following of this doc would tell you how to alleviate/offload irrelative work load on DPVS Workers.

### When do you need to consider this performance tuning?

@@ -30,7 +30,7 @@ In case of the following situations, you should consider this performance tuning
* There exists big worker loops.

> To observe worker loop time, you should uncomment the macro "CONFIG_RECORD_BIG_LOOP" in src/config.mk,recompile DPVS program and run it.
> To observe worker loop time, you should set "CONFIG_RECORD_BIG_LOOP=y" in `config.mk`,recompile DPVS program and run it.
>
> Besides, macros "BIG_LOOP_THRESH_SLAVE" and "BIG_LOOP_THRESH_MASTER" define the threshold time of worker loop. Modify them if needed.
@@ -61,7 +61,7 @@ Generally speaking, we may follow some practical rules below to choose the CPU c
You can get the CPU layout of your system by the script provided by DPDK `cpu_layout.py `, example as shown below.

```
[root@~ dpdk]# python dpdk-stable-18.11.2/usertools/cpu_layout.py
[root@~ dpdk]# python [DPDK-SOURCE]/usertools/cpu_layout.py
======================================================================
Core and Socket Information (as reported by '/sys/devices/system/cpu')
======================================================================
@@ -282,3 +282,70 @@ KiB Swap: 4194300 total, 4194300 free, 0 used. 16171432 avail Mem
```

### Assign a dedicated worker for KNI

As is the diagram shown below, KNI traffic are processed by default on Master and Forwarding Workers. But we can configure a didecated worker for KNI traffic to avoid possible disturbances caused by overloaded dataplane.

![kni-flow](pics/kni-flow-2.png)

The configurations for KNI Worker are almost the same with the Forwarding Workers except that the `type` field should be set to `kni`. Rx/Tx queues should be configured for target NICs, receiving packets from network devices and transmitting to corresponding KNI devices, or vice versa. Note that we can configure either Rx or Tx queues only, which isolates processes of inbound or outbound traffic to/from KNI onto KNI worker, respectively.
Rx queues are required by DPVS's KNI address flow which directs KNI inbound traffic to the dedicated Rx queue using DPDK rte_flow. If Rx queue is not configured, Forwarding Workers are responsible for packets reception, handing over the received packets to KNI worker, and then the KNI Worker forwards the packets to KNI interfaces. If no RX queue isconfigured, creating KNI address flow would fail. On the other hand, the Tx queues must be configured if KNI Worker is enabled, or the outbound traffic from KNI interfaces is dropped due to a lack of Tx queue, as shown in the diagram below.

![kni-flow](pics/kni-flow-1.png)

**The steps to use dedicated worker for KNI**

* S1. Add KNI worker configurations to `dpvs.conf`. For example:

```
<init> worker cpu9 {
type kni
cpu_id 9
port bond0 {
rx_queue_ids 8
tx_queue_ids 8
}
}
```
* S2. Boot up DPVS, and configure KNI interface up. For example, we configured a KNI interface on bond0.101.

```
55: bond0.101.kni: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 98:03:9b:1b:40:a4 brd ff:ff:ff:ff:ff:ff
inet 192.168.88.88/24 scope global bond0.101.kni
valid_lft forever preferred_lft forever
inet6 2001::88/64 scope global
valid_lft forever preferred_lft forever
```
Now, you can ping 192.168.88.88 and 2001::88, and all OK.

> Notes: If DPVS routes matched the KNI IPs, you should add `kni_host` routes for the KNI IPs.
* S3. (If supported) Configure KNI address flow.

```
dpip flow add type kni 192.168.88.88 dev bond0.101
dpip flow add type kni 2001::88 dev bond0.101
```
Now, all packets destined to 192.168.88.88 or 2001::88 are sent to Rxq8 on bond0.


**Performance tests**

We designed 5 cases to examine the performance of KNI worker, and listed the test results below.

| Test Cases | ping (min/avg/max/mdev) | bandwidth (iperf tcp) | forwarding rate |
| ---------------------------------------------------- | ----------------------------- | --------------------- | --------------- |
| no kni worker (idle dataplane) | 0.468/3.196/6.893/1.547 ms | 2.64 Gbits/sec | 243K packets/s |
| with kni worker, no addr flow (idle dataplane) | 0.050/2.102/5.288/0.565 ms | 4.54 Gbits/sec | 413K packets/s |
| with kni worker, with addr flow (idle dataplane) | 0.409/2.346/11.650/1.179 ms | 4.57 Gbits/sec | 416K packets/s |
| with kni worker, no addr flow (overload dataplane) | 0.628/29.880/42.010/12.026 ms | 341 Mbits/sec | 29K packets/s |
| with kni worker, with addr flow (overload dataplane) | 0.544/2.139/3.554/0.406 ms | 4.53 Gbits/sec | 410K packets/s |

*Notes: Overload dataplane is simulated by adding 1ms delay to each loop of forwarding workers.*

We got the following conclusions from the test results.

1. Dedicated KNI worker increases bandwidth of KNI interfaces.
2. KNI address flow protects KNI traffic from load disturbances of dataplane.

47 changes: 47 additions & 0 deletions doc/client-address-conservation-in-fullnat.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
Client Address Conservation in Fullnat
---

The original client addresses are substituted with DPVS's local addresses in Fullnat forwarding mode so that auxiliary means is required to pass them to realservers. Three solutions have been developed in DPVS for the problem -- *TOA*, *UOA*, and *Proxy Protocol*, with each having its own pros and cons. The document is to elaborate on them.

* **TOA**

Client address is encapsulated in a private TCP option (opcode 254) by DPVS, and parsed into the connected TCP socket on realserver by a kernel module named [toa.ko](../kmod/toa/). By default, it requires no changes in realserver application programs for fnat44, fnat66. But an extra syscall to `getsockopt` with parameters `IPPROTO_IP` and `TOA_SO_GET_LOOKUP` is required for fnat64 to retrieve the original IPv6 client address from toa.

* **UOA**

UOA is the counterpart in UDP protocol. It supports two modes: *IP Option Mode* (ipo) and *Private Protocol Mode* (opp). Client address is encapsulated into a private IPv4 option (opcode 31) in ipo mode, and into a private layer4 protocol named "option protocol" (protocol number 248) in opp mode respectively. Similarly, a kernel module name [uoa.ko](../kmod/uoa/) is required to parse the original client address from raw packets on realserver. Realserver application programs should use `getsockopt` with parameters `IPPROTO_IP` and `UOA_SO_GET_LOOKUP` immediately after user data reception to retrieve the original address from uoa. Note that not all kinds of network switches or routers support private IPv4 options or layer4 private protocols. Be aware of your network restrictions before using UOA.

* **Proxy Protocol**:

[Proxy Protocol](https://www.haproxy.org/download/2.9/doc/proxy-protocol.txt) is a widely-used protocol for client address conservation on reverse proxies. It's been drafted by haproxy.org and supported two versions up to now. The version v1 is a human-readable format which supports TCP only, while version v2 is a binary format supporting both TCP and UDP. DPVS implements both versions and users can choose which one to use on basis of a per-service configuration. Moreover, if configured to the insecure mode, DPVS allows for clients that have already carried proxy protocol data, which is often the case when DPVS's virtual IP is behind of other reverse proxies such as nginx, envoy, or another DPVS, where DPVS doesn't insert client address by itself, but just retains the client address encapsulated in the packet, and makes protocol version translation if necessary. Proxy protocol has advantages of broad and uniform supports for layer3 and layer4 protocols(including IP, IPv6, TCP, UDP), both source and destination addresses conveying, no dependency on kernel modules, tolerance of network infrastructure differences. The client addresses are encapsulated into the very begginning position of layer4 payload. Application programs on realservers must receive the data and parse it to obtain the original client addresses immediately on establishment of TCP/UDP connection. Otherwise, the client address data may be taken as application data by mistake, resulting in unexpected behavior in the application program. Fortunately, parsing the client address from proxy protocol is quite straightforward, and a variety of well-known proxy servers have supported it. Actually, proxy protocol is becoming a defato standard in this area.

Next ,let's compare the three client address conservation solutions in detail in the following two tables.

The first table below lists the forwarding modes (FNAT44/FNAT66/FNAT64) and L4 protocols (TCP/UDP) supported by different solutions.

| | toa | uoa (ipo) | uoa (opp) | proxy protocol (v1) | proxy protocol (v2) |
| ------ | ---- | --------- | --------- | --------------------- | --------------------- |
| FNAT44 ||||||
| FNAT66 || × ||||
| FNAT64 || × ||||
| TCP || × | × |||
| UDP | × ||| × ||

The second table details differences among toa, uoa and proxy protocol from aspects of functional features, configuraitons, application adaption and examples.

| | toa | uoa (ipo mode) | uoa (opp mode) | proxy protocol (v1 & v2) |
| --------------------------------------- | ------------------------------------------------------------ | --------------------------------------- | --------------------------------------- | ---------------------------------------------------------- |
| configuration switch | always on | global, default off | global, default on | per-service, toa/uoa mutal exclusive |
| where client address resides | tcp option | ipv4 option | private ip protocol | tcp/udp beginnig payload |
| standardization | private standard | private implementation | private implementation | defacto standard |
| application intrusiveness | transparent | transparent (only fnat44 supported) | transparent when uoa.ko installed | intrusive |
| client address resolution intrusiveness | transparent for fnat44/fnat66; intrusive for fnat64 | intrusive | intrusive | intrusive |
| client source address resolution | support | support | support | support |
| client destination address resolution | not support | not support | not support | support |
| kernel module requirement on realserver | toa.ko, not compulsory when client addresses aren't concerned | uoa.ko | uoa.ko | no kernel module required |
| load balancer cascading | not support | not support | not support | support |
| retransmission | support | fixed times, default 3 | fixed times, default 3 | support for tcp, not support for udp |
| underlay network supports | good | bad | medium | good |
| client address loss cases | when no enough tcp option room in first ack seg | general udp packet loss | general udp packet loss | no loss for tcp, general udp packet loss for udp |
| well-known application supports | - | - | - | haproxy, nginx, envoy, ... |
| intrusive application server examples | [fnat64](../kmod/toa/example_nat64/server.c) | [udp_serv](../kmod/uoa/example/udp_serv.c) | [udp_serv](../kmod/uoa/example/udp_serv.c) | [tcp_server](../test/proxy_protocol/tcp_server.c), [udp_server](../test/proxy_protocol/udp_server.c), [official sample code](https://www.haproxy.org/download/2.9/doc/proxy-protocol.txt) |
353 changes: 353 additions & 0 deletions doc/containerized/README.md

Large diffs are not rendered by default.

271 changes: 271 additions & 0 deletions doc/containerized/dpvs.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,271 @@
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! This is dpvs default configuration file.
!
! The attribute "<init>" denotes the configuration item at initialization stage. Item of
! this type is configured oneshoot and not reloadable. If invalid value configured in the
! file, dpvs would use its default value.
!
! Note that dpvs configuration file supports the following comment type:
! * line comment: using '#" or '!'
! * inline range comment: using '<' and '>', put comment in between
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

! global config
global_defs {
log_level WARNING
! log_file /var/log/dpvs.log
! log_async_mode on
! kni on
! pdump off
}

! netif config
netif_defs {
<init> pktpool_size 1048575
<init> pktpool_cache 256
<init> fdir_mode perfect

<init> device dpdk0 {
rx {
queue_number 8
descriptor_number 1024
rss all
}
tx {
queue_number 8
descriptor_number 1024
}
! mtu 1500
! promisc_mode
! allmulticast
kni_name dpdk0.kni
}
}

! worker config (lcores)
! notes:
! 1. rx(tx) queue ids MUST start from 0 and continous
! 2. cpu ids and rx(tx) queue ids MUST be unique, repeated ids is forbidden
! 3. cpu ids identify dpvs workers only, and not correspond to physical cpu cores.
! If you are to specify cpu cores on which to run dpvs, please use dpdk eal options,
! such as "-c", "-l", "--lcores". Use "dpvs -- --help" for supported eal options.
worker_defs {
<init> worker cpu0 {
type master
cpu_id 0
}

<init> worker cpu1 {
type slave
cpu_id 1
port dpdk0 {
rx_queue_ids 0
tx_queue_ids 0
! isol_rx_cpu_ids 9
! isol_rxq_ring_sz 1048576
}
}

<init> worker cpu2 {
type slave
cpu_id 2
port dpdk0 {
rx_queue_ids 1
tx_queue_ids 1
! isol_rx_cpu_ids 10
! isol_rxq_ring_sz 1048576
}
}

<init> worker cpu3 {
type slave
cpu_id 3
port dpdk0 {
rx_queue_ids 2
tx_queue_ids 2
! isol_rx_cpu_ids 11
! isol_rxq_ring_sz 1048576
}
}

<init> worker cpu4 {
type slave
cpu_id 4
port dpdk0 {
rx_queue_ids 3
tx_queue_ids 3
! isol_rx_cpu_ids 12
! isol_rxq_ring_sz 1048576
}
}

<init> worker cpu5 {
type slave
cpu_id 5
port dpdk0 {
rx_queue_ids 4
tx_queue_ids 4
! isol_rx_cpu_ids 13
! isol_rxq_ring_sz 1048576
}
}

<init> worker cpu6 {
type slave
cpu_id 6
port dpdk0 {
rx_queue_ids 5
tx_queue_ids 5
! isol_rx_cpu_ids 14
! isol_rxq_ring_sz 1048576
}
}

<init> worker cpu7 {
type slave
cpu_id 7
port dpdk0 {
rx_queue_ids 6
tx_queue_ids 6
! isol_rx_cpu_ids 15
! isol_rxq_ring_sz 1048576
}
}

<init> worker cpu8 {
type slave
cpu_id 8
! icmp_redirect_core
port dpdk0 {
rx_queue_ids 7
tx_queue_ids 7
! isol_rx_cpu_ids 16
! isol_rxq_ring_sz 1048576
}
}

!<init> worker cpu17 {
! type kni
! cpu_id 17
! port dpdk0 {
! rx_queue_ids 8
! tx_queue_ids 8
! }
!}
}

! timer config
timer_defs {
# cpu job loops to schedule dpdk timer management
schedule_interval 500
}

! dpvs neighbor config
neigh_defs {
<init> unres_queue_length 128
timeout 60
}

! dpvs ipset config
ipset_defs {
<init> ipset_hash_pool_size 131072
}

! dpvs ipv4 config
ipv4_defs {
forwarding off
<init> default_ttl 64
fragment {
<init> bucket_number 4096
<init> bucket_entries 16
<init> max_entries 4096
<init> ttl 1
}
}

! dpvs ipv6 config
ipv6_defs {
disable off
forwarding off
route6 {
<init> method hlist
recycle_time 10
}
}

! control plane config
ctrl_defs {
lcore_msg {
<init> ring_size 4096
sync_msg_timeout_us 20000
priority_level low
}
}

! ipvs config
ipvs_defs {
conn {
<init> conn_pool_size 2097152
<init> conn_pool_cache 256
conn_init_timeout 3
! expire_quiescent_template
! fast_xmit_close
! <init> redirect off
}

udp {
! defence_udp_drop
uoa_mode opp
uoa_max_trail 3
timeout {
oneway 60
normal 300
last 3
}
}

tcp {
! defence_tcp_drop
timeout {
none 2
established 90
syn_sent 3
syn_recv 30
fin_wait 7
time_wait 7
close 3
close_wait 7
last_ack 7
listen 120
synack 30
last 2
}
synproxy {
synack_options {
mss 1452
ttl 63
sack
! wscale 0
! timestamp
}
close_client_window
! defer_rs_syn
rs_syn_max_retry 3
ack_storm_thresh 10
max_ack_saved 3
conn_reuse_state {
close
time_wait
! fin_wait
! close_wait
! last_ack
}
}
}
}

! sa_pool config
sa_pool {
pool_hash_size 16
flow_enable on
}
122 changes: 122 additions & 0 deletions doc/containerized/keepalived.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
global_defs {
script_user root
enable_script_security
}

local_address_group laddr_g1 {
192.168.88.240 dpdk0.102
192.168.88.241 dpdk0.102
}

vrrp_instance vrrp_instance_107 {
state MASTER
interface dpdk0.102.kni
dpdk_interface dpdk0.102

higher_prio_send_advert
garp_lower_prio_repeat 3
garp_master_refresh 30
garp_master_refresh_repeat 1

virtual_router_id 107
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 12345
}

virtual_ipaddress {
192.168.88.1
}
virtual_ipaddress_excluded {
2001::1
}
}

virtual_server_group 192.168.88.1-80-TCP_QLB {
192.168.88.1 80
}

virtual_server group 192.168.88.1-80-TCP_QLB {
delay_loop 3
lb_algo rr
lb_kind FNAT
protocol TCP

establish_timeout 60
daddr_group_name 192.168.88.1_deny
laddr_group_name laddr_g1

real_server 192.168.88.30 80 {
weight 100
inhibit_on_failure
TCP_CHECK {
retry 1
connect_timeout 1
connect_port 80
}
}

real_server 192.168.88.130 80 {
weight 100
inhibit_on_failure
TCP_CHECK {
retry 1
connect_timeout 1
connect_port 80
}

}
}

virtual_server_group 2001::1-80-TCP_QLB {
2001::1 80
}

virtual_server group 2001::1-80-TCP_QLB {
delay_loop 3
lb_algo conhash
lb_kind FNAT
protocol TCP

daddr_group_name 2001::1_deny
laddr_group_name laddr_g1

real_server 192.168.88.30 80 {
weight 100
inhibit_on_failure
TCP_CHECK {
retry 1
connect_timeout 1
connect_port 80
}
}

real_server 192.168.88.130 8080 {
weight 100
inhibit_on_failure
TCP_CHECK {
retry 1
connect_timeout 1
connect_port 8080
}
}

real_server 192.168.88.30 8080 {
weight 100
inhibit_on_failure
TCP_CHECK {
retry 1
connect_timeout 1
connect_port 8080
}
}
}

deny_address_group 192.168.88.1_deny {
}

deny_address_group 2001::1_deny {
}

116 changes: 116 additions & 0 deletions doc/containerized/run.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
#!/bin/sh

## install host requirements
if [ $# -ge 1 -a _$1 = _initial ]; then
## FIXME: use proper dpdk drivers for different nics
modprobe uio
modprobe uio_pci_generic
dpdk-devbind -b uio_pci_generic 0000:01:00.1

echo 8192 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
echo 8192 > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages

mkdir /var/run/dpvs
mkdir -p /var/run/dpvs/logs/dpvs-agent
mkdir -p /var/run/dpvs/logs/healthcheck
fi

## stop and clean the running containers
docker stop dpvs && docker rm dpvs
docker stop keepalived && docker rm keepalived
docker stop dpvs-agent && docker rm dpvs-agent
docker stop healthcheck && docker rm healthcheck
rm -f /var/run/dpvs/{*.pid,dpvs.ipc}

## TODO: prepare config file: /var/run/dpvs/dpvs.conf

## start dpvs
docker run --name dpvs \
-d --privileged --network host \
-v /dev:/dev \
-v /sys:/sys \
-v /lib/modules:/lib/modules \
-v /var/run/dpvs:/dpvs \
github.com/iqiyi/dpvs:v1.9.5 \
-c /dpvs/dpvs.conf -p /dpvs/dpvs.pid -x /dpvs/dpvs.ipc \
-- -a 0000:01:00.1
sleep 10

## start dpvs-agent
docker run --name dpvs-agent \
--cap-add=NET_ADMIN \
-d --network host \
-v /var/run/dpvs:/dpvs \
--entrypoint=/usr/bin/dpvs-agent \
github.com/iqiyi/dpvs:v1.9.5 \
--log-dir=/dpvs/logs/dpvs-agent \
--ipc-sockopt-path=/dpvs/dpvs.ipc\
--host=0.0.0.0 --port=6601
sleep 3


## set command line tools
alias ipvsadm='docker run --name ipvsadm --rm --network none -v /var/run/dpvs:/dpvs -e DPVS_IPC_FILE=/dpvs/dpvs.ipc --entrypoint=/usr/bin/ipvsadm github.com/iqiyi/dpvs:v1.9.5'
#docker run --name ipvsadm \
# --rm --network none \
# -v /var/run/dpvs:/dpvs \
# -e DPVS_IPC_FILE=/dpvs/dpvs.ipc \
# --entrypoint=/usr/bin/ipvsadm \
# github.com/iqiyi/dpvs:v1.9.5 \
# ...
alias dpip='docker run --name dpip --rm --network none -v /var/run/dpvs:/dpvs -e DPVS_IPC_FILE=/dpvs/dpvs.ipc --entrypoint=/usr/bin/dpip github.com/iqiyi/dpvs:v1.9.5'
#docker run --name dpip \
# --rm --network none \
# -v /var/run/dpvs:/dpvs \
# -e DPVS_IPC_FILE=/dpvs/dpvs.ipc \
# --entrypoint=/usr/bin/dpip \
# github.com/iqiyi/dpvs:v1.9.5 \
# ...

## configure host network
#dpip vlan add dpdk0.102 link dpdk0 id 102
#dpip addr add 192.168.88.28/24 dev dpdk0.102
#dpip addr add 2001::28/64 dev dpdk0.102
#ip addr add 192.168.88.28/24 dev dpdk0.102.kni
#ip addr add 2001::28/64 dev dpdk0.102.kni
#ip link set dpdk0.102.kni up
curl -X PUT "http://127.0.0.1:6601/v2/device/dpdk0.102/vlan" -H "Content-type:application/json" -d "{\"device\":\"dpdk0\", \"id\":\"102\"}"
curl -X PUT "http://127.0.0.1:6601/v2/device/dpdk0.102/addr" -H "Content-type:application/json" -d "{\"addr\":\"192.168.88.28/24\"}"
curl -X PUT "http://127.0.0.1:6601/v2/device/dpdk0.102/addr" -H "Content-type:application/json" -d "{\"addr\":\"2001::28/64\"}"
curl -X PUT "http://127.0.0.1:6601/v2/device/dpdk0.102.kni/netlink/addr" -H "Content-type:application/json" -d "{\"addr\":\"192.168.88.28/24\"}"
curl -X PUT "http://127.0.0.1:6601/v2/device/dpdk0.102.kni/netlink/addr" -H "Content-type:application/json" -d "{\"addr\":\"2001::28/64\"}"
curl -X PUT "http://127.0.0.1:6601/v2/device/dpdk0.102.kni/netlink"

## start keepalived and deploy test services
docker run --name keepalived \
-d --privileged --network host \
--cap-add=NET_ADMIN --cap-add=NET_BROADCAST --cap-add=NET_RAW \
-v /var/run/dpvs:/dpvs \
-e DPVS_IPC_FILE=/dpvs/dpvs.ipc \
--entrypoint=/usr/bin/keepalived github.com/iqiyi/dpvs:v1.9.5 \
-D -n -f /dpvs/keepalived.conf \
--log-console --log-facility=6 \
--pid=/dpvs/keepalived.pid \
--vrrp_pid=/dpvs/vrrp.pid \
--checkers_pid=/dpvs/checkers.pid

## deploy a test service with dpvs-agent api
curl -X PUT "http://127.0.0.1:6601/v2/vs/2001::2-80-tcp" -H "Content-type:application/json" -d "{\"SchedName\":\"wrr\"}"
curl -X PUT "http://127.0.0.1:6601/v2/device/dpdk0.102/addr" -H "Content-type:application/json" -d "{\"addr\":\"2001::2\"}"
curl -X PUT "http://127.0.0.1:6601/v2/vs/2001::2-80-tcp/laddr" -H "Content-type:application/json" -d "{\"device\":\"dpdk0.102\", \"addr\":\"192.168.88.241\"}"
curl -X PUT "http://127.0.0.1:6601/v2/vs/2001::2-80-tcp/laddr" -H "Content-type:application/json" -d "{\"device\":\"dpdk0.102\", \"addr\":\"192.168.88.242\"}"
curl -X PUT "http://127.0.0.1:6601/v2/device/dpdk0.102/addr?sapool=true" -H "Content-type:application/json" -d "{\"addr\":\"192.168.88.241\"}"
curl -X PUT "http://127.0.0.1:6601/v2/device/dpdk0.102/addr?sapool=true" -H "Content-type:application/json" -d "{\"addr\":\"192.168.88.242\"}"
curl -X PUT "http://127.0.0.1:6601/v2/vs/2001::2-80-tcp/rs" -H "Content-type:application/json" -d "{\"Items\":[{\"ip\":\"192.168.88.30\", \"port\":80, \"weight\":100}]}"
curl -X PUT "http://127.0.0.1:6601/v2/vs/2001::2-80-tcp/rs" -H "Content-type:application/json" -d "{\"Items\":[{\"ip\":\"192.168.88.130\", \"port\":8080, \"weight\":100}]}"
curl -X PUT "http://127.0.0.1:6601/v2/vs/2001::2-80-tcp/rs" -H "Content-type:application/json" -d "{\"Items\":[{\"ip\":\"10.1.1.1\", \"port\":80, \"weight\":100}]}"

## start healthcheck for dpvs-aegnt
docker run --name healthcheck \
-d --network host \
-v /var/run/dpvs:/dpvs \
--entrypoint=/usr/bin/healthcheck \
github.com/iqiyi/dpvs:v1.9.5 \
-log_dir=/dpvs/logs/healthcheck \
-lb_iface_addr=localhost:6601

253 changes: 253 additions & 0 deletions doc/dest-check.md

Large diffs are not rendered by default.

55 changes: 45 additions & 10 deletions doc/faq.md
Original file line number Diff line number Diff line change
@@ -18,6 +18,8 @@ DPVS Frequently Asked Questions (FAQ)
* [Does DPVS support Bonding/VLAN/Tunnel ?](#vir-dev)
* [Why CPU usages are 100% when running DPVS ?](#cpu-100)
* [Does iptables conflict with DPVS ?](#iptables)
* [Why DPVS exits due to "Cause: failed to init tc: no memory"?](#no-memory)
* [Why IPv6 is not supported by my Keepalived?](#keepalived-ipv6)

-------------------------------------------------

@@ -27,12 +29,12 @@ DPVS Frequently Asked Questions (FAQ)

Please try to follow `README.md` and `doc/tutorial.md` first. And if you still have problem, possible reasons are:

1. NIC do not support DPDK or *flow-director* (`fdir`), please check this [answer](#nic).
2. DPDK not compatible with Kernel Version, it cause build error, please refer to [DPDK.org](https://www.dpdk.org/) or consider upgrade the Kernel.
1. NIC does not support DPDK or *flow control* (`rte_flow`), please check this [answer](#nic).
2. DPDK is not compatible with Kernel Version, it cause build error, please refer to [DPDK.org](https://www.dpdk.org/) or consider upgrade the Kernel.
3. CPU core (`lcore`) and NIC queue's configure is miss-match.
Please read `conf/*.sample`, note worker-CPU/NIC-queue are 1:1 mapping and you need one more cpu for master.
4. DPDK NIC's link is not up ? please check NIC cable first.
5. `curl` VIP in FullNAT mode fails (or sometime fails)? Please check if NIC support [fdir](#nic).
5. `curl` VIP in FullNAT mode fails (or sometime fails)? Please check if NIC support [rte_flow](#nic).
6. `curl` still fails. Please check route and arp by `dpip route show`, `dpip neigh show`.
6. The patchs in `patch/` are not applied.

@@ -42,16 +44,28 @@ And you may find other similar issues and solutions from Github's issues list.

### Does my NIC support DPVS ?

Actaully, it's the question about if the NIC support DPDK as well as "flow-director (fdir)".
Actaully, it's the question about if the NIC support DPDK as well as "flow control(rte_flow)".

First, please make sure the NIC support `DPDK`, you can check the [link](https://core.dpdk.org/supported/). Second, DPVS's FullNAT/SNAT mode need flow-director feature, *unless you configure only one worker*. For `fdir` support, this [link](http://doc.dpdk.org/guides/nics/overview.html#id1) can be checked.
First, please make sure the NIC support `DPDK`, you can check the [link](https://core.dpdk.org/supported/). Second, DPVS's FullNAT/SNAT mode need flow control(rte_flow) feature, *unless you configure only one worker*. For `rte_flow` support, this [link](http://doc.dpdk.org/guides/nics/overview.html#id1) can be checked.

Please find the DPDK driver name according to your NIC by the first link. And check `fdir` support for each drivers from the matrix in the second link.
Please find the DPDK driver name according to your NIC by the first link. And check `rte_flow` support for each drivers from the matrix in the second link.

1. https://core.dpdk.org/supported/
2. http://doc.dpdk.org/guides/nics/overview.html#id1

> `Fdir` is replaced with `rte_flow` in the lastest DPDK. DPVS is making efforts to adapt to the change.
The PMD of your NIC should support the following rte_flow items,

* ipv4
* ipv6
* tcp
* udp

and the following rte_flow actions at least.

* queue
* drop

> If you are using only one worker, you can turn off dpvs flow control by setting `sa_pool/flow_enable` to `off` in dpvs.conf.
<a id="high-avail" />

@@ -106,17 +120,17 @@ Yes, it does support UDP. In order to get the real client IP/port in FullNAT mod
### Does DPVS support IP fragment ?
No, since connection table is per-lcore (per-CPU), and RSS/fdir are used for FNAT. Assuming RSS mode is TCP and fdir uses L4 info `<lip, lport>`. Considered that IP fragment doesn't have L4 info, it needs reassembling first and re-schedule the pkt to **correct** lcore which the 5-tuple flow (connection) belongs to.
No, since connection table is per-lcore (per-CPU), and RSS/rte_flow are used for FNAT. Assuming RSS mode is TCP and rte_flow uses L4 info `<lip, lport>`. Considered that IP fragment doesn't have L4 info, it needs reassembling first and re-schedule the packet to **correct** lcore which the 5-tuple flow (connection) belongs to.
May be someday in the future, we will support "pkt re-schedule" on lcores or use L3 (IP) info only for `RSS`/`FDIR`, then we may support fragment. But even we support fragment, it may hurt the performance (reassemble, re-schedule effort) or security.
May be someday in the future, we will support "packet re-schedule" on lcores or use L3 (IP) info only for `RSS` or `flow control`, then we may support fragment. But even we support fragment, it may hurt the performance (reassemble, re-schedule effort) or security.
Actually, IPv4 fragment is not recommended, while IPv6 even not support fragment by fixed header, and do not allow re-fragment on middle-boxes. The applications, especially for the datagram-oriented apps, like UDP-apps, should perform PMTU discover algorithm to avoid fragment. TCP is sending sliced *segments*, notifying MSS to peer side and *PMTU discover* is built-in, TCP-app should not need worry about fragment.
<a id="vm" />
### How to launch DPVS on Virtual Machine ?
Please refer to the [tutorial.md](../doc/tutorial.md), there's an exmaple to run DPVS on `Ubuntu`. Basically, you may need to reduce memory usage. And for VM's NIC, `fdir` is not supported, so if you want to config FullNAT/SNAT mode, you have to configure **only one** worker (cpu), and another CPU core for master.
Please refer to the [tutorial.md](../doc/tutorial.md), there's an exmaple to run DPVS on `Ubuntu`. Basically, you may need to reduce memory usage. And for VM's NIC, `rte_flow` is not supported, so if you want to config FullNAT/SNAT mode, you have to configure **only one** worker (cpu), and another CPU core for master.
<a id="monitor" />
@@ -211,3 +225,24 @@ It's normal, not issue. Since DPDK application is using busy-polling mode. Every
### Does iptables conflict with DPVS ?
Yes, DPDK is kernel-bypass solution, all forwarding traffic in data plane do not get into the Linux Kernel, it means `iptables`(Netfilter) won't work for that kind of traffic.
<a id="no-memory" />
### Why DPVS exits due to "Cause: failed to init tc: no memory"?
1. Check hugepage configurations on your system. Adequate free hugepages must be available for DPVS. Generally, 8GB free hugepages on each NUMA node would be enough for running DPVS with default configs.
2. Check NUMA supports of your DPDK compilation. DPVS by default uses 2 NUMA nodes. If your system is not NUMA-aware, set the macro in config.mk `CONFIG_DPVS_MAX_SOCKET=1`. Otherwise, ensure `numactld-devel` package has installed before DPDK compilation.
<a id="keepalived-ipv6" />
### Why IPv6 is not supported by my Keepalived?
Keepalived IPv6 requires libnl3. Please install `libnl3-devel` package and recompile DPVS.
```sh
make clean
make distclean
make
```
Binary file added doc/pics/health_check-master-lcore.drawio.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/pics/health_check-slave-lcores.drawio.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
169 changes: 169 additions & 0 deletions doc/pics/health_check.drawio

Large diffs are not rendered by default.

Binary file added doc/pics/ipset-arch.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/pics/kni-flow-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/pics/kni-flow-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions doc/pics/kni-flow.drawio
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<mxfile host="app.diagrams.net" modified="2022-01-25T10:28:00.929Z" agent="5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36" etag="ix_6reo9NY_pv04skLY5" version="16.4.3" type="device" pages="2"><diagram id="Ep--Fc18097FFTvmNgfS" name="Page-1">7Vtdc5s4FP01nj5tBwkE5tFxkjazu53OZHbaPHUwyEAjI6+QY3t//UpGMh/CDnXsmHrshwy6EgLdc8+RdEUG9ni2+sSCefI3jTAZQCtaDezbAYS+5Yq/0rAuDMiyCkPM0qgwgdLwmP6HlVE3W6QRzmsNOaWEp/O6MaRZhkNeswWM0WW92ZSS+lPnQYwNw2MYENP6LY14UliH0Cvtn3EaJ/rJwPWLmlmgG6uR5EkQ0WXFZN8N7DGjlBdXs9UYE+k77ZfivvsdtdsXYzjjXW6AyeLH6OYf72Hhj8Jo9PJ19uPbH0B1k/O1HjGOhANUkTKe0JhmAbkrrTeMLrIIy24tUSrb/EXpXBiBMP7EnK8VmsGCU2FK+IyoWvHGbP1d3b8pPMnCR6SLt6tq5e1alcwh69enCxbifeNUoROwGPM97VDRTvqg8gDl0E+YzrB4H9GAYRLw9KUeJIGKtXjbroRDXChEfgEdb/g7oTOlGVedAqDKY0oo27y5bW1+8q5Vyr+XPYrSk36yuC47l4X3Q95zewW96vclIAv1pC8PYyMa6lgvk5Tjx3mw8cdSyHEd150+fMGM49XeQetaR4mZUnPHtovysqKNqklSkUXbOpGboOGmP788yJsyjtlUeuLcLkPI7pfLfHgOUekqDocTHXUkug96RXRkRPA9XQYsSrN4AF0iRnIzYeIqlldLyp4xy98W1NOUkIrrp1PshqGw55zRZ1ypiTx/sh+U7jSArtsvGgydS5xbD6eP23We9HpFH3fHBFAw5Tckiu3Wp1gEWogCWohin2oRanjYJE4WjeReS5RCEuR5GtYdWff6Tie9GlIVF6AWrdC2zpGnnvCViuVCiUADAMfx6j0UVFI3VTdcjX6MxZLjftSrTN1XQTejrw1Q24G/Ya43sBPB/oy5OYOI6OR10OpRntEMNyihTAFJ40wij+WaSxhkrKdiBz1SFbM0ijbq2Ua3jqHRnT9uY7kFHYM+bkvswJOtUM2V/BYDmSQIEzF6aD1nqXx6FDGZpSCCS/1EiAQTTG7EAOKNvSKC95tffa4a7pmqcuGDMDneQttz6sAPTd0EVotwQnQq5G0D+YxePKxHQBIBVJ8CkWdACVs4DE7FYdvk8Br3VUP7BaXfWPb7cq17VjDR8ArmgUtTWJdYsTQ9N5iuycyrxnbBEup9XF80dniWtFR/9+Ne1w15v/LW/vXIYvDWI4vO0Pv9ysV4ZjKm2Naw1b8D+94IDJbQ2WSRv56I+TXl3ItkIfz6KPdIW49tlnLvxOi3pTfhqeQUnZeHJfWeqnVH4iGosLDk5NF56HXkITw2DTe3jhgL1pUGc5klyis9N/JNyK+nPhD0G2FU9HjUDBPQ33OcKdDA4BIEv2ug9U3wzdxwIfj8wgXf8V9P0r+r3GvqX9qhrj6sfX8JfhscZzlc7BEcPduUmFmCQqfUKeFFSxVo5MWhudEH+ivNqlY5pzpQ9M1ZI2IixA3GEJLO8125lQoEQT4vPv2cpivJn12Y6DRPAw7T7c2D32GI2w9+J0PkoGNl13QGRM8prrmFsL2WOcU7EU7Auh5gHAYlAn6dc7Z5CPmuyTVgOQaU16R3N/20G1h65lLvnbE0PxkrT5Tlty8RzT7IfWEoxn2FuIPyOg2IoQlx2zcDB0AsiuUX/8UevPy3Cfvufw==</diagram><diagram id="ShCZfiz8j5f69a4PKt4G" name="Page-2">7Vxdl6I4EP01nn1qD0n4fOyPmdnZnd3xjDuzs097okRlGsVF7Nb+9ZtAQGLSii0B7Gl90ASDkFv3UlUU6aHb+eZDjJezPyKfhD1o+JseuutBCBCA9IP1bLMe27Gzjmkc+PxHu45h8ER4p8F714FPVsIPkygKk2Apdo6jxYKME6EPx3H0KP5sEoXivy7xlEgdwzEO5d6/Az+ZZb0udHb9v5JgOsv/GdhetmWO8x/zM1nNsB89lrrQux66jaMoyb7NN7ckZJOXz8vtj2vXHrsD//N/w9/ez58Wi+3Xq2xn708ZUpxCTBbJi3f9gP96/BR/e5r/O4KDDz8+f/nqfr/i5/qAwzWfL36uyTafwDhaL3zCdmL00M3jLEjIcInHbOsjNRnaN0vmIW0B+nUSLRJuA8ClbR+vZulYwBsDnCQkXqQ90IBsSBCGt1EYxenfId8irm/S/mmM/YCecGmb5/iG49BtqySO7klpiwtHyLbpFn4+JE7IZs8AjsweKCClXCDRnCTxlo7jezH5THEWmIgbxePOppy8b1ayJ2TzTszteFrseocV/cLhOgE6YEtQEZ+aPm8uogX9uBHRi+JkFk2jBQ4/RdGSw/KDJMmWg4bXSSQiKiBYhjdvl1C4dtlbhY+Rvgp82IG+AB16stE6HpMDv0NqFGMS4iR4EP9UBQkfOogCejgF+sBxBPgtdw/VBMdTkvBRe8AWh3EG1t4lYk0Rjbff+eGkjX/KjbuN0NrmrU2QpIP6JjJ5m40DfcMAvL0byhrlkQMSB3S2Scz7GrE3eklL8T/wu0xYzzDMqlpx6ChLMn9L0YwjNnAQYmo9dak+NDSKMEB7Kmy6kgoDF8oqbEJNEwusg8TccfDdrvd8nu6I1bdOoZYaJ33kMCuSw2qTG6bEja8rEq/qZcSekzOx2Fvt5IzSt0pY7fSl08mxgcAvZMpeTuEOl/nl1ODkqI9XdnJ+GvZYFdnjtMke4DTimbw2zLxWMXPfMHsBZsBoFbRmQoBXB9ozV7xmQLMk30IGMQyD5YocdynwapnlxybBhiFc2cd43o845H3o8jEsyxV9eFfhY9iyi5H31R8bHc6jqF348Tp+KCLl+kgHBMrtGFiRdLvQ2SoHzgeDZiHm181apyJrUVNSq8y2KEiqV2nhS6UWwHZAUyep2lRaR1LayaPPVCWK70l8JIjoSHoD5WHUfpaxnGRWRF+erugLHXbuL1MaYUe10bsIbURNa6NpnCSOaWs/Cdy0Yiqg7J5iyrf+Lk8xC+exM4p5OLR+U8w6aZaXGHRcMs1GJHNf414SC7QBWPeEMT/Ki1ZGy+uaMh7OX70pY61Eyyutuq2Mh2+eanAmnbMC7VJlgumVge8bhnUQ/DYqE1Q20EGxBa9AbB2jY2JrymFanQ5GPcnG9orE8prWo/KY1Y20xg0oceN+EfSgHdJjvhlRctjTJJ0uG8+Z6S9GK/ZxSdQBeWlOZ6gju+sXQJ2iGk87daoWvLVMHbniLecNA6PH6s3zybP/W0cZjRB22bvclTFsjlfswr27KmW7ooeW7S3n4SUwDlodYxyQK7C+DIfSZNJpSMQZw2EwZZXkYzodzK26YZMVjHF4zTfMA9/PKu7IKnjCo3RXbG6XrFA3PQ/rpmfdsX1R0q64rycRjHui5RuseVdT4ZwpVmZZyOgbQMLNVuCGdOEGmylAr8N931NdZNkl3TX6nuue7L43Gs/BypmuNkUXyomTGcEhNU4qtzMyvq9XIPcfHMHEnYx7yqoGh2CbGKoLJz0JMppolVvx0YHigZByzbLqyRFLG28bD7uhaZeJe8W0yznC3mM3ci6j4AFWTsS0Slw5CL/5MKC7+jwcvP8ZSeuIzxl0gLOXVbLZkTgfVo3zQavBCmy+tNNEp8BbSYs7EqBWx7xdyZVzO7IR/GSloQhKQY4su4rSUFBDaai6ELDxCEdQXXCElpf2aG7VIAa2qsZIjmLemGmLNzgcFTMV/hDQlqWV5fP3Pz/Sjo8sFTRhcFxEMg7k6eZ8Yg0kT6zjyjOrb1kBaWK/0Bie9gxJ/PD2XGTadqFs/tqei1SiZCs0Skhv7wDKc9lsw1WW4rymPwBguZET3RS4e5JQkA1/+ZB+RGS1+CVNmNM5ZKcyitZJ1QS4nLOtmlitnttVWZxok+ryTw0qicRUj5OTW1BJSzaTOlK0ajLL17K7wTc5t175enZwfZdigRw5aOeoKmg9SV9sADW88axXzsvUD5HtihCZCEoQuZ6MkOnoQki++VE3k+eYzisFCRrZ3WPhXjE7J9+P6cckpNi98Vp1kTbER9YsoPJ+QIO8VmTac+hWS7x4udWQkKrAzgiynV2UEcRRQoOEiO0lrc7TZRTQdvpAdN4sA/Q9eUUShPoOVPnGfQPqsg/Zi6vHPoJJLiPFrXCDbIIVE5pXYjVsTReNWuJZfaP8QqIJmQr/HyikRav1KAJeHWmPc2/evDjloSMZeZCHR2swYVNFmOqHulTVMnsWMKV4L59lBV+9kZdY9Ipg6Fx3DfUtgR+upUoJKvhharv0NnMnpo47nRWs+4A5nGGMfGi+iN42lyxxRMYeac08afE9U8w97e9G89J7ynTg8UoynlWVLqA+TjDDIFt0zUhiPJkEY4VXLndJ7Qty1YXQUAway2FfVUWvdHU+ybbzOiu5pgqoEptQl7wgICHYVXk5tZCiSa2BXt8rvRwH1aI9x3arW4vkrPc5WjQu1oCU5ehNW7RpS25LqiSCItOvT2qeTz2dFySeU2m9l4AQze+1RJhIa2LC3buLbnpFqFjOSpiyqQGvyF+cYG20uVsEPZO63VLy6N3/</diagram></mxfile>
Binary file removed doc/pics/kni.png
Binary file not shown.
4 changes: 4 additions & 0 deletions doc/pics/kni.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
92 changes: 92 additions & 0 deletions doc/pics/tutorial.drawio
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
<mxfile host="app.diagrams.net" agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36" border="50" scale="3" compressed="false" locked="false" version="24.7.13">
<diagram id="eFARZ7ye3ZyHCSxxOhFb" name="kni">
<mxGraphModel dx="989" dy="545" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="827" pageHeight="1169" math="0" shadow="0">
<root>
<mxCell id="0" />
<mxCell id="1" parent="0" />
<mxCell id="j9tfoosSE3qILXxTiQy9-3" value="&lt;font style=&quot;font-size: 24px;&quot;&gt;&lt;b style=&quot;&quot;&gt;DPVS&lt;/b&gt;&lt;/font&gt;&lt;div&gt;&lt;br&gt;&lt;/div&gt;&lt;div&gt;&lt;br&gt;&lt;/div&gt;&lt;div&gt;&lt;br&gt;&lt;/div&gt;&lt;div&gt;&lt;br&gt;&lt;/div&gt;&lt;div&gt;&lt;br&gt;&lt;/div&gt;&lt;div&gt;&lt;br&gt;&lt;/div&gt;" style="rounded=1;whiteSpace=wrap;html=1;fillColor=#1ba1e2;strokeColor=#006EAF;fontColor=#ffffff;" vertex="1" parent="1">
<mxGeometry x="450" y="490" width="211.88" height="120" as="geometry" />
</mxCell>
<mxCell id="j9tfoosSE3qILXxTiQy9-5" value="&lt;font style=&quot;font-size: 14px;&quot;&gt;&lt;b&gt;dpdk0&lt;/b&gt;&lt;/font&gt;" style="rounded=1;whiteSpace=wrap;html=1;fillColor=#fff2cc;strokeColor=#d6b656;" vertex="1" parent="1">
<mxGeometry x="501.88" y="523" width="70" height="30" as="geometry" />
</mxCell>
<mxCell id="j9tfoosSE3qILXxTiQy9-7" value="&lt;font style=&quot;font-size: 14px;&quot;&gt;&lt;b&gt;dpdk1&lt;/b&gt;&lt;/font&gt;" style="rounded=1;whiteSpace=wrap;html=1;fillColor=#fff2cc;strokeColor=#d6b656;" vertex="1" parent="1">
<mxGeometry x="581.88" y="523" width="70" height="30" as="geometry" />
</mxCell>
<mxCell id="O6jyKfrmbOcH9zmBw9ze-5" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;exitX=0.5;exitY=1;exitDx=0;exitDy=0;strokeColor=#FF0000;strokeWidth=2;startArrow=classicThin;startFill=0;endArrow=classicThin;endFill=0;dashed=1;" edge="1" parent="1" source="j9tfoosSE3qILXxTiQy9-5" target="O6jyKfrmbOcH9zmBw9ze-1">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="IBzB3Yzv9ICFWQqCQZ6M-7" value="&lt;font style=&quot;font-size: 18px;&quot;&gt;&lt;b&gt;kernel&amp;nbsp;&lt;/b&gt;&lt;/font&gt;&lt;div&gt;&lt;font style=&quot;font-size: 18px;&quot;&gt;&lt;b&gt;TCP/IP Stack&lt;/b&gt;&lt;/font&gt;&lt;div&gt;&lt;font style=&quot;font-size: 18px;&quot;&gt;&lt;br&gt;&lt;/font&gt;&lt;/div&gt;&lt;div&gt;&lt;font style=&quot;font-size: 18px;&quot;&gt;&lt;br&gt;&lt;/font&gt;&lt;/div&gt;&lt;/div&gt;" style="rounded=1;whiteSpace=wrap;html=1;strokeColor=#CC6600;strokeWidth=2;" vertex="1" parent="1">
<mxGeometry x="150" y="670" width="210" height="110" as="geometry" />
</mxCell>
<mxCell id="O6jyKfrmbOcH9zmBw9ze-7" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0.25;entryY=0;entryDx=0;entryDy=0;strokeWidth=2;dashed=1;startArrow=classicThin;startFill=0;endArrow=classicThin;endFill=0;strokeColor=#FF0000;" edge="1" parent="1" source="IBzB3Yzv9ICFWQqCQZ6M-3" target="IBzB3Yzv9ICFWQqCQZ6M-4">
<mxGeometry relative="1" as="geometry">
<Array as="points">
<mxPoint x="246" y="660" />
<mxPoint x="188" y="660" />
</Array>
</mxGeometry>
</mxCell>
<mxCell id="IBzB3Yzv9ICFWQqCQZ6M-3" value="&lt;div&gt;&lt;span style=&quot;color: rgb(0, 0, 0);&quot;&gt;&lt;font style=&quot;font-size: 18px;&quot;&gt;&lt;b&gt;Kernel Based&lt;/b&gt;&lt;/font&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style=&quot;color: rgb(0, 0, 0);&quot;&gt;&lt;font size=&quot;3&quot; style=&quot;&quot;&gt;&lt;b&gt;Applications&lt;/b&gt;&lt;/font&gt;&lt;/span&gt;&lt;br&gt;&lt;/div&gt;&lt;div&gt;&lt;span style=&quot;color: rgb(0, 0, 0);&quot;&gt;&lt;font style=&quot;font-size: 14px;&quot;&gt;&lt;br&gt;&lt;/font&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style=&quot;color: rgb(0, 0, 0);&quot;&gt;&lt;font style=&quot;font-size: 14px;&quot;&gt;sshd|keepalived|ospfd|bird|...&lt;/font&gt;&lt;/span&gt;&lt;/div&gt;" style="rounded=1;whiteSpace=wrap;html=1;fillColor=#ffff88;strokeColor=#36393d;" vertex="1" parent="1">
<mxGeometry x="140" y="490" width="212.5" height="122" as="geometry" />
</mxCell>
<mxCell id="O6jyKfrmbOcH9zmBw9ze-8" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0.5;entryY=1;entryDx=0;entryDy=0;strokeWidth=2;dashed=1;startArrow=classicThin;startFill=0;endArrow=classicThin;endFill=0;strokeColor=#FF0000;" edge="1" parent="1" source="IBzB3Yzv9ICFWQqCQZ6M-4" target="IBzB3Yzv9ICFWQqCQZ6M-8">
<mxGeometry relative="1" as="geometry">
<Array as="points">
<mxPoint x="205" y="790" />
<mxPoint x="435" y="790" />
</Array>
</mxGeometry>
</mxCell>
<mxCell id="IBzB3Yzv9ICFWQqCQZ6M-4" value="&lt;font color=&quot;#cc6600&quot; style=&quot;font-size: 14px;&quot;&gt;&lt;b&gt;dpdk0.kni&lt;/b&gt;&lt;/font&gt;" style="rounded=1;whiteSpace=wrap;html=1;strokeColor=#CC6600;" vertex="1" parent="1">
<mxGeometry x="170" y="730" width="70" height="30" as="geometry" />
</mxCell>
<mxCell id="IBzB3Yzv9ICFWQqCQZ6M-6" value="&lt;font color=&quot;#cc6600&quot; style=&quot;font-size: 14px;&quot;&gt;&lt;b&gt;dpdk1.kni&lt;/b&gt;&lt;/font&gt;" style="rounded=1;whiteSpace=wrap;html=1;strokeColor=#CC6600;" vertex="1" parent="1">
<mxGeometry x="270" y="730" width="70" height="30" as="geometry" />
</mxCell>
<mxCell id="O6jyKfrmbOcH9zmBw9ze-9" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0;entryY=0.5;entryDx=0;entryDy=0;exitX=0.75;exitY=0;exitDx=0;exitDy=0;strokeWidth=2;dashed=1;startArrow=classicThin;startFill=0;endArrow=classicThin;endFill=0;strokeColor=#FF0000;" edge="1" parent="1" source="IBzB3Yzv9ICFWQqCQZ6M-8" target="j9tfoosSE3qILXxTiQy9-5">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="IBzB3Yzv9ICFWQqCQZ6M-8" value="&lt;font style=&quot;font-size: 18px;&quot;&gt;KNI kmod&lt;/font&gt;&lt;div&gt;&lt;font style=&quot;font-size: 12px;&quot;&gt;rte_kni.ko|vhost.ko&lt;/font&gt;&lt;/div&gt;" style="rounded=1;whiteSpace=wrap;html=1;fillColor=#f5f5f5;strokeColor=#666666;gradientColor=#b3b3b3;" vertex="1" parent="1">
<mxGeometry x="380" y="670" width="110" height="50" as="geometry" />
</mxCell>
<mxCell id="oO2gNdg5D-aUiUnC3wbW-1" value="&lt;div&gt;&lt;font style=&quot;font-size: 18px;&quot;&gt;Linux Drivers&lt;/font&gt;&lt;/div&gt;&lt;font style=&quot;font-size: 12px;&quot;&gt;UIO|VFIO|Bifurcated Driver&lt;/font&gt;" style="rounded=1;whiteSpace=wrap;html=1;fillColor=#f5f5f5;gradientColor=#b3b3b3;strokeColor=#666666;" vertex="1" parent="1">
<mxGeometry x="497.88" y="670" width="164" height="50" as="geometry" />
</mxCell>
<mxCell id="oO2gNdg5D-aUiUnC3wbW-3" value="" style="endArrow=none;dashed=1;html=1;rounded=0;strokeWidth=2;dashPattern=1 1;strokeColor=#808080;" edge="1" parent="1">
<mxGeometry width="50" height="50" relative="1" as="geometry">
<mxPoint x="100" y="640" as="sourcePoint" />
<mxPoint x="740" y="640" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="oO2gNdg5D-aUiUnC3wbW-4" value="" style="endArrow=none;dashed=1;html=1;rounded=0;strokeWidth=2;dashPattern=1 1;strokeColor=#808080;" edge="1" parent="1">
<mxGeometry width="50" height="50" relative="1" as="geometry">
<mxPoint x="94" y="805" as="sourcePoint" />
<mxPoint x="734" y="805" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="O6jyKfrmbOcH9zmBw9ze-4" value="" style="group" vertex="1" connectable="0" parent="1">
<mxGeometry x="510" y="830" width="132.87" height="64" as="geometry" />
</mxCell>
<mxCell id="O6jyKfrmbOcH9zmBw9ze-1" value="" style="sketch=0;pointerEvents=1;shadow=0;dashed=0;html=1;strokeColor=none;fillColor=#505050;labelPosition=center;verticalLabelPosition=bottom;verticalAlign=top;outlineConnect=0;align=center;shape=mxgraph.office.devices.nic;" vertex="1" parent="O6jyKfrmbOcH9zmBw9ze-4">
<mxGeometry width="52.87" height="30" as="geometry" />
</mxCell>
<mxCell id="O6jyKfrmbOcH9zmBw9ze-2" value="" style="sketch=0;pointerEvents=1;shadow=0;dashed=0;html=1;strokeColor=none;fillColor=#505050;labelPosition=center;verticalLabelPosition=bottom;verticalAlign=top;outlineConnect=0;align=center;shape=mxgraph.office.devices.nic;" vertex="1" parent="O6jyKfrmbOcH9zmBw9ze-4">
<mxGeometry x="80" width="52.87" height="30" as="geometry" />
</mxCell>
<mxCell id="O6jyKfrmbOcH9zmBw9ze-3" value="&lt;font style=&quot;font-size: 18px;&quot;&gt;&lt;b&gt;NICs&lt;/b&gt;&lt;/font&gt;" style="text;html=1;align=center;verticalAlign=middle;whiteSpace=wrap;rounded=0;" vertex="1" parent="O6jyKfrmbOcH9zmBw9ze-4">
<mxGeometry x="39.879999999999995" y="34" width="60" height="30" as="geometry" />
</mxCell>
<mxCell id="IBzB3Yzv9ICFWQqCQZ6M-1" value="&lt;b&gt;&lt;font style=&quot;font-size: 14px;&quot;&gt;DPDK PMD&lt;/font&gt;&lt;/b&gt;" style="rounded=1;whiteSpace=wrap;html=1;fillColor=#fff2cc;strokeColor=#d6b656;" vertex="1" parent="1">
<mxGeometry x="501.88" y="569" width="150" height="34" as="geometry" />
</mxCell>
<mxCell id="O6jyKfrmbOcH9zmBw9ze-10" value="&lt;font style=&quot;font-size: 18px;&quot;&gt;socket&lt;/font&gt;" style="text;html=1;align=center;verticalAlign=middle;whiteSpace=wrap;rounded=0;" vertex="1" parent="1">
<mxGeometry x="186" y="617" width="60" height="30" as="geometry" />
</mxCell>
<mxCell id="O6jyKfrmbOcH9zmBw9ze-13" value="&lt;b&gt;&lt;font style=&quot;font-size: 10px;&quot; color=&quot;#ffffff&quot;&gt;rte_ring or&amp;nbsp;&lt;/font&gt;&lt;span style=&quot;font-size: 10px; background-color: initial; color: rgb(255, 255, 255);&quot;&gt;vhost_user&lt;/span&gt;&lt;/b&gt;" style="text;html=1;align=center;verticalAlign=middle;whiteSpace=wrap;rounded=0;rotation=90;" vertex="1" parent="1">
<mxGeometry x="439.5" y="569.5" width="82.24" height="11" as="geometry" />
</mxCell>
</root>
</mxGraphModel>
</diagram>
</mxfile>
333 changes: 322 additions & 11 deletions doc/tc.md

Large diffs are not rendered by default.

222 changes: 187 additions & 35 deletions doc/tutorial.md

Large diffs are not rendered by default.

15 changes: 8 additions & 7 deletions include/conf/blklst.h
Original file line number Diff line number Diff line change
@@ -24,22 +24,23 @@

#include "inet.h"
#include "conf/sockopts.h"
#include "conf/ipset.h"

struct dp_vs_blklst_entry {
union inet_addr addr;
};

struct dp_vs_blklst_conf {
typedef struct dp_vs_blklst_conf {
/* identify service */
int af;
uint8_t proto;
union inet_addr vaddr;
uint16_t vport;
uint32_t fwmark;
uint8_t proto;
uint8_t af;

/* for set */
union inet_addr blklst;
};
/* subject and ipset are mutual exclusive */
union inet_addr subject;
char ipset[IPSET_MAXNAMELEN];
} dpvs_blklst_t;

struct dp_vs_blklst_conf_array {
int naddr;
42 changes: 37 additions & 5 deletions include/conf/common.h
Original file line number Diff line number Diff line change
@@ -22,7 +22,7 @@
#include <unistd.h>
#include <ctype.h>
#include <sys/types.h>
#include <linux/if_ether.h>
#include <sys/socket.h>

typedef uint32_t sockoptid_t;

@@ -138,10 +138,6 @@ extern const char *dpvs_strerror(int err);

int get_numa_nodes(void);

int linux_set_if_mac(const char *ifname, const unsigned char mac[ETH_ALEN]);
int linux_hw_mc_add(const char *ifname, const uint8_t hwma[ETH_ALEN]);
int linux_hw_mc_del(const char *ifname, const uint8_t hwma[ETH_ALEN]);

/* read "n" bytes from a descriptor */
ssize_t readn(int fd, void *vptr, size_t n);

@@ -165,4 +161,40 @@ static inline char *strlwr(char *str) {
return str;
}

/* convert hexadecimal string to binary sequence, return the converted binary length
* note: buflen should be half in size of len at least */
int hexstr2binary(const char *hexstr, size_t len, uint8_t *buf, size_t buflen);

/* convert binary sequence to hexadecimal string, return the converted string length
* note: buflen should be twice in size of len at least */
int binary2hexstr(const uint8_t *hex, size_t len, char *buf, size_t buflen);

/* convert binary sequence to printable or hexadecimal string, return the converted string length
* note: buflen should be triple in size of len in the worst case */
int binary2print(const uint8_t *hex, size_t len, char *buf, size_t buflen);

/* get prefix from network mask */
int mask2prefix(const struct sockaddr *addr);

/* get host addresses and corresponding interfaces
*
* Loopback addresses, ipv6 link local addresses, and addresses on linked-down
* or not-running interface are ignored. If multiple addresses matched, return
* the address of the least prefix length.
*
* Params:
* @ifname: preferred interface where to get host address, can be NULL
* @result4: store ipv4 address found, can be NULL
* @result6: store ipv6 address found, can be NULL
* @ifname4: interface name of ipv4 address, can be NULL
* @ifname6: interface name of ipv6 address, can be NULL
* Return:
* 1: only ipv4 address found
* 2: only ipv6 address found
* 3: both ipv4 and ipv6 address found
* dpvs error code: error occurred
* */
int get_host_addr(const char *ifname, struct sockaddr_storage *result4,
struct sockaddr_storage *result6, char *ifname4, char *ifname6);

#endif /* __DPVS_COMMON_H__ */
97 changes: 46 additions & 51 deletions include/conf/dest.h
Original file line number Diff line number Diff line change
@@ -18,7 +18,7 @@
#ifndef __DPVS_DEST_CONF_H__
#define __DPVS_DEST_CONF_H__

#include "conf/service.h"
#include "conf/match.h"
#include "conf/conn.h"

/*
@@ -36,76 +36,71 @@ enum dpvs_fwd_mode {
};

enum {
DPVS_DEST_F_AVAILABLE = 0x1<<0,
DPVS_DEST_F_OVERLOAD = 0x1<<1,
DPVS_DEST_F_AVAILABLE = 0x1<<0, // dest removed
DPVS_DEST_F_OVERLOAD = 0x1<<1, // too many conns
DPVS_DEST_F_INHIBITED = 0x1<<2, // dest forwarding failure
};

struct dp_vs_dest_conf {
typedef struct dp_vs_dest_compat {
/* destination server address */
int af;
union inet_addr addr;
uint16_t port;
uint16_t proto;
uint32_t weight; /* destination weight */
union inet_addr addr;

uint16_t conn_flags; /* flags passed on to connections */
uint16_t flags; /* dest flags */

enum dpvs_fwd_mode fwdmode;
/* real server options */
unsigned conn_flags; /* connection flags */
int weight; /* destination weight */

/* thresholds for active connections */
uint32_t max_conn; /* upper threshold */
uint32_t min_conn; /* lower threshold */
};

struct dp_vs_dest_entry {
int af;
union inet_addr addr; /* destination address */
uint16_t port;
unsigned conn_flags; /* connection flags */
int weight; /* destination weight */

uint32_t max_conn; /* upper threshold */
uint32_t min_conn; /* lower threshold */
uint32_t max_conn; /* upper threshold */
uint32_t min_conn; /* lower threshold */

uint32_t actconns; /* active connections */
uint32_t inactconns; /* inactive connections */
uint32_t persistconns; /* persistent connections */
uint32_t actconns; /* active connections */
uint32_t inactconns; /* inactive connections */
uint32_t persistconns; /* persistent connections */

/* statistics */
struct dp_vs_stats stats;
};

struct dp_vs_get_dests {
/* which service: user fills in these */
int af;
uint16_t proto;
union inet_addr addr; /* virtual address */
uint16_t port;
uint32_t fwmark; /* firwall mark of service */
} dpvs_dest_compat_t;

/* number of real servers */
unsigned int num_dests;
typedef struct dp_vs_dest_table {
int af;
uint16_t proto;
uint16_t port;
uint32_t fwmark;
union inet_addr addr;

lcoreid_t cid;
unsigned int num_dests;

char srange[256];
char drange[256];
char iifname[IFNAMSIZ];
char oifname[IFNAMSIZ];
struct dp_vs_match match;

/* the real servers */
struct dp_vs_dest_entry entrytable[0];
};
lcoreid_t cid;
lcoreid_t index;

struct dp_vs_dest_user {
int af;
union inet_addr addr;
uint16_t port;
dpvs_dest_compat_t entrytable[0];
} dpvs_dest_table_t;

unsigned conn_flags;
int weight;
#define dp_vs_get_dests dp_vs_dest_table
#define dp_vs_dest_entry dp_vs_dest_compat
#define dp_vs_dest_conf dp_vs_dest_compat

uint32_t max_conn;
uint32_t min_conn;
};
#ifdef CONFIG_DPVS_AGENT
typedef struct dp_vs_dest_front {
uint32_t af;
uint16_t proto;
uint16_t port;
uint32_t fwmark;
union inet_addr addr;
unsigned int num_dests;
struct dp_vs_match match;
uint32_t cid;
uint32_t index;
} dpvs_dest_front_t;
#define dp_vs_dest_detail dp_vs_dest_compat
#endif

#endif /* __DPVS_DEST_CONF_H__ */
4 changes: 2 additions & 2 deletions include/conf/eal_mem.h
Original file line number Diff line number Diff line change
@@ -35,7 +35,7 @@ enum {
};

typedef struct eal_mem_seg_ret_s {
uint64_t phys_addr;
uint64_t iova;
uint64_t virt_addr;
uint64_t len;
uint64_t hugepage_sz;
@@ -52,7 +52,7 @@ typedef struct eal_all_mem_seg_ret_s {

typedef struct eal_mem_zone_ret_s {
char name[EAL_MEM_NAME_LEN];
uint64_t phys_addr;
uint64_t iova;
uint64_t virt_addr;
uint64_t len;
uint64_t hugepage_sz;
10 changes: 5 additions & 5 deletions include/conf/flow.h
Original file line number Diff line number Diff line change
@@ -26,6 +26,7 @@
#define __DPVS_FLOW_CONF_H__

#include <netinet/in.h>
#include "inet.h"

/* linux:include/uapi/route.h */
#define RTF_UP 0x0001 /* route usable */
@@ -45,11 +46,10 @@
#define RTF_LOCALIN 0x0800
#define RTF_DEFAULT 0x1000
#define RTF_KNI 0X2000
#define RTF_OUTWALL 0x4000

struct rt6_prefix {
struct in6_addr addr;
int plen;
};
typedef struct rt_addr {
union inet_addr addr;
int plen; /*prefix len*/
} rt_addr_t;

#endif /* __DPVS_FLOW_CONF_H__ */
2 changes: 2 additions & 0 deletions include/conf/inet.h
Original file line number Diff line number Diff line change
@@ -89,6 +89,7 @@ static inline const char *inet_proto_name(uint8_t proto)
const static char *proto_names[256] = {
[IPPROTO_TCP] = "TCP",
[IPPROTO_UDP] = "UDP",
[IPPROTO_SCTP] = "SCTP",
[IPPROTO_ICMP] = "ICMP",
[IPPROTO_ICMPV6] = "ICMPV6",
};
@@ -159,6 +160,7 @@ static inline int inet_addr_range_parse(const char *param,
port1 = port2 = NULL;
}

*af = 0;
memset(range, 0, sizeof(*range));

if (strlen(ip1) && inet_pton(AF_INET6, ip1, &range->min_addr.in6) > 0) {
37 changes: 33 additions & 4 deletions include/conf/inetaddr.h
Original file line number Diff line number Diff line change
@@ -34,6 +34,7 @@ enum {

/* leverage IFA_F_XXX in linux/if_addr.h*/
#define IFA_F_SAPOOL 0x10000 /* if address with sockaddr pool */
#define IFA_F_LINKLOCAL 0x20000 /* ipv6 link-local address */

/* ifa command flags */
#define IFA_F_OPS_VERBOSE 0x0001
@@ -50,15 +51,16 @@ typedef enum ifaddr_ops {

struct inet_addr_entry {
int af;
uint32_t valid_lft;
uint32_t prefered_lft;
uint32_t flags;
char ifname[IFNAMSIZ];
union inet_addr addr;
union inet_addr bcast;
union inet_addr addr;
uint8_t plen;
uint8_t scope;
lcoreid_t cid;
uint32_t valid_lft;
uint32_t prefered_lft;
uint32_t flags;
uint8_t nop;
} __attribute__((__packed__));

struct inet_addr_stats {
@@ -85,4 +87,31 @@ struct inet_addr_data_array {
struct inet_addr_data addrs[0];
} __attribute__((__packed__));

#ifdef CONFIG_DPVS_AGENT
struct inet_addr_stats_detail {
union inet_addr addr;
uint32_t sa_used;
uint32_t sa_free;
uint32_t sa_miss;
};

struct inet_addr_front {
int count;
int data[0];
};
#endif /* CONFIG_DPVS_AGENT */

struct inet_maddr_entry {
char ifname[IFNAMSIZ];
union inet_addr maddr;
int af;
uint32_t flags;
uint32_t refcnt;
} __attribute__((__packed__));

struct inet_maddr_array {
int nmaddr;
struct inet_maddr_entry maddrs[0];
} __attribute__((__packed__));

#endif /* __DPVS_INETADDR_CONF_H__ */
108 changes: 99 additions & 9 deletions include/conf/ipset.h
Original file line number Diff line number Diff line change
@@ -22,21 +22,111 @@
#ifndef __DPVS_IPSET_CONF_H__
#define __DPVS_IPSET_CONF_H__

#include <net/if.h>
#include "conf/inet.h"
#include "conf/sockopts.h"

struct dp_vs_ipset_conf {
int af;
union inet_addr addr;
#define IPSET_MAXNAMELEN 32
#define IPSET_MAXCOMLEN 32

#define IPSET_F_FORCE 0x0001

enum ipset_op {
IPSET_OP_ADD = 1,
IPSET_OP_DEL,
IPSET_OP_TEST,
IPSET_OP_CREATE,
IPSET_OP_DESTROY,
IPSET_OP_FLUSH,
IPSET_OP_LIST,
IPSET_OP_MAX
};

struct ipset_option {
union {
struct {
int32_t hashsize;
uint32_t maxelem;
uint8_t comment;
} __attribute__((__packed__)) create;
struct {
char padding[8];
uint8_t nomatch;
} __attribute__((__packed__)) add;
};
uint8_t family;
} __attribute__((__packed__));

struct ipset_param {
char type[IPSET_MAXNAMELEN];
char name[IPSET_MAXNAMELEN];
char comment[IPSET_MAXCOMLEN];
uint16_t opcode;
uint16_t flag;
struct ipset_option option;

uint8_t proto;
uint8_t cidr;
struct inet_addr_range range; /* port in host byteorder */
char iface[IFNAMSIZ];
uint8_t mac[6];

/* for type with 2 nets */
uint8_t padding;
uint8_t cidr2;
struct inet_addr_range range2;
//uint8_t mac[2];
};

struct dp_vs_multi_ipset_conf {
int num;
struct dp_vs_ipset_conf ipset_conf[0];
struct ipset_member {
char comment[IPSET_MAXCOMLEN];

union inet_addr addr;
uint8_t cidr;
uint8_t proto;
uint16_t port;
char iface[IFNAMSIZ];
uint8_t mac[6];
uint8_t nomatch;

/* second net */
uint8_t cidr2;
uint16_t port2;
uint8_t padding[2];
union inet_addr addr2;
};

struct ipset_info {
char name[IPSET_MAXNAMELEN];
char type[IPSET_MAXNAMELEN];
uint8_t comment;

uint8_t af;
uint8_t padding[2];

union {
struct ipset_bitmap_header {
uint8_t cidr;
uint8_t padding[3];
struct inet_addr_range range;
} bitmap;
struct ipset_hash_header {
uint8_t padding[4]; // aligned for dpvs-agent
int32_t hashsize;
uint32_t maxelem;
} hash;
};

uint32_t size;
uint32_t entries;
uint32_t references;

void *members;
};

struct dp_vs_ipset_conf_array {
int nipset;
struct dp_vs_ipset_conf ips[0];
struct ipset_info_array {
uint32_t nipset;
struct ipset_info infos[0];
} __attribute__((__packed__));

#endif /* __DPVS_IPSET_CONF_H__ */
51 changes: 51 additions & 0 deletions include/conf/kni.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
/*
* DPVS is a software load balancer (Virtual Server) based on DPDK.
*
* Copyright (C) 2021 iQIYI (www.iqiyi.com).
* All Rights Reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
*/
#ifndef __DPVS_KNI_CONF_H__
#define __DPVS_KNI_CONF_H__

#include <net/if.h>
#include "conf/inet.h"

enum kni_data_type {
KNI_DTYPE_ADDR_FLOW = 1,
};

struct kni_addr_flow_entry {
int af;
union inet_addr addr;
};

struct kni_addr_flow_info {
int nentries;
struct kni_addr_flow_entry entries[0];
} __attribute__((__packed__));

struct kni_conf_param {
enum kni_data_type type;
char ifname[IFNAMSIZ];
union {
struct kni_addr_flow_entry flow;
} data;
} __attribute__((__packed__));

struct kni_info {
int len;
struct kni_conf_param entries[0];
} __attribute__((__packed__));

#endif /* __DPVS_KNI_CONF_H__ */
34 changes: 28 additions & 6 deletions include/conf/laddr.h
Original file line number Diff line number Diff line change
@@ -24,6 +24,7 @@

#include "inet.h"
#include "net/if.h"
#include "conf/match.h"
#include "conf/sockopts.h"

struct dp_vs_laddr_entry {
@@ -33,18 +34,17 @@ struct dp_vs_laddr_entry {
uint32_t nconns;
};

struct dp_vs_laddr_conf {
typedef struct dp_vs_laddr_conf {
/* identify service */
int af_s;
uint8_t proto;
union inet_addr vaddr;
uint16_t vport;
uint32_t fwmark;
char srange[256];
char drange[256];
char iifname[IFNAMSIZ];
char oifname[IFNAMSIZ];

struct dp_vs_match match;
lcoreid_t cid;
lcoreid_t index;

/* for set */
int af_l;
@@ -54,6 +54,28 @@ struct dp_vs_laddr_conf {
/* for get */
int nladdrs;
struct dp_vs_laddr_entry laddrs[0];
};
} dpvs_laddr_table_t;

#ifdef CONFIG_DPVS_AGENT
typedef struct dp_vs_laddr_detail {
uint32_t af;
uint32_t conns;
uint64_t nport_conflict;
union inet_addr addr;
char ifname[IFNAMSIZ];
} dpvs_laddr_detail_t;

typedef struct dp_vs_laddr_front {
uint32_t af;
uint32_t port;
uint32_t proto;
uint32_t fwmark;
uint32_t cid;
uint32_t count;
union inet_addr addr;
struct dp_vs_match match;
struct dp_vs_laddr_detail laddrs[0];
} dpvs_laddr_front_t;
#endif /* CONFIG_DPVS_AGENT */

#endif /* __DPVS_LADDR_CONF_H__ */
41 changes: 41 additions & 0 deletions include/conf/lldp.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
/*
* DPVS is a software load balancer (Virtual Server) based on DPDK.
*
* Copyright (C) 2021 iQIYI (www.iqiyi.com).
* All Rights Reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
*/
#ifndef __DPVS_LLDP_CONF_H__
#define __DPVS_LLDP_CONF_H__

#include <net/if.h>
#include "conf/sockopts.h"

#define LLDP_MESSAGE_LEN 4096

#define DPVS_LLDP_NODE_LOCAL 0
#define DPVS_LLDP_NODE_NEIGH 1
#define DPVS_LLDP_NODE_MAX 2


struct lldp_param {
uint16_t node; /* DPVS_LLDP_NODE_xxx */
char ifname[IFNAMSIZ];
};

struct lldp_message {
struct lldp_param param;
char message[LLDP_MESSAGE_LEN];
};

#endif /* __DPVS_LLDP_CONF_H__ */
37 changes: 37 additions & 0 deletions include/conf/match.h
Original file line number Diff line number Diff line change
@@ -42,6 +42,41 @@ static inline bool is_empty_match(const struct dp_vs_match *match)
return !memcmp(match, &zero_match, sizeof(*match));
}

static inline int dp_vs_match_parse(const char *srange, const char *drange,
const char *iifname, const char *oifname,
int af, struct dp_vs_match *match)
{
int s_af = 0, d_af = 0, err;

memset(match, 0, sizeof(*match));

if (srange && strlen(srange)) {
err = inet_addr_range_parse(srange, &match->srange, &s_af);
if (err != EDPVS_OK)
return err;
}

if (drange && strlen(drange)) {
err = inet_addr_range_parse(drange, &match->drange, &d_af);
if (err != EDPVS_OK)
return err;
}

if (s_af && d_af && s_af != d_af) {
return EDPVS_INVAL;
}
match->af = s_af | d_af;

if (af && match->af && af != match->af) {
return EDPVS_INVAL;
}

snprintf(match->iifname, IFNAMSIZ, "%s", iifname ? : "");
snprintf(match->oifname, IFNAMSIZ, "%s", oifname ? : "");

return EDPVS_OK;
}

static inline int parse_match(const char *pattern, uint8_t *proto,
struct dp_vs_match *match)
{
@@ -58,6 +93,8 @@ static inline int parse_match(const char *pattern, uint8_t *proto,
*proto = IPPROTO_TCP;
} else if (strcmp(tok, "udp") == 0) {
*proto = IPPROTO_UDP;
} else if (strcmp(tok, "sctp") == 0) {
*proto = IPPROTO_SCTP;
} else if (strcmp(tok, "icmp") == 0) {
*proto = IPPROTO_ICMP;
} else if (strcmp(tok, "icmp6") == 0) {
22 changes: 13 additions & 9 deletions include/conf/neigh.h
Original file line number Diff line number Diff line change
@@ -33,15 +33,19 @@ enum {
};

struct dp_vs_neigh_conf {
int af;
uint8_t flag;
uint32_t state;
union inet_addr ip_addr;
struct ether_addr eth_addr;
uint32_t que_num;
char ifname[IFNAMSIZ];
uint8_t cid;
}__attribute__((__packed__));
int af;
uint32_t state;
union inet_addr ip_addr;
#ifdef __DPVS__
struct rte_ether_addr eth_addr;
#else
struct ether_addr eth_addr;
#endif
uint32_t que_num;
char ifname[IFNAMSIZ];
uint8_t flag;
uint8_t cid;
}__attribute__((__packed__, aligned(2)));

struct dp_vs_neigh_conf_array {
int neigh_nums;
37 changes: 29 additions & 8 deletions include/conf/netif.h
Original file line number Diff line number Diff line change
@@ -90,30 +90,33 @@ typedef struct netif_nic_list_get
/* basic nic info specified by port_id */
typedef struct netif_nic_basic_get
{
portid_t port_id;
char name[32];
char name[0x20];
char addr[0x20];
char link_status[0x10];
char link_duplex[0x10];
char link_autoneg[0x10];
uint32_t link_speed; /* ETH_SPEED_NUM_ */
uint8_t nrxq;
uint8_t ntxq;
char addr[32];
uint8_t padding[0x3];
uint8_t socket_id;
portid_t port_id;
uint16_t mtu;
uint32_t link_speed; /* ETH_SPEED_NUM_ */
char link_status[16];
char link_duplex[16];
char link_autoneg[16];
uint16_t promisc:1; /* promiscuous mode */
uint16_t allmulticast:1;
uint16_t fwd2kni:1;
uint16_t tc_egress:1;
uint16_t tc_ingress:1;
uint16_t ol_rx_ip_csum:1;
uint16_t ol_tx_ip_csum:1;
uint16_t ol_tx_tcp_csum:1;
uint16_t ol_tx_udp_csum:1;
uint16_t lldp:1;
uint16_t ol_tx_fast_free:1;
} netif_nic_basic_get_t;

/* nic statistics specified by port_id */
typedef struct netif_nic_stats_get {
portid_t port_id;
uint32_t mbuf_avail;/* Number of available mbuf in pktmempool */
uint32_t mbuf_inuse;/* Number of used mbuf in pktmempool */
uint64_t ipackets; /* Total number of successfully received packets. */
@@ -136,8 +139,22 @@ typedef struct netif_nic_stats_get {
/* Total number of successfully transmitted queue bytes. */
uint64_t q_errors[RTE_ETHDEV_QUEUE_STAT_CNTRS];
/* Total number of queue packets received that are dropped. */
uint16_t padding[0x3];
portid_t port_id;
} netif_nic_stats_get_t;

struct netif_nic_xstats_entry {
uint64_t id;
uint64_t val;
char name[64];
};

typedef struct netif_nic_xstats_get {
portid_t pid;
uint16_t nentries;
struct netif_nic_xstats_entry entries[0];
} netif_nic_xstats_get_t;

/* dev info specified by port_id */
struct netif_nic_dev_get
{
@@ -222,6 +239,8 @@ typedef struct netif_nic_set {
char macaddr[18];
uint16_t promisc_on:1;
uint16_t promisc_off:1;
uint16_t allmulticast_on:1;
uint16_t allmulticast_off:1;
uint16_t link_status_up:1;
uint16_t link_status_down:1;
uint16_t forward2kni_on:1;
@@ -230,6 +249,8 @@ typedef struct netif_nic_set {
uint16_t tc_egress_off:1;
uint16_t tc_ingress_on:1;
uint16_t tc_ingress_off:1;
uint16_t lldp_on:1;
uint16_t lldp_off:1;
} netif_nic_set_t;

typedef struct netif_bond_set {
37 changes: 37 additions & 0 deletions include/conf/netif_addr.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
/*
* DPVS is a software load balancer (Virtual Server) based on DPDK.
*
* Copyright (C) 2021 iQIYI (www.iqiyi.com).
* All Rights Reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
*/
#ifndef __DPVS_NETIF_ADDR_CONF_H__
#define __DPVS_NETIF_ADDR_CONF_H__

enum {
HW_ADDR_F_FROM_KNI = 1, // from linux kni device in local layer
};

struct netif_hw_addr_entry {
char addr[18];
uint32_t refcnt;
uint16_t flags;
uint16_t sync_cnt;
} __attribute__((__packed__));

struct netif_hw_addr_array {
int count;
struct netif_hw_addr_entry entries[0];
} __attribute__((__packed__));

#endif
Loading