Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd-nsc-vpp: issues with NOT default NSM_NAME #1826

Open
richardstone opened this issue Jun 17, 2021 · 20 comments
Open

cmd-nsc-vpp: issues with NOT default NSM_NAME #1826

richardstone opened this issue Jun 17, 2021 · 20 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@richardstone
Copy link

Hi!

When I set the NSM_NAME parameter of cmd-nsc-vpp to anything other than the default, I get this error:
Jun 17 16:19:42.714 [ERRO] [cmd:/bin/cmd-nsc-vpp] (19.1) proxyListener unable to listen on /tmp/memifproxy/endpoint-nsc-795886dc88-577t6-f96ec20f-ede0-499f-b0cc-819e8f735869/memif.socket: listen unixpacket /tmp/memifproxy/endpoint-nsc-795886dc88-577t6-f96ec20f-ede0-499f-b0cc-819e8f735869/memif.socket: bind: invalid argument

The interface seems to be in place for a second, but there are no neighbors and the pod keeps restarting.
vppctl show interface address
local0 (dn):
memif1/0 (up):
L3 172.16.1.96/32

I followed this guide: https://github.com/networkservicemesh/deployments-k8s/tree/main/examples/use-cases/Memif2Memif
If I don't set the NSM_NAME parameter, it works correctly.

Is it possible that the given name is not handled correctly somewhere, or could you help me on where to look for issues?

@denis-tingaikin denis-tingaikin added the bug Something isn't working label Jun 17, 2021
@denis-tingaikin
Copy link
Member

@denis-tingaikin
Copy link
Member

@richardstone Could you share your diff in the deployment?

@richardstone
Copy link
Author

richardstone commented Jun 17, 2021

Yes that is exactly how i'd like to use, but if I uncomment the mentioned part, it does not work.

Here is my deployment:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nsc
  labels:
    app: nsc
spec:
  selector:
    matchLabels:
      app: nsc
  replicas: 1
  template:
    metadata:
      labels:
        app: nsc
    spec:
      serviceAccountName: endpoint-nsc
      containers:
        - name: {{ template "endpoint.name" . }}-nsc
          image: {{ template "endpoint.nsc.image" . }}
          imagePullPolicy: {{ .Values.images.nsc.pullPolicy }}
          env:
            - name: NSM_REQUEST_TIMEOUT
              value: 1m
            - name: SPIFFE_ENDPOINT_SOCKET
              value: unix:///run/spire/sockets/agent.sock
            # - name: NSM_NAME
            #   valueFrom:
            #     fieldRef:
            #       fieldPath: metadata.name
            - name: NSM_NETWORK_SERVICES
              value: {{ .Values.type.nsc }}://icmp-responder/nsm-1
            - name: NSM_DIAL_TIMEOUT
              value: "60s"
            - name: NSM_REQUEST_TIMEOUT
              value: "300s"
          volumeMounts:
            - name: spire-agent-socket
              mountPath: /run/spire/sockets
              readOnly: true
            - name: nsm-socket
              mountPath: /var/lib/networkservicemesh
              readOnly: true
      volumes:
        - name: spire-agent-socket
          hostPath:
            path: /run/spire/sockets
            type: Directory
        - name: nsm-socket
          hostPath:
            path: /var/lib/networkservicemesh
            type: DirectoryOrCreate
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: endpoint-nsc

@denis-tingaikin
Copy link
Member

OK, thanks! Could you also share logs from the container?

@richardstone
Copy link
Author

Here is the log: cmd-nsc-vpp.txt

@denis-tingaikin
Copy link
Member

@d-uzlov Could you have a look at this issue ASAP?

@denis-tingaikin denis-tingaikin added Planning Issue related to SOW ASAP The issue should be completed as soon as possible labels Jun 17, 2021
@d-uzlov
Copy link
Contributor

d-uzlov commented Jun 18, 2021

@richardstone could you provide info about the cluster and operation system you are using?

Also, which exact names did you test besides "endpoint-nsc"? Are the errors in logs the same for all of the names you tested? If not, could you post logs for those different cases?
Could you provide logs of a successful run with default settings?

Am I understanding correctly that short names also don't work? Like, for example, just "nsc" as in the deployment config you posted.

I wasn't able to reproduce the issue with the name you provided. However, I was able to get the same error when the name is too long. On my system "too long" is not applicable to names like "endpoint-nsc-7f9c9cddc9-hjsk5", I had to add ~15 symbols to it to make name too long, but maybe the limit is different for your system.

@d-uzlov
Copy link
Contributor

d-uzlov commented Jun 18, 2021

Here are the logs I get when I try to set the NSM_NAME to the value from your logs:

Jun 18 08:42:29.556 [INFO] [cmd:/bin/cmd-nsc-vpp] Setting env variable DLV_LISTEN_CMD_NSC_VPP to a valid dlv '--listen' value will cause the dlv debugger to execute this binary and listen as directed.
Jun 18 08:42:29.556 [INFO] [cmd:/bin/cmd-nsc-vpp] there are 5 phases which will be executed followed by a success message:
Jun 18 08:42:29.556 [INFO] [cmd:/bin/cmd-nsc-vpp] the phases include:
Jun 18 08:42:29.556 [INFO] [cmd:/bin/cmd-nsc-vpp] 1: get config from environment
Jun 18 08:42:29.556 [INFO] [cmd:/bin/cmd-nsc-vpp] 2: run vpp and get a connection to it
Jun 18 08:42:29.556 [INFO] [cmd:/bin/cmd-nsc-vpp] 3: retrieve spiffe svid
Jun 18 08:42:29.556 [INFO] [cmd:/bin/cmd-nsc-vpp] 4: create network service client
Jun 18 08:42:29.556 [INFO] [cmd:/bin/cmd-nsc-vpp] 5: connect to all passed services
Jun 18 08:42:29.556 [INFO] [cmd:/bin/cmd-nsc-vpp] a final success message with start time duration
Jun 18 08:42:29.556 [INFO] [cmd:/bin/cmd-nsc-vpp] executing phase 1: get config from environment (time since start: 31.6µs)
This application is configured via the environment. The following environment
variables can be used:

KEY                       TYPE                           DEFAULT                                           REQUIRED    DESCRIPTION
NSM_NAME                  String                         cmd-nsc-vpp                                                   Name of Endpoint
NSM_DIAL_TIMEOUT          Duration                       5s                                                            timeout to dial NSMgr
NSM_REQUEST_TIMEOUT       Duration                       15s                                                           timeout to request NSE
NSM_CONNECT_TO            URL                            unix:///var/lib/networkservicemesh/nsm.io.sock                url to connect to
NSM_MAX_TOKEN_LIFETIME    Duration                       24h                                                           maximum lifetime of tokens
NSM_NETWORK_SERVICES      Comma-separated list of URL                                                                  A list of Network Service Requests
Jun 18 08:42:29.557 [INFO] [cmd:/bin/cmd-nsc-vpp] Config: &main.Config{Name:"endpoint-nsc-7f9c9cddc9-hjsk5", DialTimeout:5000000000, RequestTimeout:60000000000, ConnectTo:url.URL{Scheme:"unix", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"/var/lib/networkservicemesh/nsm.io.sock", RawPath:"", ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}, MaxTokenLifetime:86400000000000, NetworkServices:[]url.URL{url.URL{Scheme:"memif", Opaque:"", User:(*url.Userinfo)(nil), Host:"icmp-responder", Path:"/nsm-1", RawPath:"", ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}}}
Jun 18 08:42:29.557 [INFO] [cmd:/bin/cmd-nsc-vpp] [duration:555.5µs] completed phase 1: get config from environment
Jun 18 08:42:29.557 [INFO] [cmd:/bin/cmd-nsc-vpp] executing phase 2: run vpp and get a connection to it (time since start: 613µs)
Jun 18 08:42:29.557 [INFO] Configuration file: "/etc/vpp/helper/vpp.conf" not found, using defaults
Jun 18 08:42:29.558 [INFO] [cmd:/bin/cmd-nsc-vpp] [duration:1.6467ms] completed phase 2: run vpp and get a connection to it
Jun 18 08:42:29.558 [INFO] [cmd:/bin/cmd-nsc-vpp] executing phase 3: retrieving svid, check spire agent logs if this is the last line you see (time since start: 2.3336ms)
Jun 18 08:42:29.557 [INFO] [cmd:vpp] vpp[11]: clib_elf_parse_file: open `linux-vdso.so.1': No such file or directory
Jun 18 08:42:29.557 [INFO] [cmd:vpp] vpp[11]: clib_sysfs_prealloc_hugepages:260: pre-allocating 64 additional 2048K hugepages on numa node 0
Jun 18 08:42:29.557 [INFO] [cmd:vpp] vpp[11]: buffer: vlib_physmem_shared_map_create: pmalloc_map_pages: failed to mmap 64 pages at 0x1000000000 fd 5 numa 0 flags 0x11: Cannot allocate memory
Jun 18 08:42:29.557 [INFO] [cmd:vpp]
Jun 18 08:42:29.557 [INFO] [cmd:vpp] vpp[11]: buffer: falling back to non-hugepage backed buffer pool
Jun 18 08:42:29.557 [INFO] [cmd:vpp] vpp[11]: vat-plug/load: vat_plugin_register: oddbuf plugin not loaded...
Jun 18 08:42:30.592 [INFO] SVID: "spiffe://example.org/ns/ns-mr7gh/sa/default"
Jun 18 08:42:30.592 [INFO] [cmd:/bin/cmd-nsc-vpp] [duration:1.0340512s] completed phase 3: retrieving svid
Jun 18 08:42:30.592 [INFO] [cmd:/bin/cmd-nsc-vpp] executing phase 4: create network service client (time since start: 1.0364296s)
Jun 18 08:42:30.592 [INFO] [cmd:/bin/cmd-nsc-vpp] executing phase 5: connect to all passed services (time since start: 1.0365073s)
Jun 18 08:42:30.593 [INFO] [cmd:/bin/cmd-nsc-vpp] (1) ⎆ sdk/pkg/networkservice/common/updatepath/updatePathClient.Request()
Jun 18 08:42:30.594 [INFO] [cmd:/bin/cmd-nsc-vpp] (1.1)   request={"connection":{"id":"endpoint-nsc-7f9c9cddc9-hjsk5-6c1ca3fd-01f3-4d6d-a3b1-c3e5b874122d","network_service":"icmp-responder"}}
Jun 18 08:42:30.594 [INFO] [cmd:/bin/cmd-nsc-vpp] (1.2)   request-diff={"connection":{"path":{"path_segments":{"+0":{"name":"endpoint-nsc-7f9c9cddc9-hjsk5","id":"endpoint-nsc-7f9c9cddc9-hjsk5-6c1ca3fd-01f3-4d6d-a3b1-c3e5b874122d"}}}}}
Jun 18 08:42:30.594 [INFO] [cmd:/bin/cmd-nsc-vpp] (2)  ⎆ sdk/pkg/networkservice/common/serialize/serializeClient.Request()
Jun 18 08:42:30.594 [INFO] [cmd:/bin/cmd-nsc-vpp] (3)   ⎆ sdk/pkg/networkservice/common/refresh/refreshClient.Request()
Jun 18 08:42:30.594 [INFO] [cmd:/bin/cmd-nsc-vpp] (4)    ⎆ sdk/pkg/networkservice/utils/metadata/metaDataClient.Request()
Jun 18 08:42:30.595 [INFO] [cmd:/bin/cmd-nsc-vpp] (5)     ⎆ sdk/pkg/networkservice/core/adapters/serverToClient.Request()
Jun 18 08:42:30.595 [INFO] [cmd:/bin/cmd-nsc-vpp] (6)      ⎆ sdk/pkg/networkservice/common/heal/healServer.Request()
Jun 18 08:42:30.595 [INFO] [cmd:/bin/cmd-nsc-vpp] (7)       ⎆ sdk/pkg/networkservice/common/clienturl/clientURLServer.Request()
Jun 18 08:42:30.595 [INFO] [cmd:/bin/cmd-nsc-vpp] (8)        ⎆ sdk/pkg/networkservice/common/connect/connectServer.Request()
Jun 18 08:42:30.598 [INFO] [cmd:/bin/cmd-nsc-vpp] (9)         ⎆ sdk/pkg/networkservice/utils/metadata/metaDataClient.Request()
Jun 18 08:42:30.599 [INFO] [cmd:/bin/cmd-nsc-vpp] (10)          ⎆ sdk/pkg/networkservice/core/next/nextClient.Request()
Jun 18 08:42:30.599 [INFO] [cmd:/bin/cmd-nsc-vpp] (11)           ⎆ sdk-vpp/pkg/networkservice/up/peerup/peerupClient.Request()
Jun 18 08:42:30.599 [INFO] [cmd:/bin/cmd-nsc-vpp] (12)            ⎆ sdk-vpp/pkg/networkservice/up/upClient.Request()
Jun 18 08:42:30.599 [INFO] [cmd:/bin/cmd-nsc-vpp] [duration:189.1µs] [vppapi:WantInterfaceEvents] (12.1)              completed
Jun 18 08:42:30.599 [INFO] [cmd:/bin/cmd-nsc-vpp] (13)             ⎆ sdk/pkg/networkservice/core/next/nextClient.Request()
Jun 18 08:42:30.599 [INFO] [cmd:/bin/cmd-nsc-vpp] (14)              ⎆ sdk-vpp/pkg/networkservice/connectioncontext/mtu/mtuClient.Request()
Jun 18 08:42:30.599 [INFO] [cmd:/bin/cmd-nsc-vpp] (14.1)                request-diff={"connection":{"context":{"MTU":9000}}}
Jun 18 08:42:30.599 [INFO] [cmd:/bin/cmd-nsc-vpp] (15)               ⎆ sdk-vpp/pkg/networkservice/connectioncontext/ipcontext/routes/routesClient.Request()
Jun 18 08:42:30.600 [INFO] [cmd:/bin/cmd-nsc-vpp] (16)                ⎆ sdk-vpp/pkg/networkservice/connectioncontext/ipcontext/ipaddress/ipaddressClient.Request()
Jun 18 08:42:30.600 [INFO] [cmd:/bin/cmd-nsc-vpp] (17)                 ⎆ sdk/pkg/networkservice/core/next/nextClient.Request()
Jun 18 08:42:30.600 [INFO] [cmd:/bin/cmd-nsc-vpp] (18)                  ⎆ sdk-vpp/pkg/networkservice/mechanisms/memif/memifClient.Request()
Jun 18 08:42:30.600 [INFO] [cmd:/bin/cmd-nsc-vpp] (18.1)                    request-diff={"mechanism_preferences":{"+0":{"cls":"LOCAL","type":"MEMIF"}}}
Jun 18 08:42:30.600 [INFO] [cmd:/bin/cmd-nsc-vpp] (19)                   ⎆ sdk-vpp/pkg/networkservice/mechanisms/memif/memifproxy/memifProxyClient.Request()
Jun 18 08:42:30.600 [INFO] [cmd:/bin/cmd-nsc-vpp] (20)                    ⎆ sdk/pkg/networkservice/common/mechanisms/sendfd/sendFDClient.Request()
Jun 18 08:42:30.600 [INFO] [cmd:/bin/cmd-nsc-vpp] (21)                     ⎆ sdk/pkg/networkservice/common/mechanisms/recvfd/recvFDClient.Request()
Jun 18 08:42:30.600 [INFO] [cmd:/bin/cmd-nsc-vpp] (22)                      ⎆ sdk/pkg/networkservice/common/heal/healClient.Request()
Jun 18 08:42:30.601 [INFO] [cmd:/bin/cmd-nsc-vpp] (23)                       ⎆ sdk/pkg/networkservice/common/null/nullClient.Request()
Jun 18 08:42:30.601 [INFO] [cmd:/bin/cmd-nsc-vpp] (24)                        ⎆ api/pkg/api/networkservice/networkServiceClient.Request()
Jun 18 08:42:31.584 [INFO] [cmd:/bin/cmd-nsc-vpp] (24.1)                          response={"id":"endpoint-nsc-7f9c9cddc9-hjsk5-6c1ca3fd-01f3-4d6d-a3b1-c3e5b874122d","network_service":"icmp-responder","mechanism":{"cls":"LOCAL","type":"MEMIF","parameters":{"inodeURL":"inode://1048793/163769"}},"context":{"ip_context":{"src_ip_addrs":["172.16.1.101/32"],"dst_ip_addrs":["172.16.1.100/32"],"src_routes":[{"prefix":"172.16.1.100/32"}],"dst_routes":[{"prefix":"172.16.1.101/32"}]},"MTU":9000},"labels":{"nodeName":"kind-worker"},"path":{"path_segments":[{"name":"endpoint-nsc-7f9c9cddc9-hjsk5","id":"endpoint-nsc-7f9c9cddc9-hjsk5-6c1ca3fd-01f3-4d6d-a3b1-c3e5b874122d","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20tc3lzdGVtL3NhL2RlZmF1bHQiLCJleHAiOjE2MjQwMDc2MzYsInN1YiI6InNwaWZmZTovL2V4YW1wbGUub3JnL25zL25zLW1yN2doL3NhL2RlZmF1bHQifQ.a2WOQb1X4EXBOBNLBF-8uWKp7kRqFC3W3XoF4mKt53LQmTyRqwEJgHq7TnoNYCMkTt6vyETTea6_3JKYMxH-Sw","expires":{"seconds":1624007636}},{"name":"nsmgr-jt54m","id":"2ffda5b0-2497-4d18-99b1-825e74a08c48","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20tc3lzdGVtL3NhL2RlZmF1bHQiLCJleHAiOjE2MjQwMDgzNDMsInN1YiI6InNwaWZmZTovL2V4YW1wbGUub3JnL25zL25zbS1zeXN0ZW0vc2EvZGVmYXVsdCJ9.OIm7VMS88AEE2goqvVAJj9N0kpZEkJ9mc9SoxYPNhsXszC7_khM6QdUYs2AA46_sUbie9QQ7fokYOIuQwXNuiA","expires":{"seconds":1624008343}},{"name":"forwarder-vpp-z49mx","id":"8508bbd8-bdd9-4a06-8f00-f5fc31803316","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20tc3lzdGVtL3NhL2RlZmF1bHQiLCJleHAiOjE2MjQwMDgzNDMsInN1YiI6InNwaWZmZTovL2V4YW1wbGUub3JnL25zL25zbS1zeXN0ZW0vc2EvZGVmYXVsdCJ9.n1wOTd6G0E8XzSLA5nvmdUAFB_M3IM2dsLUVUT8MbknGBTeiMjZS-IuWYrvm04Y5HeAX_w9njfY0pB-UZFdcwQ","expires":{"seconds":1624008343},"metrics":{"client_drops":"0","client_rx_bytes":"0","client_rx_packets":"0","client_tx_bytes":"0","client_tx_packets":"0"}},{"name":"nsmgr-jt54m","id":"6a37f647-0305-4be4-b5b8-3da627134fac","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9ucy1tcjdnaC9zYS9kZWZhdWx0IiwiZXhwIjoxNjI0MDA4MzQzLCJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20tc3lzdGVtL3NhL2RlZmF1bHQifQ.Yv1neG-lDEeLWvJKJQWVH1NOoJA3d-P9WwXZR5sLqD1nU6QZWNDda4jDzN9Jgh7xZ0DG1ioXf8htkPT7aQdZgQ","expires":{"seconds":1624007636}},{"name":"nse-memif-9b6679887-8xbj2","id":"bbef3225-f148-4013-8097-036fe58134e7","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20tc3lzdGVtL3NhL2RlZmF1bHQiLCJleHAiOjE2MjQwMDc2MzYsInN1YiI6InNwaWZmZTovL2V4YW1wbGUub3JnL25zL25zLW1yN2doL3NhL2RlZmF1bHQifQ.QAgt7lw4RyAi7qQN_p6nA0NBN8jt6lDz2GblGn58SfdqQVxg4auGSOfkc-IXaaf3kaYHFFEamZF-K4uhj7VfoA","expires":{"seconds":1624007636}}]},"network_service_endpoint_name":"c632436f-5d67-44f1-b984-0d13f91383c8-nse-memif-9b6679887-8xbj2","payload":"ETHERNET"}
Jun 18 08:42:31.584 [INFO] [cmd:/bin/cmd-nsc-vpp] (21.1)                       response-diff={"mechanism":{"parameters":{"inodeURL":"file:///proc/1/fd/10"}}}
Jun 18 08:42:31.585 [INFO] [cmd:/bin/cmd-nsc-vpp] (19.1)                     response-diff={"mechanism":{"parameters":{"inodeURL":"file:///tmp/memifproxy/endpoint-nsc-7f9c9cddc9-hjsk5-6c1ca3fd-01f3-4d6d-a3b1-c3e5b874122d/memif.socket"}}}
time="2021-06-18T08:42:31Z" level=info msg="No subscription found for the notification message." msg_id=81 msg_size=19
time="2021-06-18T08:42:31Z" level=info msg="No subscription found for the notification message." msg_id=81 msg_size=19
Jun 18 08:42:29.557 [INFO] [cmd:vpp] vpp[11]: memif_plugin: clib_file_add fd 11 private_data 0 idx 4
Jun 18 08:42:29.557 [INFO] [cmd:vpp] vpp[11]: memif_plugin: clib_file_add fd 14 private_data 0 idx 5
Jun 18 08:42:31.595 [INFO] [cmd:/bin/cmd-nsc-vpp] (8.1)          request-diff={"connection":{"context":{"ip_context":{"dst_ip_addrs":{"+0":"172.16.1.100/32"},"dst_routes":{"+0":{"prefix":"172.16.1.101/32"}},"src_ip_addrs":{"+0":"172.16.1.101/32"},"src_routes":{"+0":{"prefix":"172.16.1.100/32"}}}},"labels":{"+nodeName":"kind-worker"},"mechanism":{"cls":"LOCAL","parameters":{"+inodeURL":"file:///tmp/memifproxy/endpoint-nsc-7f9c9cddc9-hjsk5-6c1ca3fd-01f3-4d6d-a3b1-c3e5b874122d/memif.socket"},"type":"MEMIF"},"network_service_endpoint_name":"c632436f-5d67-44f1-b984-0d13f91383c8-nse-memif-9b6679887-8xbj2","path":{"path_segments":{"+1":{"name":"nsmgr-jt54m","id":"2ffda5b0-2497-4d18-99b1-825e74a08c48","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20tc3lzdGVtL3NhL2RlZmF1bHQiLCJleHAiOjE2MjQwMDgzNDMsInN1YiI6InNwaWZmZTovL2V4YW1wbGUub3JnL25zL25zbS1zeXN0ZW0vc2EvZGVmYXVsdCJ9.OIm7VMS88AEE2goqvVAJj9N0kpZEkJ9mc9SoxYPNhsXszC7_khM6QdUYs2AA46_sUbie9QQ7fokYOIuQwXNuiA","expires":{"seconds":1624008343}},"+2":{"name":"forwarder-vpp-z49mx","id":"8508bbd8-bdd9-4a06-8f00-f5fc31803316","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20tc3lzdGVtL3NhL2RlZmF1bHQiLCJleHAiOjE2MjQwMDgzNDMsInN1YiI6InNwaWZmZTovL2V4YW1wbGUub3JnL25zL25zbS1zeXN0ZW0vc2EvZGVmYXVsdCJ9.n1wOTd6G0E8XzSLA5nvmdUAFB_M3IM2dsLUVUT8MbknGBTeiMjZS-IuWYrvm04Y5HeAX_w9njfY0pB-UZFdcwQ","expires":{"seconds":1624008343},"metrics":{"client_drops":"0","client_rx_bytes":"0","client_rx_packets":"0","client_tx_bytes":"0","client_tx_packets":"0"}},"+3":{"name":"nsmgr-jt54m","id":"6a37f647-0305-4be4-b5b8-3da627134fac","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9ucy1tcjdnaC9zYS9kZWZhdWx0IiwiZXhwIjoxNjI0MDA4MzQzLCJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20tc3lzdGVtL3NhL2RlZmF1bHQifQ.Yv1neG-lDEeLWvJKJQWVH1NOoJA3d-P9WwXZR5sLqD1nU6QZWNDda4jDzN9Jgh7xZ0DG1ioXf8htkPT7aQdZgQ","expires":{"seconds":1624007636}},"+4":{"name":"nse-memif-9b6679887-8xbj2","id":"bbef3225-f148-4013-8097-036fe58134e7","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20tc3lzdGVtL3NhL2RlZmF1bHQiLCJleHAiOjE2MjQwMDc2MzYsInN1YiI6InNwaWZmZTovL2V4YW1wbGUub3JnL25zL25zLW1yN2doL3NhL2RlZmF1bHQifQ.QAgt7lw4RyAi7qQN_p6nA0NBN8jt6lDz2GblGn58SfdqQVxg4auGSOfkc-IXaaf3kaYHFFEamZF-K4uhj7VfoA","expires":{"seconds":1624007636}},"0":{"expires":{"seconds":1624007636},"token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20tc3lzdGVtL3NhL2RlZmF1bHQiLCJleHAiOjE2MjQwMDc2MzYsInN1YiI6InNwaWZmZTovL2V4YW1wbGUub3JnL25zL25zLW1yN2doL3NhL2RlZmF1bHQifQ.a2WOQb1X4EXBOBNLBF-8uWKp7kRqFC3W3XoF4mKt53LQmTyRqwEJgHq7TnoNYCMkTt6vyETTea6_3JKYMxH-Sw"}}},"payload":"ETHERNET"},"mechanism_preferences":{"-0":{"cls":"LOCAL","type":"MEMIF"}}}

@denis-tingaikin denis-tingaikin removed ASAP The issue should be completed as soon as possible Planning Issue related to SOW labels Jun 18, 2021
@richardstone
Copy link
Author

Hi!

Thanks for the fast response.
You're right the name was much longer when it failed. Strange thing is that the kernel2kernel example works well with the longer name. Is there any chance you can make the cmd-nsc-vpp to accept a longer name just like the cmd-nsc? Or do you know where this limitation comes from in case of the vpp image?

@d-uzlov
Copy link
Contributor

d-uzlov commented Jun 18, 2021

Oh, it's good to know that we correctly identified the cause of the issue!

The limitation comes from the fact that memif connection uses unix sockets, with file name containing connection id, and cmd-nsc-vpp uses the name from its config as part of the connection id.
Linux has a hard limit on max length of the unix socket path, and when name of the container is very long, name of the socket may exceed the limit.

I guess we should either change the way we generate the socket name, or find some workaround.

@richardstone
Copy link
Author

richardstone commented Jun 18, 2021

Yes, thanks a lot for the investigation!

Is it possible that you'll change the method of socket name generation so there won't be differences in limits for the memif and kernel NSC names?

@d-uzlov
Copy link
Contributor

d-uzlov commented Jun 18, 2021

Yeah, I believe that we will fix it to remove limitations.

@richardstone
Copy link
Author

Very good, Thanks in advance!

@edwarnicke
Copy link
Member

@richardstone @d-uzlov Good catch.

@d-uzlov Is there any reason to set a Connection.ID at all in the cmd-nsc rather than letting the normal connection id generation kick in?

@edwarnicke
Copy link
Member

@richardstone Looking more closely at this, while I wholeheartedly support your recommended workaround of not including the name in the connection id in cmd-nsc as a fix for NSM 1.0, we need to get a more comprehensive fix for post NSM 1.0 :)

I am seeing some reports that using relative paths can provide a partial workaround. If true, that should give us a reliable way to work around the issue. Thoughts?

@richardstone
Copy link
Author

@edwarnicke The best for me would be that I'd be able to use the same long name for the memif nsc that I use for the kernel one.

As a workaround I started to look for a way to be able to take a substring of the full pod name as the NSM_NAME variable but I had no luck with finding a solution for that so far.
I guess the thing that many NSCs will have the same name (if i leave out the parameter from my deployment so the NSM_NAME would get it's default value for every replica of the NSC) would cause issues when I start to raise the replica number, so it would be good to keep the uniqueness in the name.

@d-uzlov
Copy link
Contributor

d-uzlov commented Jun 18, 2021

@d-uzlov Is there any reason to set a Connection.ID at all in the cmd-nsc rather than letting the normal connection id generation kick in?

I don't think there is any real benefit in manually setting the connection ID here. We already have path name for each path segment, which is usually used to identify where the connection came through.

But we probably don't want to limit the users in which connection ids they can use, so I think memif should support long ids.

I am seeing some reports that using relative paths can provide a partial workaround. If true, that should give us a reliable way to work around the issue. Thoughts?

I was thinking about keeping a map of [connectionId -> UUID], and using these uuids for socket names instead of connection ids.

Relative paths should work too, though I'm not sure if they will be convenient, since we would be changing current working directory of the program, and some clients may not expect this. Also, I didn't research this properly, but we could get issues with multithreading.

@edwarnicke
Copy link
Member

@richardstone Net-net: are you OK for NSM 1.0 of we simply don't use the pod name in the connection id, and we can revisit for a better solution post NSM 1.0?

edwarnicke added a commit to edwarnicke/cmd-nsc-vpp that referenced this issue Jun 18, 2021
This is a workaround for:
networkservicemesh/deployments-k8s#1826

Signed-off-by: Ed Warnicke <hagbard@gmail.com>
edwarnicke added a commit to edwarnicke/cmd-nsc-vpp that referenced this issue Jun 18, 2021
This is a workaround for:
networkservicemesh/deployments-k8s#1826

Signed-off-by: Ed Warnicke <hagbard@gmail.com>
edwarnicke added a commit to edwarnicke/cmd-nsc-vpp that referenced this issue Jun 18, 2021
This is a workaround for:
networkservicemesh/deployments-k8s#1826

Signed-off-by: Ed Warnicke <hagbard@gmail.com>
@richardstone
Copy link
Author

@edwarnicke It's okay for me. Thanks!

denis-tingaikin pushed a commit to networkservicemesh/cmd-nsc-vpp that referenced this issue Jun 18, 2021
This is a workaround for:
networkservicemesh/deployments-k8s#1826

Signed-off-by: Ed Warnicke <hagbard@gmail.com>
nsmbot pushed a commit that referenced this issue Jun 18, 2021
…cmd-nsc-vpp@main networkservicemesh/cmd-nsc-vpp#

networkservicemesh/cmd-nsc-vpp PR link: https://github.com/networkservicemesh/cmd-nsc-vpp/pull/

networkservicemesh/cmd-nsc-vpp commit message:
commit 43bb24c9953cf220ea08e52ab50fa853beb9084a
Author: Ed Warnicke <hagbard@gmail.com>
Date:   Fri Jun 18 08:39:34 2021 -0500

    Fix use of nsc name in connection ignored (#174)

    This is a workaround for:
    #1826

    Signed-off-by: Ed Warnicke <hagbard@gmail.com>

Signed-off-by: NSMBot <nsmbot@networkservicmesh.io>
@edwarnicke
Copy link
Member

@richardstone Temp fix is in. Lets leave this issue open to get a longer term fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: No status
Development

No branches or pull requests

4 participants