Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Posture Check #1803

Open
GascPT opened this issue Apr 5, 2024 · 23 comments
Open

Posture Check #1803

GascPT opened this issue Apr 5, 2024 · 23 comments
Labels
bug Something isn't working client self-hosting

Comments

@GascPT
Copy link

GascPT commented Apr 5, 2024

Describe the problem

I have a setup with multiple Sites, in each site I have a peer act as a gateway to advertise the routes of the Sites.
I have set a posture check to not advertise the routes when a mobile peer is located locally on the Site.
The posture checks the IP Range.

When this endpoint goes home and reconnects to the netbird he does not acquire the routes to Site A.

To correct this we need to exclude and include the peer from the distribution group in the netbird admin panel.

Expected behavior

We expected a recheck of the posture check

Are you using NetBird Cloud?

No, is a self-hosted deployment

NetBird version

In all versions, we test from v0.26.3 to v0.27.1

@pappz
Copy link
Contributor

pappz commented Apr 8, 2024

Hello!

Could you send me more details about your system?

  • Peer’s OS
  • Policy in which the posture is applied, how you configured it
  • The network route configuration

@GascPT
Copy link
Author

GascPT commented Apr 8, 2024

The peer OS is an Ubuntu Machine, but happens in macOS and Windows.

The policy checks the IP Range from where the peer is connecting, if the peer is connecting from the IP Range of Site A the ACL does not allow the connection between the peers.

And I am applying this ACL on the network route.

image

image

@bcmmbaga
Copy link
Contributor

bcmmbaga commented Apr 8, 2024

Hello @GascPT, could you confirm whether the peer is part of the source group(s) specified in the access control policy? Also, please check if the peer contains the local network 10.10.0.0/24 by running the command ifconfig -a.

@GascPT
Copy link
Author

GascPT commented Apr 9, 2024

Yes the peer is part of the of the source group.
The policy applies well when this peer is in the network of the Site A 10.10.0.0/24, he doesn't acquire the route.

When the peer goes to another place with another network range different of 10.10.0.0. The posture is not re-checked we need to kick the peer from the group and add him again.

@bcmmbaga
Copy link
Contributor

bcmmbaga commented Apr 9, 2024

The posture could have already been re-checked but failed since the peer still contain the Site A network in it's network interfaces.

Please share the result of ifconfig -a when the peer is connected to site A and when is connected to another network.

@GascPT
Copy link
Author

GascPT commented Apr 9, 2024

When is connect in Site A

enx0826ae3dc148: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.10.0.104  netmask 255.255.255.0  broadcast 10.10.0.255
        inet6 2001:8a0:8015:10:a3cb:1ebc:c782:c7ee  prefixlen 64  scopeid 0x0<global>
        inet6 fe80::5c82:bd12:b67a:3cd6  prefixlen 64  scopeid 0x20<link>
        inet6 2001:8a0:8015:10:d6b4:3f84:fd2a:f94a  prefixlen 64  scopeid 0x0<global>
        ether 08:26:ae:3d:c1:48  txqueuelen 1000  (Ethernet)
        RX packets 136548  bytes 90034413 (90.0 MB)
        RX errors 0  dropped 4878  overruns 0  frame 0
        TX packets 153699  bytes 37866299 (37.8 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

wt0: flags=209<UP,POINTOPOINT,RUNNING,NOARP>  mtu 1280
        inet 100.95.28.93  netmask 255.255.0.0  destination 100.95.28.93
        unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  txqueuelen 1000  (UNSPEC)
        RX packets 865212  bytes 928817092 (928.8 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 510435  bytes 53566908 (53.5 MB)
        TX errors 2957  dropped 78 overruns 0  carrier 0  collisions 0

When is connected in other place via WiFi

wlp0s20f3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.7.154  netmask 255.255.255.0  broadcast 192.168.7.255
        inet6 fe80::cf8b:ae41:418e:d58b  prefixlen 64  scopeid 0x20<link>
        ether 74:04:f1:43:9d:51  txqueuelen 1000  (Ethernet)
        RX packets 12929526  bytes 11159631682 (11.1 GB)
        RX errors 0  dropped 183229  overruns 0  frame 0
        TX packets 1380964  bytes 2762954292 (2.7 GB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

wt0: flags=209<UP,POINTOPOINT,RUNNING,NOARP>  mtu 1280
        inet 100.95.28.93  netmask 255.255.0.0  destination 100.95.28.93
        unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  txqueuelen 1000  (UNSPEC)
        RX packets 6  bytes 544 (544.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 10  bytes 1136 (1.1 KB)
        TX errors 44  dropped 11 overruns 0  carrier 0  collisions 0

I wait a couple of minutes but the peer doesn't acquire the route, neither the connection to the peer in netbird status -d

$ ip route show table netbird
10.0.0.0/24 dev wt0 
10.27.0.0/24 dev wt0 
10.27.10.0/24 dev wt0 
10.27.16.0/24 dev wt0 
10.55.0.0/24 dev wt0 
10.55.10.0/24 dev wt0 
10.55.16.0/24 dev wt0

@bcmmbaga bcmmbaga added the bug Something isn't working label Apr 9, 2024
@bcmmbaga
Copy link
Contributor

bcmmbaga commented Apr 9, 2024

Thanks, this could potentially be a bug, but I will try to reproduce the issue. Can you confirm whether, when connecting to the other network, you did not stop the netbird and run it up again, or did you only change the network?

@GascPT
Copy link
Author

GascPT commented Apr 9, 2024

We did the two situations.
Change networks without stopping the service and the other one, change the networks with the service stopped. The result was the same.

@mlsmaycon
Copy link
Collaborator

This should be fixed with: #1693

@GascPT
Copy link
Author

GascPT commented Apr 29, 2024

Waiting for the release :) to try.

@GascPT GascPT closed this as completed Apr 29, 2024
@GascPT
Copy link
Author

GascPT commented May 2, 2024

The issue persists in the version 0.27.4.

@GascPT GascPT reopened this May 2, 2024
@bcmmbaga
Copy link
Contributor

bcmmbaga commented May 2, 2024

Hi @GascPT, #1693 has not yet been released and is currently under review.

@GascPT
Copy link
Author

GascPT commented May 2, 2024

Sorry didn't see that.

@mohamed-essam
Copy link
Contributor

@bcmmbaga
I was able to reproduce the issue and I think I understand the root cause
When any source peer fails a posture check, the network map is updated for the routing peers, but when it succeeds later, the network map isn't updated for the routing peers, this is resolved for now by doing literally anything that would update the networkMap and sync it to routing peers.

I have some easy to reproduce steps (done on self-hosted 0.34.0):

  1. Have 2 peers each in a separate group (Source and Dest).
  2. Peer with source runs netbird 0.33.0
  3. Peer with Dest runs netbird 0.34.0 (not necessarily).
  4. Only 1 policy, ALL, with no posture check, with Source as Source group, and Dest as Destination.
  5. Check connectivity between peers (should be connected normally).
  6. Add posture check to policy with min. netbird version 0.34.0
  7. Check connectivity between peers (should be no connection).
  8. Update Source peer from 0.33.0 to 0.34.0.
  9. Check connectivity (should have connection, but doesn't).

Source peer logs show

2024-12-04T14:46:40Z INFO [peer: 7NwRG57B/W0XwBtxz8CuFKdiAJV+rDDzWV2N18vMD3A=] client/internal/peer/guard/guard.go:138: start listen for reconnect events...
Peers detail:
 ip-172-31-39-178.netbird.selfhosted:
  NetBird IP: 100.120.178.92
  Public key: 7NwRG57B/W0XwBtxz8CuFKdiAJV+rDDzWV2N18vMD3A=
  Status: Disconnected
  -- detail --
  Connection type: 
  ICE candidate (Local/Remote): -/-
  ICE candidate endpoints (Local/Remote): -/-
  Relay server address: 
  Last connection update: -
  Last WireGuard handshake: -
  Transfer status (received/sent) 0 B/0 B
  Quantum resistance: false
  Routes: -
  Latency: 0s

OS: linux/amd64
Daemon version: 0.34.0
CLI version: 0.34.0
Management: Connected to https://REDACTED:443
Signal: Connected to https://REDACTED:443
Relays: 
  [stun:REDACTED:3478] is Available
  [turn:REDACTED:3478?transport=udp] is Available
  [rels:REDACTED//] is Available
Nameservers: 
FQDN: ip-172-31-32-232.netbird.selfhosted
NetBird IP: 100.120.154.137/16
Interface type: Kernel
Quantum resistance: false
Routes: -
Peers count: 0/1 Connected

Destination peer logs show nothing regarding source peer.

2024-12-04T14:46:40Z ERRO signal/client/grpc.go:413: error while handling message of Peer [key: /rUhBSUgEDUq50dhe9l9cqljamdrSWjkfvns9aM9QFA=] error: [wrongly addressed message /rUhBSUgEDUq50dhe9l9cqljamdrSWjkfvns9aM9QFA=]
2024-12-04T14:46:50Z DEBG client/internal/engine.go:1381: received management probe request, healthy: true
2024-12-04T14:46:50Z DEBG client/internal/engine.go:1373: received signal probe request, healthy: true
2024-12-04T14:46:50Z DEBG util/net/dialer_dial.go:52: Dialing udp REDACTED:3478
2024-12-04T14:46:50Z DEBG client/internal/relay/relay.go:66: stun probe received address from stun:REDACTED:3478: REDACTED:56740
2024-12-04T14:46:50Z DEBG util/net/listener_listen.go:119: Listener resolved IP for REDACTED:3478: REDACTED
2024-12-04T14:46:50Z DEBG client/internal/relay/relay.go:158: turn probe relay address from turn:REDACTED:3478?transport=udp: REDACTED:49302
2024-12-04T14:46:50Z DEBG client/internal/engine.go:1401: received relay probe request, healthy: true
2024-12-04T14:46:50Z DEBG client/internal/engine.go:1408: received wg probe request
2024-12-04T14:47:56Z DEBG client/internal/engine.go:1381: received management probe request, healthy: true
2024-12-04T14:47:56Z DEBG client/internal/engine.go:1373: received signal probe request, healthy: true
2024-12-04T14:47:56Z DEBG util/net/dialer_dial.go:52: Dialing udp REDACTED:3478
2024-12-04T14:47:56Z DEBG client/internal/relay/relay.go:66: stun probe received address from stun:REDACTED:3478: REDACTED:44685
2024-12-04T14:47:56Z DEBG util/net/listener_listen.go:119: Listener resolved IP for REDACTED:3478: REDACTED
2024-12-04T14:47:56Z DEBG client/internal/relay/relay.go:158: turn probe relay address from turn:REDACTED:3478?transport=udp: REDACTED:53835
2024-12-04T14:47:56Z DEBG client/internal/engine.go:1401: received relay probe request, healthy: true
2024-12-04T14:47:56Z DEBG client/internal/engine.go:1408: received wg probe request

and its status shows it doesn't know about Source peer:

Peers detail:
OS: linux/amd64
Daemon version: 0.34.0
CLI version: 0.34.0
Management: Connected to https://REDACTED:443
Signal: Connected to https://REDACTED:443
Relays: 
  [stun:REDACTED:3478] is Available
  [turn:REDACTED:3478?transport=udp] is Available
  [rels://REDACTED:443] is Available
Nameservers: 
FQDN: ip-172-31-39-178.netbird.selfhosted
NetBird IP: 100.120.178.92/16
Interface type: Kernel
Quantum resistance: false
Routes: -
Peers count: 0/0 Connected

@mohamed-essam
Copy link
Contributor

I'm guessing fixing this would require either:

  1. Incrementing network serial when a peer's posture check status changes (and subsequently sync the new network to peers).
  2. Sync peers in a peer's networkMap when its posture check status changes and ignore networkSerial in clients for peer updates (just like this part in engine.go)

In both cases, management needs to track peer-postureCheck status somehow.

What do you think?

@mlsmaycon
Copy link
Collaborator

@mohamed-essam are you running a self-hosted version of the management service?

@mohamed-essam
Copy link
Contributor

@mlsmaycon Yes, running on 0.34.0, made a small fresh install for testing some issues.

@mlsmaycon
Copy link
Collaborator

ok, we included a fix for the posture checks in the 0.34 version of the management server, have you tested that in the upgraded version?

@mohamed-essam
Copy link
Contributor

Yes, just double checked now on management 0.34.0 with the exact scenario above

management-1  | 2024-12-05T12:50:11Z INFO [context: SYSTEM] management/cmd/management.go:337: management server version 0.34.0

Destination Peer Status:

Peers detail:
OS: linux/amd64
Daemon version: 0.34.0
CLI version: 0.34.0
Management: Connected to https://REDACTED:443
Signal: Connected to https://REDACTED:443
Relays: 
  [stun:REDACTED:3478] is Available
  [turn:REDACTED:3478?transport=udp] is Available
  [rels://REDACTED:443] is Available
Nameservers: 
FQDN: ip-172-31-39-178.netbird.selfhosted
NetBird IP: 100.120.178.92/16
Interface type: Kernel
Quantum resistance: false
Routes: -
Peers count: 0/0 Connected

Source Peer Status:

Peers detail:
 ip-172-31-39-178.netbird.selfhosted:
  NetBird IP: 100.120.178.92
  Public key: 7NwRG57B/W0XwBtxz8CuFKdiAJV+rDDzWV2N18vMD3A=
  Status: Disconnected
  -- detail --
  Connection type: 
  ICE candidate (Local/Remote): -/-
  ICE candidate endpoints (Local/Remote): -/-
  Relay server address: 
  Last connection update: -
  Last WireGuard handshake: -
  Transfer status (received/sent) 0 B/0 B
  Quantum resistance: false
  Routes: -
  Latency: 0s

OS: linux/amd64
Daemon version: 0.34.0
CLI version: 0.34.0
Management: Connected to https://REDACTED:443
Signal: Connected to https://REDACTED:443
Relays: 
  [stun:REDACTED:3478] is Available
  [turn:REDACTED:3478?transport=udp] is Available
  [rels://REDACTED:443] is Available
Nameservers: 
FQDN: ip-172-31-32-232.netbird.selfhosted
NetBird IP: 100.120.154.137/16
Interface type: Kernel
Quantum resistance: false
Routes: -
Peers count: 0/1 Connected

@mlsmaycon
Copy link
Collaborator

Thanks for confirming that. We will have a look at it.

@mohamed-essam
Copy link
Contributor

mohamed-essam commented Dec 5, 2024

@mlsmaycon Hello I found the issue:
When peer logs in, UpdateMetaIfNew is called, and it doesn't call updateAccountPeers but updates the Meta, and then when sync is called it sees the meta didn't change and doesn't sync peers

here's some logs from the management service with some added manual logs (based on current main, extra logs from my side are marked with >>> <<<)

management-1  | 2024-12-05T11:47:52Z DEBG [context: GRPC, requestID: f8a4e992-2888-40df-85cb-9097f6c5c7fa] server/grpcserver.go:422: Login request from peer [/rUhBSUgEDUq50dhe9l9cqljamdrSWjkfvns9aM9QFA=] [15.160.166.51]
management-1  | 2024-12-05T11:47:52Z INFO server/peer.go:741: Peer needs login
management-1  | 2024-12-05T11:47:52Z TRAC [peerID: /rUhBSUgEDUq50dhe9l9cqljamdrSWjkfvns9aM9QFA=, context: GRPC, requestID: f8a4e992-2888-40df-85cb-9097f6c5c7fa, accountID: ct6r2p30aofs73dkj2tg] server/sql_store.go:139: acquiring read lock for ID ct6r2p30aofs73dkj2tg
management-1  | 2024-12-05T11:47:52Z TRAC [requestID: f8a4e992-2888-40df-85cb-9097f6c5c7fa, accountID: ct6r2p30aofs73dkj2tg, peerID: /rUhBSUgEDUq50dhe9l9cqljamdrSWjkfvns9aM9QFA=, context: GRPC] server/sql_store.go:122: acquiring write lock for ID /rUhBSUgEDUq50dhe9l9cqljamdrSWjkfvns9aM9QFA=
management-1  | 2024-12-05T11:47:52Z DEBG peer/peer.go:218: >>> Peer ct6rlaj0aofs73dkj300 UI Version is empty <<<
management-1  | 2024-12-05T11:47:52Z DEBG peer/peer.go:227: >>> Peer ct6rlaj0aofs73dkj300 meta updated <<<
management-1  | 2024-12-05T11:47:52Z TRAC [context: GRPC, requestID: f8a4e992-2888-40df-85cb-9097f6c5c7fa, accountID: ct6r2p30aofs73dkj2tg, peerID: /rUhBSUgEDUq50dhe9l9cqljamdrSWjkfvns9aM9QFA=] server/sql_store.go:131: released write lock for ID /rUhBSUgEDUq50dhe9l9cqljamdrSWjkfvns9aM9QFA= in 11.439885ms
management-1  | 2024-12-05T11:47:52Z TRAC [context: GRPC, requestID: f8a4e992-2888-40df-85cb-9097f6c5c7fa, accountID: ct6r2p30aofs73dkj2tg, peerID: /rUhBSUgEDUq50dhe9l9cqljamdrSWjkfvns9aM9QFA=] server/account_request_buffer.go:61: requesting account ct6r2p30aofs73dkj2tg with backpressure
management-1  | 2024-12-05T11:47:52Z TRAC [context: SYSTEM] server/account_request_buffer.go:82: getting account ct6r2p30aofs73dkj2tg in batch took 1.290081ms
management-1  | 2024-12-05T11:47:52Z TRAC [context: GRPC, requestID: f8a4e992-2888-40df-85cb-9097f6c5c7fa, accountID: ct6r2p30aofs73dkj2tg, peerID: /rUhBSUgEDUq50dhe9l9cqljamdrSWjkfvns9aM9QFA=] server/account_request_buffer.go:66: got account with backpressure after 102.264446ms
management-1  | 2024-12-05T11:47:52Z TRAC [context: GRPC, requestID: f8a4e992-2888-40df-85cb-9097f6c5c7fa, accountID: ct6r2p30aofs73dkj2tg, peerID: /rUhBSUgEDUq50dhe9l9cqljamdrSWjkfvns9aM9QFA=] server/sql_store.go:148: released read lock for ID ct6r2p30aofs73dkj2tg in 113.958675ms
management-1  | 2024-12-05T11:47:52Z TRAC [peerID: /rUhBSUgEDUq50dhe9l9cqljamdrSWjkfvns9aM9QFA=, context: GRPC, requestID: 0c38578f-dc55-49b1-accf-8a0b166e483c] server/grpcserver.go:300: acquiring peer lock for ID /rUhBSUgEDUq50dhe9l9cqljamdrSWjkfvns9aM9QFA=
management-1  | 2024-12-05T11:47:52Z TRAC [context: GRPC, requestID: 0c38578f-dc55-49b1-accf-8a0b166e483c, peerID: /rUhBSUgEDUq50dhe9l9cqljamdrSWjkfvns9aM9QFA=] server/grpcserver.go:306: acquired peer lock for ID /rUhBSUgEDUq50dhe9l9cqljamdrSWjkfvns9aM9QFA= in 17.68µs
management-1  | 2024-12-05T11:47:52Z DEBG [peerID: /rUhBSUgEDUq50dhe9l9cqljamdrSWjkfvns9aM9QFA=, context: GRPC, requestID: 0c38578f-dc55-49b1-accf-8a0b166e483c, accountID: ct6r2p30aofs73dkj2tg] server/grpcserver.go:175: Sync request from peer [/rUhBSUgEDUq50dhe9l9cqljamdrSWjkfvns9aM9QFA=] [15.160.166.51]
management-1  | 2024-12-05T11:47:52Z TRAC [context: GRPC, requestID: 0c38578f-dc55-49b1-accf-8a0b166e483c, accountID: ct6r2p30aofs73dkj2tg, peerID: /rUhBSUgEDUq50dhe9l9cqljamdrSWjkfvns9aM9QFA=] server/sql_store.go:139: acquiring read lock for ID ct6r2p30aofs73dkj2tg
management-1  | 2024-12-05T11:47:52Z TRAC [context: GRPC, requestID: 0c38578f-dc55-49b1-accf-8a0b166e483c, accountID: ct6r2p30aofs73dkj2tg, peerID: /rUhBSUgEDUq50dhe9l9cqljamdrSWjkfvns9aM9QFA=] server/sql_store.go:122: acquiring write lock for ID /rUhBSUgEDUq50dhe9l9cqljamdrSWjkfvns9aM9QFA=
management-1  | 2024-12-05T11:47:52Z DEBG peer/peer.go:218: >>> Peer ct6rlaj0aofs73dkj300 UI Version is empty <<<
management-1  | 2024-12-05T11:47:52Z DEBG peer/peer.go:223: >>> Peer ct6rlaj0aofs73dkj300 meta is equal <<<
management-1  | 2024-12-05T11:47:52Z TRAC [context: GRPC, requestID: 0c38578f-dc55-49b1-accf-8a0b166e483c, accountID: ct6r2p30aofs73dkj2tg, peerID: /rUhBSUgEDUq50dhe9l9cqljamdrSWjkfvns9aM9QFA=] server/peer.go:682: >>> peer ct6rlaj0aofs73dkj300 metadata is the same <<<
management-1  | 2024-12-05T11:47:52Z TRAC [context: GRPC, requestID: 0c38578f-dc55-49b1-accf-8a0b166e483c, accountID: ct6r2p30aofs73dkj2tg, peerID: /rUhBSUgEDUq50dhe9l9cqljamdrSWjkfvns9aM9QFA=] server/peer.go:171: saving peer status for peer ct6rlaj0aofs73dkj300 is connected: true
management-1  | 2024-12-05T11:47:52Z DEBG [context: GRPC, requestID: 0c38578f-dc55-49b1-accf-8a0b166e483c, accountID: ct6r2p30aofs73dkj2tg, peerID: /rUhBSUgEDUq50dhe9l9cqljamdrSWjkfvns9aM9QFA=] server/peer.go:121: mark peer ct6rlaj0aofs73dkj300 connected: true

@mohamed-essam
Copy link
Contributor

I tested management image built from the code in #2991 and the scenario is resolved.

@mlsmaycon
Copy link
Collaborator

soon we will have a new container with your fix @mohamed-essam https://github.com/netbirdio/netbird/releases/tag/v0.34.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working client self-hosting
Projects
None yet
Development

No branches or pull requests

5 participants