Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track bytes #79

Merged
merged 6 commits into from
Mar 20, 2021
Merged

Track bytes #79

merged 6 commits into from
Mar 20, 2021

Conversation

ewust
Copy link
Member

@ewust ewust commented Mar 20, 2021

Currently halfPipe() uses io.CopyBuffer() which splices the two sockets together, and we don't hear from either until one closes or errors. That means we don't get to see how many bytes are traversing our app, except in very course grained steps when connections close, resulting in it being difficult to tell if data is flowing.

This fixes that, and switches us to Read/Write with a 32KB buffer, and updates stats whenever we get data. I tested this in a couple configurations, and found that it didn't have a significant performance difference on curveball: 11.6s to download 40MB vs 11.5s for io.CopyBuffer(), and the difference seemed within noise of the experiments. I watched CPU during the experiment with htop and both were below 2%, but perhaps this test is too network-constrained to really tell us what's happening (E.g. https://acln.ro/articles/go-splice suggests the CPU performance is noticeably different).

I would propose we merge this experimentally and see how it impacts our system load on the stations, and revert if it is much worse. If there's no difference, having visibility into bytes transferred seems worth it.

@jmwample jmwample merged commit 55d86ce into master Mar 20, 2021
@jmwample jmwample deleted the ewust/stats-bytes branch March 20, 2021 18:18
jmwample added a commit that referenced this pull request Jul 12, 2021
* first commit towards removing pf_ring submodule

* build PRs to test build process

* remove PF_RING

* remove local PF_RING build steps from CI build

* missed required sudo

* missed apt upate

* missed golang dependency

* last missed change

* the pointer created by Enum was getting garbage collected resulting in a null pointer for registration source

* Two liveness checks when using decoy registration with sharing over API  (#59)

* updated wrapping struct for clientToStations and added a flag to registrations for presccanned

* rust compilation issue fixed

* updated log format for flow IP addresses to accomodate IPv6 which we were butchering before (#60)

* generated protobuf for conjure rust lib with newer protobuf-codegen to prevent compile time warnings

* Update to use accessors and log in the registration api

* typo

* Add Config options for subnet filtering (#61)

* Update detector to send source/decoy addresses to the application over the C2S struct

* log addresses during new registration in application

* implementatoin, but not integrated yet.

* updated addrs, added small tests, implemented check in registration pipeline.

* added checks for nil pointers to prevent crashes and switched to checking Covert which is what this should have been from the start.

* rust compile warning fixes.

* added logging for registrations dropped by blocklist. parse hostport covert format which should always be received from client

* Seems to be implemented properly, still untested (#62)

* Early registration lookup in station tracking (#63)

* Implemented and tested in golang tests, not yet tested in staging

* added validity tag and tracking function to registration manager to track (and check tracking of) new registrations immediately

* fixed mutex deadlock created by calling registrationExists from register, minor fixes. tested in staging - working

* Client IP Loggging (#64)

* add option to moderate client ip logging, default off

* added option parse and client ip logging in registration API

* fixed mistake and removed last client IPs and covert logging

* race condition in registration tracking resulted in small data structure refactor. All regs tracking and retiring properly

* null dereference in registration api and debug line removed from registration tracking

* Defer close of ZMQ sockets

Although we don't technically care about closing the sockets (as
the program will run until either all sockets fail or the process
dies), the Go GC will collect the sockets if there isn't some
reference to them; these defers retain that reference so they aren't
collected.

* Also defer close on pubSock

* Prod Patches (#66)

* update to prevent zmq_proxy crash and to limit concurrent TCP socket use by liveness tests. also allows disable of v4 or v6 to limit unused registrations

* fixes deadlock in most conditions. seems to still lock when all TCP sockets get used up.

* dial liveness with timeout so that connections close and goroutines return in an expected amount of time.

* moved calls capable of blocking out of get_zmq_updates main thread

* Prod patch2 (#67)

* change removeOldRegistrations to only block intermittently instead of whollistically

* tested working on single station staging

* typo

* Prod Patch 3 - Socket usage and Logging cleanup (#68)

* limit connection logging, and limit redis reconnections

* update covert Blocklist to include domains and valid addresses for golang TCP dial

* exercising redis usage through multithreaded testing

* updated pubsub send on golang application side and pubsub receive on detector side with tests

* checkpoint

* validated reg API client address handling behavior with testing

* updated tests and client address handling in application

* If client registers with v6, only create and track registrations for IPv6

* prevent client Address logging and only add v6 registration when client registers using v6

* ensure that a client who registers with v4 will propogate registrations for both v4 and v6, but a client who registers with v6 only propogates v6

* client ip logging based on env var in session logging

* invalidate registrations sent over v6 if they pick a v4 phantom address

* small enhancements to logging

* added loging if zmq_proxy escapes work loops

* add phantom blocklist to station application config including tests

* Fix ZMQ race condition from proxy (#70)

* Fix ZMQ race condition from proxy

* Add test for concurrency on ZMQ proxy

* fixing race condition caused by un-locked totalRegistrations count method. (#71)

* added explicit lock drop in sessions tracking. Passing tests (#72)

* Docker and Related (#73)

* Add Docker support

* Added multistage Dockerfile for each service (except registration)
* Added Readme with a quick start information
* Added detector-entrypoint.sh which includes routing and tun interface configuration (simmilar to on-reboot.sh)
* Added simple entrypoints for other services
* Added docker-compose.yaml with sample configuration and mandatory variables

Major difference is DNAT-ing to localhost instead of public IP by default and enabling net.ipv4.conf.all.route_localnet. This simplifies minimum required configuration.

* Added list of variables

* Change phantom_blocklist behavior (#75)

* Still send phantom blocklisted IPs to registration API, but ignore them locally

* updated comment

Co-authored-by: Jack Wampler <jmwample@users.noreply.github.com>

* Build changes to get conjure working on rockypika (#74)

* add basic install instructions

* add/update zbalance sysconfig for conjure-only

* Add neeeded zbalance configs to conjure.conf

* Add erspan pfring patch to a station-specific repo

* Read/use PARSE_GRE_OFFSET env variable in detector

* Add default (0) PARSE_GRE_OFFSET to conf file

* Update stats in conjure-app (#78)

* Make halfPipe log both up and down stats

* Add stats tracking and periodic reporting

* Track bytes (#79)

* Make halfPipe log both up and down stats

* Add stats tracking and periodic reporting

* break up read/writes so we can track bytes as they are transferred, not just at the end of the connection

* track new instead of absolute local/api regs

* Correct detection for local registrations

Co-authored-by: Jack Wampler <jmwample@users.noreply.github.com>

* Extenalize Phantom Subnets to Configuration File (#76)

* add external file to manage subnet generations so we don't have to rebuild the station to update subnets in use.

* parsing error in V6Only subnet functor filter

* parsing error in V6Only subnet functor filter

* more test cleaning

* more small testing fixes

* added parsing for extended X-Forwarded-For headers to allow proxy API registration in ipv4 (#80)

* Added externalised phantoms file (#81)

* Added default value for PHANTOM_SUBNET_LOCATION environment variable
* Copy test phantom_subnets.toml file to $PHANTOM_SUBNET_LOCATION path
* Added mention of PHANTOM_SUBNET_LOCATION into README.md
* Added a bind mount for a local copy of the phantom_subnets.toml
* Included a copy of the test phantom_subnets.toml file into docker directory

* add cap util

* Quickfix (#82)

* parsing error in V6Only subnet functor filter

* quick fix for types missed in #76

* disentangle API regisrtation counts (#85)

* update default registration tracking timeout from 2 mins to 6 hrs (#88)

* remove inner attribute set by protoc which causes err in cargo build with rust 1.52.1 - temporary fix until protoc has a more stable update (#89)

* first commit towards removing pf_ring submodule

* build PRs to test build process

* remove PF_RING

* remove local PF_RING build steps from CI build

* missed required sudo

* missed apt upate

* missed golang dependency

* last missed change

* readme update (not yet complete)

* Use Ubuntu PF_RING package instead of submodule. (#91)

* Modified .dockerignore to improve effectiveness of docker caching
* Changed Dockerfile to use ntop pfring package
* Added a script to add ntop repository and install pfring and ZC on the host system (Debian and Ubuntu)
* Update docker default phantom_subnets.toml file
* Added some sanity checks into zbalance container
	* Check if hugepages number is 512
	* Check if ZC driver is loaded
	* If check not passed show some hints what can be wrong

* update readme build instructions, modified on-reboot to isolate env vars to conjure.conf, and updated det service to correct bin name

* updated zbalance systemd entrypoint to use zbalance_ipc in PATH instead of local

Co-authored-by: Carson Hoffman <c@rsonhoffman.com>
Co-authored-by: rgennt <kov2novych@gmail.com>
Co-authored-by: Eric Wustrow <ewust@colorado.edu>
Co-authored-by: rgennt <r.kovtunovych@psiphon.ca>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants