-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prod Patches #66
Merged
Merged
Prod Patches #66
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…use by liveness tests. also allows disable of v4 or v6 to limit unused registrations
…ockets get used up.
…eturn in an expected amount of time.
jmwample
added a commit
that referenced
this pull request
Jul 12, 2021
* first commit towards removing pf_ring submodule * build PRs to test build process * remove PF_RING * remove local PF_RING build steps from CI build * missed required sudo * missed apt upate * missed golang dependency * last missed change * the pointer created by Enum was getting garbage collected resulting in a null pointer for registration source * Two liveness checks when using decoy registration with sharing over API (#59) * updated wrapping struct for clientToStations and added a flag to registrations for presccanned * rust compilation issue fixed * updated log format for flow IP addresses to accomodate IPv6 which we were butchering before (#60) * generated protobuf for conjure rust lib with newer protobuf-codegen to prevent compile time warnings * Update to use accessors and log in the registration api * typo * Add Config options for subnet filtering (#61) * Update detector to send source/decoy addresses to the application over the C2S struct * log addresses during new registration in application * implementatoin, but not integrated yet. * updated addrs, added small tests, implemented check in registration pipeline. * added checks for nil pointers to prevent crashes and switched to checking Covert which is what this should have been from the start. * rust compile warning fixes. * added logging for registrations dropped by blocklist. parse hostport covert format which should always be received from client * Seems to be implemented properly, still untested (#62) * Early registration lookup in station tracking (#63) * Implemented and tested in golang tests, not yet tested in staging * added validity tag and tracking function to registration manager to track (and check tracking of) new registrations immediately * fixed mutex deadlock created by calling registrationExists from register, minor fixes. tested in staging - working * Client IP Loggging (#64) * add option to moderate client ip logging, default off * added option parse and client ip logging in registration API * fixed mistake and removed last client IPs and covert logging * race condition in registration tracking resulted in small data structure refactor. All regs tracking and retiring properly * null dereference in registration api and debug line removed from registration tracking * Defer close of ZMQ sockets Although we don't technically care about closing the sockets (as the program will run until either all sockets fail or the process dies), the Go GC will collect the sockets if there isn't some reference to them; these defers retain that reference so they aren't collected. * Also defer close on pubSock * Prod Patches (#66) * update to prevent zmq_proxy crash and to limit concurrent TCP socket use by liveness tests. also allows disable of v4 or v6 to limit unused registrations * fixes deadlock in most conditions. seems to still lock when all TCP sockets get used up. * dial liveness with timeout so that connections close and goroutines return in an expected amount of time. * moved calls capable of blocking out of get_zmq_updates main thread * Prod patch2 (#67) * change removeOldRegistrations to only block intermittently instead of whollistically * tested working on single station staging * typo * Prod Patch 3 - Socket usage and Logging cleanup (#68) * limit connection logging, and limit redis reconnections * update covert Blocklist to include domains and valid addresses for golang TCP dial * exercising redis usage through multithreaded testing * updated pubsub send on golang application side and pubsub receive on detector side with tests * checkpoint * validated reg API client address handling behavior with testing * updated tests and client address handling in application * If client registers with v6, only create and track registrations for IPv6 * prevent client Address logging and only add v6 registration when client registers using v6 * ensure that a client who registers with v4 will propogate registrations for both v4 and v6, but a client who registers with v6 only propogates v6 * client ip logging based on env var in session logging * invalidate registrations sent over v6 if they pick a v4 phantom address * small enhancements to logging * added loging if zmq_proxy escapes work loops * add phantom blocklist to station application config including tests * Fix ZMQ race condition from proxy (#70) * Fix ZMQ race condition from proxy * Add test for concurrency on ZMQ proxy * fixing race condition caused by un-locked totalRegistrations count method. (#71) * added explicit lock drop in sessions tracking. Passing tests (#72) * Docker and Related (#73) * Add Docker support * Added multistage Dockerfile for each service (except registration) * Added Readme with a quick start information * Added detector-entrypoint.sh which includes routing and tun interface configuration (simmilar to on-reboot.sh) * Added simple entrypoints for other services * Added docker-compose.yaml with sample configuration and mandatory variables Major difference is DNAT-ing to localhost instead of public IP by default and enabling net.ipv4.conf.all.route_localnet. This simplifies minimum required configuration. * Added list of variables * Change phantom_blocklist behavior (#75) * Still send phantom blocklisted IPs to registration API, but ignore them locally * updated comment Co-authored-by: Jack Wampler <jmwample@users.noreply.github.com> * Build changes to get conjure working on rockypika (#74) * add basic install instructions * add/update zbalance sysconfig for conjure-only * Add neeeded zbalance configs to conjure.conf * Add erspan pfring patch to a station-specific repo * Read/use PARSE_GRE_OFFSET env variable in detector * Add default (0) PARSE_GRE_OFFSET to conf file * Update stats in conjure-app (#78) * Make halfPipe log both up and down stats * Add stats tracking and periodic reporting * Track bytes (#79) * Make halfPipe log both up and down stats * Add stats tracking and periodic reporting * break up read/writes so we can track bytes as they are transferred, not just at the end of the connection * track new instead of absolute local/api regs * Correct detection for local registrations Co-authored-by: Jack Wampler <jmwample@users.noreply.github.com> * Extenalize Phantom Subnets to Configuration File (#76) * add external file to manage subnet generations so we don't have to rebuild the station to update subnets in use. * parsing error in V6Only subnet functor filter * parsing error in V6Only subnet functor filter * more test cleaning * more small testing fixes * added parsing for extended X-Forwarded-For headers to allow proxy API registration in ipv4 (#80) * Added externalised phantoms file (#81) * Added default value for PHANTOM_SUBNET_LOCATION environment variable * Copy test phantom_subnets.toml file to $PHANTOM_SUBNET_LOCATION path * Added mention of PHANTOM_SUBNET_LOCATION into README.md * Added a bind mount for a local copy of the phantom_subnets.toml * Included a copy of the test phantom_subnets.toml file into docker directory * add cap util * Quickfix (#82) * parsing error in V6Only subnet functor filter * quick fix for types missed in #76 * disentangle API regisrtation counts (#85) * update default registration tracking timeout from 2 mins to 6 hrs (#88) * remove inner attribute set by protoc which causes err in cargo build with rust 1.52.1 - temporary fix until protoc has a more stable update (#89) * first commit towards removing pf_ring submodule * build PRs to test build process * remove PF_RING * remove local PF_RING build steps from CI build * missed required sudo * missed apt upate * missed golang dependency * last missed change * readme update (not yet complete) * Use Ubuntu PF_RING package instead of submodule. (#91) * Modified .dockerignore to improve effectiveness of docker caching * Changed Dockerfile to use ntop pfring package * Added a script to add ntop repository and install pfring and ZC on the host system (Debian and Ubuntu) * Update docker default phantom_subnets.toml file * Added some sanity checks into zbalance container * Check if hugepages number is 512 * Check if ZC driver is loaded * If check not passed show some hints what can be wrong * update readme build instructions, modified on-reboot to isolate env vars to conjure.conf, and updated det service to correct bin name * updated zbalance systemd entrypoint to use zbalance_ipc in PATH instead of local Co-authored-by: Carson Hoffman <c@rsonhoffman.com> Co-authored-by: rgennt <kov2novych@gmail.com> Co-authored-by: Eric Wustrow <ewust@colorado.edu> Co-authored-by: rgennt <r.kovtunovych@psiphon.ca>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issues
This handles a number of issues that were discovered spontaneously in production
go doesn't automatically clean up sockets on interrupt in
zmq_proxy
, which can cause the zmq_proxy thread to crash. This will cause the station to stop ingesting registrations, missing all connectionsThe liveness testing is turned up too high. This results in too many tcp sockets being used concurrently.
Data race in registration tracking because of early tracking and lookup added in Early registration lookup in station tracking #63
too many files open
causing things to fail with locks, causing deadlock on golang station.Solutions
To fix the socket cleanup a defered close is added for the sockets in both directions. This seems to prevent the sockets ending up in a state where they fail to connect and crash the station.
The number of parallel tcp connections per registration is turned down from 8 to 3. This lowers the absolute confidence that any given phantom is not live, but it should still be sufficient. Also, currenlty no production station supports v6 connection so all ipv6 registrations are wasted. This adds an option to the station config to disable ipv6, so registrations won't be ingested, tracked or have liveness tests performed. This cuts the current number of liveness probes in half as clients always register both v4 and v6 currently but no stations support it.
To prevent the
RegisteredDecoys
struct from stepping on its own mutex, I have added "private" functions fortrack
andregistrationExists
which are used by other functions ofRegisteredDecoys
(i.e. both are used by register). "Public" function equivalentsTrack
andRegistrationExists
have been added to be used by external structs so that the mutex is never stepped on. This seems to work for this issue, preventing both data races and deadlock by mutex abuse.