Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Milestone 2 #3

Merged
merged 13 commits into from
Aug 15, 2022
Merged

Milestone 2 #3

merged 13 commits into from
Aug 15, 2022

Conversation

scilio
Copy link
Collaborator

@scilio scilio commented Feb 11, 2022

I unintentionally deviated from the project plan somewhat.

Milestone 2 was to contain:

  • 2a. Validation of all incoming data against a live node
  • 2b. Daily aggregation and broadcasting of the finalized transaction
  • 2c. Basic reorg protection
  • 2d. Automated e2e testing

Milestone 3 was to contain:

  • 3a. Data provisioning using encrypted bundles
  • 3b. Inter-server communication
  • 3c. e2e tests that spin up local servers 1-n, demonstrating the workflow against a mock GRIN node

(3a) was implemented in milestone 1 already, which I felt was necessary in order to design the best possible api.

Also, to be sure the code was working correctly, I chose to implement (3c) for a single server in this pull request, without realizing it was part of milestone 3.

Finally, to minimize the review complexity and avoid overwhelming reviewers, I chose not to include (2c) in this pull request.


I'll work with the community council to handle these discrepancies from the PM side, but for review purposes, this PR contains:

  • 2a. Validation of all incoming data against a live node
  • 2b. Daily aggregation and broadcasting of the finalized transaction
  • 2d. Automated e2e testing
  • 3a. Data provisioning using encrypted bundles - Already reviewed
  • 3c. e2e tests that spin up a single local server, demonstrating the workflow against a mock GRIN node

This pull request does not include:

  • 2c. Basic reorg protection
  • 3b. Inter-server communication
  • 3c. e2e tests that spin up multiple local servers, demonstrating the multi-server workflow against a mock GRIN node

Sorry about the confusion 😅

Copy link
Collaborator

@tromp tromp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some in-line remarks...

@phyro
Copy link
Member

phyro commented Apr 14, 2022

@scilio any updates on this?

@scilio scilio changed the title wip: milestone 2 milestone 2 Apr 27, 2022
@scilio scilio changed the title milestone 2 Milestone 2 Apr 27, 2022
@scilio scilio marked this pull request as ready for review April 27, 2022 21:03
@yeastplume
Copy link
Member

Many thanks for the latest updates! I need to give this all a proper look over as well as try what's there for myself, so will likely be a few days.

src/secp.rs Outdated
let mut h = [0; COM_SIGNATURE_SIZE];
for i in 0..cmp::min(v.len(), COM_SIGNATURE_SIZE) {
h[i] = v[i];
pub fn new(pub_nonce: &Commitment, s: &SecretKey, t: &SecretKey) -> ComSignature {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: I think it would be nice to add a brief description of what this does, something along the lines of

P = v*H + r*G  # pubkey

# Sign message M with P
R = k1*H + k2*G             # Random nonce
e = H(M | P | R)            # Compute challenge
s, t = (e*v+k1), (e*r+k2)
sig = (s, t, R)             # This is our signature for M

# Verify signature of message M for P given (s, t, R)
e = H(M | P | R)            # Compute challenge
e*P + R = s*H + t*G         # Verify the equation

I've not checked this actually correctly reflects the code below, just an example of what I had in mind. It may make it easier for someone new reading the code to grasp the underlying idea.

src/secp.rs Outdated
let k_2 = SecretKey::new(&secp, &mut thread_rng());

let commitment = secp.commit(amount, blind.clone())?;
let nonce_commitment = secp.commit_blind(k_1.clone(), k_2.clone())?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: might make sense to rename this to pub_nonce for consistency

@phyro
Copy link
Member

phyro commented Apr 28, 2022

Very glad to see this moving! I unfortunately won't be able to do a thorough review in the next few days as I'll be away from the computer on which I can test this. So I'll only be going through the code until next week.

@davidtavarez
Copy link

davidtavarez commented May 6, 2022

I would suggest to write unit tests for the methods, positive tests and negative tests. Also, if taking tests code into separate files it helps to read the code more easily imo.

}

fn real_main() -> std::result::Result<(), Box<dyn std::error::Error>> {
let yml = load_yaml!("../mwixnet.yml");
Copy link

@davidtavarez davidtavarez May 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe ../mwixnet.yml could be store as global constant somewhere else, and use the absolute path instead of relative.

src/main.rs Outdated
};
let password = args.value_of("pass").ok_or(Error::new(ErrorKind::LoadConfigError))?;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think about create a variable for each arguments you could can validate each arguments before using them, and instead use the variable directly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give an example?

wallet_owner_url: SocketAddr,
}

/// Writes the server config to the config_path given, encrypting the server_key first.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment should ideally be spat out in the config file, similar to grin and grin-wallet (can be done later)

src/server.rs Outdated
}

lazy_static! {
static ref SERVER_STATE: Mutex<Vec<Submission>> = Mutex::new(Vec::new());
static ref SERVER_STATE: Mutex<HashMap<Commitment, Submission>> = Mutex::new(HashMap::new());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a plan to serialize this, or store this in disk or somewhere else in memory in case the process needs to be restarted? Or do we just lose all of it if the process dies (I don't have an answer, just wondering). Also, we'll likely need to think about defining some limits here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto recovery from process interruptions seems like a useful function to have. It's only the first node which would really need to save to disk. The others only need to be active when the round is executing, thus won't have multiple hours worth of cache entries. I can do this as a standalone PR between milestones 2 and 3.

I'll add limits and make them configurable. When the limits are hit, shall we just stop accepting swap requests? That's okay, but probably not great. Alternative is to automatically execute the round when the limit is hit. There could be privacy implications of that, but it's effectively what Wasabi does, so there's precedent for it. Do you have a preference?

@yeastplume
Copy link
Member

A few specific points that came to mind as I went through it above; note I haven't gone through any encryption/decryption or primitive manipulation code in detail, I know a few others have already focused on it and I'll have a closer look at those specifically as we go.

Putting milestones aside, what we seem to have in place at the moment is:

  • Some library type code containing primitives and structs that will needed to be support mixing operations. What's there seems good and supported by adequate unit testing.
  • Some rudimentary config/error handing and other boilerplate (some specific comments above)
  • A rudimentary server that collects swap requests, then merges them all into a single transaction and joins them at configured intervals. I wasn't able to observe this in action as the server won't start up due to the wallet communication initiation issue outlined above.

What's there so far seems fine, but unfortunately I think we're still a good way off of having a deployable service here. Aside from the basic plumbing stuff (config, error handling, documentation), the guts of the service, e.g. the sorting/shuffling and the back and forth interactions between the servers is still largely undefined. This is the meat of the service, and given the multi-server nature of it is the part that needs the most engineering consideration, discussion and testing. I know we have a line item that says 'multi-server coming in final milestone', but in my mind this is not a case of 'just add multiple server support and we're done'. This is most of the service and it's still largely outstanding.

I know much of this is not your fault, and I'm going to take up the expectations and management of this project up with the rest of the community. I also know that the community has been under-resourced and a little bit inactive in the past, so it may have been difficult to get answers or have discussions. But I think we're getting over this and are starting to see a resumption of activity. So if we're to move forward I would like to see:

  • Smaller, more focused PRs on an ongoing basis rather than large code dumps
  • Much more engagement with the rest of the community as the project develops
  • More definition around the tasks remaining. We don't need up-front design docs, but continual engagement with the rest of the community on the issues and decisions as they're being taken works just as well.

@scilio
Copy link
Collaborator Author

scilio commented Jun 16, 2022

I know we have a line item that says 'multi-server coming in final milestone', but in my mind this is not a case of 'just add multiple server support and we're done'. This is most of the service and it's still largely outstanding.

I think you may be overlooking the several parts which were built for multi-server support. The most obvious example is the onion module, with its creation and peeling logic that tell quite a bit about the roles each server will play. Plenty of thought and discussion went into how that would work, and although we only have the ability to host it on a single server for now, that code is all tested e2e for the multi-server workflow.

What's not yet implemented is the passing back-and-forth of matrices, and the short-lived caches necessary to perform the mixing. But these are well-defined and understood, as they've been documented and discussed thoroughly in John's CoinSwap proposal. This comment uses an example to describe the workflow in full detail.

I agree that getting the servers working together will not be trivial, but to say that we haven't spent time defining or discussing it is not an accurate statement. I have a very clear vision for what that communication will look like, even if the documentation for it is not yet as organized as we would like.

@davidtavarez
Copy link

I'm trying to run it with a testnet node and wallet:

$ RUST_BACKTRACE=1 ./mwixnet --config_file "/Users/david/.grin/main/mwixnet-config.toml"

Server password:
Wallet password:
Opening wallet at 127.0.0.1:3420
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error { inner:    0: backtrace::backtrace::libunwind::trace
             at /Users/david/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.62/src/backtrace/libunwind.rs:93:5
      backtrace::backtrace::trace_unsynchronized
             at /Users/david/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.62/src/backtrace/mod.rs:66:5
   1: backtrace::backtrace::trace
             at /Users/david/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.62/src/backtrace/mod.rs:53:14
   2: backtrace::capture::Backtrace::create
             at /Users/david/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.62/src/capture.rs:176:9
   3: backtrace::capture::Backtrace::new_unresolved
             at /Users/david/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.62/src/capture.rs:170:9
   4: failure::backtrace::internal::InternalBacktrace::new
             at /Users/david/.cargo/registry/src/github.com-1ecc6299db9ec823/failure-0.1.8/src/backtrace/internal.rs:46:44
   5: failure::backtrace::Backtrace::new
             at /Users/david/.cargo/registry/src/github.com-1ecc6299db9ec823/failure-0.1.8/src/backtrace/mod.rs:121:35
   6: failure::context::Context<D>::new
             at /Users/david/.cargo/registry/src/github.com-1ecc6299db9ec823/failure-0.1.8/src/context.rs:84:40
   7: <mwixnet::error::Error as core::convert::From<grin_api::rest::Error>>::from
             at /Users/david/Projects/mwixnet/src/error.rs:126:11
   8: <core::result::Result<T,F> as core::ops::try_trait::FromResidual<core::result::Result<core::convert::Infallible,E>>>::from_residual
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/result.rs:2105:27
   9: mwixnet::wallet::HttpWallet::send_json_request
             at /Users/david/Projects/mwixnet/src/wallet.rs:189:13
  10: mwixnet::wallet::HttpWallet::init_secure_api
             at /Users/david/Projects/mwixnet/src/wallet.rs:143:4
  11: mwixnet::wallet::HttpWallet::open_wallet
             at /Users/david/Projects/mwixnet/src/wallet.rs:111:20
  12: mwixnet::real_main
             at /Users/david/Projects/mwixnet/src/main.rs:120:15
  13: mwixnet::main
             at /Users/david/Projects/mwixnet/src/main.rs:28:2
  14: core::ops::function::FnOnce::call_once
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/ops/function.rs:248:5
  15: std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/sys_common/backtrace.rs:122:18
  16: std::rt::lang_start::{{closure}}
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/rt.rs:145:18
  17: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/ops/function.rs:280:13
      std::panicking::try::do_call
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:492:40
      std::panicking::try
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:456:19
      std::panic::catch_unwind
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panic.rs:137:14
      std::rt::lang_start_internal::{{closure}}
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/rt.rs:128:48
      std::panicking::try::do_call
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:492:40
      std::panicking::try
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:456:19
      std::panic::catch_unwind
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panic.rs:137:14
      std::rt::lang_start_internal
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/rt.rs:128:20
  18: std::rt::lang_start
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/rt.rs:144:17
  19: _main


GRIN API Error: Request error: Wrong response code: 401 Unauthorized with data Body(Empty) }', src/main.rs:28:17
stack backtrace:
   0: rust_begin_unwind
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:142:14
   2: core::result::unwrap_failed
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/result.rs:1785:5
   3: core::result::Result<T,E>::unwrap
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/result.rs:1078:23
   4: mwixnet::main
             at /Users/david/Projects/mwixnet/src/main.rs:28:2
   5: core::ops::function::FnOnce::call_once
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

This is my config file:

encrypted_key = "0212128e04473c1609022a988a84a341594eae46ab4178c4d828667101ba694db11e9a91879d2031282a99c8c5b3ee1c"
salt = "be454ca8543f3439"
nonce = "53067d9b453378aeb784d3f1"
interval_s = 43200
addr = "0.0.0.0:3000"
grin_node_url = "127.0.0.1:13413"
grin_node_secret_path = "/Users/david/.grin/test/.api_secret"
wallet_owner_url = "127.0.0.1:3420"
wallet_owner_secret_path = "/Users/david/.grin/test/.owner_api_secret"

@davidtavarez
Copy link

davidtavarez commented Aug 4, 2022

@scilio trying to consume the API without success:

echo '{"jsonrpc": "2.0", "id": 1, "method" : "swap", "params": [{}]}' | http POST http://127.0.0.1:8990/v1


http: error: Request timed out (0s).

I'm running the service like this (changed port to 8990):

RUST_BACKTRACE=1 ./target/debug/mwixnet --config_file "/Users/david/.grin/main/mwixnet-config.toml"
Server password:
Wallet password:
Opening wallet at 127.0.0.1:3420
Connected to wallet
Server listening on 0.0.0.0:8990

@scilio
Copy link
Collaborator Author

scilio commented Aug 4, 2022

@davidtavarez It's working for me:

Server password: 
Wallet password: 
Opening wallet at 127.0.0.1:3420
Connected to wallet
Server listening on 0.0.0.0:8990
echo '{"jsonrpc": "2.0", "id": 1, "method" : "swap", "params": [{}]}' | http POST http://127.0.0.1:8990/v1
HTTP/1.1 200 OK
content-length: 100
content-type: application/json; charset=utf-8
date: Thu, 04 Aug 2022 15:35:53 GMT

{
    "error": {
        "code": -32602,
        "message": "Invalid params: missing field `onion`."
    },
    "id": 1,
    "jsonrpc": "2.0"
}

Maybe a firewall issue? Do you see it listening if you run sudo netstat -tunlp | grep 8990

@scilio scilio closed this Aug 4, 2022
@scilio scilio reopened this Aug 4, 2022
Copy link
Member

@phyro phyro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm I get the same result as @scilio

curl -d '{"jsonrpc": "2.0", "id": 1, "method" : "swap", "params": [{}]}' -H "Content-Type: application/json" -X POST http://127.0.0.1:3000/v1                                  11:57:56
{"jsonrpc":"2.0","error":{"code":-32602,"message":"Invalid params: missing field `onion`."},"id":1}

Does anyone happen to have code that creates a Swap request for a commitment in the wallet to try this out?

use std::path::PathBuf;

const GRIN_HOME: &str = ".grin";
const CHAIN_NAME: &str = "main";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we add an option for testnet config

./mwixnet init-config --testnet

This would also mean changing a bit the ports since the default node port 13414

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be fine.

@yeastplume
Copy link
Member

I was able to get it running and take a closer look at the onion wrapping, and I see the changes done for error handling and other engineering concerns, so thank you for all of those updates.

I agree that getting the servers working together will not be trivial, but to say that we haven't spent time defining or discussing it is not an accurate statement. I have a very clear vision for what that communication will look like, even if the documentation for it is not yet as organized as we would like.

I understand that the protocol and the role of each server has been well discussed and defined, but I'm taking more about the engineering side of things. What happens if the middle or last servers aren't available? What if messages get lost cause a server is down temporarily. Do servers store data in-memory or cache? Are there alerts when one goes down? Are messages retried? How often are they retried? Those sorts of details that take time and effort to finalise.

I'm happy to see this merged and appreciate the work done to date. But moving forward, I'll echo again what I said above:

  • A more detailed delivery plan should be worked out instead of whatever had been agreed constitutes 'phase 3'.
  • Future work should be done in much smalller, more atomic PRs on a more frequent basis. It is very difficult to review or even coordinate reviews of extremely large PRs which leads to a lot of delay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants