Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Precharge fn loading gas cost #6772

Merged
merged 18 commits into from
Jun 2, 2022

Conversation

jakmeier
Copy link
Contributor

@jakmeier jakmeier commented May 9, 2022

The code prepartion steps for executing a smart contract function call
used to be charged after the work has been done. This protocol change
reorders this.

The effect of the procotocol change is that when a function call fails
during contract loading, it will now have a non-zero gas cost.

Incidentally, this change also brings wasmtime failure behavior
during loading in-line with wasmer0 and wasmer2.

Test plan

Tests for this feature must ensure several things.

  1. The change does not affect the behavior with disabled feature.
  2. Enabled feature has the desired effect of charging gas for
    loading code even when contract loading fails on new versions.
  3. Future changes do not accidentally make old versions incompatible.

Point 1 is mostly covered by existing tests that already check that
gas is zero during failures of all kinds. These tests on the stable
version are unchanged but the nightly_protocol_feature version checks
that the expected amount of gas is charged.

The test test_gas_exceed_loadinghas been added to also check the
scenario where gas runs out during contract loading.

Upon feature stabilization, these tests will only check the new
behaviour, therefore they cannot cover point 3.

Point 2 and 3 are covered with new tests in the module
fix_contract_loading_cost_protocol_upgrade
which compare the results of different control flows in old/new version.

The code prepartion steps for executing a smart contract function call
used to be charged after the work has been done. This protocol change
reorders this.

The effect of the procotocol change is that when a function call fails
during contract loading, it will now have a non-zero gas cost.

Incidentally, this change also brings wasmtime failure behavior
during loading in-line with wasmer0 and wasmer2.

Test plan
---------
Tests for this feature must ensure several things.
1. The change does not affect the behavior with disabled feature.
2. Enabled feature has the desired effect of charging gas for
loading code even when contract loading fails on new versions.
3. Future changes do not accidentally make old versions incompatible.

Point 1 is mostly covered by existing tests that already check that
gas is zero during failures of all kinds. These tests on the stable
version are unchanged but the nightly_protocol_feature version checks
that the expected amount of gas is charged.

The test `test_gas_exceed_loading`has been added to also check the
scenario where gas runs out during contract loading.

Upon feature stabilization, these tests will only check the new
behaviour, therefore they cannot cover point 3.

Point 2 and 3 are covered with new tests in the module
`fix_contract_loading_cost_protocol_upgrade`
which compare the results of different control flows in old/new version.
@jakmeier jakmeier requested review from nagisa and matklad May 9, 2022 14:44
@jakmeier jakmeier requested a review from a team as a code owner May 9, 2022 14:44
@jakmeier
Copy link
Contributor Author

jakmeier commented May 9, 2022

Hey, so I have been working on this for a while and I think I could really use some feedback at this stage.

In particular, I am undecided about adding gas_counter as an argument to VM::run(). The motivation is that I believe it makes sense to have a way to charge gas for the function call execution before entering VM specific code. I tried a few ways to do tgat but didn't really like any of the options. In this PR, you can see the version I dislike the least, which includes said gas_counter argument. Maybe someone has a better idea or maybe you generally disagree with my goal to charge this from outside VM specific code.

I can always just "inline" this code into the three implementations to avoid the extra argument and still implement the change. But I personally tend towards deduplicating the code even if it means we have to change the interface. Curious to hear what other people think.

Copy link
Contributor

@matklad matklad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a first-pass review! At a glance, this looks about right! I want to take a closer look at two things:

  • run_with_vm_kind feels like it probably should be a method of the VM instead -- we sepcifically removed such a function a while ago
  • we had to do a bunch of changes to test code -- I wish we didn't have to. I think that's the pre-existing effect of our testing infra being fragile, but maybe there's something smart we can do here.

Not approving yet, just wanted to give a first batch of encouraging comments!

@@ -2663,6 +2653,7 @@ impl<'a> VMLogic<'a> {
}
}

#[cfg(not(feature = "protocol_feature_fix_contract_loading_cost"))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should remove this fn from logic altogether, and just open-code it in the runner? Or even add as a free-standing fn inside the runner, than we'll be able to re-use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I mean the gas_counter is a private field of VMLogic and probably should remain private. So I don't think I can easily move that code out of this module.

Comment on lines -533 to -530
let artifact =
cache::wasmer2_cache::compile_module_cached_wasmer2(code, &self.config, cache);
let artifact = match into_vm_result(artifact) {
Ok(it) => it,
Err(err) => return VMResult::nop_outcome(err),
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, so this case is actually interesting I think. It's not clear to me if we should charge contract loading cost if compilation error happens, as we don't actually load the contract in this case.

Hypothetically, if, at some point, we add shape-dependent loading costs (based, eg, on the number of fns in the contract), than we'll have to charge this cost after compilation. This seems to be cleaner in some way.

OTOH, the current strategy of just always charging this cost seems much simpler. And, even with shape-dependent costs and JIT compilation we might want to charge flat cost before we start compiling, and than shaped cost after we parsed WASM.

So, I guess the current approach is a good one, but, still, an interesting find I didn't realise before!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we don't necessarily have to charge the loading fee in case of a compilation error. But I to keep it simple until having an actual good reason to make it more complex.

even with shape-dependent costs and JIT compilation we might want to charge flat cost before we start compiling

Completely agreed. This is the main justification for me why doing it this way really doesn't seem to hurt.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we can't be sure if a FunctionCall will result in a compilation or not, and we AFAIU don't want conditionally or unconditionally charge the possibly very significant fees for every function invocation, it makes complete sense to me to just charge the “executable loading” fee before compilation.

@matklad matklad requested a review from akhi3030 May 13, 2022 15:54
@stale
Copy link

stale bot commented May 27, 2022

This PR has been automatically marked as stale because it has not had recent activity in the 2 weeks.
It will be closed in 3 days if no further activity occurs.
Thank you for your contributions.

@stale stale bot added the S-stale label May 27, 2022
@stale stale bot removed S-stale labels May 31, 2022
@jakmeier jakmeier requested a review from matklad May 31, 2022 09:56
@jakmeier
Copy link
Contributor Author

Did a first-pass review! At a glance, this looks about right! I want to take a closer look at two things:

* `run_with_vm_kind` feels like it probably should be a method of the VM instead -- we sepcifically removed such a function a while ago

* we had to do a bunch of changes to test code -- I wish we didn't have to. I _think_ that's the pre-existing effect of our testing infra being fragile, but maybe there's something smart we can do here.

Not approving yet, just wanted to give a first batch of encouraging comments!

Can you take another look please? :)

My position hasn't change much here: I don't like adding the argument to VM::run() (and how we have to touch all tests as a result of it) but I don't see a better way forward right now.

jakmeier added 6 commits June 1, 2022 11:28

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@jakmeier
Copy link
Contributor Author

jakmeier commented Jun 1, 2022

Thanks to a helpful discussion with @matklad, I think I managed to simplify this PR a lot in terms of files and LOC changed. It should be much easier to review.

Also pinging @nagisa, as I told you before that you should probably wait before reviewing this. I think now it is in a decent state to review.

Copy link
Collaborator

@nagisa nagisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall approach LGTM, see a bunch of nits inline, nothing blocking.

/// Perform VM independent checks that happen after the instantiation of
/// VMLogic but before loading the code. This includes pre-charging gas
/// costs for loading the code.
pub fn before_code_loading(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is somewhat ambiguous, there are two interpretations that come to mind:

  • before loading contract web assembly code from the database;
  • before loading the compiled machine code for contract into memory.

The comment here does not really clarify which one it is, either. One can intuit that it is the 2nd from the fact that this method accepts the wasm code length, but also my initial reaction was “why is this called before_code_loading if it needs wasm code to be already loaded?”.

I wonder if there's a better name to be had here. “loading” portion is correct, so maybe changing “code” to “executable” or something along those lines would work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, the naming is indeed confusing. I like "executable" to make it clearer.

Comment on lines -533 to -530
let artifact =
cache::wasmer2_cache::compile_module_cached_wasmer2(code, &self.config, cache);
let artifact = match into_vm_result(artifact) {
Ok(it) => it,
Err(err) => return VMResult::nop_outcome(err),
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we can't be sure if a FunctionCall will result in a compilation or not, and we AFAIU don't want conditionally or unconditionally charge the possibly very significant fees for every function invocation, it makes complete sense to me to just charge the “executable loading” fee before compilation.

jakmeier added 2 commits June 1, 2022 16:02
Reason: It is an expected change that compilation failures (or cache
errors for that matter) will burn the loading gas in new versions.
Copy link
Collaborator

@akhi3030 akhi3030 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please feel free to ignore all my suggestions if you don't think they are handy. I am just trying my hand at reviewing.

Comment on lines 2682 to 2683
/// A smarter fee could be added, although that would have to happen slightly
/// later, as this dumb fee can be pre-charged without looking at the code.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does slightly later here mean in a future version of the code or later in the process of loading the contract? It is not super clear to me and I wonder if we should consider clearing it up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Later, as in after the loading. Because we need the information from the loading phase to know the structure.

I rephrased it, I hope it is clearer now.

)
.outcome_error()

if let Some(runtime) = vm_kind.runtime(config) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should we consider keeping the previous style namely because it has fewer indentations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, not sure why I changed it. It happened after several round of refactoring around.

Comment on lines 138 to 139
/// Includes some hard-coded parameter values that would have to be updated
/// if they change in the future.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a brief comment as to how these hardcoded values were derived please? To help newbies like myself 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I elaborated it some more. But the key here is that parameter is a technical term. It is a runtime parameter that is part of protocol configuration.

Copy link
Contributor

@matklad matklad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jakmeier
Copy link
Contributor Author

jakmeier commented Jun 2, 2022

I am going to merge this now, as it the change itself has been reviewed sufficiently.

But before stabilising, it has been pointed out to me that I should add a protocol-level test as well.

@near-bulldozer near-bulldozer bot merged commit cb429d7 into near:master Jun 2, 2022
nikurt pushed a commit that referenced this pull request Jun 2, 2022
The code prepartion steps for executing a smart contract function call
used to be charged after the work has been done. This protocol change
reorders this.

The effect of the procotocol change is that when a function call fails
during contract loading, it will now have a non-zero gas cost.

Incidentally, this change also brings wasmtime failure behavior
during loading in-line with wasmer0 and wasmer2.

Test plan
---------
Tests for this feature must ensure several things.
1. The change does not affect the behavior with disabled feature.
2. Enabled feature has the desired effect of charging gas for
loading code even when contract loading fails on new versions.
3. Future changes do not accidentally make old versions incompatible.

Point 1 is mostly covered by existing tests that already check that
gas is zero during failures of all kinds. These tests on the stable
version are unchanged but the nightly_protocol_feature version checks
that the expected amount of gas is charged.

The test `test_gas_exceed_loading`has been added to also check the
scenario where gas runs out during contract loading.

Upon feature stabilization, these tests will only check the new
behaviour, therefore they cannot cover point 3.

Point 2 and 3 are covered with new tests in the module
`fix_contract_loading_cost_protocol_upgrade`
which compare the results of different control flows in old/new version.
jakmeier added a commit to jakmeier/nearcore that referenced this pull request Jul 12, 2022
# Feature to stabilize
The gas cost for loading a contract executable used to be charged afterwards. However, the general rule for gas charges is that they should be charged before the cost occurs. This is only noticable when the execution fails during that step. But it is nevertheless important, as an attacker could potentially abuse this to fill up block space without being charged for it.

Note that "loading" in this context is a technical term for preparing and populating all memory regions for an executable. Reading the data from the database is NOT included in this.

On a side note, the actual gas parameter values have not been updated, yet. We know that `wasm_contract_loading_bytes` in particular is too low. But it is counter balanced by `action_function_call` which is higher than it needs to be. Stabilizing this feature paves the floor to adjust those parameters.

Main implementation PR: near#6772
Main issue: near#5962

# Testing and QA
## Tests
The original PR (near#6772) includes tests on the VM level that ensure old versions are unaffeceted while new versions charge the cost correctly.
Integration tests in near#7169 checks the same again but on the apply-block level.

## Impact on existing smart contracts
This change does not affect anything except for function calls that fail during preparation. Since those were already failing, it should not impact anyone if the error type and amount of gas charged changes.

# Checklist
- [ ] TODO Link to nightly nayduck run (`./scripts/nayduck.py`, [docs](https://github.com/near/nearcore/blob/master/nightly/README.md#scheduling-a-run)):  https://nayduck.near.org/
- [x] Update CHANGELOG.md to include this protocol feature in the `Unreleased` section.
jakmeier added a commit that referenced this pull request Jul 15, 2022
# Feature to stabilize
The gas cost for loading a contract executable used to be charged afterwards. However, the general rule for gas charges is that they should be charged before the cost occurs. This is only noticeable when the execution fails during that step. But it is nevertheless important, as an attacker could potentially abuse this to fill up block space without being charged for it.

Note that "loading" in this context is a technical term for preparing and populating all memory regions for an executable. Reading the data from the database is NOT included in this.

On a side note, the actual gas parameter values have not been updated, yet. We know that `wasm_contract_loading_bytes` in particular is too low. But it is counter balanced by `action_function_call` which is higher than it needs to be. Stabilizing this feature paves the floor to adjust those parameters.

Main implementation PR: #6772
Main issue: #5962

# Testing and QA
## Tests
The original PR (#6772) includes tests on the VM level that ensure old versions are unaffected while new versions charge the cost correctly.
Integration tests in #7169 checks the same again but on the apply-block level.

## Impact on existing smart contracts
This change does not affect anything except for function calls that fail during preparation. Since those were already failing, it should not impact anyone if the error type and amount of gas charged changes.

# Checklist
- [x] Link to nightly nayduck: https://nayduck.near.org/#/run/2562
- [x] Update CHANGELOG.md to include this protocol feature in the `Unreleased` section.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants