Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simulate: resource population #6015

Open
wants to merge 48 commits into
base: master
Choose a base branch
from

Conversation

joe-p
Copy link
Contributor

@joe-p joe-p commented Jun 5, 2024

Summary

When a user calls simulate with UnnamedResources enabled, simulate should suggest to the user how they can populate the resource arrays in their transactions to properly send the transaction group to the network.

Test Plan

  • Test ResourcePopulator works with simple local (not group sharing) resources
  • Test ResourcePopulator with group sharing
  • Test ResourcePopulator resource limit detection with group sharing (ie. it is able to find the correct transaction to put a resource in)
  • Test Simulate with ResourcePopulator functionality
  • Test /simulate endpoint with ResourcePopulator functionality
  • Write smaller tests for better ledger/simulation/resources.go coverage

@joe-p joe-p changed the title Feat/populate_resources resource population Jun 5, 2024
@joe-p joe-p force-pushed the feat/populate_resources branch from 466fd50 to 5ba0a9a Compare June 5, 2024 23:06
@joe-p joe-p changed the title resource population simulate: resource population Jun 5, 2024
Copy link

codecov bot commented Jun 6, 2024

Codecov Report

Attention: Patch coverage is 89.17526% with 42 lines in your changes missing coverage. Please review.

Project coverage is 52.03%. Comparing base (d52e3dd) to head (989e746).
Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
daemon/algod/api/server/v2/utils.go 0.00% 29 Missing ⚠️
ledger/simulation/resources.go 97.40% 5 Missing and 4 partials ⚠️
ledger/simulation/simulator.go 66.66% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6015      +/-   ##
==========================================
+ Coverage   51.84%   52.03%   +0.19%     
==========================================
  Files         643      643              
  Lines       86384    86772     +388     
==========================================
+ Hits        44783    45152     +369     
- Misses      38737    38753      +16     
- Partials     2864     2867       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@joe-p
Copy link
Contributor Author

joe-p commented Oct 9, 2024

In 292a9b9 I updated the API model to avoid using map[int] but for some reason it's not encoding the response properly: msgpack decode error [pos 50]: no matching struct field found when decoding stream map with key PopulatedResourceArrays.

If I print the raw response I see this:

��last-round��txn-groups���PopulatedResourceArrays��app-budget-added����app-budget-consumed��failed-at���failure-message��transaction SSG3ROSUBRSMXPTZYOORYAXCDYURLGM5OJJF5J3LMJ74LFEWJJAA: logic eval error: invalid Account reference CV6S42NRDBJZKQDDQPUVXSD4KUJ5FSYHXENR74ZPTTIEQASD2WBRBFODRY. Details: app=1006, pc=57, opcodes=store 2; load 0; balance�txn-results���app-budget-consumed��txn-result��pool-error��txn��sig�@۔sq��3b�l3���^�J��b�=�29�IұH=� �P��I��r����-e��A���}��q����?<��txn��apid���fee���fv��gh� ��Gp��y8�d��"���\>-c�P�2�P��݆~ݢlv����snd� ��ʏe�Lz�Y��W�[�+!�����O���P�Фtype�appl�version

So for some reason PopulatedResourceArrays is present where I would expect it to be omitted (and if it was included I would expect it to be populated-resource-arrays) given the fact that it's defined as

// PreEncodedSimulateTxnResult mirrors model.SimulateTransactionResult
type PreEncodedSimulateTxnResult struct {
	Txn                      PreEncodedTxInfo                        `codec:"txn-result"`
	AppBudgetConsumed        *uint64                                 `codec:"app-budget-consumed,omitempty"`
	LogicSigBudgetConsumed   *uint64                                 `codec:"logic-sig-budget-consumed,omitempty"`
	TransactionTrace         *model.SimulationTransactionExecTrace   `codec:"exec-trace,omitempty"`
	UnnamedResourcesAccessed *model.SimulateUnnamedResourcesAccessed `codec:"unnamed-resources-accessed,omitempty"`
	FixedSigner              *string                                 `codec:"fixed-signer,omitempty"`
	PopulatedResourceArrays  *model.ResourceArrays                   `codec:"populated-resource-arrays,omitempty"`
}

Any ideas on what might be happening here?

@jannotti
Copy link
Contributor

jannotti commented Oct 9, 2024

It almost seems like the codec line was completely ignored for encoding, since it has the default name and omitempty was ineffective. Yet, decoding was surprised to see the capitalized form. I don't know the context of your testing - is there any chance you encoded that bytestream before the codec line was added, then decoded it after?

@kylebeee
Copy link

kylebeee commented Oct 9, 2024

At a glance it seems to me like you might have a bug somewhere where you're assigning *simulation.PopulatedResourceArrays to PreEncodedSimulateTxnResult.PopulatedResourceArrays instead of *model.ResourceArray but its not super clear where that would be happening & i dont know the go-algorand code base well enough to say definitively.

Where are you printing the raw response?

@joe-p
Copy link
Contributor Author

joe-p commented Jan 17, 2025

Edit2: The error is actually in simulate. TestPopulateResources/mixed_resources is currently failing.

@joe-p
Copy link
Contributor Author

joe-p commented Jan 21, 2025

After having this on the backburner for awhile I've come back to working on this and discovered why I was slow to make progress once I started to implement the endpoint. I was making two mistakes

  1. I was not building algod before running the e2e tests. In hindsight this seems obvious, but I was used to go test picking up the changes automatically for me. With the e2e tests the built algod is spawned as a seperate task, so any changes to algod need to be explicitly rebuilt.

  2. The test cache was not being properly invalidated. Most likely because of the first problem, but I was running tests and getting incorrect cached results. This lead to me making changes that actually broke things but I was under the impression they were still working. This made debugging breaking changes harder because I was breaking things without realizing it (see 035ef72 fixed by 41d63dd )

Now with 41d63dd all tests are passing, although I am experiencing an intermittent issue with database tables being locked when testing, which is seemingly causing a tracked app to be missing

--- FAIL: TestPopulatorWithGlobalResources (0.00s)
    resources_test.go:431: 
                Error Trace:    /Users/joe/git/algorand/go-algorand/ledger/simulation/resources_test.go:431
                Error:          elements differ
                            
                                extra elements in list B:
                                ([]interface {}) (len=1) {
                                 (basics.AppIndex) 3
                                }
                            
                            
                                listA:
                                ([]basics.AppIndex) (len=2) {
                                 (basics.AppIndex) 11,
                                 (basics.AppIndex) 5
                                }
                            
                            
                                listB:
                                ([]basics.AppIndex) (len=3) {
                                 (basics.AppIndex) 5,
                                 (basics.AppIndex) 11,
                                 (basics.AppIndex) 3
                                }
                Test:           TestPopulatorWithGlobalResources
time="2025-01-21T15:46:24.630756 -0500" level=warning msg="db.LoggedRetry: 5 retries (last err: database table is locked: accountbase)" file=dbutil.go function=github.com/algorand/go-algorand/util/db.LoggedRetry line=171
time="2025-01-21T15:46:24.630995 -0500" level=warning msg="db.LoggedRetry: 6 retries (last err: database table is locked: accountbase)" file=dbutil.go function=github.com/algorand/go-algorand/util/db.LoggedRetry line=171
time="2025-01-21T15:46:24.631008 -0500" level=warning msg="db.LoggedRetry: 7 retries (last err: database table is locked: accountbase)" file=dbutil.go function=github.com/algorand/go-algorand/util/db.LoggedRetry line=171
time="2025-01-21T15:46:24.631220 -0500" level=warning msg="db.LoggedRetry: 8 retries (last err: database table is locked: acctrounds)" file=dbutil.go function=github.com/algorand/go-algorand/util/db.LoggedRetry line=171

Here is a gist showing the full output with 2/10 runs failing because of the above: https://gist.github.com/joe-p/860cf28908a99db2f58c5010cb378894

I have not yet tried to reproduce on the e2e tests, but I was running them extensively last week and never saw this issue.

Once this issue is resolved the only remaining work is to make some smaller unit tests to test the "bad" cases and make sure things fail gracefully.

joe-p added 8 commits January 28, 2025 15:57
as per the comment, these checks should never be needed due to the logic
in eval context, but felt safer to add just in case
previously non appls weren't properly added to the populator, meaning
their fields were not accounted for when checking for availability. This
is actually probably fine since these sorts of duplicates should be
prevented by the logic in evalcontext, but as mentioned in previous
commits it feels safer to check here just in case
@joe-p joe-p marked this pull request as ready for review January 31, 2025 12:03
@joe-p
Copy link
Contributor Author

joe-p commented Jan 31, 2025

I believe all comments have been addressed at this point and test coverage is near 100%. The only problem is I'm still occasionally getting database table is locked when running tests locally. So far it's only happened with TestPopulatorWithGlobalResources. I tried just running this test and disabling parallel testing but I'm still seeing the same error occasionally. I believe this is just a problem with the test harness so not sure if it should be considered a blocker or not. I'd be interested to know if others can replicate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants