-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rename oni/lotus-soup imports to testplans/lotus-soup and go get lotus@master on trigger #4995
Conversation
@@ -43,5 +43,3 @@ require ( | |||
replace github.com/filecoin-project/filecoin-ffi => ../../extern/filecoin-ffi | |||
|
|||
replace github.com/supranational/blst => ../../extern/blst | |||
|
|||
replace github.com/filecoin-project/lotus => ../../ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this plays well with Testground. filecoin-ffi
and blst
are bundled with the base image (as they don't change often), but we are not including ../../
when we ship the test plan to Testground, so I am a bit puzzled how this compiled and worked over the last few days.
b0fdf9a
to
bd6d814
Compare
12e9e8a
to
301e2d0
Compare
I am marking this as ready for review, because I think it is an improvement to I think that many of the issues described are not directly related to the test plan, and something that might involve a bit more effort to address in the underlying APIs within Lotus. For reference the |
|
||
FROM golang:${GO_VERSION}-buster | ||
|
||
RUN apt-get update && apt-get install -y ca-certificates llvm clang mesa-opencl-icd ocl-icd-opencl-dev jq gcc git pkg-config bzr libhwloc-dev | ||
|
||
ARG FILECOIN_FFI_COMMIT=1985275547f222e8c97a8ab70b5cc26bc1fa50b1 | ||
ARG FILECOIN_FFI_COMMIT=1d9cb3e8ff53f51f9318fc57e5d00bc79bdc0128 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to extract this from the Docker image and compile on every test run. filecoin-ffi
library is changing much more often than originally anticipated, so the fact that we have to maintain the correct
version in the base
image we use for the testplan is not ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, agree. In fact, it's already outdated since you submitted the PR and I'm reviewing this. We can rely on docker caching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can rely on docker caching.
Not sure what you mean with that. Could you elaborate?
I think we should just rely on Lotus and clone Lotus for a given commit. We discarded that option early on, but time and time again Lotus is changing underneath these tests and we have to replicate the Lotus setup - for example we no longer need blst
to be included as an extern
.
|
||
ARG LOTUS_COMMIT=1a170e18a | ||
ARG LOTUS_COMMIT=b13226bc2f8a0233b3beb9bd18ab266547bc29c7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is necessary for the debug
image as we bundle lotus-miner
and lotus
binaries, so that we can inspect miners and clients while tests are running. This is NOT the version under test, but if it gets old enough, lotus
and lotus-miner
might stop working (due to CBOR serialisation changes for example).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be worth rescuing the change log from README-old-from-oni.md? To keep tabs on the changes we're making.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be useful to document what this image is for. Maybe add comments to the top of the Dockerfile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(also to the others)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be worth rescuing the change log from README-old-from-oni.md? To keep tabs on the changes we're making.
I'd rather lock these tests with a specific Lotus commit. The initial thinking that "filecoin-ffi" does not change often is just not correct at this stage, we have updated it tens of times, it should be regarded as any other dependency in the go.mod
of Lotus in my opinion.
@@ -0,0 +1,67 @@ | |||
[metadata] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am bundling a few more composition files for easier debugging, when we notice that tests are broken / flaky.
@@ -26,7 +26,7 @@ | |||
miners = "2" | |||
genesis_timestamp_offset = "0" | |||
balance = "20000000" # These balances will work for maximum 100 nodes, as TotalFilecoin is 2B | |||
sectors = "10" | |||
sectors = "4" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are moving to 8MiB sectors, so no need to preseal 10 sectors when starting the miners, as it takes unnecessary time.
_ = log.SetLogLevel("*", "DEBUG") | ||
_ = log.SetLogLevel("dht", "WARN") | ||
_ = log.SetLogLevel("swarm2", "WARN") | ||
_ = log.SetLogLevel("addrutil", "WARN") | ||
_ = log.SetLogLevel("stats", "WARN") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've enabled DEBUG logging by default, so that we can monitor TaaS and try to fix flakiness continuously, and also reduced verbosity on some of the louder subsystems.
301e2d0
to
4d69947
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Needs updating FFI and possibly BLST in the images (we really do need to find a better system; as @nonsense mentioned, we expected these modules to be mostly static, but they change more frequently than anticipated).
Also, needs a merge from master and a dependency upgrade, I think.
|
||
FROM golang:${GO_VERSION}-buster | ||
|
||
RUN apt-get update && apt-get install -y ca-certificates llvm clang mesa-opencl-icd ocl-icd-opencl-dev jq gcc git pkg-config bzr libhwloc-dev | ||
|
||
ARG FILECOIN_FFI_COMMIT=1985275547f222e8c97a8ab70b5cc26bc1fa50b1 | ||
ARG FILECOIN_FFI_COMMIT=1d9cb3e8ff53f51f9318fc57e5d00bc79bdc0128 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, agree. In fact, it's already outdated since you submitted the PR and I'm reviewing this. We can rely on docker caching.
|
||
FROM golang:${GO_VERSION}-buster | ||
|
||
RUN apt-get update && apt-get install -y ca-certificates llvm clang mesa-opencl-icd ocl-icd-opencl-dev jq gcc git pkg-config bzr libhwloc-dev | ||
|
||
ARG FILECOIN_FFI_COMMIT=1985275547f222e8c97a8ab70b5cc26bc1fa50b1 | ||
ARG FILECOIN_FFI_COMMIT=1d9cb3e8ff53f51f9318fc57e5d00bc79bdc0128 | ||
ARG FFI_DIR=/extern/filecoin-ffi | ||
|
||
ARG BLST_COMMIT=1cbb16ed9580dcd3e9593b71221fcf2a048faaef |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above? (for FFI)
|
||
ARG LOTUS_COMMIT=1a170e18a | ||
ARG LOTUS_COMMIT=b13226bc2f8a0233b3beb9bd18ab266547bc29c7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be worth rescuing the change log from README-old-from-oni.md? To keep tabs on the changes we're making.
|
||
ARG LOTUS_COMMIT=1a170e18a | ||
ARG LOTUS_COMMIT=b13226bc2f8a0233b3beb9bd18ab266547bc29c7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be useful to document what this image is for. Maybe add comments to the top of the Dockerfile?
|
||
ARG LOTUS_COMMIT=1a170e18a | ||
ARG LOTUS_COMMIT=b13226bc2f8a0233b3beb9bd18ab266547bc29c7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(also to the others)
// give some time to the miner, otherwise, we get errors like: | ||
// deal errored deal failed: (State=26) error calling node: publishing deal: GasEstimateMessageGas | ||
// error: estimating gas used: message execution failed: exit 19, reason: failed to lock balance: failed to lock client funds: not enough balance to lock for addr t0102: escrow balance 0 < locked 0 + required 640297000 (RetCode=19) | ||
time.Sleep(50 * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally the miner would be able to publish an event when it's ready to start accepting deals.
docker build -t "iptestground/oni-runtime:v5" -f "docker-images/Dockerfile.oni-runtime" "docker-images" | ||
docker build -t "iptestground/oni-buildbase:v13-lotus" -f "docker-images/Dockerfile.oni-buildbase" "docker-images" | ||
docker build -t "iptestground/oni-runtime:v7" -f "docker-images/Dockerfile.oni-runtime" "docker-images" | ||
docker build -t "iptestground/oni-runtime:v8-debug" -f "docker-images/Dockerfile.oni-runtime-debug" "docker-images" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to keep this -debug version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we want to keep it, -debug
includes the lotus
binary and the lotus-miner
binary, so it is useful to have it. Actually at the moment the manifest.toml
of the testplan refers to the debug version.
Co-authored-by: Łukasz Magiera <magik6k@users.noreply.github.com>
…otus-version-in-testplan rename oni/lotus-soup imports to testplans/lotus-soup and go get lotus@master on trigger
This PR is:
go.mod
to be used with the testplans ran on TaaS (currently based ongo get lotus@master
but could be changed to use the commit from CircleCI as well).filecoin-project/oni
.replace github.com/filecoin-project/lotus => ../../
since the parent folder is not included with the testplan, and this doesn't work with the test plan.It's worth noting that without making too many changes to the tests from one version of Lotus to another, we are sometimes getting different flakyness / different success rates. These tests were very stable for a few months, then they got broken for a month, and now they seem to pass some of the time. It is worth integrating them and running them soon as part of the Lotus test suite, so that we get them to a stable state, which is not easy since most of the logic really happens behind multiple APIs and is abstracted away from the test itself.
I am not sure how much of this is timing issues due to much lower block delays / propagation delays / etc. vs actual undergoing work on Lotus / go-fil-markets / etc.
PRIOR TO MERGE:
lock-lotus-version-in-testplan
branch from.circleci/config.yaml
TODO:
miner
client
3. e2e test is flaky: sometimes one of the clients blocks in
StorageDealAwaitingPreCommit
state4. WindowPoSt deadlines appear to have a one-off error with the configs in
init.go
. (This is not a problem for any of the existing tests, but worth figuring out and fixing. For the current tests, usually by the time the first WindowPoSt is to be submitted the deals should have been sealed by miners and retrieved from the clients).