-
Notifications
You must be signed in to change notification settings - Fork 810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate integration tests to Go testing framework #1839
Migrate integration tests to Go testing framework #1839
Conversation
I support the idea of moving to Go, but Compose seems to be adding a lot of weight - is it justified? |
@bboreham It's a good question. My idea (which is not visible in this PoC) is that you need a compose file for each scenario, not test suite. A scenario is a cluster setup (ie. storage based on Dynamo+S3 or another scenario with gossip ring instead of consul, etc), upon which you can run multiple tests. Each test suite runs against a scenario and within a single suite you can have multiple semi-isolated tests running each test with a different tenant ID (so the "framework" may give each test a unique tenant ID to use). |
71418db
to
3aa987f
Compare
456213c
to
3de4cd0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this a lot. Tests are readable and checks make more sense. They are also easy to run using standard go test tools. Big thumbs up! Thanks Marco for your effort.
integration/framework/scenario.go
Outdated
} | ||
|
||
// Setup the docker network | ||
if out, err := RunCommandAndGetOutput("docker", "network", "create", s.networkName); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to detect Error response from daemon: network with name cortex-1 already exists
message and ignore it? Or perhaps try to delete it first. It's the most common issue I've run into when trying to run tests (esp. when I cancel the test in the middle).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is not removing or reusing the network if already exists, but after that error you will get an error for already existing containers. You get this error if the previous test run panicked (ie. test timeout) and automatically cleaning up containers in a safe way is non trivial.
Please let me know if you ended up into this situation without a panic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was pretty common to run into problem with container left behind, when I didn't have CORTEX_IMAGE and CORTEX_INTEGRATION_DIR set correctly.
With unset CORTEX_INTEGRATION_DIR, docker starts minio container, but minio then fails. However, since this is reported as error in the StartService
code, service is not appended to list of started services, and is then not cleaned up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After talking to Peter, we actually found an issue. If service.Start()
did return error
(for any reason) the container may have been created but it was not cleaned up. I've pushed a fix for this (commit).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the previous version I force-removed all containers and the network before running each scenario.
(That is what stop-components.sh
did)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I've added the logic to cleanup the Docker containers + network before each test, so that we can now automatically recover from a spurious situation left from the previous run. The change is in this commit.
3de4cd0
to
502bc57
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nits below; more importantly I haven't had a clean run yet, and have broadly no idea how to debug.
$ CORTEX_IMAGE=quay.io/cortexproject/cortex:latest go test -timeout 300s -v -count=1 ./integration
=== RUN TestBackwardCompatibilityWithChunksStorage
Starting dynamodb
Starting consul
dynamodb: Initializing DynamoDB Local with the following configuration:
dynamodb: Port: 8000
dynamodb: InMemory: true
dynamodb: DbPath: null
dynamodb: SharedDb: true
dynamodb: shouldDelayTransientStatuses: false
dynamodb: CorsParams: *
consul: ==> Starting Consul agent...
consul: ==> Consul agent running!
consul: Version: 'v0.9.4'
consul: Node ID: '09f38a58-45f1-6e51-66e2-88b92af6b841'
consul: Node name: 'consul'
consul: Datacenter: 'dc1' (Segment: '<all>')
consul: Server: true (Bootstrap: false)
consul: Client Addr: 0.0.0.0 (HTTP: 8500, HTTPS: -1, DNS: 8600)
consul: Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302)
consul: Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false
consul: ==> Log data will now stream in as it occurs:
Starting table-manager
Starting ingester-1
Starting distributor
consul: ==> Newer Consul version available: 1.6.3 (currently running: 0.9.4)
Starting ingester-2
Stopping ingester-1
ingester-1: level=warn ts=2020-02-07T11:36:00.848025258Z caller=transfer.go:423 msg="transfer attempt failed" err="cannot find ingester to transfer chunks to: no pending ingesters" attempt=1 max_retries=10
Starting querier
querier: 2020/02/07 11:36:01 Request.HasNextPage deprecated. Use Pagination type for configurable pagination of API operations
Stopping querier
Starting querier
querier: 2020/02/07 11:36:02 Request.HasNextPage deprecated. Use Pagination type for configurable pagination of API operations
Stopping querier
Killing ingester-2
ingester-2: level=warn ts=2020-02-07T11:36:02.827192046Z caller=transfer.go:422 msg="transfer attempt failed" err="cannot find ingester to transfer chunks to: no pending ingesters" attempt=1 max_retries=10
Killing distributor
Killing table-manager
Killing consul
Killing dynamodb
--- PASS: TestBackwardCompatibilityWithChunksStorage (7.94s)
=== RUN TestIngesterFlushWithChunksStorage
Starting dynamodb
Starting consul
dynamodb: Initializing DynamoDB Local with the following configuration:
dynamodb: Port: 8000
dynamodb: InMemory: true
dynamodb: DbPath: null
dynamodb: SharedDb: true
dynamodb: shouldDelayTransientStatuses: false
dynamodb: CorsParams: *
consul: ==> Starting Consul agent...
consul: ==> Consul agent running!
consul: Version: 'v0.9.4'
consul: Node ID: 'db75608b-b074-d963-8c06-eabdc3d6a4b0'
consul: Node name: 'consul'
consul: Datacenter: 'dc1' (Segment: '<all>')
consul: Server: true (Bootstrap: false)
consul: Client Addr: 0.0.0.0 (HTTP: 8500, HTTPS: -1, DNS: 8600)
consul: Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302)
consul: Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false
consul: ==> Log data will now stream in as it occurs:
Starting table-manager
Starting ingester-1
Starting querier
Starting distributor
querier: 2020/02/07 11:36:08 Request.HasNextPage deprecated. Use Pagination type for configurable pagination of API operations
Stopping ingester-1
Killing distributor
Killing querier
Killing table-manager
Killing consul
Killing dynamodb
--- PASS: TestIngesterFlushWithChunksStorage (6.15s)
=== RUN TestIngesterHandOverWithBlocksStorage
Starting minio
Starting consul
minio: Attempting encryption of all config, IAM users and policies on MinIO backend
minio: API: SYSTEM()
minio: Time: 11:36:10 UTC 02/07/2020
minio: DeploymentID: 69cd1dbf-5fa9-4fda-a7cd-2223e17064cc
minio: Error: disk path full
minio: 8: cmd/fs-v1-helpers.go:320:cmd.fsCreateFile()
minio: 7: cmd/fs-v1.go:1013:cmd.(*FSObjects).putObject()
minio: 6: cmd/fs-v1.go:927:cmd.(*FSObjects).PutObject()
minio: 5: cmd/config-common.go:62:cmd.saveConfig()
minio: 4: cmd/config-encrypted.go:305:cmd.migrateConfigPrefixToEncrypted()
minio: 3: cmd/config-encrypted.go:109:cmd.handleEncryptedConfigBackend()
minio: 2: cmd/server-main.go:208:cmd.initSafeMode()
minio: 1: cmd/server-main.go:373:cmd.serverMain()
minio: API: SYSTEM()
minio: Time: 11:36:10 UTC 02/07/2020
minio: DeploymentID: 69cd1dbf-5fa9-4fda-a7cd-2223e17064cc
minio: Error: disk path full
minio: 8: cmd/fs-v1-helpers.go:320:cmd.fsCreateFile()
minio: 7: cmd/fs-v1.go:1013:cmd.(*FSObjects).putObject()
minio: 6: cmd/fs-v1.go:927:cmd.(*FSObjects).PutObject()
minio: 5: cmd/config-common.go:62:cmd.saveConfig()
minio: 4: cmd/config-encrypted.go:305:cmd.migrateConfigPrefixToEncrypted()
minio: 3: cmd/config-encrypted.go:109:cmd.handleEncryptedConfigBackend()
minio: 2: cmd/server-main.go:208:cmd.initSafeMode()
minio: 1: cmd/server-main.go:373:cmd.serverMain()
consul: ==> Starting Consul agent...
consul: ==> Consul agent running!
consul: Version: 'v0.9.4'
consul: Node ID: '5ce784bb-6181-052a-18b3-620571ab9a86'
consul: Node name: 'consul'
consul: Datacenter: 'dc1' (Segment: '<all>')
consul: Server: true (Bootstrap: false)
consul: Client Addr: 0.0.0.0 (HTTP: 8500, HTTPS: -1, DNS: 8600)
consul: Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302)
consul: Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false
consul: ==> Log data will now stream in as it occurs:
consul: ==> Newer Consul version available: 1.6.3 (currently running: 0.9.4)
Killing consul
Killing minio
minio: Exiting on signal: TERMINATED
--- FAIL: TestIngesterHandOverWithBlocksStorage (16.41s)
ingester_hand_over_test.go:18:
Error Trace: ingester_hand_over_test.go:18
ingester_hand_over_test.go:54
ingester_hand_over_test.go:14
Error: Received unexpected error:
the service minio is not ready
Test: TestIngesterHandOverWithBlocksStorage
=== RUN TestIngesterHandOverWithChunksStorage
Starting dynamodb
Starting consul
dynamodb: Initializing DynamoDB Local with the following configuration:
dynamodb: Port: 8000
dynamodb: InMemory: true
dynamodb: DbPath: null
dynamodb: SharedDb: true
dynamodb: shouldDelayTransientStatuses: false
dynamodb: CorsParams: *
consul: ==> Starting Consul agent...
consul: ==> Consul agent running!
consul: Version: 'v0.9.4'
consul: Node ID: 'd5d51e32-0e52-e71c-0c3a-391411e89b86'
consul: Node name: 'consul'
consul: Datacenter: 'dc1' (Segment: '<all>')
consul: Server: true (Bootstrap: false)
consul: Client Addr: 0.0.0.0 (HTTP: 8500, HTTPS: -1, DNS: 8600)
consul: Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302)
consul: Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false
consul: ==> Log data will now stream in as it occurs:
Starting table-manager
Starting ingester-1
Starting querier
Starting distributor
querier: 2020/02/07 11:36:31 Request.HasNextPage deprecated. Use Pagination type for configurable pagination of API operations
Starting ingester-2
Stopping ingester-1
ingester-1: level=warn ts=2020-02-07T11:36:31.426057707Z caller=transfer.go:422 msg="transfer attempt failed" err="cannot find ingester to transfer chunks to: no pending ingesters" attempt=1 max_retries=10
Killing ingester-2
ingester-2: level=warn ts=2020-02-07T11:36:31.843151654Z caller=transfer.go:422 msg="transfer attempt failed" err="cannot find ingester to transfer chunks to: no pending ingesters" attempt=1 max_retries=10
Killing distributor
Killing querier
Killing table-manager
Killing consul
Killing dynamodb
--- PASS: TestIngesterHandOverWithChunksStorage (6.69s)
FAIL
FAIL github.com/cortexproject/cortex/integration 37.188s
FAIL
# Make sure the working directory is within the GOPATH, otherwise | ||
# "go test" will not download module dependencies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should really turn on -mod=vendor
to avoid this. Not part of this PR.
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
…tex:latest Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
7d7be2d
to
c78d894
Compare
Signed-off-by: Marco Pracucci <marco@pracucci.com>
@bboreham According to the output, I can see:
And looking at minio logs (starting with
Not sure about the root cause tho, unless the machine where you run Docker host has disk full (I assume that's not the case). |
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got the tests to run through cleanly; I think my disk did fill up and I just didn't notice the error message.
Then, as an experiment, I made a breaking change in cortex.proto
and although unit tests passed the integration tests failed, so they work!
Great job, thanks @pracucci
What this PR does:
This PR introduces an easy-to-use framework to write Cortex integration tests in Go. The current integration tests written in Bash have been ported (and enriched) into Go, and a couple of new tests have been added:
Worth to note:
make test
doesn't run tests in theintegration/
foldercortex:latest
Docker image is updated. On my local machine a single integration test takes about 10s and re-iterating on it (while developing it) looks pretty handyWhich issue(s) this PR fixes:
No issue
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]