Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate integration tests to Go testing framework #1839

Conversation

pracucci
Copy link
Contributor

@pracucci pracucci commented Nov 19, 2019

What this PR does:
This PR introduces an easy-to-use framework to write Cortex integration tests in Go. The current integration tests written in Bash have been ported (and enriched) into Go, and a couple of new tests have been added:

  1. Backward compatibility test
  2. Ingester transfer with blocks storage

Worth to note:

  1. In CircleCI, the configs DB integration tests and the new tests have been splitted into 2 pipelines, to run them concurrently
  2. I've tried to fine-tune the Minio and Consul config to reduce the startup time (I haven't found any possibile optimization on DynamoDB local)
  3. make test doesn't run tests in the integration/ folder
  4. Being based on the Go testing framework, I can run integration tests directly in the IDE (VSCode) as far as the cortex:latest Docker image is updated. On my local machine a single integration test takes about 10s and re-iterating on it (while developing it) looks pretty handy

Which issue(s) this PR fixes:
No issue

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@bboreham
Copy link
Contributor

I support the idea of moving to Go, but Compose seems to be adding a lot of weight - is it justified?
It looks like we will need another 50-line Compose file for each test?

@pracucci
Copy link
Contributor Author

Compose seems to be adding a lot of weight - is it justified?
It looks like we will need another 50-line Compose file for each test?

@bboreham It's a good question. My idea (which is not visible in this PoC) is that you need a compose file for each scenario, not test suite. A scenario is a cluster setup (ie. storage based on Dynamo+S3 or another scenario with gossip ring instead of consul, etc), upon which you can run multiple tests. Each test suite runs against a scenario and within a single suite you can have multiple semi-isolated tests running each test with a different tenant ID (so the "framework" may give each test a unique tenant ID to use).

@pracucci pracucci added the keepalive Skipped by stale bot label Feb 3, 2020
@pracucci pracucci force-pushed the setup-integration-tests-with-docker-compose-and-golang branch 4 times, most recently from 71418db to 3aa987f Compare February 4, 2020 16:51
@pracucci pracucci changed the title Proposal: integration tests based on docker-compose and go testing framework Migrate integration tests to Go testing framework Feb 4, 2020
@pracucci pracucci marked this pull request as ready for review February 4, 2020 17:09
@pracucci pracucci force-pushed the setup-integration-tests-with-docker-compose-and-golang branch from 456213c to 3de4cd0 Compare February 5, 2020 09:40
@pracucci pracucci mentioned this pull request Feb 6, 2020
2 tasks
Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this a lot. Tests are readable and checks make more sense. They are also easy to run using standard go test tools. Big thumbs up! Thanks Marco for your effort.

}

// Setup the docker network
if out, err := RunCommandAndGetOutput("docker", "network", "create", s.networkName); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to detect Error response from daemon: network with name cortex-1 already exists message and ignore it? Or perhaps try to delete it first. It's the most common issue I've run into when trying to run tests (esp. when I cancel the test in the middle).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is not removing or reusing the network if already exists, but after that error you will get an error for already existing containers. You get this error if the previous test run panicked (ie. test timeout) and automatically cleaning up containers in a safe way is non trivial.

Please let me know if you ended up into this situation without a panic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was pretty common to run into problem with container left behind, when I didn't have CORTEX_IMAGE and CORTEX_INTEGRATION_DIR set correctly.

With unset CORTEX_INTEGRATION_DIR, docker starts minio container, but minio then fails. However, since this is reported as error in the StartService code, service is not appended to list of started services, and is then not cleaned up.

Copy link
Contributor Author

@pracucci pracucci Feb 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After talking to Peter, we actually found an issue. If service.Start() did return error (for any reason) the container may have been created but it was not cleaned up. I've pushed a fix for this (commit).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the previous version I force-removed all containers and the network before running each scenario.
(That is what stop-components.sh did)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I've added the logic to cleanup the Docker containers + network before each test, so that we can now automatically recover from a spurious situation left from the previous run. The change is in this commit.

integration/framework/service.go Show resolved Hide resolved
integration/framework/service.go Show resolved Hide resolved
integration/framework/scenario.go Outdated Show resolved Hide resolved
integration/framework/service.go Outdated Show resolved Hide resolved
integration/framework/service.go Show resolved Hide resolved
@pracucci pracucci force-pushed the setup-integration-tests-with-docker-compose-and-golang branch from 3de4cd0 to 502bc57 Compare February 6, 2020 11:05
Copy link
Contributor

@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits below; more importantly I haven't had a clean run yet, and have broadly no idea how to debug.

$ CORTEX_IMAGE=quay.io/cortexproject/cortex:latest go test -timeout 300s -v -count=1 ./integration
=== RUN   TestBackwardCompatibilityWithChunksStorage
Starting dynamodb
Starting consul
dynamodb: Initializing DynamoDB Local with the following configuration:
dynamodb: Port:	8000
dynamodb: InMemory:	true
dynamodb: DbPath:	null
dynamodb: SharedDb:	true
dynamodb: shouldDelayTransientStatuses:	false
dynamodb: CorsParams:	*
consul: ==> Starting Consul agent...
consul: ==> Consul agent running!
consul: Version: 'v0.9.4'
consul: Node ID: '09f38a58-45f1-6e51-66e2-88b92af6b841'
consul: Node name: 'consul'
consul: Datacenter: 'dc1' (Segment: '<all>')
consul: Server: true (Bootstrap: false)
consul: Client Addr: 0.0.0.0 (HTTP: 8500, HTTPS: -1, DNS: 8600)
consul: Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302)
consul: Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false
consul: ==> Log data will now stream in as it occurs:
Starting table-manager
Starting ingester-1
Starting distributor
consul: ==> Newer Consul version available: 1.6.3 (currently running: 0.9.4)
Starting ingester-2
Stopping ingester-1
ingester-1: level=warn ts=2020-02-07T11:36:00.848025258Z caller=transfer.go:423 msg="transfer attempt failed" err="cannot find ingester to transfer chunks to: no pending ingesters" attempt=1 max_retries=10
Starting querier
querier: 2020/02/07 11:36:01 Request.HasNextPage deprecated. Use Pagination type for configurable pagination of API operations
Stopping querier
Starting querier
querier: 2020/02/07 11:36:02 Request.HasNextPage deprecated. Use Pagination type for configurable pagination of API operations
Stopping querier
Killing ingester-2
ingester-2: level=warn ts=2020-02-07T11:36:02.827192046Z caller=transfer.go:422 msg="transfer attempt failed" err="cannot find ingester to transfer chunks to: no pending ingesters" attempt=1 max_retries=10
Killing distributor
Killing table-manager
Killing consul
Killing dynamodb
--- PASS: TestBackwardCompatibilityWithChunksStorage (7.94s)
=== RUN   TestIngesterFlushWithChunksStorage
Starting dynamodb
Starting consul
dynamodb: Initializing DynamoDB Local with the following configuration:
dynamodb: Port:	8000
dynamodb: InMemory:	true
dynamodb: DbPath:	null
dynamodb: SharedDb:	true
dynamodb: shouldDelayTransientStatuses:	false
dynamodb: CorsParams:	*
consul: ==> Starting Consul agent...
consul: ==> Consul agent running!
consul: Version: 'v0.9.4'
consul: Node ID: 'db75608b-b074-d963-8c06-eabdc3d6a4b0'
consul: Node name: 'consul'
consul: Datacenter: 'dc1' (Segment: '<all>')
consul: Server: true (Bootstrap: false)
consul: Client Addr: 0.0.0.0 (HTTP: 8500, HTTPS: -1, DNS: 8600)
consul: Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302)
consul: Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false
consul: ==> Log data will now stream in as it occurs:
Starting table-manager
Starting ingester-1
Starting querier
Starting distributor
querier: 2020/02/07 11:36:08 Request.HasNextPage deprecated. Use Pagination type for configurable pagination of API operations
Stopping ingester-1
Killing distributor
Killing querier
Killing table-manager
Killing consul
Killing dynamodb
--- PASS: TestIngesterFlushWithChunksStorage (6.15s)
=== RUN   TestIngesterHandOverWithBlocksStorage
Starting minio
Starting consul
minio: Attempting encryption of all config, IAM users and policies on MinIO backend
minio: API: SYSTEM()
minio: Time: 11:36:10 UTC 02/07/2020
minio: DeploymentID: 69cd1dbf-5fa9-4fda-a7cd-2223e17064cc
minio: Error: disk path full
minio: 8: cmd/fs-v1-helpers.go:320:cmd.fsCreateFile()
minio: 7: cmd/fs-v1.go:1013:cmd.(*FSObjects).putObject()
minio: 6: cmd/fs-v1.go:927:cmd.(*FSObjects).PutObject()
minio: 5: cmd/config-common.go:62:cmd.saveConfig()
minio: 4: cmd/config-encrypted.go:305:cmd.migrateConfigPrefixToEncrypted()
minio: 3: cmd/config-encrypted.go:109:cmd.handleEncryptedConfigBackend()
minio: 2: cmd/server-main.go:208:cmd.initSafeMode()
minio: 1: cmd/server-main.go:373:cmd.serverMain()
minio: API: SYSTEM()
minio: Time: 11:36:10 UTC 02/07/2020
minio: DeploymentID: 69cd1dbf-5fa9-4fda-a7cd-2223e17064cc
minio: Error: disk path full
minio: 8: cmd/fs-v1-helpers.go:320:cmd.fsCreateFile()
minio: 7: cmd/fs-v1.go:1013:cmd.(*FSObjects).putObject()
minio: 6: cmd/fs-v1.go:927:cmd.(*FSObjects).PutObject()
minio: 5: cmd/config-common.go:62:cmd.saveConfig()
minio: 4: cmd/config-encrypted.go:305:cmd.migrateConfigPrefixToEncrypted()
minio: 3: cmd/config-encrypted.go:109:cmd.handleEncryptedConfigBackend()
minio: 2: cmd/server-main.go:208:cmd.initSafeMode()
minio: 1: cmd/server-main.go:373:cmd.serverMain()
consul: ==> Starting Consul agent...
consul: ==> Consul agent running!
consul: Version: 'v0.9.4'
consul: Node ID: '5ce784bb-6181-052a-18b3-620571ab9a86'
consul: Node name: 'consul'
consul: Datacenter: 'dc1' (Segment: '<all>')
consul: Server: true (Bootstrap: false)
consul: Client Addr: 0.0.0.0 (HTTP: 8500, HTTPS: -1, DNS: 8600)
consul: Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302)
consul: Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false
consul: ==> Log data will now stream in as it occurs:
consul: ==> Newer Consul version available: 1.6.3 (currently running: 0.9.4)
Killing consul
Killing minio
minio: Exiting on signal: TERMINATED
--- FAIL: TestIngesterHandOverWithBlocksStorage (16.41s)
    ingester_hand_over_test.go:18: 
        	Error Trace:	ingester_hand_over_test.go:18
        	            				ingester_hand_over_test.go:54
        	            				ingester_hand_over_test.go:14
        	Error:      	Received unexpected error:
        	            	the service minio is not ready
        	Test:       	TestIngesterHandOverWithBlocksStorage
=== RUN   TestIngesterHandOverWithChunksStorage
Starting dynamodb
Starting consul
dynamodb: Initializing DynamoDB Local with the following configuration:
dynamodb: Port:	8000
dynamodb: InMemory:	true
dynamodb: DbPath:	null
dynamodb: SharedDb:	true
dynamodb: shouldDelayTransientStatuses:	false
dynamodb: CorsParams:	*
consul: ==> Starting Consul agent...
consul: ==> Consul agent running!
consul: Version: 'v0.9.4'
consul: Node ID: 'd5d51e32-0e52-e71c-0c3a-391411e89b86'
consul: Node name: 'consul'
consul: Datacenter: 'dc1' (Segment: '<all>')
consul: Server: true (Bootstrap: false)
consul: Client Addr: 0.0.0.0 (HTTP: 8500, HTTPS: -1, DNS: 8600)
consul: Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302)
consul: Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false
consul: ==> Log data will now stream in as it occurs:
Starting table-manager
Starting ingester-1
Starting querier
Starting distributor
querier: 2020/02/07 11:36:31 Request.HasNextPage deprecated. Use Pagination type for configurable pagination of API operations
Starting ingester-2
Stopping ingester-1
ingester-1: level=warn ts=2020-02-07T11:36:31.426057707Z caller=transfer.go:422 msg="transfer attempt failed" err="cannot find ingester to transfer chunks to: no pending ingesters" attempt=1 max_retries=10
Killing ingester-2
ingester-2: level=warn ts=2020-02-07T11:36:31.843151654Z caller=transfer.go:422 msg="transfer attempt failed" err="cannot find ingester to transfer chunks to: no pending ingesters" attempt=1 max_retries=10
Killing distributor
Killing querier
Killing table-manager
Killing consul
Killing dynamodb
--- PASS: TestIngesterHandOverWithChunksStorage (6.69s)
FAIL
FAIL	github.com/cortexproject/cortex/integration	37.188s
FAIL

Comment on lines +119 to +120
# Make sure the working directory is within the GOPATH, otherwise
# "go test" will not download module dependencies.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should really turn on -mod=vendor to avoid this. Not part of this PR.

integration/framework/scenario.go Outdated Show resolved Hide resolved
integration/framework/service.go Show resolved Hide resolved
integration/framework/scenario.go Show resolved Hide resolved
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
…tex:latest

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
@pracucci pracucci force-pushed the setup-integration-tests-with-docker-compose-and-golang branch from 7d7be2d to c78d894 Compare February 7, 2020 14:38
Signed-off-by: Marco Pracucci <marco@pracucci.com>
@pracucci
Copy link
Contributor Author

pracucci commented Feb 7, 2020

more importantly I haven't had a clean run yet

@bboreham According to the output, I can see:

        	Error:      	Received unexpected error:
        	            	the service minio is not ready
        	Test:       	TestIngesterHandOverWithBlocksStorage

And looking at minio logs (starting with minio:) I can see:

minio: Error: disk path full

Not sure about the root cause tho, unless the machine where you run Docker host has disk full (I assume that's not the case).

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Copy link
Contributor

@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got the tests to run through cleanly; I think my disk did fill up and I just didn't notice the error message.
Then, as an experiment, I made a breaking change in cortex.proto and although unit tests passed the integration tests failed, so they work!

Great job, thanks @pracucci

@pracucci pracucci merged commit 6ecd6f6 into cortexproject:master Feb 12, 2020
@pracucci pracucci mentioned this pull request Feb 12, 2020
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keepalive Skipped by stale bot
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants