-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Created docker files for an integ test cluster (#601) #986
Conversation
Cluster contains: * Spark master * Spark worker * OpenSearch server * OpenSearch dashboards * Minio server Signed-off-by: Norman Jordan <norman.jordan@improving.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@normanj-bitquill thanks!!
lets try to use / utilize the existing IT pythons scripts
Signed-off-by: Norman Jordan <norman.jordan@improving.com>
@YANG-DB I am part way through altering the integ test script to run against the docker containers. I have been able to create the indices for Some tests now pass when they were expected to fail. This could be caused by more recent changes. Some tests fail when they were expected to pass. These fall into 3 categories:
I will continue to update the script for running the tests to also get the report at the end. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@normanj-bitquill how would spark-connect be used ?
will it be via python ? scala ?
could you plz describe the use case ?
@YANG-DB I have been repurposing the script: With that it is: This would be an initial phase in this PR. The follow up PR would be to make use of the Scala integration test framework already in place. Update it to connect with Spark Connect and run tests. |
I'm not sure EMR supports spark connect... |
I doubt that EMR would support Spark Connect. I am keeping that in mind, but I don't have an obvious solution for Spark EMR as yet. In the end the integration tests need to be able to run queries against either standard Spark containers or Spark EMR. The integration tests should not care which they are using. When I get to creating docker files for integration tests with Spark EMR, I will find a solution to this problem. It may require altering how integration tests connect to run queries, but for now I'd like to get a starting point out. |
The Python script for integration tests was updated to run queries against the docker cluster. The required indices are created as part of the script. The queries for the Python script were likely out of date. These have been updated when the fix for the query was obvious. There are still 6 tests that fail. Signed-off-by: Norman Jordan <norman.jordan@improving.com>
@YANG-DB I have updated this PR so that the Python script for integration tests will now run against the docker cluster. Below is one idea for the long term solution of running integration tests. Let me know what you think and if we should discuss this elsewhere. ProposalCreate a directory structure for the tests.
Create a Spark App that makes use of the The Spark (either master container or EMR container) have the following directories mounted:
The integration tests (run from After the tests finish, the integration tests (run from |
@normanj-bitquill |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@normanj-bitquill
looks great , can you please add some link to the ./script/README.md
file from our main readme.md
file ?
right below this
pip install requests pandas openpyxl | ||
pip install requests pandas openpyxl pyspark setuptools pyarrow grpcio grpcio-status protobuf | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plz also mention that both ppl-spark-integration-assembly-0.7.0-SNAPSHOT.jar
& flint-spark-integration-assembly-0.7.0-SNAPSHOT.jar
needed to be build using :
sbt clean sparkSqlApplicationCosmetic/assembly
sbt clean sparkPPLCosmetic/assembly
before the docker can run...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this section.
integ-test/script/README.md
Outdated
``` | ||
You need to replace the placeholders with your actual values of URL_ADDRESS, DATASOURCE_NAME and USERNAME, PASSWORD for authentication to your endpoint. | ||
You need to replace the placeholders with your actual values of URL_ADDRESS, OPENSEARCH_URL and USERNAME, PASSWORD for authentication to your endpoint. | ||
|
||
For more details of the command line parameters, you can see the help manual via command: | ||
```shell | ||
python SanityTest.py --help | ||
|
||
usage: SanityTest.py [-h] --base-url BASE_URL --username USERNAME --password PASSWORD --datasource DATASOURCE --input-csv INPUT_CSV |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is an example value for ${URL_ADDRESS} ? if it the spark's url ?
please mention that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed this up. It should actually be SPARK_URL
. Also provided an example value.
Signed-off-by: Norman Jordan <norman.jordan@improving.com>
Signed-off-by: Norman Jordan <norman.jordan@improving.com>
Added a link in the top-level README.md |
Signed-off-by: Norman Jordan <norman.jordan@improving.com>
@YANG-DB I have added a section to the integ test README.md to describe the test indices. |
Description
Created a cluster that can later be used for integration tests. It contains a
docker-compose.yml
file that can be used to start the whole cluster.Cluster contains:
Currently the Minio server is unused.
Spark nodes are configured to include the Flint and PPL extensions as well as to be able to query the OpenSearch server.
The OpenSearch dashboards are configured to connect to the OpenSearch server.
Related Issues
#601
Check List
--signoff
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.