Skip to content

Commit

Permalink
Multinode-HA Vespa Setup for Local Testing (#1071)
Browse files Browse the repository at this point in the history
Co-authored-by: yihanzhao <yihan@marqo.ai>
  • Loading branch information
vicilliar and papa99do authored Feb 25, 2025
1 parent ca5f03b commit e556293
Show file tree
Hide file tree
Showing 32 changed files with 1,740 additions and 328 deletions.
36 changes: 1 addition & 35 deletions .github/workflows/largemodel_unit_test_CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -113,41 +113,7 @@ jobs:
mvn clean package
- name: Start Vespa
run: |
# Define these for checking if Vespa is ready
export VESPA_CONFIG_URL=http://localhost:19071
export VESPA_DOCUMENT_URL=http://localhost:8080
export VESPA_QUERY_URL=http://localhost:8080
cd marqo/scripts/vespa_local
set -x
python vespa_local.py start
set +x
echo "Waiting for Vespa to start"
for i in {1..20}; do
echo -ne "Waiting... $i seconds\r"
sleep 1
done
echo -e "\nDone waiting."
# Zip up schemas and services
sudo apt-get install zip -y
zip -r vespa_tester_app.zip services.xml schemas
# Deploy application with test schema
curl --header "Content-Type:application/zip" --data-binary @vespa_tester_app.zip http://localhost:19071/application/v2/tenant/default/prepareandactivate
# wait for vespa to start (document url):
timeout 10m bash -c 'until curl -f -X GET $VESPA_DOCUMENT_URL >/dev/null 2>&1; do echo " Waiting for Vespa document API to be available..."; sleep 10; done;' || \
(echo "Vespa (Document URL) did not start in time" && exit 1)
echo "Vespa document API is available. Local Vespa setup complete."
# Delete the zip file
rm vespa_tester_app.zip
echo "Deleted vespa_tester_app.zip"
run: python marqo/scripts/vespa_local/vespa_local.py full-start --Shards ${{ inputs.number_of_shards || 1 }} --Replicas ${{ inputs.number_of_replicas || 0 }}

- name: Run Large Model Unit Tests
run: |
Expand Down
87 changes: 77 additions & 10 deletions .github/workflows/unit_test_200gb_CI.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,36 @@
name: unit_test_200gb_CI
run-name: Unit Tests with ${{ inputs.number_of_shards || 1 }} shards and ${{ inputs.number_of_replicas || 0 }} replicas
# runs unit tests on AMD64 machine

on:
workflow_call:
inputs:
number_of_shards:
type: number
description: 'Number of shards (content nodes per group in Vespa). Minimum of 1.'
required: true
default: 1

number_of_replicas:
type: number
description: 'Number of replicas (groups in Vespa minus 1). Minimum of 0.'
required: true
default: 0

workflow_dispatch:
inputs:
number_of_shards:
type: number
description: 'Number of shards (content nodes per group in Vespa). Minimum of 1.'
required: true
default: 1

number_of_replicas:
type: number
description: 'Number of replicas (groups in Vespa - 1)'
required: true
default: 0

push:
branches:
- mainline
Expand All @@ -16,7 +43,7 @@ on:
- releases/*

concurrency:
group: integ-tests-${{ github.ref }}
group: unit-tests-${{ github.ref }}-${{ inputs.number_of_shards }}-${{ inputs.number_of_replicas }}
cancel-in-progress: true

permissions:
Expand All @@ -39,9 +66,9 @@ jobs:
run: |
cd marqo
set -x
# Determine BASE_COMMIT and HEAD_COMMIT based on the event type
if [[ "${GITHUB_EVENT_NAME}" == "pull_request" ]]; then
if [[ "${GITHUB_EVENT_NAME}" == "pull_request" || "${GITHUB_EVENT_NAME}" == "pull_request_review" ]]; then
BASE_COMMIT=${{ github.event.pull_request.base.sha }}
HEAD_COMMIT=${{ github.event.pull_request.head.sha }}
elif [[ "${GITHUB_EVENT_NAME}" == "push" ]]; then
Expand Down Expand Up @@ -70,11 +97,46 @@ jobs:
echo "doc_only=true" >> $GITHUB_OUTPUT
fi
Start-Runner:
name: Start self-hosted EC2 runner
Determine-Vespa-Setup:
needs:
- Check-Changes
runs-on: ubuntu-latest
if: ${{ needs.Check-Changes.outputs.doc_only == 'false' }} # Run only if there are non-documentation changes
outputs:
VESPA_MULTINODE_SETUP: ${{ steps.set_var.outputs.VESPA_MULTINODE_SETUP }}
MULTINODE_TEST_ARGS: ${{ steps.set_var.outputs.MULTINODE_TEST_ARGS }}
steps:
- name: Determine VESPA_MULTINODE_SETUP
id: set_var
run: |
# For single node, initialize as false
echo "VESPA_MULTINODE_SETUP=false" >> $GITHUB_OUTPUT
# Only enforce coverage check if single node
echo "MULTINODE_TEST_ARGS=--cov-fail-under=69" >> $GITHUB_OUTPUT
echo "First assuming single node Vespa setup."
# Extract inputs safely, defaulting to 1 (for shards), 0 (for replicas) if not present
NUMBER_OF_SHARDS="${{ inputs.number_of_shards || 1 }}"
NUMBER_OF_REPLICAS="${{ inputs.number_of_replicas || 0 }}"
# Convert inputs to integers
NUMBER_OF_SHARDS_INT=$(echo "$NUMBER_OF_SHARDS" | awk '{print int($0)}')
NUMBER_OF_REPLICAS_INT=$(echo "$NUMBER_OF_REPLICAS" | awk '{print int($0)}')
# Evaluate the conditions
if [[ "$NUMBER_OF_SHARDS_INT" -gt 1 || "$NUMBER_OF_REPLICAS_INT" -gt 0 ]]; then
echo "Now using multi-node Vespa setup. Shards are $NUMBER_OF_SHARDS_INT and replicas are $NUMBER_OF_REPLICAS_INT."
echo "VESPA_MULTINODE_SETUP=true" >> $GITHUB_OUTPUT
# If multinode vespa, ignore unrelated tests to save time and prevent errors
echo "MULTINODE_TEST_ARGS=--multinode --ignore=tests/integ_tests/core/index_management/test_index_management.py --ignore=tests/integ_tests/core/inference --ignore=tests/integ_tests/processing --ignore=tests/integ_tests/s2_inference" >> $GITHUB_OUTPUT
fi
Start-Runner:
needs:
- Determine-Vespa-Setup
- Check-Changes
name: Start self-hosted EC2 runner
runs-on: ubuntu-latest
if: ${{ needs.Check-Changes.outputs.doc_only == 'false' }} # Run only if there are non-documentation changes
outputs:
label: ${{ steps.start-ec2-runner.outputs.label }}
Expand All @@ -93,7 +155,8 @@ jobs:
mode: start
github-token: ${{ secrets.GH_PERSONAL_ACCESS_TOKEN }}
ec2-image-id: ${{ vars.MARQO_CPU_AMD64_TESTS_INSTANCE_AMI }}
ec2-instance-type: m6i.xlarge
# m6i.xlarge if single node vespa, but m6i.2xlarge if multinode vespa
ec2-instance-type: ${{ needs.Determine-Vespa-Setup.outputs.VESPA_MULTINODE_SETUP == 'true' && 'm6i.2xlarge' || 'm6i.xlarge' }}
subnet-id: ${{ secrets.MARQO_WORKFLOW_TESTS_SUBNET_ID }}
security-group-id: ${{ secrets.MARQO_WORKFLOW_TESTS_SECURITY_GROUP_ID }}
aws-resource-tags: > # optional, requires additional permissions
Expand All @@ -111,9 +174,13 @@ jobs:
needs:
- Check-Changes # required to start the main job when the runner is ready
- Start-Runner # required to get output from the start-runner job
- Determine-Vespa-Setup
if: ${{ needs.Check-Changes.outputs.doc_only == 'false' }} # Run only if there are non-documentation changes
runs-on: ${{ needs.start-runner.outputs.label }} # run the job on the newly created runner
environment: marqo-test-suite
env:
VESPA_MULTINODE_SETUP: ${{ needs.Determine-Vespa-Setup.outputs.VESPA_MULTINODE_SETUP }}
MULTINODE_TEST_ARGS: ${{ needs.Determine-Vespa-Setup.outputs.MULTINODE_TEST_ARGS }}
steps:
- name: Checkout marqo repo
uses: actions/checkout@v3
Expand Down Expand Up @@ -171,7 +238,7 @@ jobs:
mvn clean package
- name: Start Vespa
run: python marqo/tests/api_tests/v1/scripts/start_vespa.py
run: python marqo/scripts/vespa_local/vespa_local.py full-start --Shards ${{ inputs.number_of_shards || 1 }} --Replicas ${{ inputs.number_of_replicas || 0 }}

- name: Run Unit Tests
run: |
Expand All @@ -186,10 +253,10 @@ jobs:
cd marqo
export PYTHONPATH="./src"
export PYTHONPATH="./src:."
set -o pipefail
pytest --ignore=tests/integ_tests/test_documentation.py \
--durations=100 --cov=src --cov-branch --cov-context=test --cov-fail-under=69 \
pytest ${{ env.MULTINODE_TEST_ARGS }} --ignore=tests/integ_tests/test_documentation.py \
--durations=100 --cov=src --cov-branch --cov-context=test \
--cov-report=html:cov_html --cov-report=xml:cov.xml --cov-report term:skip-covered \
--md-report --md-report-flavor gfm --md-report-output pytest_result_summary.md \
tests/integ_tests/ | tee pytest_output.txt
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/unit_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ jobs:
- name: Run Unit Tests
run: |
cd marqo
export PYTHONPATH="./src"
export PYTHONPATH="./src:."
pytest tests/unit_tests/ --durations=100 --cov=src --cov-branch --cov-context=test --cov-report=html:cov_html --cov-report=lcov:lcov.info
- name: Upload Test Report
Expand Down
52 changes: 52 additions & 0 deletions .github/workflows/unit_tests_with_shards_and_replicas.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Runs unit tests on 4 cases:
# 1. single node vespa
# 2. multinode vespa: 1 shard, 1 replica
# 3. multinode vespa: 2 shard, 0 replicas
# 4. multinode vespa: 2 shards, 1 replicas
# Runs only once on PR approval

name: Unit Tests with Shards and Replicas

on:
workflow_dispatch:
pull_request_review:
types: [submitted]
branches:
- mainline
- 'releases/*'

permissions:
contents: read

jobs:
Unit-Tests-1-Shard-0-Replica:
uses: ./.github/workflows/unit_test_200gb_CI.yml
secrets: inherit
if: github.event_name == 'workflow_dispatch' || github.event.review.state == 'approved'
with:
number_of_shards: 1
number_of_replicas: 0

Unit-Tests-1-Shard-1-Replica:
uses: ./.github/workflows/unit_test_200gb_CI.yml
secrets: inherit
if: github.event_name == 'workflow_dispatch' || github.event.review.state == 'approved'
with:
number_of_shards: 1
number_of_replicas: 1

Unit-Tests-2-Shard-0-Replica:
uses: ./.github/workflows/unit_test_200gb_CI.yml
secrets: inherit
if: github.event_name == 'workflow_dispatch' || github.event.review.state == 'approved'
with:
number_of_shards: 2
number_of_replicas: 0

Unit-Tests-2-Shard-1-Replica:
uses: ./.github/workflows/unit_test_200gb_CI.yml
secrets: inherit
if: github.event_name == 'workflow_dispatch' || github.event.review.state == 'approved'
with:
number_of_shards: 2
number_of_replicas: 1
12 changes: 11 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -149,5 +149,15 @@ dump.rdb

.DS_Store

# Local vespa artifacts
# Tester app for unit tests
scripts/vespa_local/vespa_tester_app.zip
scripts/vespa_local/vespa_tester_app.zip

# Dynamically generated files for multinode vespa
scripts/vespa_local/docker-compose.yml
scripts/vespa_local/services.xml
scripts/vespa_local/hosts.xml

scripts/vespa_local/multinode/docker-compose.yml
scripts/vespa_local/multinode/services.xml
scripts/vespa_local/multinode/hosts.xml
6 changes: 5 additions & 1 deletion requirements.dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,8 @@ pytest==8.3.4
pytest-cov==6.0.0
diff-cover==9.2.0
pytest-md-report==0.6.2
pytest-asyncio==0.23.8
pytest-asyncio==0.23.8

# For vespa_local setup
docker==7.1.0
PyYAML==6.0.2
Empty file added scripts/__init__.py
Empty file.
58 changes: 58 additions & 0 deletions scripts/vespa_local/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Setting up Vespa locally
When running Marqo or the unit test suite locally, a Vespa node or cluster needs to be running. To assist with this,
this directory comes with scripts to set up either a single node (1 container) or multinode-HA Vespa on your machine.

### Set Vespa version
- By default, this script will use Vespa 8.431.32, as defined in `vespa_local.py`. To change it, set the `VESPA_VERSION`
variable to the desired version. For example:
```commandline
export VESPA_VERSION="latest"
```
## Single Node Vespa (default & recommended)
- Runs 1 Vespa container on your machine. This serves as the config, api, and content node.
- This is equivalent to running Vespa with 0 replicas and 1 shard.
- Start with this command:
```commandline
python vespa_local.py start
```
- This will run the Vespa docker container then copy the `services.xml` file from the `singlenode/` directory to
this directory. This will be bundled into the Vespa application upon deployment.

## Multi-node Vespa
- Runs a Vespa cluster with the following nodes:
- 3 config nodes
- `m` content nodes, where `m` is `number_of_shards * (1 + number_of_replicas)`
- `n` API nodes, where `n` is `max(2, number_of_content_nodes)`
- For example, with 2 shards and 1 replica, it will run 4 content nodes and 2 API nodes.
- Start with this command:
```commandline
python vespa_local.py start --Shards 2 --Replicas 1
```

## Deployment
- After starting the Vespa node(s), you can deploy the Vespa application with the files in this directory using:
```commandline
python vespa_local.py deploy-config
```
- For single node, you can check for readiness using:
```
curl -s http://localhost:19071/state/v1/health
```
- For multi-node, the start script will output a list of URLs corresponding to the API and content nodes.
You can curl each one to check for readiness.

## Other Commands
### Stop Vespa
```commandline
python vespa_local.py stop
```
### Restart Vespa
```commandline
python vespa_local.py restart
```

## Notes
- When running other commands in this script (stop, restart), it will check for the presence of a container named
`vespa`, and will assume setup is single node if it finds one. If not, it will assume setup is multi-node.
- For multi-node, expect config and API nodes to take ~1gb of memory, while content nodes take ~500mb each. Adjust your
resource allotment accordingly.
Empty file added scripts/vespa_local/__init__.py
Empty file.
34 changes: 0 additions & 34 deletions scripts/vespa_local/schemas/test_vespa_client.sd

This file was deleted.

Loading

0 comments on commit e556293

Please sign in to comment.