Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Unable to run opensearch-benchmark in test mode #245

Closed
kotwanikunal opened this issue Mar 27, 2023 · 22 comments
Closed

[BUG] Unable to run opensearch-benchmark in test mode #245

kotwanikunal opened this issue Mar 27, 2023 · 22 comments
Labels
bug Something isn't working

Comments

@kotwanikunal
Copy link
Member

Describe the bug

  • opensearch-benchmark fails without running the benchmark on a local OpenSearch server
  • Logs below

To Reproduce

  1. Install latest version from pypi (pip3 install opensearch-benchmark)
  2. Execute the test command
opensearch-benchmark execute_test --target-host=localhost:9200 --workload=nyc_taxis --pipeline=benchmark-only --test-mode --kill-running-processes

Expected behavior

  • Benchmarks to run in test mode

Logs

023-03-27 23:25:05,311 ActorAddr-(T|:49871)/PID:73989 osbenchmark.actor ERROR Error in test execution orchestrator
Traceback (most recent call last):

  File "/Users/kkotwani/.pyenv/versions/3.8.16/lib/python3.8/site-packages/osbenchmark/actor.py", line 92, in guard
    return f(self, msg, sender)

  File "/Users/kkotwani/.pyenv/versions/3.8.16/lib/python3.8/site-packages/osbenchmark/test_execution_orchestrator.py", line 108, in receiveMsg_Setup
    self.coordinator.setup(sources=msg.sources)

  File "/Users/kkotwani/.pyenv/versions/3.8.16/lib/python3.8/site-packages/osbenchmark/test_execution_orchestrator.py", line 195, in setup
    self.current_workload = workload.load_workload(self.cfg)

  File "/Users/kkotwani/.pyenv/versions/3.8.16/lib/python3.8/site-packages/osbenchmark/workload/loader.py", line 192, in load_workload
    repo = workload_repo(cfg)

  File "/Users/kkotwani/.pyenv/versions/3.8.16/lib/python3.8/site-packages/osbenchmark/workload/loader.py", line 290, in workload_repo
    return GitWorkloadRepository(cfg, fetch, update)

  File "/Users/kkotwani/.pyenv/versions/3.8.16/lib/python3.8/site-packages/osbenchmark/workload/loader.py", line 331, in __init__
    self.repo.update(distribution_version)

  File "/Users/kkotwani/.pyenv/versions/3.8.16/lib/python3.8/site-packages/osbenchmark/utils/repo.py", line 68, in update
    branch = versions.best_match(git.branches(self.repo_dir, remote=self.remote), distribution_version)

  File "/Users/kkotwani/.pyenv/versions/3.8.16/lib/python3.8/site-packages/osbenchmark/utils/git.py", line 44, in probe
    return f(src, *args, **kwargs)

  File "/Users/kkotwani/.pyenv/versions/3.8.16/lib/python3.8/site-packages/osbenchmark/utils/git.py", line 123, in branches
    return _cleanup_remote_branch_names(process.run_subprocess_with_output(

  File "/Users/kkotwani/.pyenv/versions/3.8.16/lib/python3.8/site-packages/osbenchmark/utils/git.py", line 137, in _cleanup_remote_branch_names
    return [(b[b.index("/") + 1:]).strip() for b in branch_names if not b.endswith("/HEAD")]

  File "/Users/kkotwani/.pyenv/versions/3.8.16/lib/python3.8/site-packages/osbenchmark/utils/git.py", line 137, in <listcomp>
    return [(b[b.index("/") + 1:]).strip() for b in branch_names if not b.endswith("/HEAD")]

ValueError: substring not found

More Context (please complete the following information):

  • Workload(Share link for custom workloads): nyc_taxis
  • Service(E.g OpenSearch): OpenSearch
  • Version (E.g. 1.0): 3.0.0/Latest main

Additional context

  • Running on an arm64 M1 Macbook
@kotwanikunal kotwanikunal added bug Something isn't working untriaged labels Mar 27, 2023
@IanHoang
Copy link
Collaborator

IanHoang commented Mar 28, 2023

Thanks for bringing this to our attention Kunal. I've also been experiencing this issue recently in the integration tests and am currently in the process of identifying a fix. Looks like the issue arises for users who are starting from scratch and is not detected for users who have workloads already preloaded and unzipped.

@IanHoang
Copy link
Collaborator

Curious to see if this issue only occurs when we provision an opensearch cluster via OSB. I've been using an external host and found that it works with that. This is good to note since we can further be sure to test changes with external and internal hosts.

@kartg
Copy link
Member

kartg commented Mar 28, 2023

Curious to see if this issue only occurs when we provision an opensearch cluster via OSB

i'm currently hitting this issue when running OSB against a remote endpoint:

opensearch-benchmark execute_test --workload geonames --workload-params "bulk_indexing_clients:1" --pipeline benchmark-only --target-hosts [endpoint]:[port]

Note that I get the same stacktrace and error on running opensearch-benchmark list workloads

@IanHoang
Copy link
Collaborator

Thanks for the heads up @kartg. I've been able to get it to work with an external host. Will dive further into this.

$ ~ % opensearch-benchmark execute_test --target-host=<endpoint> --client-options="basic_auth_user:'<username>',basic_auth_password:'<password>'" --workload=nyc_taxis --pipeline=benchmark-only --test-mode --kill-running-processes


   ____                  _____                      __       ____                  __                         __
  / __ \____  ___  ____ / ___/___  ____ ___________/ /_     / __ )___  ____  _____/ /_  ____ ___  ____ ______/ /__
 / / / / __ \/ _ \/ __ \\__ \/ _ \/ __ `/ ___/ ___/ __ \   / __  / _ \/ __ \/ ___/ __ \/ __ `__ \/ __ `/ ___/ //_/
/ /_/ / /_/ /  __/ / / /__/ /  __/ /_/ / /  / /__/ / / /  / /_/ /  __/ / / / /__/ / / / / / / / / /_/ / /  / ,<
\____/ .___/\___/_/ /_/____/\___/\__,_/_/   \___/_/ /_/  /_____/\___/_/ /_/\___/_/ /_/_/ /_/ /_/\__,_/_/  /_/|_|
    /_/

[INFO] You did not provide an explicit timeout in the client options. Assuming default of 10 seconds.
[INFO] Executing test with workload [nyc_taxis], test_procedure [append-no-conflicts] and provision_config_instance ['external'] with version [1.1.0].

[WARNING] merges_total_time is 94 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] indexing_total_time is 253 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] refresh_total_time is 289 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] flush_total_time is 86 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
Running delete-index                                                           [100% done]
Running create-index                                                           [100% done]
Running check-cluster-health                                                   [100% done]
Running index                                                                  [100% done]
Running refresh-after-index                                                    [100% done]
Running force-merge                                                            [100% done]
Running refresh-after-force-merge                                              [100% done]
Running wait-until-merges-finish                                               [100% done]
Running default                                                                [100% done]
Running range                                                                  [100% done]
Running distance_amount_agg                                                    [100% done]
Running autohisto_agg                                                          [100% done]
Running date_histogram_agg                                                     [100% done]

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------

|                                                         Metric |                     Task |       Value |   Unit |
|---------------------------------------------------------------:|-------------------------:|------------:|-------:|
|                     Cumulative indexing time of primary shards |                          |  0.00363333 |    min |
|             Min cumulative indexing time across primary shards |                          |           0 |    min |
|          Median cumulative indexing time across primary shards |                          | 0.000366667 |    min |
|             Max cumulative indexing time across primary shards |                          |       0.001 |    min |
|            Cumulative indexing throttle time of primary shards |                          |           0 |    min |
|    Min cumulative indexing throttle time across primary shards |                          |           0 |    min |
| Median cumulative indexing throttle time across primary shards |                          |           0 |    min |
|    Max cumulative indexing throttle time across primary shards |                          |           0 |    min |
|                        Cumulative merge time of primary shards |                          |  0.00156667 |    min |
|                       Cumulative merge count of primary shards |                          |           1 |        |
|                Min cumulative merge time across primary shards |                          |           0 |    min |
|             Median cumulative merge time across primary shards |                          |           0 |    min |
|                Max cumulative merge time across primary shards |                          |  0.00156667 |    min |
|               Cumulative merge throttle time of primary shards |                          |           0 |    min |
|       Min cumulative merge throttle time across primary shards |                          |           0 |    min |
|    Median cumulative merge throttle time across primary shards |                          |           0 |    min |
|       Max cumulative merge throttle time across primary shards |                          |           0 |    min |
|                      Cumulative refresh time of primary shards |                          |       0.005 |    min |
|                     Cumulative refresh count of primary shards |                          |         228 |        |
|              Min cumulative refresh time across primary shards |                          |           0 |    min |
|           Median cumulative refresh time across primary shards |                          | 0.000216667 |    min |
|              Max cumulative refresh time across primary shards |                          |  0.00348333 |    min |
|                        Cumulative flush time of primary shards |                          |  0.00143333 |    min |
|                       Cumulative flush count of primary shards |                          |           8 |        |
|                Min cumulative flush time across primary shards |                          |           0 |    min |
|             Median cumulative flush time across primary shards |                          |    0.000225 |    min |
|                Max cumulative flush time across primary shards |                          | 0.000266667 |    min |
|                                        Total Young Gen GC time |                          |           0 |      s |
|                                       Total Young Gen GC count |                          |           0 |        |
|                                          Total Old Gen GC time |                          |           0 |      s |
|                                         Total Old Gen GC count |                          |           0 |        |
|                                                     Store size |                          | 0.000688318 |     GB |
|                                                  Translog size |                          |  5.6345e-07 |     GB |
|                                         Heap used for segments |                          |   0.0620499 |     MB |
|                                       Heap used for doc values |                          |    0.021553 |     MB |
|                                            Heap used for terms |                          |   0.0317993 |     MB |
|                                            Heap used for norms |                          |  0.00402832 |     MB |
|                                           Heap used for points |                          |           0 |     MB |
|                                    Heap used for stored fields |                          |  0.00466919 |     MB |
|                                                  Segment count |                          |          10 |        |
|                                                 Min Throughput |                    index |     2793.59 | docs/s |
|                                                Mean Throughput |                    index |     2793.59 | docs/s |
|                                              Median Throughput |                    index |     2793.59 | docs/s |
|                                                 Max Throughput |                    index |     2793.59 | docs/s |
|                                        50th percentile latency |                    index |     290.706 |     ms |
|                                       100th percentile latency |                    index |     310.121 |     ms |
|                                   50th percentile service time |                    index |     290.706 |     ms |
|                                  100th percentile service time |                    index |     310.121 |     ms |
|                                                     error rate |                    index |           0 |      % |
|                                                 Min Throughput | wait-until-merges-finish |        4.33 |  ops/s |
|                                                Mean Throughput | wait-until-merges-finish |        4.33 |  ops/s |
|                                              Median Throughput | wait-until-merges-finish |        4.33 |  ops/s |
|                                                 Max Throughput | wait-until-merges-finish |        4.33 |  ops/s |
|                                       100th percentile latency | wait-until-merges-finish |     198.273 |     ms |
|                                  100th percentile service time | wait-until-merges-finish |     198.273 |     ms |
|                                                     error rate | wait-until-merges-finish |           0 |      % |
|                                                 Min Throughput |                  default |        4.53 |  ops/s |
|                                                Mean Throughput |                  default |        4.53 |  ops/s |
|                                              Median Throughput |                  default |        4.53 |  ops/s |
|                                                 Max Throughput |                  default |        4.53 |  ops/s |
|                                       100th percentile latency |                  default |     398.297 |     ms |
|                                  100th percentile service time |                  default |     177.316 |     ms |
|                                                     error rate |                  default |           0 |      % |
|                                                 Min Throughput |                    range |        4.96 |  ops/s |
|                                                Mean Throughput |                    range |        4.96 |  ops/s |
|                                              Median Throughput |                    range |        4.96 |  ops/s |
|                                                 Max Throughput |                    range |        4.96 |  ops/s |
|                                       100th percentile latency |                    range |      380.03 |     ms |
|                                  100th percentile service time |                    range |     178.118 |     ms |
|                                                     error rate |                    range |           0 |      % |
|                                                 Min Throughput |      distance_amount_agg |        4.57 |  ops/s |
|                                                Mean Throughput |      distance_amount_agg |        4.57 |  ops/s |
|                                              Median Throughput |      distance_amount_agg |        4.57 |  ops/s |
|                                                 Max Throughput |      distance_amount_agg |        4.57 |  ops/s |
|                                       100th percentile latency |      distance_amount_agg |     390.726 |     ms |
|                                  100th percentile service time |      distance_amount_agg |     171.877 |     ms |
|                                                     error rate |      distance_amount_agg |           0 |      % |
|                                                 Min Throughput |            autohisto_agg |        4.96 |  ops/s |
|                                                Mean Throughput |            autohisto_agg |        4.96 |  ops/s |
|                                              Median Throughput |            autohisto_agg |        4.96 |  ops/s |
|                                                 Max Throughput |            autohisto_agg |        4.96 |  ops/s |
|                                       100th percentile latency |            autohisto_agg |     378.538 |     ms |
|                                  100th percentile service time |            autohisto_agg |     176.765 |     ms |
|                                                     error rate |            autohisto_agg |           0 |      % |
|                                                 Min Throughput |       date_histogram_agg |        4.57 |  ops/s |
|                                                Mean Throughput |       date_histogram_agg |        4.57 |  ops/s |
|                                              Median Throughput |       date_histogram_agg |        4.57 |  ops/s |
|                                                 Max Throughput |       date_histogram_agg |        4.57 |  ops/s |
|                                       100th percentile latency |       date_histogram_agg |     398.352 |     ms |
|                                  100th percentile service time |       date_histogram_agg |     179.242 |     ms |
|                                                     error rate |       date_histogram_agg |           0 |      % |


--------------------------------
[INFO] SUCCESS (took 16 seconds)

@IanHoang
Copy link
Collaborator

IanHoang commented Mar 28, 2023

@kartg @kotwanikunal Could we have more context on your setups:

  • @kotwanikunal Since you are not specifying --distribution-version in your OSB command, have you already set up an OpenSearch cluster locally?
  • @kartg Which Operating System are you running on? Also, what OpenSearch version are you running with?
  • For both of @kotwanikunal @kartg: Could you visit ~/.benchmark/benchmarks/workloads/default/ and run git status and provide the output? Curious to see what branch it's defaulted to.

@kotwanikunal
Copy link
Member Author

  • @kotwanikunal Since you are not specifying --distribution-version in your OSB command, have you already set up an OpenSearch cluster locally?

Yes, I have setup a cluster locally.

  • For both of @kotwanikunal @kartg: Could you visit ~/.benchmark/benchmarks/workloads/default/ and run git status and provide the output? Curious to see what branch it's defaulted to.
 ~ % cd ~/.benchmark/benchmarks/workloads/default/
default % git status
On branch main
Your branch is behind 'origin/main' by 1 commit, and can be fast-forwarded.
  (use "git pull" to update your local branch)

Tried pulling the latest and running again, still the same issue.

@IanHoang
Copy link
Collaborator

@kotwanikunal Could you try the following:

  • Try testing with an earlier version for OpenSearch (such as 2.6.0) to see if the error still exists.
  • Pull/Run the docker image for OpenSearch 2.6.0 and retry.
    Let us know if you still encounter the issue for both of these attempts

@kartg
Copy link
Member

kartg commented Mar 28, 2023

@kartg Which Operating System are you running on? Also, what OpenSearch version are you running with?

I'm running OSB on macOS 12.6.3 on an Intel-powered Macbook. My target cluster is on OpenSearch 1.3

For both of @kotwanikunal @kartg: Could you visit ~/.benchmark/benchmarks/workloads/default/ and run git status and provide the output? Curious to see what branch it's defaulted to.

$ cd ~/.benchmark/benchmarks/workloads/default/
$ git status
On branch main
Your branch is up to date with 'origin/main'.

nothing to commit, working tree clean

@kartg
Copy link
Member

kartg commented Mar 28, 2023

@IanHoang the error seems to originate from executing this git command:

"git -C {src} for-each-ref refs/remotes/ --format='%(refname:short)'".format(src=clean_src)))

and then trying to parse its output:

return [(b[b.index("/") + 1:]).strip() for b in branch_names if not b.endswith("/HEAD")]

Here's the output of that git command against the directory where the workloads repo is cloned:

$ git -C ~/.benchmark/benchmarks/workloads/default for-each-ref refs/remotes/ --format='%(refname:short)'
origin/1
origin/2
origin/3
origin/6
origin/7
origin
origin/main

I'm guessing that the "origin" entry in this list is causing the parsing to fail (since it has no /)


EDIT: Can confirm that I can repro this error signature with a simple Python script:

$ cat test.py 
import os
import sys

dir = sys.argv[1]
stream = os.popen("git -C {src} for-each-ref refs/remotes/ --format='%(refname:short)'".format(src=dir))
branch_names = stream.readlines()
for b in branch_names:
  print((b[b.index("/") + 1:]).strip())

$ python test.py ~/.benchmark/benchmarks/workloads/default/
1
2
3
6
7
Traceback (most recent call last):
  File "/Users/gkart/test.py", line 8, in <module>
    print((b[b.index("/") + 1:]).strip())
             ^^^^^^^^^^^^
ValueError: substring not found

Seems related to the new main branch (cc @tlfeng) that was added to the workloads repo recently - https://github.com/opensearch-project/opensearch-benchmark-workloads/branches

@kartg
Copy link
Member

kartg commented Mar 28, 2023

It looks like fixing this will need either:

In the meantime, here's a rather ugly workaround:

  • First, run:
git -C ~/.benchmark/benchmarks/workloads/default update-ref -d refs/remotes/origin/main
  • Then, run the following command to open up the git config in your editor:
git -C ~/.benchmark/benchmarks/workloads/default config -e

and the change the following line:

fetch = +refs/heads/*:refs/remotes/origin/*

to something like:

fetch = +refs/heads/*:refs/real-remotes/origin/*

(or any string other than refs/remotes/). This should now allow OSB to execute normally

@kotwanikunal
Copy link
Member Author

It looks like fixing this will need either:

In the meantime, here's a rather ugly workaround:

  • First, run:
git -C ~/.benchmark/benchmarks/workloads/default update-ref -d refs/remotes/origin/main
  • Then, run the following command to open up the git config in your editor:
git -C ~/.benchmark/benchmarks/workloads/default config -e

and the change the following line:

fetch = +refs/heads/*:refs/remotes/origin/*

to something like:

fetch = +refs/heads/*:refs/real-remotes/origin/*

(or any string other than refs/remotes/). This should now allow OSB to execute normally

Thanks @kartg! @IanHoang this worked for me.

@gkamat
Copy link
Collaborator

gkamat commented Mar 28, 2023

While trying to reproduce this scenario by cloning the workloads repository and checking the refs:

$ git clone https://github.com/opensearch-project/opensearch-benchmark-workloads
$ git -C opensearch-benchmark-workloads for-each-ref refs/remotes/ --format='%(refname:short)'
origin/1
origin/2
origin/3
origin/6
origin/7
origin/HEAD
origin/main

There are no entries without a /. Where is the offending ref coming from?

@kartg
Copy link
Member

kartg commented Mar 28, 2023

@gkamat interesting, maybe it's a change in the git command? The version on my machine simply drops the /HEAD suffix

$ git clone https://github.com/opensearch-project/opensearch-benchmark-workloads

$ git for-each-ref refs/remotes/ --format='%(refname:short)'
origin/1
origin/2
origin/3
origin/6
origin/7
origin
origin/main

$ git --version
git version 2.40.0

@kotwanikunal
Copy link
Member Author

I have the same output as @kartg

workspace % git -C opensearch-benchmark-workloads for-each-ref refs/remotes/ --format='%(refname:short)'
origin/1
origin/2
origin/3
origin/6
origin/7
origin
origin/main
workspace % git --version
git version 2.40.0

@gkamat
Copy link
Collaborator

gkamat commented Mar 29, 2023

I'm using the version that gets installed by yum on AL2:

$ git --version
git version 2.39.2

Which platform are you using?

@kotwanikunal
Copy link
Member Author

I'm using the version that gets installed by yum on AL2:

$ git --version
git version 2.39.2

Which platform are you using?

M1 Macbook Pro on Ventura 13.2.1

@gkamat
Copy link
Collaborator

gkamat commented Mar 29, 2023

That is a rather new version of git. It will need to be built from source -- even the Linux tarballs at https://git-scm.com/download/linux end at 2.39.2.

@kartg
Copy link
Member

kartg commented Mar 29, 2023

Nah, FTP just sorts 2.40 after 2.4.* 😄 https://mirrors.edge.kernel.org/pub/software/scm/git/git-2.40.0.tar.sign

@gkamat
Copy link
Collaborator

gkamat commented Mar 29, 2023

Just rebuilt git from source. Yes, the behaviour of the new version is different, as suspected. There will need to be a change made, to fix this.

@gkamat
Copy link
Collaborator

gkamat commented Mar 29, 2023

@IanHoang, changing the git command to use the full refname and indexing from the right should likely fix this issue:

        return _cleanup_remote_branch_names(process.run_subprocess_with_output(
                "git -C {src} for-each-ref refs/remotes/ --format='%(refname)'".format(src=clean_src)))

def _cleanup_remote_branch_names(branch_names):
    return [(b[b.rindex("/") + 1:]).strip() for b in branch_names if not b.endswith("/HEAD")]

@IanHoang
Copy link
Collaborator

IanHoang commented Mar 29, 2023

@kartg Thanks for diving into this! Just caught up on the thread.

TL;DR:

Reproduced the issue in a clean Ubuntu environment and can confirm that the issue lies in Git versions 2.40.0+. Propose we use @gkamat recommended fix after testing them out.

I'll open a PR to address this fix.

I'm in the same boat as Govind and am using an earlier version of Git, which doesn't exclude the HEAD from origin/HEAD:

hoangia@3c22fbd0d988 default % git for-each-ref refs/remotes/ --format='%(refname:short)'
origin/1
origin/2
origin/3
origin/6
origin/7
origin/HEAD
origin/main
hoangia@3c22fbd0d988 default % git --version
git version 2.33.0

Reproduced issue in Ubuntu:

Confirmed that issue resides in Git versioning.

  1. Started with git version 2.33.0
ubuntu@ip-172-31-80-80:~$ opensearch-benchmark list workloads

   ____                  _____                      __       ____                  __                         __
  / __ \____  ___  ____ / ___/___  ____ ___________/ /_     / __ )___  ____  _____/ /_  ____ ___  ____ ______/ /__
 / / / / __ \/ _ \/ __ \\__ \/ _ \/ __ `/ ___/ ___/ __ \   / __  / _ \/ __ \/ ___/ __ \/ __ `__ \/ __ `/ ___/ //_/
/ /_/ / /_/ /  __/ / / /__/ /  __/ /_/ / /  / /__/ / / /  / /_/ /  __/ / / / /__/ / / / / / / / / /_/ / /  / ,<
\____/ .___/\___/_/ /_/____/\___/\__,_/_/   \___/_/ /_/  /_____/\___/_/ /_/\___/_/ /_/_/ /_/ /_/\__,_/_/  /_/|_|
    /_/

Available workloads:

Name           Description                                                                                                        Documents    Compressed Size    Uncompressed Size    Default TestProcedure         All TestProcedures
-------------  -----------------------------------------------------------------------------------------------------------------  -----------  -----------------  -------------------  ----------------------------  ------------------------------------------------------------------------------------------------------------------------------------------------------------------
pmc            Full text benchmark with academic papers from PMC                                                                  574,199      5.5 GB             21.7 GB              append-no-conflicts           append-no-conflicts,append-no-conflicts-index-only,append-sorted-no-conflicts,append-fast-with-conflicts
nested         StackOverflow Q&A stored as nested docs                                                                            11,203,029   663.3 MB           3.4 GB               nested-search-test-procedure  nested-search-test-procedure,index-only
geoshape       Shapes from PlanetOSM                                                                                              60,523,283   13.4 GB            45.4 GB              append-no-conflicts           append-no-conflicts
percolator     Percolator benchmark based on AOL queries                                                                          2,000,000    121.1 kB           104.9 MB             append-no-conflicts           append-no-conflicts
so             Indexing benchmark using up to questions and answers from StackOverflow                                            36,062,278   8.9 GB             33.1 GB              append-no-conflicts           append-no-conflicts
noaa           Global daily weather measurements from NOAA                                                                        33,659,481   949.4 MB           9.0 GB               append-no-conflicts           append-no-conflicts,append-no-conflicts-index-only,top_metrics,aggs
http_logs      HTTP server log data                                                                                               247,249,096  1.2 GB             31.1 GB              append-no-conflicts           append-no-conflicts,append-no-conflicts-index-only,append-sorted-no-conflicts,append-index-only-with-ingest-pipeline,update,append-no-conflicts-index-reindex-only
geopointshape  Point coordinates from PlanetOSM indexed as geoshapes                                                              60,844,404   470.8 MB           2.6 GB               append-no-conflicts           append-no-conflicts,append-no-conflicts-index-only,append-fast-with-conflicts
geopoint       Point coordinates from PlanetOSM                                                                                   60,844,404   482.1 MB           2.3 GB               append-no-conflicts           append-no-conflicts,append-no-conflicts-index-only,append-fast-with-conflicts
geonames       POIs from Geonames                                                                                                 11,396,503   252.9 MB           3.3 GB               append-no-conflicts           append-no-conflicts,append-no-conflicts-index-only,append-sorted-no-conflicts,append-fast-with-conflicts,significant-text
nyc_taxis      Taxi rides in New York in 2015                                                                                     165,346,692  4.5 GB             74.3 GB              append-no-conflicts           append-no-conflicts,append-no-conflicts-index-only,append-sorted-no-conflicts-index-only,update,searchable-snapshot
eventdata      This benchmark indexes HTTP access logs generated based sample logs from the elastic.co website using a generator  20,000,000   756.0 MB           15.3 GB              append-no-conflicts           append-no-conflicts,transform

-------------------------------
[INFO] SUCCESS (took 1 seconds)
-------------------------------
ubuntu@ip-172-31-80-80:~$ git version
git version 2.34.1
  1. Updated git to 2.40.0 and observed error when listing workloads
ubuntu@ip-172-31-80-80:~/.benchmark/benchmarks/workloads/default$ git --version
git version 2.40.0
ubuntu@ip-172-31-80-80:~/.benchmark/benchmarks/workloads/default$ opensearch-benchmark list workloads

   ____                  _____                      __       ____                  __                         __
  / __ \____  ___  ____ / ___/___  ____ ___________/ /_     / __ )___  ____  _____/ /_  ____ ___  ____ ______/ /__
 / / / / __ \/ _ \/ __ \\__ \/ _ \/ __ `/ ___/ ___/ __ \   / __  / _ \/ __ \/ ___/ __ \/ __ `__ \/ __ `/ ___/ //_/
/ /_/ / /_/ /  __/ / / /__/ /  __/ /_/ / /  / /__/ / / /  / /_/ /  __/ / / / /__/ / / / / / / / / /_/ / /  / ,<
\____/ .___/\___/_/ /_/____/\___/\__,_/_/   \___/_/ /_/  /_____/\___/_/ /_/\___/_/ /_/_/ /_/ /_/\__,_/_/  /_/|_|
    /_/

[ERROR] Cannot list. substring not found.

Logs shows same error:

  File "/home/ubuntu/opensearch-benchmark/osbenchmark/utils/git.py", line 123, in branches
    return _cleanup_remote_branch_names(process.run_subprocess_with_output(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/opensearch-benchmark/osbenchmark/utils/git.py", line 137, in _cleanup_remote_branch_names
    return [(b[b.index("/") + 1:]).strip() for b in branch_names if not b.endswith("/HEAD")]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/opensearch-benchmark/osbenchmark/utils/git.py", line 137, in <listcomp>
    return [(b[b.index("/") + 1:]).strip() for b in branch_names if not b.endswith("/HEAD")]
               ^^^^^^^^^^^^
ValueError: substring not found

Applying the recommended fix by @gkamat

Added recommended fix by @gkamat to opensearch-benchmark/osbenchmark/utils/git.py and ran list subcommand for environments with Git versions with 2.40.0 and 2.33.0.

  1. Remove :short from refname:short in this line:
    https://github.com/IanHoang/opensearch-benchmark/blob/a1f45502b5b9e69bad1c8d10e7a6c30bd0ed8469/osbenchmark/utils/git.py#L123-L124

  2. Update index to rindex in this line:
    https://github.com/IanHoang/opensearch-benchmark/blob/a1f45502b5b9e69bad1c8d10e7a6c30bd0ed8469/osbenchmark/utils/git.py#L136-L137

  3. Reran python3 -m pip install -e . to reinstall OSB in development mode.

  4. Reran opensearch-benchmark list workloads and received the successful output:

ubuntu@ip-172-31-80-80:~/opensearch-benchmark$ opensearch-benchmark list workloads

   ____                  _____                      __       ____                  __                         __
  / __ \____  ___  ____ / ___/___  ____ ___________/ /_     / __ )___  ____  _____/ /_  ____ ___  ____ ______/ /__
 / / / / __ \/ _ \/ __ \\__ \/ _ \/ __ `/ ___/ ___/ __ \   / __  / _ \/ __ \/ ___/ __ \/ __ `__ \/ __ `/ ___/ //_/
/ /_/ / /_/ /  __/ / / /__/ /  __/ /_/ / /  / /__/ / / /  / /_/ /  __/ / / / /__/ / / / / / / / / /_/ / /  / ,<
\____/ .___/\___/_/ /_/____/\___/\__,_/_/   \___/_/ /_/  /_____/\___/_/ /_/\___/_/ /_/_/ /_/ /_/\__,_/_/  /_/|_|
    /_/

Available workloads:

Name           Description                                                                                                        Documents    Compressed Size    Uncompressed Size    Default TestProcedure         All TestProcedures
-------------  -----------------------------------------------------------------------------------------------------------------  -----------  -----------------  -------------------  ----------------------------  ------------------------------------------------------------------------------------------------------------------------------------------------------------------
pmc            Full text benchmark with academic papers from PMC                                                                  574,199      5.5 GB             21.7 GB              append-no-conflicts           append-no-conflicts,append-no-conflicts-index-only,append-sorted-no-conflicts,append-fast-with-conflicts
nested         StackOverflow Q&A stored as nested docs                                                                            11,203,029   663.3 MB           3.4 GB               nested-search-test-procedure  nested-search-test-procedure,index-only
geoshape       Shapes from PlanetOSM                                                                                              60,523,283   13.4 GB            45.4 GB              append-no-conflicts           append-no-conflicts
percolator     Percolator benchmark based on AOL queries                                                                          2,000,000    121.1 kB           104.9 MB             append-no-conflicts           append-no-conflicts
so             Indexing benchmark using up to questions and answers from StackOverflow                                            36,062,278   8.9 GB             33.1 GB              append-no-conflicts           append-no-conflicts
noaa           Global daily weather measurements from NOAA                                                                        33,659,481   949.4 MB           9.0 GB               append-no-conflicts           append-no-conflicts,append-no-conflicts-index-only,top_metrics,aggs
http_logs      HTTP server log data                                                                                               247,249,096  1.2 GB             31.1 GB              append-no-conflicts           append-no-conflicts,append-no-conflicts-index-only,append-sorted-no-conflicts,append-index-only-with-ingest-pipeline,update,append-no-conflicts-index-reindex-only
geopointshape  Point coordinates from PlanetOSM indexed as geoshapes                                                              60,844,404   470.8 MB           2.6 GB               append-no-conflicts           append-no-conflicts,append-no-conflicts-index-only,append-fast-with-conflicts
geopoint       Point coordinates from PlanetOSM                                                                                   60,844,404   482.1 MB           2.3 GB               append-no-conflicts           append-no-conflicts,append-no-conflicts-index-only,append-fast-with-conflicts
geonames       POIs from Geonames                                                                                                 11,396,503   252.9 MB           3.3 GB               append-no-conflicts           append-no-conflicts,append-no-conflicts-index-only,append-sorted-no-conflicts,append-fast-with-conflicts,significant-text
nyc_taxis      Taxi rides in New York in 2015                                                                                     165,346,692  4.5 GB             74.3 GB              append-no-conflicts           append-no-conflicts,append-no-conflicts-index-only,append-sorted-no-conflicts-index-only,update,searchable-snapshot
eventdata      This benchmark indexes HTTP access logs generated based sample logs from the elastic.co website using a generator  20,000,000   756.0 MB           15.3 GB              append-no-conflicts           append-no-conflicts,transform

-------------------------------
[INFO] SUCCESS (took 0 seconds)
-------------------------------

IanHoang pushed a commit to IanHoang/opensearch-benchmark that referenced this issue Mar 29, 2023
IanHoang added a commit that referenced this issue Mar 29, 2023
… (#246)

Signed-off-by: Ian Hoang <hoangia@amazon.com>
Co-authored-by: Ian Hoang <hoangia@amazon.com>
@IanHoang
Copy link
Collaborator

IanHoang commented Mar 30, 2023

Closing issue as this has been resolved in PR #246

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants