Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extracted KnnIndexer class out of KnnGraphTester and moving to a knn package #254

Merged
merged 2 commits into from
Apr 2, 2024

Conversation

nitirajrathore
Copy link
Contributor

@nitirajrathore nitirajrathore commented Feb 2, 2024

In this PR

  1. moved the current KnnGraphTester to knn package.
  2. Extracted KnnIndexer class out of KnnGraphTester code so that the logic can be reused.
  3. Extracted some more classes from KnnGraphTester to avoid bloating it as a coding best practice.

Tests :

ant build

python src/python/knnPerfTest.py

Also tested by passing the -indexPath to KnnGraphTester from knnPerfTest.py. Works fine.

Was able to run the command correctly and get output like

recall avgCpuTime numDocs fanout maxConn beamWidth totalVisited reindexTimeMsec selectivity prefilter
0.971 0.71 10000 0 16 100 1339 7013 1.00 post-filter

@nitirajrathore
Copy link
Contributor Author

Hi @msokolov : If you find time please review and help with merge.

Copy link
Collaborator

@msokolov msokolov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you undo the output format change? I guess if markdown output is desired we could either post-process or add a command line option to control the format?

@@ -557,7 +565,22 @@ private void testSearch(Path indexPath, Path queryPath, Path outputPath, int[][]
totalVisited /= numIters;
System.out.printf(
Locale.ROOT,
"%5.3f\t%5.2f\t%d\t%d\t%d\t%d\t%d\t%d\t%.2f\t%s\n",
"|%s\t|%s\t|%s\t|%s\t|%s\t|%s\t|%s\t|%s\t|%s\t|%s|\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with e code re-org but would prefer to keep the output in TSV format so it can be consumed by other tooling

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds right. Will revert this part. My thought was that the output is generally pasted in the comments here so it would be better to pre-format it into md table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reverted to csv format.

@msokolov msokolov merged commit ece3efd into mikemccand:master Apr 2, 2024
@msokolov
Copy link
Collaborator

msokolov commented Apr 2, 2024

sorry I lost track of this! merging now...

@benwtrent
Copy link
Collaborator

I am working on fixing this, as this breaks quantization, and adding options to quantize to 4bits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants