Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/gho 16 #3

Merged
merged 8 commits into from
Apr 7, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .eslintignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
node_modules/
.eslintrc.cjs
.prettierrc.cjs
20 changes: 15 additions & 5 deletions .eslintrc.cjs
Original file line number Diff line number Diff line change
@@ -1,10 +1,20 @@
module.exports = {
plugins: ["vitest", "@typescript-eslint"],
parser: "@typescript-eslint/parser",
plugins: ['vitest', '@typescript-eslint'],
parser: '@typescript-eslint/parser',
extends: [
"plugin:vitest/recommended",
"eslint:recommended",
"plugin:@typescript-eslint/recommended",
'plugin:vitest/recommended',
'eslint:recommended',
'plugin:@typescript-eslint/recommended',
'plugin:prettier/recommended',
],
root: true,
rules: {
eqeqeq: ['error', 'smart'],
'prettier/prettier': [
'warn',
{
endOfLine: 'auto',
},
],
},
};
11 changes: 11 additions & 0 deletions .prettierrc.cjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
module.exports = {
singleQuote: true,
bracketSpacing: true,
trailingComma: 'es5',
requirePragma: false,
arrowParens: 'always',
bracketSameLine: false,
tabWidth: 2,
printWidth: 120,
endOfLine: 'lf',
};
46 changes: 38 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,23 +69,53 @@ const result = index.searchKnn(query, numNeighbors);
console.table(result);
```

## More on hnswlib
HNSW (Hierarchical Navigable Small World) is a graph-based index structure for efficient similarity search in high-dimensional spaces. It has several parameters that can be tuned to control the trade-off between search quality and index size or construction time. Here are some of the key parameters:
# HNSW Algorithm Parameters for hnswlib-wasm
This section will provide an overview of the HNSW algorithm parameters and their impact on performance when using the hnswlib-wasm library.
HNSW (Hierarchical Navigable Small World) is a graph-based index structure for efficient similarity search in high-dimensional spaces.

- M: This controls the maximum number of connections each node can have in the graph. Increasing M can improve search quality at the cost of index size and construction time.
![](https://d33wubrfki0l68.cloudfront.net/1fcaebe70c031d408ae082da355bfe0c6ecc04ac/ba768/images/similarity-search-indexes16.jpg) Image from [pinecone.io](https://www.pinecone.io/learn/hnsw/)

- efConstruction: This controls the maximum number of nodes that can be visited during the construction of the graph. Increasing efConstruction can improve search quality at the cost of construction time.

- efSearch: This controls the maximum number of nodes that can be visited during a search. Increasing efSearch can improve search quality at the cost of search time.
It has several parameters that can be tuned to control the trade-off between search quality and index size or construction time. Here are some of the key parameters.

- levelMult: This controls the number of connections between nodes at adjacent levels in the graph. Increasing levelMult can improve search quality at the cost of index size and construction time.
## Search Parameters
### efSearch
efSearch is the size of the dynamic list for the nearest neighbors used during the search. Higher efSearch values lead to more accurate but slower searches. efSearch cannot be set lower than the number of queried nearest neighbors k and can be any value between k and the size of the dataset.

- randomSeed: This sets the seed for the random number generator used in the construction of the graph. Setting the seed can ensure reproducibility of results.
## Construction Parameters
### M
M is the number of bi-directional links created for every new element during index construction. A reasonable range for M is 2-100. Higher M values work better on datasets with high intrinsic dimensionality and/or high recall, while lower M values work better for datasets with low intrinsic dimensionality and/or low recall. The parameter also determines the algorithms memory consumption, which is roughly M * 8-10 bytes per stored element.

- distance: This specifies the distance metric to be used in the similarity search. The choice of distance metric depends on the nature of the data being indexed.
### efConstruction
efConstruction controls the index construction time and accuracy. Bigger efConstruction values lead to longer construction times but better index quality. At some point, increasing efConstruction does not improve the quality of the index. To check if the selected efConstruction value is appropriate, measure recall for M nearest neighbor search when efSearch = efConstruction. If the recall is lower than 0.9, there is room for improvement.

## Parameter Selection for hnswlib-wasm

When using hnswlib-wasm, it is essential to choose appropriate values for M, efSearch, and efConstruction based on your datasets size and dimensionality. Since hnswlib-wasm is running in the browser, you should consider the available memory and performance limitations. Here are some recommendations:

### M:
Choose a value in the range of 12-48, as it works well for most use cases. You may need to experiment to find the optimal value for your specific dataset.

### efSearch:
Start with a value close to M and adjust it based on your desired trade-off between search speed and accuracy. Lower values will be faster but less accurate, while higher values will be more accurate but slower.

### efConstruction:
Set this value considering the expected query volume. If you anticipate low query volume, you can set a higher value for efConstruction to improve recall with minimal impact on search time, especially when using lower M values.

Remember that higher M values will increase the memory usage of the index, so you should balance performance and memory constraints when choosing your parameters for hnswlib-wasm.

## Resources

[Learn hnsw by pinecone](https://www.pinecone.io/learn/hnsw/)

[Vector indexes by pinecone](https://www.pinecone.io/learn/vector-indexes/)

Images from [pinecone.io](https://www.pinecone.io/learn/hnsw/)
![](https://d33wubrfki0l68.cloudfront.net/f8df59c49b28522dea11e4293307af2e4f8d97ed/a6992/images/hnsw-9.jpg)
![](https://d33wubrfki0l68.cloudfront.net/e5194e6f5b1aad4b940e0d3f1957b71bf6c2f25b/40135/images/hnsw-10.jpg)
![](https://d33wubrfki0l68.cloudfront.net/1b0b0b0b5b1b0b0b0b0b0b0b0b0b0b0b0b0b0b0b/40135/images/hnsw-11.jpg)

# Other Notes
## License

hnswlib-wasm is available as open source under the terms of the [Apache-2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
Expand Down
84 changes: 84 additions & 0 deletions bench/HierarchicalNSW.1.bench.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
import { bench } from 'vitest';
import { createVectorData } from '~test/testHelpers';
import { defaultParams, HierarchicalNSW, HnswlibModule, hnswParamsForAda, loadHnswlib } from '../dist/hnswlib';

describe('benchmark initIndex with defaults and 1536 dimensions', () => {
let index: HierarchicalNSW;
const setup = async () => {
index = new testHnswlibModule.HierarchicalNSW('l2', hnswParamsForAda.dimensions);
// }
};
const baseIndexSize = 1000;
bench.skip(
`${baseIndexSize} points`,
async () => {
const newIndexSize = baseIndexSize;
index.initIndex(newIndexSize, ...defaultParams.initIndex);
},
{
setup,
}
);

bench(
`${baseIndexSize * 10} points`,
async () => {
const newIndexSize = baseIndexSize * 10;
index.initIndex(newIndexSize, ...defaultParams.initIndex);
},
{
setup,
}
);

bench(
`${baseIndexSize * 50} points`,
async () => {
const newIndexSize = baseIndexSize * 50;
index.initIndex(newIndexSize, ...defaultParams.initIndex);
},
{
setup,
}
);
});

describe('benchmark initIndex with hnswParamsForAda', () => {
let index: HierarchicalNSW;
const setup = async () => {
index = new testHnswlibModule.HierarchicalNSW('l2', hnswParamsForAda.dimensions);
// }
};
const baseIndexSize = 1000;
bench.skip(
`${baseIndexSize} points`,
async () => {
index.initIndex(baseIndexSize, hnswParamsForAda.m, hnswParamsForAda.efConstruction, 200, true);
},
{
setup,
}
);

bench(
`${baseIndexSize * 10} points`,
async () => {
const newIndexSize = baseIndexSize * 10;
index.initIndex(newIndexSize, hnswParamsForAda.m, hnswParamsForAda.efConstruction, 200, true);
},
{
setup,
}
);

bench(
`${baseIndexSize * 50} points`,
async () => {
const newIndexSize = baseIndexSize * 50;
index.initIndex(newIndexSize, hnswParamsForAda.m, hnswParamsForAda.efConstruction, 200, true);
},
{
setup,
}
);
});
99 changes: 99 additions & 0 deletions bench/HierarchicalNSW.2.bench.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
import { bench } from 'vitest';
import { createVectorData } from '~test/testHelpers';
import {
addItemsWithPtrsHelper,
defaultParams,
HierarchicalNSW,
HnswlibModule,
hnswParamsForAda,
loadHnswlib,
} from '../dist/hnswlib';

describe('benchmark initIndex and addPoints', () => {
const baseIndexSize = 10;

describe.skip(`${baseIndexSize * 10} points`, () => {
let index: HierarchicalNSW;
const newIndexSize = baseIndexSize * 10;
const setup = async () => {
index = new testHnswlibModule.HierarchicalNSW('l2', hnswParamsForAda.dimensions);
};

bench(
`vectors`,
async () => {
index.initIndex(newIndexSize, ...defaultParams.initIndex);
const testVectorData = createVectorData(newIndexSize, hnswParamsForAda.dimensions);

index.addItems(testVectorData.vectors, testVectorData.labels, ...defaultParams.addPoint);
expect(index.getCurrentCount()).toBe(newIndexSize);
},
{
setup,
iterations: 5,
}
);

bench(
`pointers`,
async () => {
index.initIndex(newIndexSize, ...defaultParams.initIndex);
const testVectorData = createVectorData(newIndexSize, hnswParamsForAda.dimensions);
addItemsWithPtrsHelper(
testHnswlibModule,
index,
testVectorData.vectors,
testVectorData.labels,
...defaultParams.addPoint
);
expect(index.getCurrentCount()).toBe(newIndexSize);
},
{
setup,
iterations: 5,
}
);
});
describe(`${baseIndexSize * 100} points`, () => {
let index: HierarchicalNSW;
const newIndexSize = baseIndexSize * 100;
const setup = async () => {
index = new testHnswlibModule.HierarchicalNSW('l2', hnswParamsForAda.dimensions);
};

bench(
`vectors`,
async () => {
index.initIndex(newIndexSize, ...defaultParams.initIndex);
const testVectorData = createVectorData(newIndexSize, hnswParamsForAda.dimensions);

index.addItems(testVectorData.vectors, testVectorData.labels, ...defaultParams.addPoint);
expect(index.getCurrentCount()).toBe(newIndexSize);
},
{
setup,
iterations: 5,
}
);

bench(
`pointers`,
async () => {
index.initIndex(newIndexSize, ...defaultParams.initIndex);
const testVectorData = createVectorData(newIndexSize, hnswParamsForAda.dimensions);
addItemsWithPtrsHelper(
testHnswlibModule,
index,
testVectorData.vectors,
testVectorData.labels,
...defaultParams.addPoint
);
expect(index.getCurrentCount()).toBe(newIndexSize);
},
{
setup,
iterations: 5,
}
);
});
});
61 changes: 61 additions & 0 deletions bench/HierarchicalNSW.3.bench.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
/* eslint-disable prefer-const */

import { bench } from 'vitest';
import { createVectorData, sleep } from '~test/testHelpers';
import {
defaultParams,
HierarchicalNSW,
HnswlibModule,
hnswParamsForAda,
loadHnswlib,
addItemsWithPtrsHelper,
} from '../dist/hnswlib';

let index: HierarchicalNSW;

async function setupBefore() {
const baseIndexSize = 10000;
const testVectorData = createVectorData(baseIndexSize, hnswParamsForAda.dimensions);
index = new testHnswlibModule.HierarchicalNSW('l2', hnswParamsForAda.dimensions);
index.initIndex(baseIndexSize, hnswParamsForAda.m, hnswParamsForAda.efConstruction, 200, true);

await sleep(20);

// Add vectors in chunks of 1000
const chunkSize = 2500;
for (let i = 0; i < baseIndexSize; i += chunkSize) {
console.log('chunk', i);
const chunkVectors = testVectorData.vectors.slice(i, i + chunkSize);
const chunkLabels = testVectorData.labels.slice(i, i + chunkSize);
//index.addItems(chunkVectors, chunkLabels, ...defaultParams.addPoint);
addItemsWithPtrsHelper(testHnswlibModule, index, chunkVectors, chunkLabels, ...defaultParams.addPoint);
await sleep(20);
}

return { baseIndexSize, testVectorData };
}

const { baseIndexSize, testVectorData } = await setupBefore();

describe('benchmark searchKnn with thousand points and default params', async () => {
beforeAll(async () => {
expect(index.getCurrentCount()).toBe(baseIndexSize);
});

const setup = async (m: number, efConstruction: number) => {
index.setEfSearch(hnswParamsForAda.efSearch);
};

bench(
`default parameters: ${hnswParamsForAda.m}, efConstruction=${hnswParamsForAda.efConstruction} efSearch=${hnswParamsForAda.efSearch}`,
async () => {
index.setEfSearch(hnswParamsForAda.efSearch);
const data = index.searchKnn(testVectorData.vectors[10], 10, undefined);
expect(data.neighbors).toHaveLength(10);
},
{
setup: () => setup(hnswParamsForAda.m, hnswParamsForAda.efConstruction),
iterations: 100,
}
);
});
3 changes: 3 additions & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -35,14 +35,17 @@
"@typescript-eslint/parser": "^5.57.1",
"@xn-sakina/phoenix": "^1.0.3",
"eslint": "^8.37.0",
"eslint-config-prettier": "^8.8.0",
"eslint-define-config": "^1.17.0",
"eslint-plugin-jest": "^27.2.1",
"eslint-plugin-jest-extended": "^2.0.0",
"eslint-plugin-node": "^11.1.0",
"eslint-plugin-prettier": "^4.2.1",
"eslint-plugin-vitest": "^0.0.57",
"fake-indexeddb": "^4.0.1",
"happy-dom": "^9.1.7",
"husky": "^8.0.3",
"prettier": "^2.8.7",
"tsembind": "^1.1.0",
"typescript": "^5.0.3",
"vite": "^4.2.1",
Expand Down
24 changes: 16 additions & 8 deletions src/defaultTypes.ts → src/constants.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,19 @@ export const defaultParams = {
* @param {boolean} allowReplaceDeleted The flag to replace deleted element when adding new element
*
*/
initIndex: [16, 200, 100, false],
/**
* @param {boolean} replaceDeleted — The flag to replace a deleted element (default: false)
*/
addPoint: [false],
} as const;

export type defaultParamtersTypes = keyof typeof defaultParams;
initIndex: [16, 200, 100, false],
/**
* @param {boolean} replaceDeleted — The flag to replace a deleted element (default: false)
*/
addPoint: [false],
} as const;

export type defaultParamtersTypes = keyof typeof defaultParams;

export const hnswParamsForAda = {
m: 48,
efSearch: 24,
efConstruction: 32,
numNeighbors: 8,
dimensions: 1538,
} as const;
Loading