💫 Finalise vector support and add vector specs to model meta #1457
Labels
docs
Documentation and website
enhancement
Feature requests and improvements
models
Issues related to the statistical models
🌙 nightly
Discussion and contributions related to nightly builds
Related issues: #1092, #1341, #1204
Finalise vector support
The vector support of the
en_core_web_sm
model in v2.0 is still being finalised. However, the stable version will definitely include some vectors, and will let you get context-sensitive token vectors from thetensorizer
. This needs to be wired up properly again.Documentation of model vector specs
The way the included word vectors are documented in the current models documentation and new v2.0 model directory still isn't ideal. Vector details are only present in the
"description"
– instead, they should be added to their own"vectors"
key in themeta.json
. The details could be read off the model automatically after training, e.g. byspacy train
. This would also mean that users training their own model would have this information added automatically.Example
The v2.0 model directory requests each model's
meta.json
and uses this info to populate the model details. This ensures that the website is always up to date with the latest release. On the front-end, all that has to be done is add a row for the vectors info, and populate it via theModelLoader
script if a"vectors"
object is present in the meta. We'll also need to update our internal model build process to make sure the vectors info is added to each individual model release.Other documentation
While fixing this, we also need to revisit the word vectors & similarity guide to make sure it doesn't contain any misleading information about the vectors included in the models.
The text was updated successfully, but these errors were encountered: