29 Apr 08:39

ebca29f

v1.8.0-rc.2 🪼 Pre-release

Pre-release

⚠️ Since this is a release candidate (RC), we do NOT recommend using it in a production environment. Is something not working as expected? We welcome bug reports and feedback about new features.

What's Changed

Remove useless analytics by @irevoire in #4578
Stop crashing when panic occurs in thread pool by @Kerollmops in #4593
Fix embedders api by @ManyTheFish in #4600
Fix embeddings settings update by @ManyTheFish in #4597

Contributors

Kerollmops, ManyTheFish, and irevoire

Assets 7

18 Apr 10:40

ManyTheFish

v1.8.0-rc.1

a04012c

v1.8.0-rc.1 🪼 Pre-release

Pre-release

⚠️ Since this is a release candidate (RC), we do NOT recommend using it in a production environment. Is something not working as expected? We welcome bug reports and feedback about new features.

What's Changed

Avoid clearing db in transform by @ManyTheFish in #4504
Update the search logs by @irevoire in #4580
Always show facet numbers in alpha order in the facet distribution by @Kerollmops in #4581
increase the default search time budget from 150ms to 1.5s by @irevoire in #4576
Update charabia v0.8.9 by @ManyTheFish in #4583
- Remove pinyin normalization
- \t is now part of the default separators

Contributors

Kerollmops, ManyTheFish, and irevoire

Assets 7

15 Apr 07:49

curquiza

v1.8.0-rc.0

0661c86

v1.8.0-rc.0 🪼 Pre-release

Pre-release

⚠️ Since this is a release candidate (RC), we do NOT recommend using it in a production environment. Is something not working as expected? We welcome bug reports and feedback about new features.

Meilisearch v1.8 introduces new changes and optimizations related to the Hybrid search with the addition of new models and embedders like REST embedders and the Ollama model. This version also focuses on stability by adding more security around the search requests. Finally, we introduce the negative operator to exclude specific terms from a search query.

New features and improvements 🔥

Hybrid search improvements

Full description of hybrid search changes here.

🗣️ This is an experimental feature and we need your help to improve it! Share your thoughts and feedback on this GitHub discussion.

Done by @dureuill and @jakobklemm in #4456, #4537, #4509, #4548, #4549.

⚠️ Breaking changes of hybrid search usage

To ease the search answer speed and bandwidth, Meilisearch no longer returns the query vector in the search response. The vector field will not be displayed.
_semanticScore is no longer returned in the search response. The _rankingScore field has the same value as the _semanticScore, and should be used in its place. To get the _rankingScore value, add "showRankingScore": true to the search query.
When adding "showRankingScoreDetails": true to a semantic search query, the vector and its value are no longer displayed to improve the search speed and bandwidth use.

New embedders: generic REST embedder and Ollama model

New embedder sources have been added

ollama source
rest source

REST embedder

Meilisearch now supports any REST embedder. You can set them up with the following configuration:

"default": {
  "source": "rest", // 👈 Use the REST source
  "url": "http://localhost:12345/api/v1/embed",
  // ☝️ Mandatory, full URL to the embedding endpoint
  "apiKey": "187HFLDH97CNHN",
  // ☝️ optional, will be passed as Bearer in the Authorization header
  "dimensions": 512,
  // ☝️ optional, inferred with a dummy request if missing
  "documentTemplate": "blabla",
  "inputField": ["data", "text"],
  // ☝️ inject texts in data.text in the query
  // Optional, defaults to []
  "inputType": "text", // text or textArray
  // ☝️ inject a single text
  // Optional, defaults to text
  "query": {
    // A JSON object describing other fields to send in a query
    // for example
    "model": "name-of-your-model",
    "dimensions": 512
  },
  // ☝️ A JSON object describing other fields to send in a query
  // Optional, defaults to {}
  "pathToEmbeddings": ["data"],
  // ☝️ look at embeddings in "data" in the response
  // Optional, defaults to []
  "embeddingObject": ["embedding"]
  // ☝️ look at the embedding inside of "embedding"
  // Optional, defaults to []
}

Here is an example of setting OpenAI embedder with the rest source:

{
  "source": "rest",
  "apiKey": "<your-openai-api-key>",
  "dimensions": 1536,
  "url": "https://api.openai.com/v1/embeddings",
  "query": {
    "model": "text-embedding-ada-002"
  },
  "inputField": ["input"],
  "inputType": "textArray",
  "pathToEmbeddings": ["data"],
  "embeddingObject": ["embedding"]
}

Ollama model

Here is how to set up the Ollama model:

"default": {
  "source": "ollama", // 👈 Use the Ollama source
  "url": "http://localhost:11434/api/embeddings",
  // ☝️ optional, fetched from MEILI_OLLAMA_URL environment variable if missing
  "apiKey": "<foobarbaz>",
  // ☝️ optional
  "model": "nomic-embed-text",
  "documentTemplate": "blabla" // like for openAI and huggingFace sources
}

Expose the `distribution` shift setting

When setting an embedder, you can now set the distribution shift.

"default": {
  "source": "huggingFace", // supported for any source
  "model": "some/model",
  "distribution": {  // describes the natural distribution of results
    "mean": 0.7, // mean value
    "sigma": 0.3 // variance
  }
}

The “distribution shift” is an affine transformation applied to the _rankingScore of a semantic search result with the aim of making the comparison to the _rankingScore of a keyword search result more meaningful.

Other hybrid search improvements

Hide the API key in settings and task queue (#4533) @dureuill
Return the keyword search results even in case of a failure of the embedding (#4548) @dureuill
For hybrid or semantic search requests, add a semanticHitCount field at the top of the search response indicating the number of hits originating from the semantic search (#4548) @dureuill

Support negative keyword when searching

Search queries can now contain a negative keyword to exclude terms from the search. Use the - operator in front of a word or a phrase to make sure no document that contains those words are shown in the results.

-escape returns a placeholder search without any document contains the escape word.
-escape room returns only documents containing the room word but not the escape one.
-"on demand" returns a placeholder search but without any document containing the "on demand" phrase.

Done by @Kerollmops in #4535.

Search robustness improvements

Add a search cutoff

To avoid any crash and performance issues, Meilisearch now stops search requests lasting more than 150ms.

If you want to customize this value, you can update the searchCutoffMs settings (value in ms):

curl \
  -X PATCH 'http://localhost:7700/indexes/movies/settings' \
  -H 'Content-Type: application/json' \
  --data-binary '{
    "searchCutoffMs": 50
  }'

The default value of the searchCutoffMs setting is null and corresponds to 150ms.

Done by @irevoire in #4466.

Limit concurrent search requests

Meilisearch now limits the number of search requests waiting to be processed to avoid consuming an unbounded amount of RAM and crashing. So a queue of search requests waiting to be processed has been introduced.

👉 This change does NOT impact the search performance, but only the number of enqueued search requests to prevent from any security issues.

The default number of requests in the queue is 1000.

To change this limit, use the experimental CLI flag:

./meilisearch --experimental-search-queue-size 100

🗣️ This is an experimental flag and we need your help to improve it! Share your thoughts and feedback on this GitHub discussion.

Done by @irevoire in #4536

Other improvements

The sortFacetValuesBy setting now impacts the /facet-search route (#4476) @Kerollmops
Related to Prometheus experimental feature: add status code label to the HTTP request counter (#4373) @rohankmr414
Tokenizer improvements by bumping charabia to 0.8.8 (#4511) @6543
- Support markdown formatted code blocks
- Improve Korean segmentation to correctly use the context ID registered in the dictionary

Fixes 🐞

Related to Prometheus experimental feature: fix the HTTP request duration histogram bucket boundaries to follow the OpenTelemetry spec (#4530) @rohankmr414
Related to Hybrid search experimental feature: fix an error on Windows when generating embeddings (#4549) @dureuill

Misc

Dependencies upgrade
- Bump mio from 0.8.9 to 0.8.11 (#4457)
- Upgrade rustls to 0.21.10 and ring to 0.17 (#4400) @hack3ric
CIs and tests
- Add automation to create openAPI issue (#4520) @curquiza
- Add tests when the field limit is reached (#4463) @irevoire
- Allow running benchmarks without sending results to the dashboard (#4475) @dureuill
- Create automation when creating Milestone to create update-version issue (#4416) @curquiza
- Fix reason param when benches are triggered from a comment (#4483) @dureuill
Documentation
- Fix milli link in contributing doc (#4499) @mohsen-alizadeh
- Fix some typos in comments (#4546) @redistay
- Remove repetitive words in Benchmark docs (#4526) @availhang
- Remove repetitive words in code-base comments (#4491) @shuangcui
- Update sprint_issue.md (#4516) @curquiza
- Add documentation for benchmarks (#4477) @dureuill
- Fix typos (#4542) @brunoocasali
Misc
- Update cargo version (#4474) @curquiza
- Fix milli/Cargo.toml for usage as dependency via git (#4547) @Toromyx

❤️ Thanks again to our external contributors:

Meilisearch: @availhang, @hack3ric, @jakobklemm, @mohsen-alizadeh, @redistay, @rohankmr414, @shuangcui, @Toromyx, @6543.
Charabia: @mosuka, @6543

Contributors

mosuka, Kerollmops, and 13 other contributors

Assets 7

11 Apr 16:58

dureuill

v1.7.6

c8c8c03

v1.7.6 🐇

Fixes 🪲

Update grenad to fix rare DB corruption by @dureuill in #4562

Contributors

dureuill

Assets 8

08 Apr 08:45

irevoire

v1.7.5

217fbc7

v1.7.5 🐇

After a security flaw has been discovered in the implementation of http2, it’s possible for an attacker to slow down your instance; see this link for more information.

This PR updates our web stack to the latest version containing a fix against this attack.

What's Changed

update h2 by @irevoire in #4553

Full Changelog: v1.7.4...v1.7.5

Contributors

irevoire

Assets 8

28 Mar 08:04

dureuill

v1.7.4

0259ad6

v1.7.4 🐇

Fixes 🪲

Fix regression introduced in v1.7.1, where adding sortable and filterable nested attributes would never trigger a reindexing operation by @dureuill in #4539

Thanks @curquiza for the report ❤️

Contributors

curquiza and dureuill

Assets 8

21 Mar 13:49

irevoire

v1.7.3

414fc14

v1.7.3 🐇

This new release doesn’t contain any fixes or features.
We make it only because the release-v1.7.2 had an issue and didn’t contain all the required assets (Linux, macOS, and Windows x86 binaries were missing).

What's Changed

Update version for the next release (v1.7.3) in Cargo.toml by @meili-bot in #4519

Full Changelog: v1.7.2...v1.7.3

Contributors

meili-bot

Assets 8

20 Mar 13:18

irevoire

v1.7.2

f2f1367

v1.7.2 🐇

What's Changed

Adds a timeout to the webhook in #4508

Some assets weren’t built correctly, but the docker image is available.

Full Changelog: v1.7.1...v1.7.2

Assets 5

14 Mar 09:55

Kerollmops

v1.7.1

d2f77e8

v1.7.1 🐇

Indexing Speed Improvement 🏇

Skip reindexing when modifying unknown faceted fields by @Kerollmops in #4479

Contributors

Kerollmops

Assets 8

11 Mar 09:02

curquiza

v1.7.0

ee3076d

v1.7.0 🐇

Meilisearch v1.7.0 focuses on improving v1.6.0 features, indexing speed and hybrid search.

🧰 All official Meilisearch integrations (including SDKs, clients, and other tools) are compatible with this Meilisearch release. Integration deployment happens between 4 to 48 hours after a new version becomes available.

Some SDKs might not include all new features—consult the project repository for detailed information. Is a feature you need missing from your chosen SDK? Create an issue letting us know you need it, or, for open-source karma points, open a PR implementing it (we'll love you for that ❤️).

New features and improvements 🔥

Improved AI-powered search — Experimental

To activate AI-powered search, set vectorStore to true in the /experimental-features route. Consult the Meilisearch documentation for more information.

🗣️ This is an experimental feature and we need your help to improve it! Share your thoughts and feedback on this GitHub discussion.

New OpenAI embedding models

When configuring OpenAI embedders), you can now specify two new models:

text-embedding-3-small with a default dimension of 1536.
text-embedding-3-large with a default dimension of 3072.

These new models are cheaper and improve search result relevancy.

Custom OpenAI model dimensions

You can configure dimensions for sources using the new OpenAI models: text-embedding-3-small and text-embedding-3-large. Dimensions must be bigger than 0 and smaller than the model size:

"embedders": {
  "new_model": {
    "source": "openAi",
    "model": "text-embedding-3-large",
    "dimensions": 512 // must be >0, must be <= 3072 for "text-embedding-3-large"
  },
  "legacy_model": {
    "source": "openAi",
    "model": "text-embedding-ada-002"
  }
}

You cannot customize dimensions for older OpenAI models such as text-embedding-ada-002. Setting dimensions to any value except the default size of these models will result in an error.

Done in #4375 by @Gosti.

GPU support when computing Hugging Face embeddings

Activate CUDA to use Nvidia GPUs when computing Hugging Face embeddings. This can significantly improve embedding generation speeds.

To enable GPU support through CUDA for HuggingFace embedding generation:

Install CUDA dependencies
Clone and compile Meilisearch with the cuda feature: cargo build --release --package meilisearch --features cuda
Launch your freshly compiled Meilisearch binary
Activate vector search
Add a Hugging Face embedder

Done by @dureuill in #4304.

Improved indexing speed and reduced memory crashes

Auto-batch task deletion to reduce indexing time (#4316) @irevoire
Improved indexing speed for vector store (Hybrid search experimental feature indexing time more than 10 times faster) (#4332) @Kerollmops @irevoire
Capped the maximum memory of grenad sorters to reduce memory usage (#4388) @Kerollmops
Added multiple technical and internal indexing improvements (#4350) @ManyTheFish
Enhance facet incremental indexing (#4433) @ManyTheFish
Change the threshold triggering incremental indexing (#4462) @ManyTheFish

Stabilized `showRankingScoreDetails`

The showRankingScoreDetails search parameter, first introduce as an experimental feature in Meilisearch v1.3.0, is now a stable feature.

Use it with the /search endpoint to view detailed scores per ranking rule for each returned document:

curl \
  -X POST 'http://localhost:7700/indexes/movies/search' \
  -H 'Content-Type: application/json' \
  --data-binary '{ "q": "Batman Returns", "showRankingScoreDetails": true }'

When showRankingScoreDetails is set to true, returned documents include a _rankingScoreDetails field:

"_rankingScoreDetails": {
  "words": {
    "order": 0,
    "matchingWords": 1,
    "maxMatchingWords": 1,
    "score": 1.0
  },
  "typo": {
    "order": 1,
    "typoCount": 0,
    "maxTypoCount": 1,
    "score": 1.0
  },
  "proximity": {
    "order": 2,
    "score": 1.0
  },
  "attribute": {
    "order": 3,
    "attributes_ranking_order": 0.8,
    "attributes_query_word_order": 0.6363636363636364,
    "score": 0.7272727272727273
  },
  "exactness": {
    "order": 4,
    "matchType": "noExactMatch",
    "matchingWords": 0,
    "maxMatchingWords": 1,
    "score": 0.3333333333333333
  }
}

Done by @dureuill in #4389.

Improved logging

Done by @irevoire in #4391

Log output modified

Log messages now follow a different pattern:

# new format ✅
2024-02-06T14:54:11Z INFO actix_server::builder: 200: starting 10 workers
# old format ❌
[2024-02-06T14:54:11Z INFO  actix_server::builder] starting 10 workers

⚠️ This change may impact you if you have any automated tasks based on log output.

Log output format — Experimental

You can now configure Meilisearch to output logs in JSON.

Relaunch your instance passing json to the --experimental-logs-mode command-line option:

./meilisearch --experimental-logs-mode json

--experimental-logs-format accepts two values:

human: default human-readable output
json: JSON structured logs

🗣️ This feature is experimental and we need your help to improve it! Share your thoughts and feedback on this GitHub discussion.

⚠️ Experimental features may be incompatible between Meilisearch versions.

New `/logs/stream` and `/logs/stderr` routes — Experimental

Meilisearch v1.7 introduces 2 new experimental API routes: /logs/stream and /logs/stderr.

Use the /experimental-features route to activate both routes during runtime:

curl \
  -X PATCH 'http://localhost:7700/experimental-features/' \
  -H 'Content-Type: application/json'  \
--data-binary '{
    "logsRoute": true
  }'

🗣️ This feature is experimental, and we need your help to improve it! Share your thoughts and feedback on this GitHub discussion.

⚠️ Experimental features may be incompatible between Meilisearch versions.

`/logs/stream`

Use the POST endpoint to output logs in a stream. The following example disables actix logging and keeps all other logs at the DEBUG level:

curl \
  -X POST http://localhost:7700/logs/stream \
  -H 'Content-Type: application/json' \
  --data-binary '{
      "mode": "human",
      "target": "actix=off,debug"
    }'

This endpoint requires two paramaters:

target: defines the log level and on which part of the engine you want to apply it. Must be a string formatted as code_part=log_level. Omit code_part= to set a single log level for the whole strram. Valid values for log level are: trace, debug, info, warn, error, or off
mode: accepts fmt (basic) or profile (verbose trace)

Use the DELETE endpoint of /logs/stream to interrupt a stream:

curl -X DELETE http://localhost:7700/logs/stream

You may only have one listener at a time. Meilisearch log streams are not compatible with xh or httpie.

`/logs/stderr`

Use the POST endpoint to configure the default log output for non-stream logs:

curl \
  -X POST http://localhost:7700/logs/stream \
  -H 'Content-Type: application/json' \
  --data-binary '{
      "target": "debug"
    }'

/logs/stderr accepts one parameter:

target: defines the log level and on which part of the engine you want to apply it. Must be a string formatted as code_part=log_level. Omit code_part= to set a single log level for the whole strram. Valid values for log level are: trace, debug, info, warn, error, or off

Other improvements

Prometheus experimental feature: add job variable to Grafana dashboard (#4330) @capJavert
Multiple language support improvements, including expanded Vietnamese normalization (Ð and Đ into d). Now uses Charabia v0.8.7. (#4365) @agourlay, @choznerol, @ngdbao, @timvisee, @xshadowlegendx, and @ManyTheFish
New experimental feature: change the behavior of Meilisearch in a few ways to run meilisearch in a cluster by externalizing the task queue.
Add the content type to the webhook (#4450) @irevoire

Fixes 🐞

Make update file deletion atomic (#4435) @irevoire
Do not omit vectors when importing a dump (#4446) @dureuill
Put a bound on OpenAI timeout (#4459) @dureuill

Misc

Dependencies upgrade
- Bump rustls-webpki from 0.101.3 to 0.101.7 (#4263)
- Bump h2 from 0.3.20 to 0.3.24 (#4345)
- Update the dependencies (#4332) @Kerollmops
CIs and tests
- Update SDK test dependencies (#4293) @curquiza
- Remove tests on nightly (#4353) @dureuill
- Add subcommand to run ben...

Contributors

agourlay, timvisee, and 12 other contributors

Assets 8

Releases: meilisearch/meilisearch

v1.8.0-rc.2 🪼

What's Changed

Contributors

v1.8.0-rc.1 🪼

What's Changed

Contributors

v1.8.0-rc.0 🪼

New features and improvements 🔥

Hybrid search improvements

⚠️ Breaking changes of hybrid search usage

New embedders: generic REST embedder and Ollama model

Expose the distribution shift setting

Other hybrid search improvements

Support negative keyword when searching

Search robustness improvements

Add a search cutoff

Limit concurrent search requests

Other improvements

Fixes 🐞

Misc

Contributors

v1.7.6 🐇

Fixes 🪲

Contributors

v1.7.5 🐇

What's Changed

Contributors

v1.7.4 🐇

Fixes 🪲

Contributors

v1.7.3 🐇

What's Changed

Contributors

v1.7.2 🐇

What's Changed

v1.7.1 🐇

Indexing Speed Improvement 🏇

Contributors

v1.7.0 🐇

New features and improvements 🔥

Improved AI-powered search — Experimental

New OpenAI embedding models

Custom OpenAI model dimensions

GPU support when computing Hugging Face embeddings

Improved indexing speed and reduced memory crashes

Stabilized showRankingScoreDetails

Improved logging

Log output modified

Log output format — Experimental

New /logs/stream and /logs/stderr routes — Experimental

/logs/stream

/logs/stderr

Other improvements

Fixes 🐞

Misc

Contributors

Expose the `distribution` shift setting

Stabilized `showRankingScoreDetails`

New `/logs/stream` and `/logs/stderr` routes — Experimental

`/logs/stream`

`/logs/stderr`