-
-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate additional Search Engines. #178
Comments
I have been having a look at FlexSearch especially due to its faster speeds and lower bundle size, but I am still working out how the index is constructed and whether Zola could support it. If that doesn't work, I am happy to try and fix tinysearch, but I have limited time currently so it might take me a while either way. |
flexsearch is a smaller script, 5.87 kB instead of 18.05 kB for elasticlunr. but elasticlunr loads faster, and uses less memory, which might be important on mobile devices. The author of uFuzzy setup the benchmarks, and he admits he maybe does not have other search engines tuned perfectly, but that he made a best effort when documentation was available, you can see the full table at the bottom of this github page: https://github.com/leeoniya/uFuzzy On the flipside, flexsearch performs a search in 3ms, and elasticlunr takes 14ms. but I think this difference is inconsequential compared to the enormous difference in load speed and memory usage. elasticlunr uses 89 MB and takes 978ms to load |
Yeah, it is probably a bit pointless going chasing after those extra milliseconds especially on a static site and even more so if it increases load times. |
I have been on the lookout for something that is better across the board...(mostly because elasticlunr is no longer maintained) pagefind is interesting because it chunks the index, so if you had a really enormous site with lots of articles, it would not download the entire index, only the chunks relevant to your search.... but I think until a site is pretty big that elasticlunr would actually perform better than pagefind, but is just my hunch without having actually tried pagefind. |
All of that said, I like the idea of Abridge being flexible, so if anyone wants to submit a pull request to support a given search engine, I would likely add it as long as it didn't cause an unavoidable problem. |
I was looking at search engine that supports fuzzy matching (elasticlunr, tinysearcch and stork all don't seem to), to help provide a better user experience. That being said, pagefind looks to support this (and also supports indexing other files, like PDFs - which I have also been looking for) and I might have a go at seeing what I can do to support it. I'm not sure how well I'll go with chunking and manipulating the index - but I'll give it a try :) |
Just checking what you think of this flow for the search process. As I think it would have to pass through a node script on the build step which I guess is already done on the process for elasticlunr. I don't know quite yet how to support multilingual, but at least pagefind seems to support it out of the box. |
Yes that is exactly what I was thinking, I believe if we configured Zola to output a json index that we should be able to send it to pagefind through their node api using this method: https://pagefind.app/docs/node-api/#indexaddcustomrecord If the json format that zola outputs is missing some things we need then maybe we can look at the recently added fuse json output and create a pull request to zola adding an additional json index format that is compatible with pagefind. Once the pagefind index is built I am guessing we would use this to save it: |
It is pretty simple to both construct an index, however it does seem to have a heavy reliance on I wrote this script based off this information: import * as pagefind from 'pagefind';
async function createIndex() {
// Create a new Pagefind index
const { index } = await pagefind.createIndex({
forceLanguage: 'en', // Force the language to English
});
// Define your data
const data = [{
"title": "Abridge Zola Theme",
"url": "https://abridge.netlify.app/overview-abridge/",
"meta": "Abridge is a fast and lightweight Zola theme using semantic html, only ~6kb css before svg icons and syntax highlighting, no mandatory JS, and perfect…",
"body": "Abridge is a fast and lightweight Zola theme using semantic html..."
}, {
"title": "Code Blocks and Themes",
"url": "https://abridge.netlify.app/overview-code-blocks/",
"meta": "This article shows various Code Blocks allowing to easily compare sublime themes.\n",
"body": "This article shows various Code Blocks allowing to easily compare sublime themes..."
}, {
"title": "Markdown and Style Guide",
"url": "https://abridge.netlify.app/overview-markdown-and-style/",
"meta": "This article offers a sample of basic Markdown syntax that can be used in Zola content files, also it shows if basic HTML elements are decorated with …",
"body": "This article offers a sample of basic Markdown syntax that can be used in Zola content files, also it shows if basic HTML elements are decorated with CSS in a Zola theme..."
}, {
"title": "Image Shortcodes",
"url": "https://abridge.netlify.app/overview-images/",
"meta": "This post covers the imghover and img shortcodes. Images can also be embedded directly using markdown ![Ferris](ferris.svg), but it is better to use a…",
"body": "This post covers the imghover and img shortcodes. Images can also be embedded directly using markdown..."
}, {
"title": "Rich Content",
"url": "https://abridge.netlify.app/overview-rich-content/",
"meta": "Several custom shortcodes are included to augment CommonMark (courtesy of d3c3nt theme), in addition to those already provided by Zola. video, image, …",
"body": "Several custom shortcodes are included to augment CommonMark (courtesy of d3c3nt theme), in addition to those already provided by Zola. video, image, gif,..."
}, {
"title": "Embedded Youtube Videos",
"url": "https://abridge.netlify.app/overview-embedded-youtube/",
"meta": "Zola has many shortcodes, and new are easily added, this example shows youtube.\n",
"body": "Zola has many shortcodes, and new are easily added, this example shows youtube.\nYoutube\nwith yt(id="the_id_here")\n\nid: the video id (mandatory)\nplaylist: the playlist id (optional)\nclass: a class to add to the <div> surrounding the iframe (optional)\nautoplay: when set to "true", the video autoplays on load (optional)\ntitle - set alt title for the iframe (optional, defaults to "Youtube")\ncookie - set to "true" if you want tracking cookies, otherwise it defaults to false.\n\n\n\t\n\n"
}, {
"title": "Embedded Vimeo Videos",
"url": "https://abridge.netlify.app/overview-embedded-vimeo/",
"meta": "Zola has many shortcodes, and new are easily added, this example shows vimeo.\n",
"body": "Zola has many shortcodes, and new are easily added, this example shows vimeo.\nVimeo\nwith vm(id="id_here")\n\nid: the video id (mandatory)\nclass: a class to add to the <div> surrounding the iframe (optional)\nautoplay: when set to "true", the video autoplays on load (optional)\nloop: when set to "true", the video plays on a loop (optional)\nnoautopause: when set to "true", the video will not autopause (optional)\ntitle - set alt title for the iframe (optional, defaults to "Vimeo")\ncookie - set to "true" if you want tracking cookies, otherwise it defaults to false.\n\n\n\t\n\n"
}, {
"title": "Mathematical Notations",
"url": "https://abridge.netlify.app/overview-math/",
"meta": "You can use KaTeX to render mathematical notations.\nYou can enable the $\\KaTeX$ support globally, per-section or per-page basis.\n",
"body": "You can use KaTeX to render mathematical notations.\nYou can enable the $\\KaTeX$ support globally, per-section or per-page basis.\nEnable..."
}];
// Add each record to the index
for (const record of data) {
await index.addCustomRecord({
url: record.url,
content: record.body,
language: 'en',
meta: {
title: record.title,
description: record.meta,
}
});
}
// Write the index files to disk
await index.writeFiles({
outputPath: 'public/pagefind'
});
console.log('Index created successfully!');
}
createIndex().catch(console.error); Then search is even easier (it handles the chunking for you!), it could easily be put in: async function search() {
const pagefind = await import("./public/pagefind/pagefind.js");
pagefind.init();
const search = await pagefind.search("zola");
const oneResult = await search.results[0].data();
console.log(oneResult);
}
search(); The index is spits out is pretty small only I am just a bit confused, as this would require users to rerun the index build process on every |
That is what I was thinking, yet pagefind will still pull ahead for sites with a ton of content, because as you add content the elasticlunr index gets bigger and bigger. yes, every time you do a zola build, it generates a new index, that much is true of elasticlunr, tinysearch, stork, etc. |
Does And
Cool! I am just not quite sure how to hook into that. |
I would configure your https://www.getzola.org/documentation/content/search/#fuse # config.toml
[search]
index_format = "fuse_json" after you do that just issue a Then for your script that you wrote, you could wrap it up in a function within EDIT: Currently you have some static data within your function to feed into pagefind for the index for testing purposes If you look in EDIT: elasticlunr does NOT load all those js files, those files are for other languages, so they only get used on pages that use other languages. (in google chrome you can press ctrl+shift+i and load the abridge demo and search for something in the searchbox and see exactly which files get loaded for elasticlunr) EDIT: ah yes or in firefox, which I prefer: |
I was just wondering if you think it is better for the users to install |
because node is required to build the index, we might as well just have it as a dependency that gets installed as a node package.... Any javascript for the client side search can of coarse go into static/js but anything related to building the index can just be installed as a node package. |
Now that I have merged your pull request for pagefind, I am thinking I will go ahead and close this for now. Thanks again for your work on implementing pagefind. |
No problem! |
...and Jekyll too... |
uploaded a pagefind demo: https://abridge-pagefind.pages.dev/ |
@Hysterelius where does this pagefind.js file come from? https://github.com/Jieiku/abridge/blob/master/static/js/pagefind.js In case I ever need to update it? I poked around on the pagefind repo but have not found it yet. If you had to build the file let me know what is required to do that. I noticed that there are 51 global variables: http://yellowlab.tools:8282/result/gys1dxx7b8/rule/globalVariables I plan to open an issue with pagefind to see if that could be fixed, but would like to have an understanding of where this file comes from before I do that. Edit: It looks like the file is created from this line: abridge/static/js/pagefind.index.cjs Line 67 in 41ea5d3
Edit2: Since we are creating that file, maybe I can wrap it in an anonymous function, will see what I can do. Edit3: figured it out: 39606e8 http://yellowlab.tools:8282/result/gys4idfdw3/rule/globalVariables |
I was going to say that if we remove the file from the bundle, the search functions wouldn't be accessible to the |
yep, I inlined the file. This demo gets built at every commit to abridge: https://abridge-pagefind.pages.dev/ |
elasticlunr was the first implemented in abridge because index generation was directly supported by Zola.
elasticlunr also supports CJK, stemmers, and stop words, so it is a good solution for a wide range of people.
I then implemented both tinysearch and stork, the demos are here:
https://jieiku.github.io/abridge-tinysearch/
https://jieiku.github.io/abridge-stork/
Those demos are static builds from an older version of abridge, I lost interest in stork because it actually used more bandwidth than elasticlunr. I am however interested in getting tinysearch working again.
Zola now supports building a json based index:
getzola/zola#2507
https://www.getzola.org/documentation/content/search/#fuse
I think I may have looked at flexsearch but I cannot remember all the details, it has been a while, another one I am interested in is pagefind: CloudCannon/pagefind#277
I opened a new issue at tinysearch, I don't have time to work on it at the moment: tinysearch/tinysearch#178
https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uFuzzy,FlexSearch,Elasticlunr,Fuse&search=super%20ma
After looking here it looks like flexsearch supports stemmers and CJK nextapps-de/flexsearch#51
I am not sure how automatic that support is, but it looks like flexsearch is worth looking into.
The text was updated successfully, but these errors were encountered: