Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Will there be support for deno deploy + esm.sh #22

Open
deepak-coding-art opened this issue Mar 17, 2023 · 8 comments
Open

Will there be support for deno deploy + esm.sh #22

deepak-coding-art opened this issue Mar 17, 2023 · 8 comments

Comments

@deepak-coding-art
Copy link

deepak-coding-art commented Mar 17, 2023

I tried to load the npm package with esm.sh but it gives an error:

 Import 'https://esm.sh/v111/@dqbd/tiktoken@1.0.2/deno/tiktoken_bg.wasm.js' failed: 522 <unknown status code>
    at https://esm.sh/v111/@dqbd/tiktoken@1.0.2/deno/tiktoken.js:2:419

here is the import statement

import {
  encoding_for_model,
  get_encoding,
} from "https://esm.sh/@dqbd/tiktoken@1.0.2";

I want to use it in supabase edge functions they use deno deploy and it does not support npm packages

Another error:

error: invalid utf-8 sequence of 1 bytes from index 9
    at https://esm.sh/v111/@dqbd/tiktoken@1.0.2/deno/tiktoken.js:2:419
@dqbd
Copy link
Owner

dqbd commented Mar 19, 2023

Hello, currently Deno / Supabase Edge Functions are not supported, but you might get far enough with NPM compat layer.

Tested on Deno 1.29.1

import tiktoken from "npm:@dqbd/tiktoken";

console.log(tiktoken.get_encoding("gpt2").encode("Hello World!"));

@dqbd
Copy link
Owner

dqbd commented Mar 19, 2023

However, Deno Deploy will most likely not work yet, as NPM specifiers are not supported yet.

Yes, Deno Deploy is an excellent choice to host your Deno code, especially if you want it to be executed close to your users on the edge. Deploying to our V8 isolate cloud takes seconds and can be accessed around the world within seconds. Also, npm specifiers aren't currently supported in Deploy, but will be soon.

https://deno.com/blog/npm-and-deno-anywhere

@dqbd
Copy link
Owner

dqbd commented Mar 19, 2023

Digging a little bit deeper, you can use @dqbd/tiktoken/lite via esm.sh with the following code:

import { init, Tiktoken } from "https://esm.sh/@dqbd/tiktoken/lite/init";
import cl100k_base from "https://esm.sh/@dqbd/tiktoken/encoders/cl100k_base.json" assert { type: "json" };

await init(async (imports) => {
  const req = await fetch(
    "https://esm.sh/@dqbd/tiktoken/lite/tiktoken_bg.wasm"
  );
  return WebAssembly.instantiate(await req.arrayBuffer(), imports);
});

const enc = new Tiktoken(
  cl100k_base.bpe_ranks,
  cl100k_base.special_tokens,
  cl100k_base.pat_str
);
const tokens = enc.encode("Hello world");
enc.free();

It seems like there is a caching bug in esm.sh, which serves the lite WASM file even if full version is requested, see esm-dev/esm.sh#552

@dqbd dqbd changed the title Will there be support for deno deploy Will there be support for deno deploy + esm.sh Mar 19, 2023
@dqbd dqbd pinned this issue Mar 19, 2023
@arthur5005
Copy link

Digging a little bit deeper, you can use @dqbd/tiktoken/lite via esm.sh with the following code:

import { init, Tiktoken } from "https://esm.sh/@dqbd/tiktoken/lite/init";
import cl100k_base from "https://esm.sh/@dqbd/tiktoken/encoders/cl100k_base.json" assert { type: "json" };

await init(async (imports) => {
  const req = await fetch(
    "https://esm.sh/@dqbd/tiktoken/lite/tiktoken_bg.wasm"
  );
  return WebAssembly.instantiate(await req.arrayBuffer(), imports);
});

const enc = new Tiktoken(
  cl100k_base.bpe_ranks,
  cl100k_base.special_tokens,
  cl100k_base.pat_str
);
const tokens = enc.encode("Hello world");
enc.free();

It seems like there is a caching bug in esm.sh, which serves the lite WASM file even if full version is requested, see esm-dev/esm.sh#552

I gave this a go David @dqbd, this is failing for me randomly.

About 50% it initializes with success, and the other 50% it fails. My uneducated guess is that this is because they are running in limited resource serverless functions on Supabase, but I have no idea really.

Error: `unwrap_throw` failed
    at Y (https://esm.sh/v119/@dqbd/tiktoken@1.0.7/deno/lite/init.js:2:5691)
    at <anonymous> (wasm://wasm/00307922:1:458526)
    at <anonymous> (wasm://wasm/00307922:1:427757)
    at <anonymous> (wasm://wasm/00307922:1:14850)
    at new y (https://esm.sh/v119/@dqbd/tiktoken@1.0.7/deno/lite/init.js:2:2567)
    at getEncoder (file:///src/index.ts:16:15)
    at async file:///src/index.ts:48:21
    at async Server.<anonymous> (file:///_shared/serveClient.ts:40:20)
    at async Server.#respond (https://deno.land/std@0.168.0/http/server.ts:301:18)
    ```

@arthur5005
Copy link

@dqbd ... actually having specified a version in this fetch url, seems to have stabilized it...

export async function getEncoder() {
  await init(async (imports) => {
    const req = await fetch(
      "https://esm.sh/@dqbd/tiktoken@1.0.7/lite/tiktoken_bg.wasm"

@gregnr
Copy link

gregnr commented May 7, 2023

@arthur5005 sounds like you've solved this. FWIW, we also use the fetch() approach at Supabase to load JSON within edge functions (Deno):
https://github.com/supabase/supabase/blob/d730ea9ca066b3a2ebe232195149725257c3eb25/supabase/functions/common/tokenizer.ts#L4

Worth noting JSON imports should be fixed now as of last week, so fetch() may no longer be required if you're on the latest edge-runtime:
supabase/edge-runtime#74

@hourianto
Copy link

hourianto commented Sep 2, 2024

Can the Deno entry in Compatibility section in https://github.com/dqbd/tiktoken?tab=readme-ov-file#compatibility please be fixed? This linked issue is only regarding Deno Deploy, normal WASM tiktoken works perfectly fine in Deno itself (when using it as a runtime, similar to Node or Bun).

@niieani
Copy link

niieani commented Oct 2, 2024

hey folks, if you're still looking for a tokenizer that works in deno / via esm.sh, you can give gpt-tokenizer a go:

import tokenizer from "https://esm.sh/gpt-tokenizer@2.4.0/model/gpt-4o"

tokenizer.encode('hello, this is a test') // returns `[24912, 11, 495, 382, 261, 1746]`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants
@niieani @arthur5005 @dqbd @gregnr @deepak-coding-art @hourianto and others