You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The answer is no, but as of Python 3.7 you can use the simple dict from the Python standard library with just keys (and values as None) for the same purpose.
Ordered dictionaries are just like regular dictionaries but have some extra capabilities relating to ordering operations. They have become less important now that the built-in dict class gained the ability to remember insertion order (this new behavior became guaranteed in Python 3.7).
B.
I was looking up tokenization [list], i.e. BPE etc. for LLMs, and dropped in on your repo my accident. Such tokenizers were made first for English, mostly made optimal for it, and Icelandic and German an afterthought if that, Chinese has at least been worked on. I agree with Karpathy, I want tokenizers gone, at least in the long run, they are a solution, but also a problem for current LLMs. Do you do any work on such/LLMs?
The text was updated successfully, but these errors were encountered:
A.
@vthorsteinsson I see you added OrderedDict (and OrderedSet) in late 2019, when 3.6 was around without dict then not ordered by default.
If you only support 3.7 and higher, then it seems you can simplify the code, not sure if it will be faster, hopefully:
https://stackoverflow.com/questions/1653970/does-python-have-an-ordered-set
946ffc7
https://docs.python.org/3/library/collections.html
https://deepsource.com/blog/python-performance-three-easy-tips
https://stackoverflow.com/questions/18422995/why-is-ordereddict-10x-slower-than-dict-and-list
B.
I was looking up tokenization [list], i.e. BPE etc. for LLMs, and dropped in on your repo my accident. Such tokenizers were made first for English, mostly made optimal for it, and Icelandic and German an afterthought if that, Chinese has at least been worked on. I agree with Karpathy, I want tokenizers gone, at least in the long run, they are a solution, but also a problem for current LLMs. Do you do any work on such/LLMs?
The text was updated successfully, but these errors were encountered: