OrderedDict not needed, and question and comment #53

PallHaraldsson · 2024-12-13T13:57:50Z

A.
@vthorsteinsson I see you added OrderedDict (and OrderedSet) in late 2019, when 3.6 was around without dict then not ordered by default.

If you only support 3.7 and higher, then it seems you can simplify the code, not sure if it will be faster, hopefully:

https://stackoverflow.com/questions/1653970/does-python-have-an-ordered-set

The answer is no, but as of Python 3.7 you can use the simple dict from the Python standard library with just keys (and values as None) for the same purpose.

946ffc7

https://docs.python.org/3/library/collections.html

Ordered dictionaries are just like regular dictionaries but have some extra capabilities relating to ordering operations. They have become less important now that the built-in dict class gained the ability to remember insertion order (this new behavior became guaranteed in Python 3.7).

[make sure to read the rest there.]

https://deepsource.com/blog/python-performance-three-easy-tips

When initializing a new dictionary, using {} is much more performant than calling the dict built-in.

https://stackoverflow.com/questions/18422995/why-is-ordereddict-10x-slower-than-dict-and-list

B.
I was looking up tokenization [list], i.e. BPE etc. for LLMs, and dropped in on your repo my accident. Such tokenizers were made first for English, mostly made optimal for it, and Icelandic and German an afterthought if that, Chinese has at least been worked on. I agree with Karpathy, I want tokenizers gone, at least in the long run, they are a solution, but also a problem for current LLMs. Do you do any work on such/LLMs?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OrderedDict not needed, and question and comment #53

OrderedDict not needed, and question and comment #53

PallHaraldsson commented Dec 13, 2024 •

edited

Loading

OrderedDict not needed, and question and comment #53

OrderedDict not needed, and question and comment #53

Comments

PallHaraldsson commented Dec 13, 2024 • edited Loading

PallHaraldsson commented Dec 13, 2024 •

edited

Loading