Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

F-REQ: If the pip installer doesn't find Rust, it should install the pure python version of the tokenizer #227

Open
Emasoft opened this issue Dec 10, 2023 · 2 comments

Comments

@Emasoft
Copy link

Emasoft commented Dec 10, 2023

Currently Tiktoken (and with it all the OpenAI related python libraries using it) cannot be installed on systems and platforms that cannot (or are forbidden to) install Rust. This is a big issue, and many times it was rised here.

See:
#36
#57
#94
#134
josephrocca/gpt-2-3-tokenizer#2
pyodide/pyodide#3875
pyodide/pyodide#3663
pyodide/pyodide#3543
emscripten-forge/recipes#660
psymbio/tiktoken_rust_wasm
#94 (comment)

There are already 2 pure python implementations of the tokenizer:

In the educational version:
https://github.com/openai/tiktoken/blob/main/tiktoken/_educational.py
In this fork, courtesy of @kechan:
https://github.com/kechan/tiktoken
As discussed here: #36

Since everything is in place, the solution would be simple: If the pip installer doesn't find Rust, it should install the pure python version of the tokenizer.
Please consider it. Making Rust mandatory to use OpenAI api it's inconvenient and only making the API accessible to less users and companies. It is in the best interest of OpenAI make its tools as portable as possible, and Python it's the perfect language for this. Thanks!

@Emasoft
Copy link
Author

Emasoft commented Dec 29, 2023

Any update on this?
Maybe some devs at OpenAI are underestimating the importance of Tiktoken in the OpenAI ecosystem. Every small tool accessing GPT have to use this. It is a key element that should run on EVERY platform, including in-browsers python interpreters and headless VMs/Dockers with severe restrictions on compiled binaries. Pure Python is perfect for such universal portability, but the mandatory Rust binary in Tiktoken makes this key element to stop being cross platform as a true Python program should be, and to become a troubling stumbling block instead for many devs. Please consider this issue. Thanks. 🙏

@dbold
Copy link

dbold commented Nov 15, 2024

The Rust dependency, which is only needed for this one (minor) library greatly increases the image size. It's almost ridiculous how large it is; quit unusual for a Python library.

This should have a pure Python implementation by default and provide a tiktoken[fast] or tiktoken[rust] extra which introduces the Rust variant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants