-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: read_csv #1112
Comments
I would generally prefer to keep narwhals's "just-pass-me-the-df" philosophy. We could infer which namespace to use based on which module is already imported? import pandas as pd
import narwhals as nw
df = nw.read_csv("data.csv") # < uses pandas But then, what to do with the already imported pandas...? If you are importing it, you might as well use it for I/O The only major reason to have an I/O support (that I can think of) would be if you would want to replace an entire "narwhals workflow/script" with one setting. Other way I could think of: nw.set_io_backend("pandas")
df = nw.read_csv("data.csv") |
This sounds interesting - as a library user, how would somebody use it? At the moment, the "give me some-kinda-df, get back some-kinda-df" gives a neat boundary to figure out what the end user is expecting, if I was writing a library with Narwhal's IO, would I do something like this: def get_a_csv_and_do_some_stuff(namespace: str) -> nw.DataFrame:
library = get_library_from_namespace_name(namespace)
return nw.read_csv("data.csv", native_namespace=library).with_columns(z=nw.col("x") * nw.col("y")) I'm thinking, for this to be useful, a library needs a way of figuring out the namespace an end use wants, would Narwhal's do this, or would that be the library maintainers responsibility? |
I would like to work on this issue |
Thanks!
Yup that's right |
I was initially hesitant about adding IO methods, the idea being "users provide their own dataframe, we just deal with how to process it", but we already have from_dict, and ImperialCollegeLondon/pycsvy#83 and Temporian look like good use cases for
read_csv
pandas and Polars each have dozens of
read_csv
methods...so we may need to careful here about which ones we add, and perhaps only start with the most common onesThe api would be something like
We could do:
nw.read_csv
: this is eager-only and always returnsnw.DataFrame
nw.scan_csv
: this is the most generic one, and returnsnw.LazyFrame
if possible (e.g. Polars), elsenw.DataFrame
Alternatives
Keep the status-quo: users are responsible for doing their own IO
The text was updated successfully, but these errors were encountered: