Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
chore: bump datasets from 2.16.1 to 2.19.1 in /presets/tuning/tfs (#380)
Bumps [datasets](https://github.com/huggingface/datasets) from 2.16.1 to 2.19.1. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/huggingface/datasets/releases">datasets's releases</a>.</em></p> <blockquote> <h2>2.19.1</h2> <h2>Bug fixes</h2> <ul> <li>Fix download for dict of dicts of URLs by <a href="https://github.com/albertvillanova"><code>@albertvillanova</code></a> in <a href="https://redirect.github.com/huggingface/datasets/pull/6871">huggingface/datasets#6871</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/huggingface/datasets/compare/2.19.0...2.19.1">https://github.com/huggingface/datasets/compare/2.19.0...2.19.1</a></p> <h2>2.19.0</h2> <h2>Dataset Features</h2> <ul> <li>Add Polars compatibility by <a href="https://github.com/psmyth94"><code>@psmyth94</code></a> in <a href="https://redirect.github.com/huggingface/datasets/pull/6531">huggingface/datasets#6531</a> <ul> <li>convert to a Polars dataframe using <code>.to_polars()</code>; <pre lang="python"><code>import polars as pl from datasets import load_dataset ds = load_dataset("DIBT/10k_prompts_ranked", split="train") ds.to_polars() \ .groupby("topic") \ .agg(pl.len(), pl.first()) \ .sort("len", descending=True) </code></pre> </li> <li>Use Polars formatting to return Polars objects when accessing a dataset: <pre lang="python"><code>ds = ds.with_format("polars") ds[:10].group_by("kind").len() </code></pre> </li> </ul> </li> <li>Add <code>fsspec</code> support for <code>to_json</code>, <code>to_csv</code>, and <code>to_parquet</code> by <a href="https://github.com/alvarobartt"><code>@alvarobartt</code></a> in <a href="https://redirect.github.com/huggingface/datasets/pull/6096">huggingface/datasets#6096</a> <ul> <li>Save on HF in any file format: <pre lang="python"><code>ds.to_json("hf://datasets/username/my_json_dataset/data.jsonl") ds.to_csv("hf://datasets/username/my_csv_dataset/data.csv") ds.to_parquet("hf://datasets/username/my_parquet_dataset/data.parquet") </code></pre> </li> </ul> </li> <li>Add <code>mode</code> parameter to <code>Image</code> feature by <a href="https://github.com/mariosasko"><code>@mariosasko</code></a> in <a href="https://redirect.github.com/huggingface/datasets/pull/6735">huggingface/datasets#6735</a> <ul> <li>Set images to be read in a certain mode like "RGB" <pre lang="python"><code>dataset = dataset.cast_column("image", Image(mode="RGB")) </code></pre> </li> </ul> </li> <li>Add CLI function to convert script-dataset to Parquet by <a href="https://github.com/albertvillanova"><code>@albertvillanova</code></a> in <a href="https://redirect.github.com/huggingface/datasets/pull/6795">huggingface/datasets#6795</a> <ul> <li>run command to open a PR in script-based dataset to convert it to Parquet: <pre><code>datasets-cli convert_to_parquet <dataset_id> </code></pre> </li> </ul> </li> <li>Add Dataset.take and Dataset.skip by <a href="https://github.com/lhoestq"><code>@lhoestq</code></a> in <a href="https://redirect.github.com/huggingface/datasets/pull/6813">huggingface/datasets#6813</a> <ul> <li>same as IterableDataset.take and IterableDataset.skip <pre lang="python"><code>ds = ds.take(10) # take only the first 10 examples </code></pre> </li> </ul> </li> </ul> <h2>General improvements and bug fixes</h2> <ul> <li>Bump huggingface-hub lower version to 0.21.2 by <a href="https://github.com/albertvillanova"><code>@albertvillanova</code></a> in <a href="https://redirect.github.com/huggingface/datasets/pull/6713">huggingface/datasets#6713</a></li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/huggingface/datasets/commit/bb2664cf540d5ce4b066365e7c8b26e7f1ca4743"><code>bb2664c</code></a> Release 2.19.1 (<a href="https://redirect.github.com/huggingface/datasets/issues/6872">#6872</a>)</li> <li><a href="https://github.com/huggingface/datasets/commit/a5a76a410a5b6407f43479357eba2b1c370bb9c1"><code>a5a76a4</code></a> Fix download for dict of dicts of URLs (<a href="https://redirect.github.com/huggingface/datasets/issues/6871">#6871</a>)</li> <li><a href="https://github.com/huggingface/datasets/commit/0d3c7462bc67407c42d3ad102b7f9d5914219d9d"><code>0d3c746</code></a> Release: 2.19.0 (<a href="https://redirect.github.com/huggingface/datasets/issues/6825">#6825</a>)</li> <li><a href="https://github.com/huggingface/datasets/commit/0bc709af303c8dc64c973a17016bd5aa5db2f3d5"><code>0bc709a</code></a> Fix parquet export infos (<a href="https://redirect.github.com/huggingface/datasets/issues/6822">#6822</a>)</li> <li><a href="https://github.com/huggingface/datasets/commit/2a14271263da2fda9f966af41c7bd885bfa42256"><code>2a14271</code></a> Make convert_to_parquet CLI command create script branch (<a href="https://redirect.github.com/huggingface/datasets/issues/6809">#6809</a>)</li> <li><a href="https://github.com/huggingface/datasets/commit/5eb93f61f9f6e7fefba5d800defe21e50ddf8c58"><code>5eb93f6</code></a> Support indexable objects in <code>Dataset.__getitem__</code> (<a href="https://redirect.github.com/huggingface/datasets/issues/6817">#6817</a>)</li> <li><a href="https://github.com/huggingface/datasets/commit/8983a3b4dec315bf25331a6065cb74de9017f0e8"><code>8983a3b</code></a> add allow_primitive_to_str and allow_decimal_to_str instead of allow_number_t...</li> <li><a href="https://github.com/huggingface/datasets/commit/a188022dc43a76a119d90c03832d51d6e4a94d91"><code>a188022</code></a> Extract data on the fly in packaged builders (<a href="https://redirect.github.com/huggingface/datasets/issues/6784">#6784</a>)</li> <li><a href="https://github.com/huggingface/datasets/commit/ed8860faef3e751f3b77c08e09ce723a74d2c2e5"><code>ed8860f</code></a> Remove <code>os.path.relpath</code> in <code>resolve_patterns</code> (<a href="https://redirect.github.com/huggingface/datasets/issues/6815">#6815</a>)</li> <li><a href="https://github.com/huggingface/datasets/commit/55eb1d9a34a91dbf2418166f9f1d92f7181e778b"><code>55eb1d9</code></a> Add Dataset.take and Dataset.skip (<a href="https://redirect.github.com/huggingface/datasets/issues/6813">#6813</a>)</li> <li>Additional commits viewable in <a href="https://github.com/huggingface/datasets/compare/2.16.1...2.19.1">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Ishaan Sehgal <ishaanforthewin@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Ishaan Sehgal <ishaanforthewin@gmail.com>
- Loading branch information