Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in loading from Huggingface #19

Open
BenHamm opened this issue Jul 25, 2024 · 4 comments
Open

Error in loading from Huggingface #19

BenHamm opened this issue Jul 25, 2024 · 4 comments
Assignees

Comments

@BenHamm
Copy link

BenHamm commented Jul 25, 2024

When I try to run the following code in colab:

from datasets import load_dataset
dataset = load_dataset("xinrongzhang2022/InfiniteBench")

I get the following error:

DatasetGenerationCastError: An error occurred while generating the dataset

All the data files must have the same columns, but at some point there are 1 missing columns ({'options'})

This happened while the json dataset builder was generating data using

hf://datasets/xinrongzhang2022/InfiniteBench/kv_retrieval.jsonl (at revision 2c3c9fe62808833ab783026bbf8e7a47539a28c6)

Please either edit the data files to have matching columns, or separate them into different configurations (see docs at https://hf.co/docs/hub/datasets-manual-configuration#multiple-configurations)

@tuantuanzhang
Copy link
Collaborator

We have re-uploaded the files and solved this problem. Please kindly use load_dataset("xinrongzhang2022/InfiniteBench") now!

@HivaMohammadzadeh1
Copy link

Running into this issue as well

@tuantuanzhang
Copy link
Collaborator

please kindly set features when loading datasets

import datasets
from dataset import Value, Sequence
ft = Features({"id": Value("int64"), "context": Value("string"), "input": Value("string"), "answer": Sequence(Value("string")), "options": Sequence(Value("string"))})
dataset = load_dataset("xinrongzhang2022/InfiniteBench", features=ft)

@tuantuanzhang
Copy link
Collaborator

from datasets import load_dataset, Features, Value, Sequence
ft = Features({
"id": Value("int64"),
"context": Value("string"),
"input": Value("string"),
"answer": Sequence(Value("string")),
"options": Sequence(Value("string"))
})
dataset = load_dataset("xinrongzhang2022/InfiniteBench", features=ft)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants