You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
/opt/conda/lib/python3.8/site-packages/octis/dataset/dataset.py:330: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
final_df = df[df[1] == 'train'].append(df[df[1] == 'val'])
/opt/conda/lib/python3.8/site-packages/octis/dataset/dataset.py:331: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
final_df = final_df.append(df[df[1] == 'test'])
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/opt/conda/lib/python3.8/site-packages/octis/dataset/dataset.py in load_custom_dataset_from_folder(self, path, multilabel)
335
--> 336 self.__corpus = [d.split() for d in final_df[0].tolist()]
337 if len(final_df.keys()) > 2:
/opt/conda/lib/python3.8/site-packages/octis/dataset/dataset.py in <listcomp>(.0)
335
--> 336 self.__corpus = [d.split() for d in final_df[0].tolist()]
337 if len(final_df.keys()) > 2:
AttributeError: 'int' object has no attribute 'split'
During handling of the above exception, another exception occurred:
Exception Traceback (most recent call last)
<ipython-input-16-28e6bd2fc3cd> in <module>
1 dataset = Dataset()
----> 2 dataset.load_custom_dataset_from_folder("/mnt/mydata/notebooks")
/opt/conda/lib/python3.8/site-packages/octis/dataset/dataset.py in load_custom_dataset_from_folder(self, path, multilabel)
356 self._load_document_indexes(self.dataset_path + "/indexes.txt")
357 except:
--> 358 raise Exception("error in loading the dataset:" + self.dataset_path)
359
360 def fetch_dataset(self, dataset_name, data_home=None, download_if_missing=True):
Exception: error in loading the dataset:/mnt/mydata/notebooks
The text was updated successfully, but these errors were encountered:
in [Load a Custom Dataset] section, it is mentioned that our data set should have a vocabulary file while my dataset is just a csv file I am wondering how can we generate this vocab file. does this pipeline generate it automatically?
Description
Hello,
I am having trouble loading my custom dataset. I followed the guide in the main README and am getting the below errors.
What I Did
from octis.dataset.dataset import Dataset
import pandas as pd
df = pd.read_csv("/mnt/mydata/notebooks/data.csv")
df.to_csv('corpus.tsv', sep="\t", header= False, columns=['documents'])
dataset.load_custom_dataset_from_folder("/mnt/mydata/notebooks")
The text was updated successfully, but these errors were encountered: