-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mldata 1.0 #2
base: master
Are you sure you want to change the base?
mldata 1.0 #2
Conversation
The storage requirement will instead be controlled by the driver (hdf5).
…unsupervised learning.
… datasets folders.
__iter__() and __getitem__ are false methods. They actually are class methods, which makes their definition on the fly quite tricky. The old solution would reassing the correct definition in the __init__ method. However, since it is a class method, it also affect other object (that might have a need of the other iterator or getter). Thus, the best way to make it work is the naive way that checks each time is dataset.target is None.
@@ -1,6 +1,6 @@ | |||
language: python | |||
python: | |||
- "3.3" | |||
- "3.4" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Support for Python3.4 is not release yet for Travis. See travis-ci/travis-ci#1989
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the time you'll get around to fix CI, support should be available.
Folder I know we discussed it, but are we planning on supporting both Python2.7.+ and Python3.+ ? |
|
||
def setup_module(): | ||
# save current config file | ||
os.rename(cfg.CONFIGFILE, cfg.CONFIGFILE +".bak") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line failed when $HOME/.mldataConfig
does not already exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, will be fixed.
Why is there a capitalized D in |
There is no folder |
Oups, you're right about Regarding the uppercase D, I think it should be put in lowercase because it is refering to the file you are testing: that is |
def _create_default_config(): | ||
""" Build and save a default config file for MLData. | ||
|
||
The default config is saved as ``.MLDataConfig`` in the ``$HOME`` folder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: you mean .mldataConfig
path = None | ||
if cfg.dataset_exists(dset_name): | ||
path = cfg.get_dataset_path(dset_name) | ||
return _load_from_file(dset_name + '_' + version_name, path, lazy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the dataset is not found in the config file, _load_from_file
will fail on the os.join
with None. Maybe we should display a better error message.
We can now either gave the ``splits`` be given in the form (nb_train, ..., nb_test) or (nb_train, ..., nb_train + ... + nb_valid)
Creates a tuple, each containing a generator over a part of the dataset, following the given splits.
Promote the use of itertools.cycle(iter) instead.
dataset = LazyDataset(lazy_functions) | ||
datasetFile = h5py.File(file_to_load, mode='r', driver='core') | ||
|
||
data = datasetFile['/']["data"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If lazy==False
do we want data
to be a ndarray? Right now it is a HDF5 dataset
but still supports iteration and indexing as numpy.
Adds :