You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.
When using the ImageDataGenerator.flow* methods to yield image batches the seed parameter modfies Numpy's global random number generator. Similar behaviour in has been identified in other parts of the Keras library e.g. this issue.
Any calls to numpy.random.* after a batch is yielded (and the global seed is set) return the same numbers. In my case I wanted to select and view random images from a batch and was seeing that the same images were constantly being selected. I include an example below in which I am using the flow_from_data_frame method to load 8-bit RGB images from my local disk.
Example
I submitted a question to Data Science Stack Exchange after seeing this behavior in which I include a worked example. The code is below but see SE for more information.
# Step 1# Set up image data flowimg_generator=ImageDataGenerator(rescale=1/255.)
train_gen=img_generator.flow_from_dataframe(
img_df, # filnames are read from column "filename"img_dir, # local directory containing image filesy_col=None,
target_size=(512,512),
class_mode=None,
shuffle=False, # I'm using separate mask images so no shuffling herebatch_size=16,
seed=42# behavior occurs when using seed
)
# Step 2# Generate and print 8 random indices# No batch of images retrieved yet; no use of seedprint(np.random.randint(16, size=8))
>>> [ 71513363214] # always random# Step 3# Now get a batch of images; seed is usedbatch=next(train_gen)
# Step 4# Generate and print 8 random indicesprint(np.random.randint(16, size=8))
>>> [ 6138111319] # always the same result
Proposed Solution
It appears that the culprit is the base Iterator class, specifically the _flow_indexmethod. Similar to the approach taken in the Keras repo (PR 12259) I would suggest implementing a local RNG.
System information
OS: macOS BigSur 11.2
Python: 3.8.2
Environment checklist
Check that you are up-to-date with the master branch of keras-preprocessing. You can update with: pip install git+git://github.com/keras-team/keras-preprocessing.git --upgrade --no-deps
Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short).
The text was updated successfully, but these errors were encountered:
Describe the problem
When using the
ImageDataGenerator.flow*
methods to yield image batches the seed parameter modfies Numpy's global random number generator. Similar behaviour in has been identified in other parts of the Keras library e.g. this issue.Any calls to
numpy.random.*
after a batch is yielded (and the global seed is set) return the same numbers. In my case I wanted to select and view random images from a batch and was seeing that the same images were constantly being selected. I include an example below in which I am using theflow_from_data_frame
method to load 8-bit RGB images from my local disk.Example
I submitted a question to Data Science Stack Exchange after seeing this behavior in which I include a worked example. The code is below but see SE for more information.
Proposed Solution
It appears that the culprit is the base
Iterator
class, specifically the_flow_index
method. Similar to the approach taken in the Keras repo (PR 12259) I would suggest implementing a local RNG.System information
Environment checklist
pip install git+git://github.com/keras-team/keras-preprocessing.git --upgrade --no-deps
The text was updated successfully, but these errors were encountered: