Skip to content

Commit

Permalink
Fix Hang in Python Dataset Reader with DistConv (#2457)
Browse files Browse the repository at this point in the history
* Internally track mini batch index

* Remove redundant minibatch index
  • Loading branch information
fiedorowicz1 authored Jun 13, 2024
1 parent 5abdcbc commit 0305a0e
Showing 1 changed file with 8 additions and 1 deletion.
9 changes: 8 additions & 1 deletion src/data_ingestion/readers/data_reader_python_dataset.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,14 @@ void python_dataset_reader::shuffle_responses(DataType* responses_ptr)

execution_mode mode = exec_mode_from_string(get_role());
dataset& ds = get_trainer().get_data_coordinator().get_dataset(mode);
uint64_t global_mb_size = ds.get_current_mini_batch_size();
uint64_t global_mb_size{};
if (m_dataset_minibatch_offset < (ds.get_num_iterations_per_epoch() - 1)) {
global_mb_size = ds.get_mini_batch_size();
}
else if (m_dataset_minibatch_offset ==
(ds.get_num_iterations_per_epoch() - 1)) {
global_mb_size = ds.get_last_mini_batch_size();
}

uint64_t local_mb_size = global_mb_size / nprocs;
uint64_t extra_samples = global_mb_size % nprocs;
Expand Down

0 comments on commit 0305a0e

Please sign in to comment.