Releases: bshoshany/thread-pool
Releases · bshoshany/thread-pool
BS::thread_pool v5.0.0
v5.0.0 (2024-12-19)
- A major new release with many new features, improvements, bug fixes, and performance optimizations! Please note that code written using previous releases may need to be modified to work with the new release. The changes needed to migrate to the new API are explicitly indicated below for your convenience.
- Highlights:
- Added support for C++20 and C++23, while maintaining full C++17 compatibility. In C++20, the library can now optionally be imported as a module using
import BS.thread_pool
on Clang, GCC, and MSVC. In C++23, both the library itself and the test program can now optionally import the C++ Standard Library as a module usingimport std
on supported compilers and platforms. Extensive documentation has been added toREADME.md
on how to use these features, to ease the transition. - Optional features are now enabled via a bitmask template parameter instead of macros, using the flags
BS::tp::priority
,BS::tp::pause
, andBS::tp::wait_deadlock_checks
. This makes the optional features easier to use, allows multiple thread pools with different features to coexist, and makes the library compatible with C++20 modules. Exception handling is now disabled automatically if exceptions are disabled, instead of using a macro. - Added optional native extensions for non-portable features using the operating system's native API: setting the priority and affinity for processes and threads, and setting thread names. These have been tested on the latest versions of Windows, Ubuntu, and macOS.
- This library is now back to being a true single-header library, with a single header file
BS_thread_pool.hpp
. The utility classes have been combined into the main header file.BS::timer
has been removed,BS::signaller
has been replaced withBS::binary_semaphore
andBS::counting_semaphore
(in C++17 mode only), andBS::synced_stream
now supports multiple output streams. - Cleanup functions can now be defined to complement the initialization functions. Both initialization and cleanup functions can now optionally take the index of the thread as an argument.
- Parallelization member functions no longer need type casting or template parameters if the start and end indices are of different types.
- The worker function no longer incorrectly reads shared variables while the mutex is unlocked.
- The type aliases
BS::this_thread::optional_index
andBS::this_thread::optional_pool
have been removed. Instead,BS::this_thread::get_index()
returnsstd::optional<std::size_t>
, andBS::this_thread::get_pool()
returnsstd::optional<void*>
. The latter must be cast to the correct instantiation of theBS::thread_pool
class template before using any member functions. - The thread pool version is now accessible using the object
BS::thread_pool_version
, aconstexpr struct
of typeBS::version
with the membersmajor
,minor
, andpatch
. This works even if importing the library as a C++20 module, unlike the version macros. - The type
priority_t
, used to set priorities, is now defined asstd::int8_t
, which means it takes values from -128 to +127. The pre-defined priorities inBS::pr
, such asBS::pr::highest
orBS::pr::lowest
, have been updated accordingly. - Exceptions thrown by detached tasks are now caught and prevented from propagating, so that they do not terminate the program. Exceptions thrown by submitted tasks are still rethrown when calling
get()
on the future, as before. - Parallelization member functions no longer destruct objects prematurely under certain circumstances.
- The test program has been expanded with many new tests for both old and new features. It can also import both the thread pool module using
import BS.thread_pool
(in C++20 and later) and the C++ Standard Library module usingimport std
(in C++23) if the appropriate macros are defined, and read default command line arguments from adefault_args.txt
file for debugging purposes. - Added new and improved benchmarks using a highly-optimized multithreaded algorithm which generates a plot of the Mandelbrot set, utilizing a normalized iteration count algorithm and linear interpolation to create smooth coloring.
- The type
BS::concurrency_t
has been removed; usestd::size_t
instead.
- Added support for C++20 and C++23, while maintaining full C++17 compatibility. In C++20, the library can now optionally be imported as a module using
- C++20 and C++23 support:
- This library now officially supports C++20 and C++23 in addition to C++17. If compiled with C++20 and/or C++23 support (e.g. using the compiler flag
-std=c++23
in Clang/GCC or/std:c++latest
on MSVC), the library will make use of newly available features for maximum performance, reliability, and usability.- To be clear, the library is still fully compatible with any C++17 standard-compliant compiler. I have no plans to remove C++17 support at the moment, as it is still the most widely used C++ standard among developers, but that might change in the future.
- If C++20 features are available, the library can be imported as a module using
import BS.thread_pool
. This is now the officially recommended way to use the library, as it has many benefits, such as faster compilation times, better encapsulation, no namespace pollution, no include order issues, easier maintainability, simpler dependency management, and more.- The module file itself is
BS.thread_pool.cppm
, located in themodules
folder, and it is just a thin wrapper around the header fileBS_thread_pool.hpp
. - The
constexpr
flagBS::thread_pool_module
indicates whether the thread pool library was compiled as a module. - To my knowledge,
BS::thread_pool
is one of the only popular C++ libraries that are currently available as a C++20 module (and certainly the only thread pool library). This feature has been tested with the latest versions of Clang, GCC, and MSVC. Unfortunately, C++20 modules are still (4 years later!) not fully implemented in all compilers, and each compiler implements them differently; for instructions on how to compile and import theBS.thread_pool
module in each compiler, please seeREADME.md
. - Known issues:
- GCC v14.2.0 (latest version at the time of writing) appears to have an internal compiler error when compiling programs containing modules (or at least, this particular module) with any optimization flags other than
-Og
enabled. Until this is fixed, if you wish to use compiler optimizations, please either include the library as a header file or use a different compiler. - On macOS, Apple Clang v16.0.0 (latest version at the time of writing) does not support C++20 modules. Please either install the latest version of LLVM Clang using Homebrew, or include the library as a header file.
- Visual Studio Code's C/C++ extension v1.23.2 (latest version at the time of writing) does not yet support modules. My temporary solution for that, as demonstrated in the test program, is to define the macro
BS_THREAD_POOL_TEST_IMPORT_MODULE
(see below) when compiling the test program, but not when editing in the IDE. If the macro is enabled, the module is imported viaimport BS.thread_pool
, otherwise the header file is included using#include "BS_thread_pool.hpp"
as usual.
- GCC v14.2.0 (latest version at the time of writing) appears to have an internal compiler error when compiling programs containing modules (or at least, this particular module) with any optimization flags other than
- The module file itself is
- If C++23 features are available, both the library and the test program can now import the C++ Standard Library as a module using
import std
. To enable this, define the macroBS_THREAD_POOL_IMPORT_STD
at compilation time. This is currently only officially supported by recent versions of MSVC with Microsoft STL or LLVM Clang (not Apple Clang) with LLVM libc++. It is not supported by GCC with any standard library, Clang with any standard library other than libc++, any compiler with GNU libstdc++, or any other compiler.- If
BS_THREAD_POOL_IMPORT_STD
is defined, then you must also import the library itself as a module. If the library is included as a header file, this will force the program that included the header file to also importstd
, which is not desirable and can lead to compilation errors if the program#include
s any Standard Library header files. - Defining the macro before importing the module will not work, as modules cannot access macros defined in the program that imported them. Instead, define the macro as a compiler flag, e.g.
-D BS_THREAD_POOL_IMPORT_STD
(or/D
for MSVC). - The
constexpr
flagBS::thread_pool_import_std
indicates whether the thread pool library was compiled withimport std
. Note that the flag will befalse
ifBS_THREAD_POOL_IMPORT_STD
is defined but the compiler or standard library does not support importing the C++ Standard Library as a module.
- If
- If C++20 features are available, the pool will use
std::jthread
instead ofstd::thread
. This allows considerable simplification and added safety, since the threads no longer need to be manually joined, andstd::stop_token
is used to stop the workers automatically when destructing the threads. This eliminates the need for thedestroy_threads()
member function, as well as theworkers_running
flag, which are now only used in C++17 mode. - If C++20 features are available, the library will use concepts to enforce the signature of the initialization function and to selectively enable member functions related to pausing only if pausing is enabled. In C++17 mode, the library will use SFINAE to achieve essentially the same effect.
- If C++23 features are available, the task queue will use
std::move_only_function<void()>
instead ofstd::function<void()>
. This allowssubmit_task()
to work without using a shared pointer, which should increase performance. - API migration: All of the C++20/C++23 features listed above are either automatically applied based on compiler sett...
- This library now officially supports C++20 and C++23 in addition to C++17. If compiled with C++20 and/or C++23 support (e.g. using the compiler flag
BS::thread_pool v4.1.0
v4.1.0 (2024-03-22)
- This library is now published in SoftwareX! If you use it in published research, please cite it as follows: Barak Shoshany, "A C++17 Thread Pool for High-Performance Scientific Computing", doi:10.1016/j.softx.2024.101687, SoftwareX 26 (2024) 101687, arXiv:2105.00613
- Updated the source files, as well as
README.md
,CITATION.bib
, andCITATION.cff
with the new citation.
- Updated the source files, as well as
- A new macro,
BS_THREAD_POOL_DISABLE_EXCEPTION_HANDLING
, allows the user to disable exception handling insubmit_task()
if it is not needed, or if exceptions are explicitly disabled in the codebase. See #139.- Note that this macro can be defined independently of
BS_THREAD_POOL_ENABLE_WAIT_DEADLOCK_CHECK
. Disabling exception handling removes thetry
-catch
block fromsubmit_task()
, while enabling wait deadlock checks adds athrow
expression towait()
,wait_for()
, andwait_until()
. - If the feature-test macro
__cpp_exceptions
is undefined,BS_THREAD_POOL_DISABLE_EXCEPTION_HANDLING
is automatically defined, andBS_THREAD_POOL_ENABLE_WAIT_DEADLOCK_CHECK
is automatically undefined.
- Note that this macro can be defined independently of
- Replaced
#pragma once
with old-school include guards using the macrosBS_THREAD_POOL_HPP
andBS_THREAD_POOL_UTILS_HPP
. There are two main reasons for this:- Even though
#pragma once
is supported by the vast majority of modern compilers, it is still a non-standard feature, so using it technically made the library not standards compliant. - Include guards make it possible to include the library twice in the same project (for example, once with priority enabled and once without) by undefining the include guard and putting the second include in its own namespace.
- Even though
- Included a description of the destructor behavior for the
BS::thread_pool
class inREADME.md
, in the library reference section. See #143. - Removed unnecessary locking in
reset()
if pausing is not enabled.
BS::thread_pool v4.0.1
v4.0.1 (2023-12-28)
- Fixed linkage issue caused by the global variables
BS::this_thread::get_index
andBS::this_thread::get_pool
not being defined asinline
. See 134 and 137. - Fixed redundant cast in the
BS::thread_pool::blocks
class, and added-Wuseless-cast
to the GCC warning flags inBS_thread_pool_test.ps1
to catch similar issues in the future. See 133. - Each of the three files
BS_thread_pool_test.cpp
,BS_thread_pool.hpp
, andBS_thread_pool_utils.hpp
now contains three macros indicating the major, minor, and patch version of the file. In addition,BS_thread_pool_test.cpp
now checks whether the versions of all three files match, and aborts compilation if they do not.
BS::thread_pool v4.0.0
v4.0.0 (2023-12-27)
- A major new release with numerous changes, additions, fixes, and improvements. Many frequently requested features have been added, and performance has been optimized. Please note that code written using previous releases will need to be modified to work with the new release. The changes needed to migrate to the new API are explicitly indicated below for your convenience.
- Highlights:
- The light thread pool has been removed. However, by default, the thread pool is in "light mode". Optional features that may affect performance must be enabled by defining suitable macros.
- This library now ships with two stand-alone header files:
BS_thread_pool.hpp
contains the mainBS::thread_pool
class and theBS::multi_future
helper classes, and is the only file needed to use the thread pool itself.BS_thread_pool_utils.hpp
contains the additional utility classesBS::signaller
,BS::synced_stream
, andBS::timer
, which are fully independent of the thread pool itself and can be used either with or without it.
- It is now possible to assign priorities to tasks. Tasks with higher priorities will be executed first.
- Member functions for submitting tasks and loops have been renamed for consistency, e.g.
detach_task()
andsubmit_task()
, where the prefixdetach
means no future will be returned andsubmit
means a future orBS::multi_future
will be returned. - There are now two ways to parallelize loops into blocks:
detach_blocks()
andsubmit_blocks()
behave the same as loop parallelization in previous releases, running the loop function once per block.detach_loop()
andsubmit_loop()
have a simpler syntax, where the loop function is run once per index, so the user doesn't have to manually run the internal loop for each block.
- The new member functions
detach_sequence()
andsubmit_sequence()
allow submitting a sequence of tasks enumerated by indices. - It is now possible to run an initialization function in each thread before it starts to execute any submitted tasks.
- Tasks submitted with
detach_task()
orsubmit_task()
can no longer have arguments. Task with arguments must be enclosed inside lambda expressions. This simplifies the API and provides better readability. Tasks can still have return values. - Various ways to obtain information about the threads in the pool have been introduced:
- The member function
get_thread_ids()
obtains the unique thread identifiers, andget_native_handles()
obtains the underlying implementation-defined thread handles. - The new namespace
BS::this_thread
allows obtaining the thread's index in the pool usingBS::this_thread::get_index()
and a pointer to the pool that owns the thread usingBS::this_thread::get_pool()
.
- The member function
- Member functions for waiting for tasks have been renamed for brevity:
wait()
/wait_for()
/wait_until()
. In addition, these functions can now optionally throw an exception if the user tries to call them from within a thread of the same pool, which would result in a deadlock. - The first index must now be specified explicitly when parallelizing blocks, loops, and sequences, and it must not be greater than the last index. Also, both indices must now have the same type, or the template parameter should be explicitly specified.
- Optimized the way
detach_blocks()
,submit_blocks()
,detach_loop()
, andsubmit_loop()
split the range of the loop into blocks. - Added a utility class
BS::signaller
to allow simple signalling between threads. BS::multi_future<T>
is now a specialization ofstd::vector<std::future<T>>
with additional member functions.
- Breaking changes:
- The light thread pool has been removed. The original idea was that the light thread pool will allow the user to sacrifice functionality for increased performance. However, in my testing I found that there was no actual performance benefit to the light thread pool. Therefore, there is no reason to keep it.
- However, by default, the thread pool is in "light mode". Optional features that may affect performance due to additional checks or more complicated algorithms must be enabled by defining suitable macros before including the library:
BS_THREAD_POOL_ENABLE_PAUSE
to enable pausing.BS_THREAD_POOL_ENABLE_PRIORITY
to enable task priority.BS_THREAD_POOL_ENABLE_WAIT_DEADLOCK_CHECK
to enable wait deadlock checks.
- API migration:
- If you previously used
BS_thread_pool_light.hpp
, simply useBS_thread_pool.hpp
instead. - If you previously used the pausing feature, define the macro
BS_THREAD_POOL_ENABLE_PAUSE
before includingBS_thread_pool.hpp
to enable it.
- If you previously used
- However, by default, the thread pool is in "light mode". Optional features that may affect performance due to additional checks or more complicated algorithms must be enabled by defining suitable macros before including the library:
- Member functions have been renamed for better consistency. Each function has a
detach
variant which does not return a future, and asubmit
variant which does return a future (or aBS::multi_future
):detach_task()
andsubmit_task()
for single tasks.detach_blocks()
andsubmit_blocks()
for loops to be split into blocks, where the loop function is executed once per block and must have an internal loop, as in previous releases.detach_loop()
andsubmit_loop()
for loops to be split into blocks, where the loop function is executed once per index and the pool takes care of the internal loop.detach_sequence()
andsubmit_sequence()
for sequences of enumerated tasks.- API migration: Use the new names of the functions:
push_task()
->detach_task()
submit()
->submit_task()
push_loop()
->detach_blocks()
parallelize_loop()
->submit_blocks()
wait_for_tasks()
,wait_for_tasks_duration()
, andwait_for_tasks_until()
have been renamed towait()
,wait_for()
, andwait_until()
respectively.- API migration: Use the new names of the functions:
wait_for_tasks()
->wait()
wait_for_tasks_duration()
->wait_for()
wait_for_tasks_until()
->wait_until()
- API migration: Use the new names of the functions:
- Functions for parallelizing loops no longer have dedicated overloads for the special case where the first index is 0. These overloads essentially amount to giving the first function argument a default value, which is not allowed in C++, and can be confusing. In addition, indicating the first index explicitly is better for readability.
- API migration: Add the first index 0 manually as the first argument if it was omitted.
- Functions for parallelizing loops no longer allow the last index to be smaller than the first index. Previously, e.g.
detach_blocks(5, 0, ...)
was equivalent todetach_blocks(0, 5, ...)
. However, this led to confusing results. Since the first argument is the first index and the second argument is the index after the last index (i.e. 0 to 5 actually means 0, 1, 2, 3, 4), the user might get the wrong impression thatdetach_blocks(5, 0, ...)
will count 5, 4, 3, 2, 1 instead. This option was removed to avoid this confusion.- Sometimes the user might actually want to make a loop that counts down instead of up. This cannot be done by flipping the order of the arguments to e.g.
detach_blocks()
(nor could it be done in previous releases). However, it can be done by simply defining a suitable loop function. For example, if you calldetach_blocks(0, 10, loop, 2)
and define the loop function asfor (T i = 9 - start; i > 9 - end; --i)
, then the first block will count 9, 8, 7, 6, 5 and the second block will count 4, 3, 2, 1, 0. detach_loop()
,submit_loop()
,detach_sequence()
, andsubmit_sequence()
work the same way. The first index must be smaller than the last index, but you can count down by writing a suitable loop or sequence function.- API migration: Any loop parallelization that used a first index greater than the last index will work exactly the same after switching the first and second arguments so that the smaller index appears first.
- Sometimes the user might actually want to make a loop that counts down instead of up. This cannot be done by flipping the order of the arguments to e.g.
- Functions for parallelizing loops no longer accept first and last indices of different types. The reason for allowing this previously was that otherwise, writing something like
detach_blocks(0, x, ...)
wherex
is not anint
would result in a compilation error, since0
is by default anint
and therefore the arguments0
andx
have different types. However, this behavior, which usedstd::common_type
to determine the common type of the two indices, sometimes completely messed up the range of the loop. For example, thestd::common_type
ofint
andunsigned int
isunsigned int
, which means the loop will only use non-negative indices even if theint
start index was negative, resulting in an integer overflow.- API migration: If you want to invoke e.g.
detach_blocks(0, x, ...)
wherex
is not anint
, you can either:- Make the
0
have the desired type using a cast or a suffix. For example, ifx
is anunsigned int
, write(unsigned int)0
or0U
instead of0
. - Specify the template parameter explicitly. For example, if
x
is asize_t
, writedetach_blocks<size_t>(0, x, ...)
.
- Make the
- API migration: If you want to invoke e.g.
detach_task()
andsubmit_task()
no longer accept arguments for the submitted task. Instead, you must enclose the function in a lambda expression. In other words, instead ofdetach_task(task, args...)
you should writedetach_task([] { task(args...); })
, indicating in the capture list[]
whether to capture the task itself, and each of the arguments, by value or reference. Please seeREADME.md
for examples. This was changed for the following reasons:- Consistency wi...
- The light thread pool has been removed. The original idea was that the light thread pool will allow the user to sacrifice functionality for increased performance. However, in my testing I found that there was no actual performance benefit to the light thread pool. Therefore, there is no reason to keep it.
BS::thread_pool v3.5.0
v3.5.0 (2023-05-25)
BS_thread_pool.hpp
andBS_thread_pool_light.hpp
:- Added a new member function,
purge()
, to the full (non-light) thread pool. This function purges all the tasks waiting in the queue. Tasks that are currently running will not be affected, but any tasks still waiting in the queue will be removed and will never be executed by the threads. Please note that there is no way to restore the purged tasks. - Fix a bug which caused
wait_for_tasks()
to only block the first thread that called it. Now it blocks every thread that calls it, which is the expected behavior. In addition, all related deadlock have now been completely resolved. This also applies to the variantswait_for_tasks_duration()
andwait_for_tasks_until()
in the non-light version. See #110.- Note: You should never call
wait_for_tasks()
from within a thread of the same thread pool, as that will cause it to wait forever! This fix is relevant for situations whenwait_for_tasks()
is called from an auxiliarystd::thread
or a separate thread pool.
- Note: You should never call
push_task()
andsubmit()
now avoid creating unnecessary copies of the function object. This should improve performance, especially if large objects are involved. See #90.- Optimized the way condition variables are used by the thread pool class. Shared variables are now modified while owning the mutex, but condition variables are notified after the mutex is released, if possible. See #84.
- Instead of a variable
tasks_total
to keep track of the total number of tasks (queued + running), the thread pool class now uses a variabletasks_running
to keep track only of the number of running tasks, with the number of tasks in the queue obtained viatasks.size()
. This makes more sense in terms of the internal logic of the class. - All atomic variables have been converted to non-atomic. They are now all governed by
tasks_mutex
, so they do not need to be atomic. This eliminates redundant locking, and may improve performance a bit. running
has been renamed toworkers_running
andtask_done_cv
has been renamed totasks_done_cv
.- The worker now only notifies thi condition variable
tasks_done_cv
if all the tasks are done, not just a single task. Checking if the tasks are done is cheaper than notifying the condition variable, so since the worker no longer notifies the condition variable every single time it finishes a task, this should improve performance a bit ifwait_for_tasks()
is used.
- Added a new member function,
BS_thread_pool_test.cpp
:- Combined the tests for the full and light versions into one program. The file
BS_thread_pool_light_test.cpp
has been removed. - The tests for the light version are now much more comprehensive. The only features that are not tested in the light version are those that do not exist in it.
- Added a test for the new
purge()
member function. - Added a test to ensure that
push_task()
andsubmit()
do not create unnecessary copies of the function object. - Added a test to ensure that
push_task()
andsubmit()
correctly accept arguments passed by value, reference, and constant reference. - Added a test to ensure that
wait_for_tasks()
blocks all external threads that call it. _CRT_SECURE_NO_WARNINGS
is now set only if it has not already been defined, to prevent errors in MSVC projects which already have it set as part of the default build settings. See #72.
- Combined the tests for the full and light versions into one program. The file
README.md
:- Added documentation for the new
purge()
member function. - Added an explanation for how to pass arguments by reference or constant reference when submitting functions to the queue, using the wrappers
std::ref()
andstd::cref()
respectively. See #83. - Added a link to my lecture notes for a course taught at McMaster University, for the benefit of beginner C++ programmers who wish to learn some of the advanced techniques and programming practices used in developing this library.
- Removed the sample test results, since the complete log file (including the deadlock tests) is now over 500 lines long.
- Added documentation for the new
- Other:
- A
.clang-format
file with the project's formatting conventions is now included in the GitHub repository. The pull request template now asks to format any new code using this file, so that it is consistent with the rest of the library. - A PowerShell script,
BS_thread_pool_test.ps1
, is now provided in the GitHub repository to make running the test on multiple compilers and operating systems easier. Since it is written in PowerShell, it is fully portable and works on Windows, Linux, and macOS. The script will automatically detect if Clang, GCC, and/or MSVC are available, compile the test program using each available compiler, and then run each compiled test program 5 times and report on any errors. The pull request template now recommends using this script for testing. - Since the root folder has become a bit crowded, the header files
BS_thread_pool.hpp
andBS_thread_pool_light.hpp
have been moved to theinclude
subfolder, and the test fileBS_thread_pool_test.cpp
has been moved to thetests
subfolder, which also contains the new test scriptBS_thread_pool_test.ps1
.
- A
BS::thread_pool v3.4.0
v3.4.0 (2023-05-12)
BS_thread_pool.hpp
andBS_thread_pool_light.hpp
:- Resolved an issue which could have caused
tasks_total
to not be synchronized in some cases. See #70. - Resolved a deadlock which could rarely be caused when the pool was destructed or reset. See #93, #100, #107, and #108.
- Resolved a deadlock which could be caused when
wait_for_tasks()
was called more than once. - Two new member functions have been added to the non-light version:
wait_for_tasks_duration()
andwait_for_tasks_until()
. They allow waiting for the tasks to complete, but with a timeout.wait_for_tasks_duration()
will stop waiting after the specified duration has passed, andwait_for_tasks_until()
will stop waiting after the specified time point has been reached. - Renamed
BS_THREAD_POOL_VERSION
inBS_thread_pool_light.hpp
toBS_THREAD_POOL_LIGHT_VERSION
and removed the[light]
tag. This allows including both header files in the same program in case we want to use both the light and non-light thread pools simultaneously.
- Resolved an issue which could have caused
BS_thread_pool_test.cpp
andBS_thread_pool_light_test.cpp
:- Fixed an issue that caused a compilation error when using MSVC and including
Windows.h
. See #72. - The number and size of the vectors in the performance test (
BS_thread_pool_test.cpp
only) are now guaranteed to be multiples of the number of threads, for optimal performance. - In
count_unique_threads()
, moved the condition variables and mutexes to the function scope to prevent cluttering the global scope. - Three new tests have been added to
BS_thread_pool_test.cpp
to check the deadlocks issue that were resolved in this release (see above). The tests rely on the new wait for tasks with timeout feature, so they are not available in the light version.- One test checks for deadlocks when calling
wait_for_tasks()
more than once. - Two tests check for deadlocks when destructing and resetting the pool respectively. They are turned off by default, since they take a long time to complete, but can be turned on by setting
enable_long_deadlock_tests
totrue
.
- One test checks for deadlocks when calling
- Two new tests have been added to the non-light version to check the new member functions
wait_for_tasks_duration()
andwait_for_tasks_until()
. - The test programs now return the number of failed tests upon exit, instead of just 1 if any number of tests failed, which was the case in previous versions. Also, if any tests failed,
std::quick_exit()
is invoked instead ofreturn
, to avoid getting stuck due to any lingering tasks or deadlocks.
- Fixed an issue that caused a compilation error when using MSVC and including
README.md
:- Added documentation for the two new member functions,
wait_for_tasks_duration()
andwait_for_tasks_until()
. - Fixed Markdown rendering incorrectly on Visual Studio. See #77.
- The sample performance tests are now taken from a 40-core / 80-thread dual-CPU computing node, which is a more typical use case for high-performance scientific software.
- Added documentation for the two new member functions,
BS::thread_pool v3.3.0
v3.3.0 (2022-08-03)
BS_thread_pool.hpp
:- The public member variable
paused
ofBS::thread_pool
has been made private for future-proofing (in case future versions implement a more involved pausing mechanism) and better encapsulation. It is now accessible only via thepause()
,unpause()
, andis_paused()
member functions. In other words:- Replace
pool.paused = true
withpool.pause()
. - Replace
pool.paused = false
withpool.unpause()
. - Replace
if (pool.paused)
(or similar) withif (pool.is_paused())
.
- Replace
- The public member variable
f
ofBS::multi_future
has been renamed tofutures
for clarity, and has been made private for encapsulation and simplification purposes. Instead of operating on the vectorfutures
itself, you can now use the[]
operator of theBS::multi_future
to access the future at a specific index directly, or thepush_back()
member function to append a new future to the list. Thesize()
member function tells you how many futures are currently stored in the object. - The explicit casts of
std::endl
andstd::flush
, added in v3.2.0 to enable flushing aBS::synced_stream
, caused ODR (One Definition Rule) violations ifBS_thread_pool.hpp
was included in two different translation units, since they were mistakenly not defined asinline
. To fix this, I decided to make them static members ofBS::synced_stream
instead of global variables, which also makes the code better organized in my opinion. These objects can now be accessed asBS::synced_stream::endl
andBS::synced_stream::flush
. I also added an example for how to use them inREADME.md
. See #64.
- The public member variable
BS_thread_pool_light.hpp
:- This package started out as a very lightweight thread pool, but over time has expanded to include many additional features, and at the time of writing it has a total of 340 lines of code, including all the helper classes. Therefore, I have decided to bundle a light version of the thread pool in a separate and stand-alone header file,
BS_thread_pool_light.hpp
, with only 170 lines of code (half the size of the full package). This file does not contain any of the helper classes, only a newBS::thread_pool_light
class, which is a minimal thread pool with only the 5 most basic member functions:get_thread_count()
push_loop()
push_task()
submit()
wait_for_tasks()
- A separate test program
BS_thread_pool_light_test.cpp
tests only the features of the lightweightBS::thread_pool_light
class. In the spirit of minimalism, it does not generate a log file and does not do any benchmarks. - To be perfectly clear, each header file is 100% stand-alone. If you wish to use the full package, you only need
BS_thread_pool.hpp
, and if you wish to use the light version, you only needBS_thread_pool_light.hpp
. Only a single header file needs to be included in your project.
- This package started out as a very lightweight thread pool, but over time has expanded to include many additional features, and at the time of writing it has a total of 340 lines of code, including all the helper classes. Therefore, I have decided to bundle a light version of the thread pool in a separate and stand-alone header file,
BS::thread_pool v3.2.0
v3.2.0 (2022-07-28)
BS_thread_pool.hpp
:- Main
BS::thread_pool
class:- Added a new member function,
push_loop()
, which does the same thing asparallelize_loop()
, except that it does not return aBS::multi_future
with the futures for each block. Just likepush_task()
vs.submit()
, this avoids the overhead of creating the futures, but the user must usewait_for_tasks()
or some other method to ensure that the loop finishes executing, otherwise bad things will happen. push_task()
andsubmit()
now utilize perfect forwarding in order to support more types of tasks - in particular member functions, which in previous versions could not be submitted unless wrapped in a lambda. To submit a member function, use the syntaxsubmit(&class::function, &object, args)
. More information can be found inREADME.md
. See #9.push_loop()
andparallelize_loop()
now have overloads where the first argument (the first index in the loop) is omitted, in which case it is assumed to be 0. This is for convenience, as the case where the first index is 0 is very common.
- Added a new member function,
- Helper classes:
BS::synced_stream
now utilizes perfect forwarding in the member functionsprint()
andprintln()
.- Previously, it was impossible to pass the flushing manipulators
std::endl
andstd::flush
toprint()
andprintln()
, since the compiler could not figure out which template specializations to use. The new objectsBS::endl
andBS::flush
are explicit casts of these manipulators, whose sole purpose is to enable passing them toprint()
andprintln()
. BS::multi_future::get()
now rethrows exceptions generated by the futures, even if the futures returnvoid
. See #62.- Added a new helper class,
BS::blocks
, which is used byparallelize_loop()
andpush_loop()
to divide a range into blocks. This class is not documented inREADME.md
, as it most likely will not be of interest to most users, but it is still publicly available, in case you want to parallelize something manually but still benefit from the built-in algorithm for splitting a range into blocks.
- Main
BS_thread_pool_test.cpp
:- Added plenty of new tests for the new features described above.
- Fixed a bug in
count_unique_threads()
that caused it to get stuck on certain systems. dual_println()
now also flushes the stream usingBS::endl
, so that if the test gets stuck, the log file will still contain everything up to that point. (Note: It is a common misconception thatstd::endl
and'\n'
are interchangeable.std::endl
not only prints a newline character, it also flushes the stream, which is not always desirable, as it may reduce performance.)- The performance test has been modified as follows:
- Instead of generating random vectors using
std::mersenne_twister_engine
, which proved to be inconsistent across different compilers and systems, the test now generates each element via an arbitrarily-chosen numerical operation. In my testing, this provided much more consistent results. - Instead of using a hard-coded vector size, a suitable vector size is now determined dynamically at runtime.
- Instead of using
parallelize_loop()
, the test now uses the newpush_loop()
function to squeeze out a bit more performance. - Instead of setting the test parameters to achieve a fixed single-threaded mean execution time of 300 ms, the test now aims to achieve a fixed multi-threaded mean execution time of 50 ms when the number of blocks is equal to the number of threads. This allows for more reliable results on very fast CPUs with a very large number of threads, where the mean execution time when using all the threads could previously be below a statistically significant value.
- The number of vectors is now restricted to be a multiple of the number of threads, so that the blocks are always all of the same size.
- Instead of generating random vectors using
README.md
:- Added instructions and examples for the new features described above.
- Rewrote the documentation for
parallelize_loop()
to make it clearer.
BS::thread_pool v3.1.0
v3.1.0 (2022-07-13)
BS_thread_pool.hpp
:- Fixed an issue where
wait_for_tasks()
would sometimes get stuck ifpush_task()
was executed immediately beforewait_for_tasks()
. - Both the thread pool constructor and the
reset()
member function now determine the number of threads to use in the pool as follows. If the parameter is a positive number, then the pool will be created with this number of threads. If the parameter is non-positive, or a parameter was not supplied, then the pool will be created with the total number of hardware threads available, as obtained fromstd::thread::hardware_concurrency()
. If the latter returns a non-positive number for some reason, then the pool will be created with just one thread. See #51 and #52. - Added the
[[nodiscard]]
attribute to classes and class members, in order to warn the user when accidentally discarding an important return value, such as a future or the return value of a function with no useful side-effects. For example, if you usesubmit()
and don't save the future it returns, the compiler will now generate a warning. (If a future is not needed, then you should usepush_task()
instead.) - Removed the
explicit
specifier from all constructors, as it prevented the default constructor from being used with static class members. See #48.
- Fixed an issue where
BS_thread_pool_test.cpp
:- Improved
count_unique_threads()
using condition variables, to ensure that each thread in the pool runs at least one task regardless of how fast it takes to run the tasks. - When appropriate,
check()
now explicitly reports what the obtained result was and what it was expected to be. check_task_monitoring()
andcheck_pausing()
now explicitly report the results of the monitoring at each step.- Changed all instances of
std::vector<std::atomic<bool>>
tostd::unique_ptr<std::atomic<bool>[]>
. See #44. - Converted a few more C-style casts to C++ cast expressions.
- Improved
README.md
:- Added instructions for using this package with the Conan C/C++ package manager. Please refer to this package's page on ConanCenter to learn how to use Conan to include this package in your project with various build systems.
- If you found this project useful, please consider starring it on GitHub! This allows me to see how many people are using my code, and motivates me to keep working to improve it.
BS::thread_pool v3.0.0
v3.0.0 (2022-05-30)
- This is a major new release with many changes and improvements! Please note that code written using previous releases will need to be slightly modified to work with the new release. The changes needed to migrate to the new API are explicitly indicated below for your convenience.
- Breaking changes to the library header file:
- The header file has been renamed to
BS_thread_pool.hpp
to avoid potential conflict with other thread pool libraries.- API migration: The library must now be included by invoking
#include "BS_thread_pool.hpp"
.
- API migration: The library must now be included by invoking
- All the definitions in the library, including the
thread_pool
class and the helper classes, are now located in the namespaceBS
. This namespace will also be used for my other C++ projects, and is intended to ensure consistency between my projects while avoiding potential name conflicts with other libraries.- API migration: The thread pool class should now be invoked as
BS::thread_pool
. Alternatively, it is possible to employusing BS::thread_pool
or evenusing namespace BS
and then invokethread_pool
directly. Same for theBS::synced_stream
andBS::timer
helper classes.
- API migration: The thread pool class should now be invoked as
- The macro
THREAD_POOL_VERSION
, which contains the version number and release date of the library, has been renamed toBS_THREAD_POOL_VERSION
to avoid potential conflicts.- API migration: The version must now be read from the macro
BS_THREAD_POOL_VERSION
.
- API migration: The version must now be read from the macro
- The public member
sleep_duration
has been removed. The thread pool now uses condition variables instead of sleep to facilitate waiting. This significantly improves performance (by 10%-50% in my testing), drastically decreases idle CPU utilization, and eliminates the need to set an optimal sleep time. This was a highly-requested change; see issue #1, issue #12, and pull request #23.- API migration: Remove any code that relates to the public member
sleep_duration
.
- API migration: Remove any code that relates to the public member
- The template specializations for
submit()
have been merged. Now instead of two versions, one for functions with a return value and one for functions without a return value, there is just one version, which can accept any function. This makes the code more compact (and elegant). If a function with no return value is submitted, anstd::future<void>
is returned (the previous version returned anstd::future<bool>
)- API migration: To wait for a task with no return value, simply call
wait()
orget()
on the correspondingstd::future<void>
.
- API migration: To wait for a task with no return value, simply call
parallelize_loop()
now returns a future in the form of a newBS::multi_future
helper class template. The member functionwait()
of this future allows waiting until all of the loop's blocks finish executing. In previous versions, callingparallelize_loop()
both parallelized the loop and waited for the blocks to finish; now it is possible to do other stuff while the loop executes.- API migration: Since
parallelize_loop()
no longer automatically blocks, you should either store the result in aBS::multi_future
object and call itswait()
member function, or simply callparallelize_loop().wait()
to reproduce the old behavior.
- API migration: Since
- The header file has been renamed to
- Non-breaking changes to the library header file:
- It is now possible to use
parallelize_loop()
with functions that have return values and get these values from all blocks at once through theget()
member function of theBS::multi_future
. - The template specializations for
push_task()
have been merged. Now instead of two versions, one for functions with arguments and one for functions without arguments, there is just one version, which can accept any function. - Constructors have been made
explicit
. See issue #28. submit()
now usesstd::make_shared
instead ofnew
to create the shared pointer. This means only one memory allocation is performed instead of two, which should improve performance. In addition, all unique pointers are now created usingstd::make_unique
.- A new helper class template,
BS::multi_future
, has been added. It's basically just a wrapper aroundstd::vector<std::future<T>>
. This class is used by the new implementation ofparallelize_loop()
to allow waiting for the entire loop, consisting of multiple tasks with their corresponding futures, to finish executing. BS::multi_future
can also be used independently to handle multiple futures at once. For example, you can now keep track of several groups of tasks by storing their futures inside separateBS::multi_future
objects and use eitherwait()
to wait for all tasks in a specific group to finish orget()
to get anstd::vector
with the return values of every task in the group.- Integer types are now chosen in a smarter way to improve portability, allow for better compatibility with 32-bit systems, and prevent potential conversion errors.
- Added a new type,
BS::concurrency_t
, equal to the return type ofstd::thread::hardware_concurrency()
. This is probably pointless, since the C++ standard requires this to beunsigned int
, but it seems to me to make the code slightly more portable, in case some non-conforming compiler chooses to use a different integer type. - C-style casts have been converted to C++ cast expressions for added clarity.
- Miscellaneous minor optimizations and style improvements.
- It is now possible to use
- Changes to the test program:
- The program has been renamed to
BS_thread_pool_test.cpp
to avoid potential conflict with other thread pool libraries. - The program now returns
EXIT_FAILURE
if any of the tests failed, for automation purposes. See pull request #42. - Fixed incorrect check order in
check_task_monitoring()
. See pull request #43. - Added a new test for
parallelize_loop()
with a return value. - Improved some of the tests to make them more reliable. For example,
count_unique_threads()
now uses futures (stored in aBS::multi_future<void>
object). - The program now uses
std::vector
instead of matrices, for both consistency checks and benchmarks, in order to simplify the code and considerably reduce its length. - The benchmarks have been simplified. There's now only one test: filling a specific number of vectors of fixed size with random values. This may be replaced with something more practical in a future released, but at least on the systems I've tested on, it does demonstrate a very significant multi-threading speedup.
- In addition to multi-threaded tests with different numbers of tasks, the benchmark now also includes a single-threaded test. This allows for more accurate benchmarks compared to previous versions, as the (slight) parallelization overhead is now taken into account when calculating the maximum speedup.
- The program decides how many vectors to use for benchmarking by testing how many are needed to reach a target duration in the single-threaded test. This ensures that the test takes approximately the same amount of time on different systems, and is thus more consistent and portable.
- Miscellaneous minor optimizations and style improvements.
- The program has been renamed to
- Changes to
README.md
:- Many sections have been rewritten and/or polished.
- Explanations and examples of all the new features have been added.
- Added an acknowledgements section.
- Miscellaneous changes:
- Added a
CITATION.bib
file (in BibTeX format) to the GitHub repository. You can use it to easily cite this package if you use it in any research papers. - Added a
CITATION.cff
file (in YAML format) to the GitHub repository. This should add an option to get a citation in different formats directly from GitHub repository by clicking on "cite this repository" on the sidebar to the right. - Added templates for GitHub issues and pull requests.
- Added a