algorithms.html

<cxx-clause id="parallel.alg">
  <h1>Parallel algorithms</h1>

  <cxx-section id="parallel.alg.general">
    <h1>In general</h1>

    This clause describes components that C++ programs may use to perform operations on containers
    and other sequences in parallel.

    <cxx-section id="parallel.alg.general.user">
      <h1>Requirements on user-provided function objects</h1>

     <p>
        Function objects passed into parallel algorithms as objects of type <code>BinaryPredicate</code>,
        <code>Compare</code>, and <code>BinaryOperation</code> shall not directly or indirectly modify
        objects via their arguments.
      </p>
    </cxx-section>

    <cxx-section id="parallel.alg.general.exec">
      <h1>Effect of execution policies on algorithm execution</h1>

      <p>
        Parallel algorithms have template parameters named <code>ExecutionPolicy</code> which describe
        the manner in which the execution of these algorithms may be parallelized and the manner in
        which they apply the element access functions.
      </p>

      <p>
        The invocations of element access functions in parallel algorithms invoked with an execution
        policy object of type <code>sequential_execution_policy</code> execute in sequential order in
        the calling thread.
      </p>

      <p>
        The invocations of element access functions in parallel algorithms invoked with an execution
        policy object of type <code>parallel_execution_policy</code> are permitted to execute in an
        unordered fashion in either the invoking thread or in a thread implicitly created by the library
        to support parallel algorithm execution. Any such invocations executing in the same thread are
        indeterminately sequenced with respect to each other. 

        <cxx-note>
          It is the caller's responsibility to ensure correctness, for example that the invocation does
          not introduce data races or deadlocks.
        </cxx-note>
      </p>

      <cxx-example><pre>using namespace std::experimental::parallel;
int a[] = {0,1};
std::vector&lt;int&gt; v;
for_each(par, std::begin(a), std::end(a), [&amp;](int i) {
  v.push_back(i*2+1);
});
</pre>

        The program above has a data race because of the unsynchronized access to the container
        <code>v</code>.
      </cxx-example><pre>
</pre>

      <cxx-example><pre>
using namespace std::experimental::parallel;
std::atomic&lt;int&gt; x = 0;
int a[] = {1,2};
for_each(par, std::begin(a), std::end(a), [&amp;](int n) {
  x.fetch_add(1, std::memory_order_relaxed);
  // spin wait for another iteration to change the value of x
  while (x.load(std::memory_order_relaxed) == 1) { }
});</pre>

        The above example depends on the order of execution of the iterations, and is therefore
        undefined (may deadlock).
      </cxx-example><pre>
</pre>

      <cxx-example><pre>
using namespace std::experimental::parallel;
int x=0;
std::mutex m;
int a[] = {1,2};
for_each(par, std::begin(a), std::end(a), [&amp;](int) {
  m.lock();
  ++x;
  m.unlock();
});</pre>

        The above example synchronizes access to object <code>x</code> ensuring that it is
        incremented correctly.
      </cxx-example>

      <p>
        The invocations of element access functions in parallel algorithms invoked with an execution
        policy of type <code>parallel_vector_execution_policy</code>
        are permitted to execute in an unordered fashion in unspecified threads, and unsequenced
        with respect to one another within each thread.
        <cxx-note>
          This means that multiple function object invocations may be interleaved on a single thread.
        </cxx-note>
        <pre>
</pre>

        <cxx-note>
          This overrides the usual guarantee from the C++ standard, Section 1.9 [intro.execution] that
          function executions do not interleave with one another.
        </cxx-note>
        <pre>
</pre>

          Since <code>parallel_vector_execution_policy</code> allows the execution of element access functions to be
          interleaved on a single thread, synchronization, including the use of mutexes, risks deadlock. Thus the
          synchronization with <code>parallel_vector_execution_policy</code> is restricted as follows:<pre>
</pre>

          A standard library function is <em>vectorization-unsafe</em> if it is specified to synchronize with
          another function invocation, or another function invocation is specified to synchronize with it, and if
          it is not a memory allocation or deallocation function. Vectorization-unsafe standard library functions
          may not be invoked by user code called from <code>parallel_vector_execution_policy</code> algorithms.<pre>
</pre>

          <cxx-note>
            Implementations must ensure that internal synchronization inside standard library routines does not
            induce deadlock.
          </cxx-note>
      </p>

      <cxx-example><pre>
using namespace std::experimental::parallel;
int x=0;
std::mutex m;
int a[] = {1,2};
for_each(par_vec, std::begin(a), std::end(a), [&amp;](int) {
  m.lock();
  ++x;
  m.unlock();
});</pre>

        The above program is invalid because the applications of the function object are not
        guaranteed to run on different threads.
      </cxx-example><pre>
</pre>

      <cxx-note>
        The application of the function object may result in two consecutive calls to
        <code>m.lock</code> on the same thread, which may deadlock.
      </cxx-note><pre>
</pre>

      <cxx-note>
        The semantics of the <code>parallel_execution_policy</code> or the
        <code>parallel_vector_execution_policy</code> invocation allow the implementation to fall back to
        sequential execution if the system cannot parallelize an algorithm invocation due to lack of
        resources.
      </cxx-note>

      <p>
        Algorithms invoked with an execution policy object of type <code>execution_policy</code>
        execute internally as if invoked with the contained execution policy object.
      </p>

      <p>
        The semantics of parallel algorithms invoked with an execution policy object of
        implementation-defined type are implementation-defined.
      </p>
    </cxx-section>

    <cxx-section id="parallel.alg.overloads">
      <h1><code>ExecutionPolicy</code> algorithm overloads</h1>

      <p>
        The Parallel Algorithms Library provides overloads for each of the algorithms named in
        Table 1, corresponding to the algorithms with the same name in the C++ Standard Algorithms Library.
        
        For each algorithm in <cxx-ref to="tab.parallel.algorithms"></cxx-ref>, if there are overloads for
        corresponding algorithms with the same name
        in the C++ Standard Algorithms Library,
        the overloads shall have an additional template type parameter named
        <code>ExecutionPolicy</code>, which shall be the first template parameter.
        
        In addition, each such overload shall have the new function parameter as the
        first function parameter of type <code>ExecutionPolicy&amp;&amp;</code>.
      </p>

      <p>
        Unless otherwise specified, the semantics of <code>ExecutionPolicy</code> algorithm overloads
        are identical to their overloads without.
      </p>
        
      <p>
        Parallel algorithms shall not participate in overload resolution unless
        <code>is_execution_policy&lt;decay_t&lt;ExecutionPolicy&gt;&gt;::value</code> is <code>true</code>.
      </p>
        
      <table is="cxx-table" id="tab.parallel.algorithms" class="list">
        <caption>Table of parallel algorithms</caption>
          <tr>
            <td><code>adjacent_difference</code></td>
            <td><code>adjacent_find</code></td>
            <td><code>all_of</code></td>
            <td><code>any_of</code></td>
          </tr>
          <tr>
            <td><code>copy</code></td>
            <td><code>copy_if</code></td>
            <td><code>copy_n</code></td>
            <td><code>count</code></td>
          </tr>
          <tr>
            <td><code>count_if</code></td>
            <td><code>equal</code></td>
            <td><code>exclusive_scan</code></td>
            <td><code>fill</code></td>
          </tr>
          <tr>
            <td><code>fill_n</code></td>
            <td><code>find</code></td>
            <td><code>find_end</code></td>
            <td><code>find_first_of</code></td>
          </tr>
          <tr>
            <td><code>find_if</code></td>
            <td><code>find_if_not</code></td>
            <td><code>for_each</code></td>
            <td><code>for_each_n</code></td>
          </tr>
          <tr>
            <td><code>generate</code></td>
            <td><code>generate_n</code></td>
            <td><code>includes</code></td>
            <td><code>inclusive_scan</code></td>
          </tr>
          <tr>
            <td><code>inner_product</code></td>
            <td><code>inplace_merge</code></td>
            <td><code>is_heap</code></td>
            <td><code>is_heap_until</code></td>
          </tr>
          <tr>
            <td><code>is_partitioned</code></td>
            <td><code>is_sorted</code></td>
            <td><code>is_sorted_until</code></td>
            <td><code>lexicographical_compare</code></td>
          </tr>
          <tr>
            <td><code>max_element</code></td>
            <td><code>merge</code></td>
            <td><code>min_element</code></td>
            <td><code>minmax_element</code></td>
          </tr>
          <tr>
            <td><code>mismatch</code></td>
            <td><code>move</code></td>
            <td><code>none_of</code></td>
            <td><code>nth_element</code></td>
          </tr>
          <tr>
            <td><code>partial_sort</code></td>
            <td><code>partial_sort_copy</code></td>
            <td><code>partition</code></td>
            <td><code>partition_copy</code></td>
          </tr>
          <tr>
            <td><code>reduce</code></td>
            <td><code>remove</code></td>
            <td><code>remove_copy</code></td>
            <td><code>remove_copy_if</code></td>
          </tr>
          <tr>
            <td><code>remove_if</code></td>
            <td><code>replace</code></td>
            <td><code>replace_copy</code></td>
            <td><code>replace_copy_if</code></td>
          </tr>
          <tr>
            <td><code>replace_if</code></td>
            <td><code>reverse</code></td>
            <td><code>reverse_copy</code></td>
            <td><code>rotate</code></td>
          </tr>
          <tr>
            <td><code>rotate_copy</code></td>
            <td><code>search</code></td>
            <td><code>search_n</code></td>
            <td><code>set_difference</code></td>
          </tr>
          <tr>
            <td><code>set_intersection</code></td>
            <td><code>set_symmetric_difference</code></td>
            <td><code>set_union</code></td>
            <td><code>sort</code></td>
          </tr>
          <tr>
            <td><code>stable_partition</code></td>
            <td><code>stable_sort</code></td>
            <td><code>swap_ranges</code></td>
            <td><code>transform</code></td>
          </tr>
          <tr>
            <td><code>transform_exclusive_scan</code></td>
            <td><code>transform_inclusive_scan</code></td>
            <td><code>transform_reduce</code></td>
            <td><code>uninitialized_copy</code></td>
          </tr>
          <tr>
            <td><code>uninitialized_copy_n</code></td>
            <td><code>uninitialized_fill</code></td>
            <td><code>uninitialized_fill_n</code></td>
            <td><code>unique</code></td>
          </tr>
          <tr>
            <td><code>unique_copy</code></td>
            <td></td>
            <td></td>
            <td></td>
          </tr>
      </table>

      <cxx-note>
      Not all algorithms in the Standard Library have counterparts in <cxx-ref to="tab.parallel.algorithms"></cxx-ref>.
      </cxx-note>
    </cxx-section>
  </cxx-section>

  <cxx-section id="parallel.alg.defns">
    <h1>Definitions</h1>

    <p>
      Define <code><em>GENERALIZED_SUM</em>(op, a1, ..., aN)</code> as follows:

      <ul>
        <li><code>a1</code> when <code>N</code> is <code>1</code></li>
      
        <li>
          <code>op(<em>GENERALIZED_SUM</em>(op, b1, ..., bK)</code>, <code><em>GENERALIZED_SUM</em>(op, bM, ..., bN))</code> where
           
          <ul>
            <li><code>b1, ..., bN</code> may be any permutation of <code>a1, ..., aN</code> and</li>

            <li><code>1 &lt; K+1 = M &le; N</code>.</li>
          </ul>
        </li>
      </ul>
    </p>

    <p>
      Define <code><em>GENERALIZED_NONCOMMUTATIVE_SUM</em>(op, a1, ..., aN)</code> as follows:

      <ul>
        <li><code>a1</code> when <code>N</code> is <code>1</code></li>

        <li>
          <code>op(<em>GENERALIZED_NONCOMMUTATIVE_SUM</em>(op, a1, ..., aK), <em>GENERALIZED_NONCOMMUTATIVE_SUM</em>(op, aM, </code><br>
          <code>..., aN)</code> where <code>1 &lt; K+1 = M &le; N</code>.
        </li>
      </ul>
    </p>
  </cxx-section>

  <cxx-section id="parallel.alg.ops">
    <h1>Non-Numeric Parallel Algorithms</h1>

    <cxx-section id="parallel.alg.ops.synopsis">
      <h1>Header <code>&lt;experimental/algorithm&gt;</code> synopsis</h1>

      <pre>
namespace std {
namespace experimental {
namespace parallel {
inline namespace v1 {
  template&lt;class ExecutionPolicy,
           class InputIterator, class Function&gt;
    void for_each(ExecutionPolicy&amp;&amp; exec,
                  InputIterator first, InputIterator last,
                  Function f);
  template&lt;class InputIterator, class Size, class Function&gt;
    InputIterator for_each_n(InputIterator first, Size n,
                             Function f);
  template&lt;class ExecutionPolicy,
           class InputIterator, class Size, class Function&gt;
    InputIterator for_each_n(ExecutionPolicy&amp;&amp; exec,
                             InputIterator first, Size n,
                             Function f);
}
}
}
}
</pre>
    </cxx-section>

    <cxx-section id="parallel.alg.foreach">
      <h1>For each</h1>

      <cxx-function>
        <cxx-signature>template&lt;class ExecutionPolicy,
      class InputIterator, class Function&gt;
void for_each(ExecutionPolicy&amp;&amp; exec,
              InputIterator first, InputIterator last,
              Function f);</cxx-signature>

        <cxx-effects>
          Applies <code>f</code> to the result of dereferencing every iterator in the range <code>[first,last)</code>.

          <cxx-note>
            If the type of <code>first</code> satisfies the requirements of a mutable iterator, <code>f</code> may
            apply nonconstant functions through the dereferenced iterator.
          </cxx-note>
        </cxx-effects>

        <cxx-complexity>
          Applies <code>f</code> exactly <code>last - first</code> times.
        </cxx-complexity>
        
        <cxx-remarks>
          If <code>f</code> returns a result, the result is ignored.
        </cxx-remarks>

        <cxx-notes>
          Unlike its sequential form, the parallel overload of <code>for_each</code> does not return a copy of
          its <code>Function</code> parameter, since parallelization may not permit efficient state
          accumulation.
        </cxx-notes>
        <cxx-requires>
          Unlike its sequential form, the parallel overload of <code>for_each</code> requires
          <code>Function</code> to meet the requirements of <code>CopyConstructible</code>.
        </cxx-requires>
      </cxx-function>

      <cxx-function>
        <cxx-signature>template&lt;class InputIterator, class Size, class Function&gt;
InputIterator for_each_n(InputIterator first, Size n,
                         Function f);</cxx-signature>

        <cxx-requires>
          <code>Function</code> shall meet the requirements of <code>MoveConstructible</code>
          
          <cxx-note>
            <code>Function</code> need not meet the requirements of <code>CopyConstructible</code>.
          </cxx-note>
        </cxx-requires>

        <cxx-effects>
          Applies <code>f</code> to the result of dereferencing every iterator in the range
          <code>[first,first + n)</code>, starting from <code>first</code> and proceeding to <code>first + n - 1</code>.

          <cxx-note>
            If the type of <code>first</code> satisfies the requirements of a mutable iterator,
            <code>f</code> may apply nonconstant functions through the dereferenced iterator.
          </cxx-note>
        </cxx-effects>

        <cxx-returns>
          <code>first + n</code> for non-negative values of <code>n</code> and <code>first</code> for negative values.
        </cxx-returns>

        <cxx-remarks>
          If <code>f</code> returns a result, the result is ignored.
        </cxx-remarks>
      </cxx-function>

      <cxx-function>
        <cxx-signature>template&lt;class ExecutionPolicy,
      class InputIterator, class Size, class Function&gt;
InputIterator for_each_n(ExecutionPolicy &amp;&amp; exec,
                         InputIterator first, Size n,
                         Function f);</cxx-signature>

        <cxx-effects>
          Applies <code>f</code> to the result of dereferencing every iterator in the range
          <code>[first,first + n)</code>, starting from <code>first</code> and proceeding to <code>first + n - 1</code>.

          <cxx-note>
            If the type of <code>first</code> satisfies the requirements of a mutable iterator,
            <code>f</code> may apply nonconstant functions through the dereferenced iterator.
          </cxx-note>
        </cxx-effects>

        <cxx-returns>
        <code>first + n</code> for non-negative values of <code>n</code> and <code>first</code> for negative values.
        </cxx-returns>

        <cxx-remarks>
          If <code>f</code> returns a result, the result is ignored.
        </cxx-remarks>

        <cxx-notes>
          Unlike its sequential form, the parallel overload of <code>for_each_n</code> requires
          <code>Function</code> to meet the requirements of <code>CopyConstructible</code>.
        </cxx-notes>
      </cxx-function>
    </cxx-section>
  </cxx-section>

    <cxx-section id="parallel.alg.numeric">
    <h1>Numeric Parallel Algorithms</h1>

    <cxx-section id="parallel.alg.numeric.synopsis">
      <h1>Header <code>&lt;experimental/numeric&gt;</code> synopsis</h1>

      <pre>
namespace std {
namespace experimental {
namespace parallel {
inline namespace v1 {
  template&lt;class InputIterator&gt;
    typename iterator_traits&lt;InputIterator&gt;::value_type
      reduce(InputIterator first, InputIterator last);
  template&lt;class ExecutionPolicy,
           class InputIterator&gt;
    typename iterator_traits&lt;InputIterator&gt;::value_type
      reduce(ExecutionPolicy&amp;&amp; exec,
             InputIterator first, InputIterator last);
  template&lt;class InputIterator, class T&gt;
    T reduce(InputIterator first, InputIterator last, T init);
  template&lt;class ExecutionPolicy,
           class InputIterator, class T&gt;
    T reduce(ExecutionPolicy&amp;&amp; exec,
             InputIterator first, InputIterator last, T init);
  template&lt;class InputIterator, class T, class BinaryOperation&gt;
    T reduce(InputIterator first, InputIterator last, T init,
             BinaryOperation binary_op);
  template&lt;class ExecutionPolicy, class InputIterator, class T, class BinaryOperation&gt;
    T reduce(ExecutionPolicy&amp;&amp; exec,
             InputIterator first, InputIterator last, T init,
             BinaryOperation binary_op);

  template&lt;class InputIterator, class OutputIterator,
           class T&gt;
    OutputIterator
      exclusive_scan(InputIterator first, InputIterator last,
                     OutputIterator result,
                     T init);
  template&lt;class ExecutionPolicy,
           class InputIterator, class OutputIterator,
           class T&gt;
    OutputIterator
      exclusive_scan(ExecutionPolicy&amp;&amp; exec,
                     InputIterator first, InputIterator last,
                     OutputIterator result,
                     T init);
  template&lt;class InputIterator, class OutputIterator,
           class T, class BinaryOperation&gt;
    OutputIterator
      exclusive_scan(InputIterator first, InputIterator last,
                     OutputIterator result,
                     T init, BinaryOperation binary_op);
  template&lt;class ExecutionPolicy,
           class InputIterator, class OutputIterator,
           class T, class BinaryOperation&gt;
    OutputIterator
      exclusive_scan(ExecutionPolicy&amp;&amp; exec,
                     InputIterator first, InputIterator last,
                     OutputIterator result,
                     T init, BinaryOperation binary_op);

  template&lt;class InputIterator, class OutputIterator&gt;
    OutputIterator
      inclusive_scan(InputIterator first, InputIterator last,
                     OutputIterator result);
  template&lt;class ExecutionPolicy,
           class InputIterator, class OutputIterator&gt;
    OutputIterator
      inclusive_scan(ExecutionPolicy&amp;&amp; exec,
                     InputIterator first, InputIterator last,
                     OutputIterator result);
  template&lt;class InputIterator, class OutputIterator,
           class BinaryOperation&gt;
    OutputIterator
      inclusive_scan(InputIterator first, InputIterator last,
                     OutputIterator result,
                     BinaryOperation binary_op);
  template&lt;class ExecutionPolicy,
           class InputIterator, class OutputIterator,
           class BinaryOperation&gt;
    OutputIterator
      inclusive_scan(ExecutionPolicy&amp;&amp; exec,
                     InputIterator first, InputIterator last,
                     OutputIterator result,
                     BinaryOperation binary_op);
  template&lt;class InputIterator, class OutputIterator,
           class BinaryOperation, class T&gt;
    OutputIterator
      inclusive_scan(InputIterator first, InputIterator last,
                     OutputIterator result,
                     BinaryOperation binary_op, T init);
  template&lt;class ExecutionPolicy,
           class InputIterator, class OutputIterator,
           class BinaryOperation, class T&gt;
    OutputIterator
      inclusive_scan(ExecutionPolicy&amp;&amp; exec,
                     InputIterator first, InputIterator last,
                     OutputIterator result,
                     BinaryOperation binary_op, T init);

  template&lt;class InputIterator, class UnaryOperation,
           class T, class BinaryOperation&gt;
    T transform_reduce(InputIterator first, InputIterator last,
                       UnaryOperation unary_op,
                       T init, BinaryOperation binary_op);
  template&lt;class ExecutionPolicy,
           class InputIterator, class UnaryOperation,
           class T, class BinaryOperation&gt;
    T transform_reduce(ExecutionPolicy&amp;&amp; exec,
                       InputIterator first, InputIterator last,
                       UnaryOperation unary_op,
                       T init, BinaryOperation binary_op);

  template&lt;class InputIterator, class OutputIterator,
           class UnaryOperation, class T, class BinaryOperation&gt;
    OutputIterator
      transform_exclusive_scan(InputIterator first, InputIterator last,
                               OutputIterator result,
                               UnaryOperation unary_op,
                               T init, BinaryOperation binary_op);
  template&lt;class ExecutionPolicy,
           class InputIterator, class OutputIterator,
           class UnaryOperation, class T, class BinaryOperation&gt;
    OutputIterator
      transform_exclusive_scan(ExecutionPolicy&amp;&amp; exec,
                               InputIterator first, InputIterator last,
                               OutputIterator result,
                               UnaryOperation unary_op,
                               T init, BinaryOperation binary_op);

  template&lt;class InputIterator, class OutputIterator,
           class UnaryOperation, class BinaryOperation&gt;
    OutputIterator
      transform_inclusive_scan(InputIterator first, InputIterator last,
                               OutputIterator result,
                               UnaryOperation unary_op,
                               BinaryOperation binary_op);
  template&lt;class ExecutionPolicy,
           class InputIterator, class OutputIterator,
           class UnaryOperation, class BinaryOperation&gt;
    OutputIterator
      transform_inclusive_scan(ExecutionPolicy&amp;&amp; exec,
                               InputIterator first, InputIterator last,
                               OutputIterator result,
                               UnaryOperation unary_op,
                               BinaryOperation binary_op);

  template&lt;class InputIterator, class OutputIterator,
           class UnaryOperation, class BinaryOperation, class T&gt;
    OutputIterator
      transform_inclusive_scan(InputIterator first, InputIterator last,
                               OutputIterator result,
                               UnaryOperation unary_op,
                               BinaryOperation binary_op, T init);
  template&lt;class ExecutionPolicy,
           class InputIterator, class OutputIterator,
           class UnaryOperation, class BinaryOperation, class T&gt;
    OutputIterator
      transform_inclusive_scan(ExecutionPolicy&amp;&amp; exec,
                               InputIterator first, InputIterator last,
                               OutputIterator result,
                               UnaryOperation unary_op,
                               BinaryOperation binary_op, T init);
}
}
}
}
</pre>
    </cxx-section>

    <cxx-section id="parallel.alg.reduce">
      <h1>Reduce</h1>

      <cxx-function>
        <cxx-signature>template&lt;class InputIterator&gt;
typename iterator_traits&lt;InputIterator&gt;::value_type
    reduce(InputIterator first, InputIterator last);</cxx-signature>

        <cxx-effects>
          Same as <code>reduce(first, last, typename iterator_traits&lt;InputIterator&gt;::value_type{})</code>.
        </cxx-effects>
      </cxx-function>

      <cxx-function>
        <cxx-signature>template&lt;class InputIterator, class T&gt;
T reduce(InputIterator first, InputIterator last, T init);</cxx-signature>

        <cxx-effects>
          Same as <code>reduce(first, last, init, plus&lt;&gt;())</code>.
        </cxx-effects>
      </cxx-function>

      <cxx-function>
        <cxx-signature>template&lt;class InputIterator, class T, class BinaryOperation&gt;
T reduce(InputIterator first, InputIterator last, T init,
         BinaryOperation binary_op);</cxx-signature>

        <cxx-returns>
          <code><em>GENERALIZED_SUM</em>(binary_op, init, *first, ..., *(first + (last - first) - 1))</code>.
        </cxx-returns>

        <cxx-requires>
          <code>binary_op</code> shall not invalidate iterators or subranges, nor modify elements in the
          range <code>[first,last)</code>.
        </cxx-requires>

        <cxx-complexity>
          O(<code>last - first</code>) applications of <code>binary_op</code>.
        </cxx-complexity>

        <cxx-notes>
          The primary difference between <code>reduce</code> and <code>accumulate</code> is that the behavior
          of <code>reduce</code> may be non-deterministic for non-associative or non-commutative <code>binary_op</code>.
        </cxx-notes>
      </cxx-function>
    </cxx-section>

    <cxx-section id="parallel.alg.exclusive.scan">
      <h1>Exclusive scan</h1>

      <cxx-function>
        <cxx-signature>template&lt;class InputIterator, class OutputIterator, class T&gt;
OutputIterator exclusive_scan(InputIterator first, InputIterator last,
                              OutputIterator result,
                              T init);</cxx-signature>

        <cxx-effects>
          Same as <code>exclusive_scan(first, last, result, init, plus&lt;&gt;())</code>.
        </cxx-effects>
      </cxx-function>

      <cxx-function>
        <cxx-signature>template&lt;class InputIterator, class OutputIterator, class T, class BinaryOperation&gt;
OutputIterator exclusive_scan(InputIterator first, InputIterator last,
                              OutputIterator result,
                              T init, BinaryOperation binary_op);</cxx-signature>

        <cxx-effects>
          Assigns through each iterator <code>i</code> in <code>[result,result + (last - first))</code> the
          value of <code><em>GENERALIZED_NONCOMMUTATIVE_SUM</em>(binary_op, init, *first, ..., *(first + (i - result) - 1))</code>.
        </cxx-effects>

        <cxx-returns>
          The end of the resulting range beginning at <code>result</code>.
        </cxx-returns>

        <cxx-requires>
          <code>binary_op</code> shall not invalidate iterators or subranges, nor modify elements in the
          ranges <code>[first,last)</code> or <code>[result,result + (last - first))</code>.
        </cxx-requires>

        <cxx-complexity>
          O(<code>last - first</code>) applications of <code>binary_op</code>.
        </cxx-complexity>

        <cxx-notes>
          The difference between <code>exclusive_scan</code> and <code>inclusive_scan</code> is that
          <code>exclusive_scan</code> excludes the <code>i</code>th input element from the <code>i</code>th
          sum. If <code>binary_op</code> is not mathematically associative, the behavior of
          <code>exclusive_scan</code> may be non-deterministic.
        </cxx-notes>
      </cxx-function>
    </cxx-section>

    <cxx-section id="parallel.alg.inclusive.scan">
      <h1>Inclusive scan</h1>

      <cxx-function>
        <cxx-signature>template&lt;class InputIterator, class OutputIterator&gt;
OutputIterator inclusive_scan(InputIterator first, InputIterator last,
                              OutputIterator result);</cxx-signature>

        <cxx-effects>
          Same as <code>inclusive_scan(first, last, result, plus&lt;&gt;())</code>.
        </cxx-effects>
      </cxx-function>

      <cxx-function>
        <cxx-signature>template&lt;class InputIterator, class OutputIterator, class BinaryOperation&gt;
OutputIterator inclusive_scan(InputIterator first, InputIterator last,
                              OutputIterator result,
                              BinaryOperation binary_op);</cxx-signature>
        <cxx-signature>template&lt;class InputIterator, class OutputIterator, class BinaryOperation, class T&gt;
OutputIterator inclusive_scan(InputIterator first, InputIterator last,
                              OutputIterator result,
                              BinaryOperation binary_op, T init);</cxx-signature>

        <cxx-effects>
          Assigns through each iterator <code>i</code> in <code>[result,result + (last - first))</code> the value of
          <code><em>GENERALIZED_NONCOMMUTATIVE_SUM</em>(binary_op, *first, ..., *(first + (i - result)))</code> or
          <code><em>GENERALIZED_NONCOMMUTATIVE_SUM</em>(binary_op, init, *first, ..., *(first + (i - result)))</code>
          if <code>init</code> is provided.
        </cxx-effects>

        <cxx-returns>
          The end of the resulting range beginning at <code>result</code>.
        </cxx-returns>

        <cxx-requires>
          <code>binary_op</code> shall not invalidate iterators or subranges, nor modify elements in the 
          ranges <code>[first,last)</code> or <code>[result,result + (last - first))</code>.
        </cxx-requires>

        <cxx-complexity>
          O(<code>last - first</code>) applications of <code>binary_op</code>.
        </cxx-complexity>

        <cxx-notes>
          The difference between <code>exclusive_scan</code> and <code>inclusive_scan</code> is that
          <code>inclusive_scan</code> includes the <code>i</code>th input element in the <code>i</code>th sum.
          If <code>binary_op</code> is not mathematically associative, the behavior of
          <code>inclusive_scan</code> may be non-deterministic.
        </cxx-notes>
      </cxx-function>
    </cxx-section>

    <cxx-section id="parallel.alg.transform.reduce">
      <h1>Transform reduce</h1>

      <cxx-function>
        <cxx-signature>template&lt;class InputIterator, class UnaryFunction, class T, class BinaryOperation&gt;
T transform_reduce(InputIterator first, InputIterator last,
                   UnaryOperation unary_op, T init, BinaryOperation binary_op);</cxx-signature>

        <cxx-returns>
          <code><em>GENERALIZED_SUM</em>(binary_op, init, unary_op(*first), ..., unary_op(*(first + (last - first) -</code><br>
          <code>1)))</code>.
        </cxx-returns>

        <cxx-requires>
          Neither <code>unary_op</code> nor <code>binary_op</code> shall invalidate subranges, or modify elements in the range <code>[first,last)</code>
        </cxx-requires>

        <cxx-complexity>
          O(<code>last - first</code>) applications each of <code>unary_op</code> and <code>binary_op</code>.
        </cxx-complexity>

        <cxx-notes>
          <code>transform_reduce</code> does not apply <code>unary_op</code> to <code>init</code>.
        </cxx-notes>
      </cxx-function>
    </cxx-section>

    <cxx-section id="parallel.alg.transform.exclusive.scan">
      <h1>Transform exclusive scan</h1>

      <cxx-function>
        <cxx-signature>template&lt;class InputIterator, class OutputIterator,
      class UnaryOperation,
      class T, class BinaryOperation&gt;
OutputIterator transform_exclusive_scan(InputIterator first, InputIterator last,
                                        OutputIterator result,
                                        UnaryOperation unary_op,
                                        T init, BinaryOperation binary_op);</cxx-signature>

        <cxx-effects>
          Assigns through each iterator <code>i</code> in <code>[result,result + (last - first))</code> the value of
          <code><em>GENERALIZED_NONCOMMUTATIVE_SUM</em>(binary_op, init, unary_op(*first), ..., unary_op(*(first + (i</code><br>
          <code>- result) - 1)))</code>.
        </cxx-effects>

        <cxx-returns>
          The end of the resulting range beginning at <code>result</code>.
        </cxx-returns>

        <cxx-requires>
          Neither <code>unary_op</code> nor <code>binary_op</code> shall invalidate iterators or subranges, or modify elements in the
          ranges <code>[first,last)</code> or <code>[result,result + (last - first))</code>.
        </cxx-requires>

        <cxx-complexity>
          O(<code>last - first</code>) applications each of <code>unary_op</code> and <code>binary_op</code>.
        </cxx-complexity>

        <cxx-notes>
          The difference between <code>transform_exclusive_scan</code> and <code>transform_inclusive_scan</code> is that <code>transform_exclusive_scan</code>
          excludes the ith input element from the ith sum. If <code>binary_op</code> is not mathematically associative, the behavior of
          <code>transform_exclusive_scan</code> may be non-deterministic. <code>transform_exclusive_scan</code> does not apply <code>unary_op</code> to <code>init</code>.
        </cxx-notes>
      </cxx-function>
    </cxx-section>

    <cxx-section id="parallel.alg.transform.inclusive.scan">
      <h1>Transform inclusive scan</h1>

      <cxx-function>
        <cxx-signature>template&lt;class InputIterator, class OutputIterator,
      class UnaryOperation,
      class BinaryOperation&gt;
OutputIterator transform_inclusive_scan(InputIterator first, InputIterator last,
                                        OutputIterator result,
                                        UnaryOperation unary_op,
                                        BinaryOperation binary_op);</cxx-signature>

        <cxx-signature>template&lt;class InputIterator, class OutputIterator,
      class UnaryOperation,
      class BinaryOperation, class T&gt;
OutputIterator transform_inclusive_scan(InputIterator first, InputIterator last,
                                        OutputIterator result,
                                        UnaryOperation unary_op,
                                        BinaryOperation binary_op, T init);</cxx-signature>

        <cxx-effects>
          Assigns through each iterator <code>i</code> in <code>[result,result + (last - first))</code> the value of
          <code><em>GENERALIZED_NONCOMMUTATIVE_SUM</em>(binary_op, unary_op(*first), ..., unary_op(*(first + (i -</code><br>
          <code>result))))</code> or
          <code><em>GENERALIZED_NONCOMMUTATIVE_SUM</em>(binary_op, init, unary_op(*first), ..., unary_op(*(first + (i</code><br>
          <code>- result))))</code>
          if <code>init</code> is provided.
        </cxx-effects>

        <cxx-returns>
          The end of the resulting range beginning at <code>result</code>.
        </cxx-returns>

        <cxx-requires>
          Neither <code>unary_op</code> nor <code>binary_op</code> shall invalidate iterators or subranges, or modify elements in the ranges <code>[first,last)</code>
          or <code>[result,result + (last - first))</code>.
        </cxx-requires>

        <cxx-complexity>
          O(<code>last - first</code>) applications each of <code>unary_op</code> and <code>binary_op</code>.
        </cxx-complexity>

        <cxx-notes>
          The difference between <code>transform_exclusive_scan</code> and <code>transform_inclusive_scan</code> is that <code>transform_inclusive_scan</code>
          includes the ith input element from the ith sum. If <code>binary_op</code> is not mathematically associative, the behavior of
          <code>transform_inclusive_scan</code> may be non-deterministic. <code>transform_inclusive_scan</code> does not apply <code>unary_op</code> to <code>init</code>.
        </cxx-notes>
      </cxx-function>
    </cxx-section>
  </cxx-section>
</cxx-clause>