Fixed The C Loop So you don't have to. #285

JustForCollege · 2024-12-22T05:16:57Z

Added a casey inspired macro.

Thanks,

He-Pin · 2024-12-22T07:45:59Z

loops/c/code.c

 int main (int argc, char** argv) {
  int u = atoi(argv[1]);               // Get an input number from the command line
  srand(time(NULL));                   // FIX random seed
  int r = rand() % 10000;              // Get a random integer 0 <= r < 10k
  int32_t a[10000] = {0};              // Array of 10k elements initialized to 0
  for (int i = 0; i < 10000; i++) {    // 10k outer loop iterations
    for (int j = 0; j < 100000; j++) { // 100k inner loop iterations, per outer loop iteration
-      a[i] = a[i] + j%u;               // Simple sum
+      a[i] = a[i] + REM(j,u)               // Simple sum


COOL, but why the compiler does not do

I don't know to be honest, the only time I saw a compiler optimizes modules is if you tried doing module power of 2, in that case it replaces it with "and" instruction, so for example if

a = 35
b = 32

a % b = a & (b - 1)

which in this case is 3

PEZ

Thanks for contributing! 🙏

Can you explain for someone not familiar with Casey? What are the trade-offs? (I am assuming there are trade-offs since the compiler doesn't do this optimization.) And please also provide benchmarks showing what effects the change has to motivate the use of a macro.

JustForCollege · 2024-12-22T13:17:24Z

@PEZ @He-Pin full explanation can be found in this video (https://www.youtube.com/watch?v=RrHGX1wwSYM), thing is the original code can't be vectorized, it uses idiv instruction (for example on x86 machines) which is very slow, compared to this version also uses division however this version the compiler can vectorize it, by using SIMD (Single Instruction Multiple Data), instead of working on one piece of data one instruction can manipulate multiple data, why the compiler doesn't do that already is out of my scope to be honest.

alberts8 · 2024-12-23T21:29:32Z

Without enabling a more advanced instruction set like Avx2 or Avx512 this change might actually slow things down.

Additionally the same change could be made for many other languages as well.

JustForCollege added 2 commits December 22, 2024 07:15

Update code.c

e84704c

Update code.c

35a7e4c

He-Pin reviewed Dec 22, 2024

View reviewed changes

PEZ reviewed Dec 22, 2024

View reviewed changes

Update code.c

20cd818

Ichoran mentioned this pull request Dec 22, 2024

trigger SIMD compiler optimizations #292

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed The C Loop So you don't have to. #285

Fixed The C Loop So you don't have to. #285

JustForCollege commented Dec 22, 2024

He-Pin Dec 22, 2024

JustForCollege Dec 22, 2024

PEZ left a comment

JustForCollege commented Dec 22, 2024 •

edited

Loading

alberts8 commented Dec 23, 2024

Fixed The C Loop So you don't have to. #285

Are you sure you want to change the base?

Fixed The C Loop So you don't have to. #285

Conversation

JustForCollege commented Dec 22, 2024

He-Pin Dec 22, 2024

Choose a reason for hiding this comment

JustForCollege Dec 22, 2024

Choose a reason for hiding this comment

PEZ left a comment

Choose a reason for hiding this comment

JustForCollege commented Dec 22, 2024 • edited Loading

alberts8 commented Dec 23, 2024

JustForCollege commented Dec 22, 2024 •

edited

Loading