Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inconsistent alignment of cl_* types on i386 #149

Open
anbe42 opened this issue Jan 29, 2021 · 5 comments
Open

inconsistent alignment of cl_* types on i386 #149

anbe42 opened this issue Jan 29, 2021 · 5 comments

Comments

@anbe42
Copy link

anbe42 commented Jan 29, 2021

While looking further into pocl/pocl#801 on Debian i386 I found some inconsistencies in the alignment of cl_* types resulting in a struct {cl_int; cl_long;}; to have size 12 on i386 but 16 elsewhere (including struct {int; long;}; in OpenCL).
This causes mismatches when passing such a struct as a kernel argument.
We all know that i386 is 'special' with its alignment requirements, but does this need to carry on into cl_*?
I found it especially curious to see differences between results in a i386 chroot and a 32-bit program built on the x86-64 host system with -m32.
(All this was observed on Debian sid/testing with gcc-10, I haven't tested this on other distributions.)

I can't conclude from the standard that cl_long in C should have an alignment of 8 (only for the OpenCL long type this is required), but maybe I've just missed something?
But doesn't a cl_long without well defined alignment somehow break its purpose?

A small program to show the alignments of some types:

#define CL_TARGET_OPENCL_VERSION 220
#include <CL/opencl.h>
#include <stdio.h>

struct cl_int_long {
        cl_int a;
        cl_long b;
};

#define S(t) #t
#define X(t) printf("%20s  sizeof()=%2d _Alignof()=%2d\n", S(t), sizeof(t), _Alignof(t))

int main()
{
        X(int);
        X(long);
        X(int64_t);
        X(float);
        X(double);
        X(cl_int);
        X(cl_long);
        X(cl_long2);
        X(cl_float);
        X(cl_double);
        X(struct cl_int_long);
}
// platform: x86-64
                 int  sizeof()= 4 _Alignof()= 4
                long  sizeof()= 8 _Alignof()= 8
             int64_t  sizeof()= 8 _Alignof()= 8
               float  sizeof()= 4 _Alignof()= 4
              double  sizeof()= 8 _Alignof()= 8
              cl_int  sizeof()= 4 _Alignof()= 4
             cl_long  sizeof()= 8 _Alignof()= 8
            cl_long2  sizeof()=16 _Alignof()=16
            cl_float  sizeof()= 4 _Alignof()= 4
           cl_double  sizeof()= 8 _Alignof()= 8
  struct cl_int_long  sizeof()=16 _Alignof()= 8
// platform: x86-64 with -m32
                 int  sizeof()= 4 _Alignof()= 4
                long  sizeof()= 4 _Alignof()= 4
             int64_t  sizeof()= 8 _Alignof()= 4
               float  sizeof()= 4 _Alignof()= 4
              double  sizeof()= 8 _Alignof()= 4
              cl_int  sizeof()= 4 _Alignof()= 4
             cl_long  sizeof()= 8 _Alignof()= 8
            cl_long2  sizeof()=16 _Alignof()=16
            cl_float  sizeof()= 4 _Alignof()= 4
           cl_double  sizeof()= 8 _Alignof()= 8
  struct cl_int_long  sizeof()=16 _Alignof()= 8
// platform: i386
                 int  sizeof()= 4 _Alignof()= 4
                long  sizeof()= 4 _Alignof()= 4
             int64_t  sizeof()= 8 _Alignof()= 4
               float  sizeof()= 4 _Alignof()= 4
              double  sizeof()= 8 _Alignof()= 4
              cl_int  sizeof()= 4 _Alignof()= 4
             cl_long  sizeof()= 8 _Alignof()= 4 !!!
            cl_long2  sizeof()=16 _Alignof()=16
            cl_float  sizeof()= 4 _Alignof()= 4
           cl_double  sizeof()= 8 _Alignof()= 4 !!!
  struct cl_int_long  sizeof()=12 _Alignof()= 4 !!!
// platform: armhf
                 int  sizeof()= 4 _Alignof()= 4
                long  sizeof()= 4 _Alignof()= 4
             int64_t  sizeof()= 8 _Alignof()= 8
               float  sizeof()= 4 _Alignof()= 4
              double  sizeof()= 8 _Alignof()= 8
              cl_int  sizeof()= 4 _Alignof()= 4
             cl_long  sizeof()= 8 _Alignof()= 8
            cl_long2  sizeof()=16 _Alignof()=16
            cl_float  sizeof()= 4 _Alignof()= 4
           cl_double  sizeof()= 8 _Alignof()= 8
  struct cl_int_long  sizeof()=16 _Alignof()= 8
@bashbaug
Copy link
Contributor

FWIW, we used to have alignment attributes on these types, but removed them a couple of years ago:

#40

The POCL issue linked above seems to be about passing structs from host code to kernel code. Is that correct? Just trying to understand the bounds of the problem. Thanks!

@anbe42
Copy link
Author

anbe42 commented Jan 29, 2021

The test in question is https://github.com/pocl/pocl/blob/master/tests/regression/test_structs_as_args.cpp
The kernel gets a struct as argument: kernel void test_kernel(test_struct input, global int* output), the struct is declared in C at line 55 and at line 79 in the OpenCL kernel.

The error output is F(4: 0 != 5) F(5: -2147483648 != 6) F(6: 0 != 7) F(7: 0 != 8), which means

struct test_struct {
    cl_int elementA;  // OK
    cl_int elementB;  // OK
    cl_long elementC;  // OK
    cl_char elementD;  // OK
    // C inserts 3 bytes padding on i386 and 7 bytes elsewhere; OpenCL kernel inserts 7 bytes padding
    cl_long elementE;  // host set value 5, kernel read 0 (after truncation to (int))
    cl_float elementF;  // host set value 6, kernel read some fp value, after conversion to (int) that yielded -2147483648 (aka CL_INT_MIN)
    cl_short elementG;  // host set value 7, kernel read 0
    cl_double elementH;  // host set value 8, kernel read 0
};

I think dropping the __attribute__((aligned(8))) was wrong (at least when "funny ABIs" like the SysV I386 ABI are in use, I don't know about Windows).

Should there be tests for alignment expectations and struct paddings?

@bashbaug
Copy link
Contributor

Thank you for your patience and sorry for the slow reply.

I'm not sure that there is a "right answer" to this problem.

On the one hand, including the alignment attribute is convenient, assuming it it works, and when using the "cl"-prefixed types.

On the other hand, including the alignment attribute may not be supported everywhere, has its own set of problems (the GCC STL warning mentioned in #40 was especially ugly), and doesn't "just work" when using standard C or C++ types vs. the "cl"-prefixed types.

Could / should alignment attributes be placed on the struct members in the test host code itself (and perhaps in the device code also?) to ensure a consistent layout between the host and the device?

@anbe42
Copy link
Author

anbe42 commented Feb 17, 2021

What about a patch like https://gist.github.com/anbe42/3e3c6fadf25b0619abaa8206b46efc87 ?
It only adds back the __attribute__((aligned(8))) (and therefore the 'ignoring attributes on template argument' warning in C++) on the types and platforms where the alignment of the underlying types is weaker than mandated by OpenCL.

That does solve the alignment inconsistency for the cl_* types and it fixes struct {cl_int; cl_long;};, but it does not fix std::tuple<cl_int, cl_long> (but at least you get a (confusing) warning in this case). (Can I even pass std::tuple<> to a kernel?)

I do not know whether this issue affects the 32-bit windows world and if it needs/can/should be fixed there, too.

Of course it can be fixed in the struct used in the pocl testcase, but if writing portable OpenCL code using structs requires manual struct alignment that would really be a pity:

struct cl_int_long {
        cl_int a;
        cl_long b __attribute__((aligned(8)));
};

@anbe42
Copy link
Author

anbe42 commented Feb 17, 2021

Enhanced version of my test program, supporting both C and C++:

#define CL_TARGET_OPENCL_VERSION 220
#include <CL/opencl.h>
#include <stdio.h>

#ifdef __cplusplus
#include <tuple>

typedef std::tuple<cl_int, cl_long> tuple_cl_int_long;
typedef std::tuple<cl_int, cl_int2> tuple_cl_int_int2;

#define myalignof alignof
#else
#define myalignof _Alignof
#endif

struct cl_int_long {
        cl_int a;
        cl_long b;
};

struct cl_int_long_aligned {
        cl_int a;
        cl_long b __attribute__((aligned(8)));
};

#define S(t) #t
#define S_(t) S(t)
#define X(t,a) do { \
        printf("%30s  sizeof()=%2zd  " S_(myalignof) "()=%2zd", S(t), sizeof(t), myalignof(t)); \
        if (myalignof(t) != (a)) printf(" != %d", (int)(a)); \
        printf("\n"); \
        } while(0)

#ifdef __i386__
#define i386_align(i) 4
#else
#define i386_align(i) (i)
#endif

int main()
{
        X(int, 4);
        X(long, i386_align(sizeof(long)));
        X(int64_t, i386_align(8));
        X(float, 4);
        X(double, i386_align(8));
        X(cl_int, 4);
        X(cl_int2, 8);
        X(cl_long, 8);
        X(cl_long2, 16);
        X(cl_float, 4);
        X(cl_double, 8);
        X(struct cl_int_long, 8);
        X(struct cl_int_long_aligned, 8);
#ifdef __cplusplus
        X(tuple_cl_int_long, 8);
        X(tuple_cl_int_int2, 8);
#endif
}

anbe42 added a commit to anbe42/pocl that referenced this issue Jun 29, 2023
 141/143 Test  pocl#85: regression/struct_kernel_arguments ..................................................***Failed  Error regular expression found in output. Regex=[FAIL]  4.58 sec
 CMake Error at /build/pocl-1.4/cmake/run_test.cmake:34 (message):
  FAIL: Test exited with nonzero code (1):
  /build/pocl-1.4/obj-i686-linux-gnu/tests/regression/test_structs_as_args

  STDOUT:

  F(4: 0 != 5) F(5: -2147483648 != 6) F(6: 0 != 7) F(7: 0 != 8)

  STDERR:

 -- OK

on i386, the default alignment is 4 for 64-bit types, too

the OpenCL standard is only explicit about alignment requirements for
OpenCL types, but not for the corresponding cl_* types in C
KhronosGroup/OpenCL-Headers#149

fixes: pocl#801
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants