Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CBLAS/LAPACKE extension for 64bit integer #666

Closed
mkrainiuk opened this issue May 5, 2022 · 31 comments
Closed

CBLAS/LAPACKE extension for 64bit integer #666

mkrainiuk opened this issue May 5, 2022 · 31 comments

Comments

@mkrainiuk
Copy link
Contributor

mkrainiuk commented May 5, 2022

Intro

CBLAS/LAPACKE wrappers support 32bit integer and 64bit integer with the same function names located in different libraries but this approach does not allow mixing both libraries in one environment because of the symbols conflict.

This FR is for discussing potential solution to avoid this problem and have stable work in any environment for the applications/libraries that has dependency on CBLAS/LAPACKE built with 64bit integer or any other projects with similar API.

Problem

Similar to Fortran API where integer with no size specified is used, CBLAS/LAPACKE API integer type can be selected by special build option during compilation.
Example for CBLAS:

#ifdef WeirdNEC
#define CBLAS_INT int64_t
#else
#define CBLAS_INT int32_t
#endif

Since C/Fortran symbols do not reflect the data type in the name in general (like C++), linker cannot detect if the resolved symbols has correct Integer type. As the result symbols mismatch could cause unexpected behavior, like incorrect results, data corruption, and segmentation fault.

Example:

A plugin "X" uses CBLAS API with 64bit integer type because it works with number of elements that does not fit to 32bit. Another plugin "Y" loads "libcblas.so" CBLAS library built with 32bit integer as a dependency in the complex application environment, As first loaded "libcblas.so" library will be used for the first plugin "X" which will cause segmentation fault during application execution.

Related Work

Proposal

  • Define a suffix for API with 64bit integer support so that users can control integer type and right API call in the code and avoid any runtime conflicts

    • 64bit integer API was selected to be extended with the suffix because 32bit integer API is default one for the most of the libraries. So it could be easier to change only code where 64bit integer API is required, and keep default API unchanged.
  • Suffix considerations

    • Julia uses OpenBLAS with 64_ suffix for all C and Fortran symbols.
      Please note, from the API perspective it will be two different suffixes for C and Fortran APIs:
      • _64 for Fortran API, because e.g., snrm2/snrm2_64 functions will be converted to snrm2_/snrm2_64_ symbols (default gfortran/ifort compilers behavior on Unix)
      • 64_ for C API in order to get same symbol suffix, because e.g. cblas_snrm2/cblas_snrm264_ functions will be converted to cblas_snrm2/cblas_snrm264_ symbol (default gcc/icc/clang compilers behavior)
    • FlexiBLAS considered adding _64, it will improve readability for function names with numbers, e.g.: for API DGGES3 it is better to have DGGES3_64 instead of DGGES364.
    • Intel oneMKL and NVIDIA cuBLAS use _64 suffix
  • Proposed suffix for CBLAS/LAPACK API: _64
    Please note, _64 must be at the end of the function name for the symbol identification simplicity, including cases like LAPACKE_dgeqrf_work, ILP64 API will be LAPACKE_dgeqrf_work_64, not LAPACKE_dgeqrf_64_work

Example:

// Current API
float      cblas_snrm2(const CBLAS_INT N, const float *X, const CBLAS_INT incX);
lapack_int LAPACKE_dgeev( int matrix_layout, char jobvl, char jobvr, lapack_int n, double* a, lapack_int lda, ... );

// 64bit fixed width integer API
float   cblas_snrm2_64(const int64_t N, const float *X, const int64_t incX); 
int64_t LAPACKE_dgeev_64( int matrix_layout, char jobvl, char jobvr, int64_t n, double* a, int64_t lda, ... );

int main() {
int64_t N, incX;
int32_t n, lda, ...;
...
cblas_snrm2_64(N, X, incX);
LAPACKE_dgeev(matrix_layout, jobvl, jobv, n, a, lda, ...);
...
}

Other Considerations

  • Update APIs that do not have dependency on the Integer type, e.g.
    void cblas_srotg(float *a, float *b, float *c, float *s);
  • Do we need to consider other data types that could benefit from this approach?
  • Add function mangling to the header so that users can convert names for all functions in the application during compilation, same way as currently it's supported for integer type: 32bit by default, 64bit if special macro is defined.
  • Extend Fortran API. It's partially should be done anyway in order to enable extensions for CBLAS/LAPACKE wrappers
@mkrainiuk
Copy link
Contributor Author

Summon people participated in the discussions mentioned in the Related Work section:
Julia @staticfloat @ViralBShah
FlexiBLAS @grisuthedragon
BLIS @devinamatthews
OpenBLAS @xianyi

@ViralBShah
Copy link

ViralBShah commented Jun 16, 2023

For Julia: @amontoison
For OpenBLAS: @martin-frbg

Also related: #824

@martin-frbg
Copy link
Collaborator

Yes, I'm lurking... OpenBLAS currently does it by running a big objcopy after the initial build, which is far from ideal. And standardization of the expected ILP64 suffix across (at least) Julia/Go/(Num|Sci)Py is obviously desirable.

@grisuthedragon
Copy link
Contributor

And standardization of the expected ILP64 suffix across (at least) Julia/Go/(Num|Sci)Py is obviously desirable.

Ok, I compiled a bit what @mkrainiuk proposed, some of us already discussed in mpimd-csc/flexiblas#12, and what I discussed with a colleague.

Let us distinguish between suffixed and not suffixed. First, we have the ones without suffixes:

  • libblas.so - This contains the 32-bit integer build with no suffixes. This is the default name and maintains compatibility with old codes, makefiles, build systems. So no one with old code will be affected by the change.
  • libblas64.so - This contains the 64-bit integer build without suffixes.

I would like to introduce the 32-bit and 64-bit builds with suffixes. This could be for example

  • libblas_32.so - The 32-bit integer build with suffixes.
  • libblas_64.so - The 64-bit integer build with suffixes.

Regarding the rules for suffixes for each symbol, my suggestion is this. We look at all symbols from the point of view of their native interface. That is, before the compiler does any name mangling. This means that the normal BLAS and LAPACK routines are treated in the Fortran context, and CBLAS and LAPACK from a C perspective. This keeps the approach invariant even under strange compilers and ABI definition, like using IBM/Sun compilers or macOS(like @ViralBShah reported there: mpimd-csc/flexiblas#12 (comment) )

As for the suffixes themselves, I thought of the following:

  • BLAS/LAPACK: DGEMM -> DGEMM_32 and DGEMM_64. The addition of the underscore is important so that routines like DGEQP3 do not end in DGEQP332 and DGEQP364.
  • CBLAS: cblas_dgemm -> cblas_dgemm_32 and cblas_dgemm_64
  • LAPACKE: LAPACKE_dgeqrf -> LAPACKE_dgeqrf_64. But in case of the _work routines, we could either go to LAPACKE_GEQRF_WORK -> LAPACKE_GEQRF_64_WORK or LAPACKE_GEQRF_WORK -> LAPACKE_GEQRF_WORK_64. I prefer the latter for simplicity when writing macros.

@martin-frbg
Copy link
Collaborator

I do not like the additional complexity of having a libblas64.so with non-suffixed symbols, but I guess it is inevitable for backwards compatibility ? Fully agreed on the leading underscore, and on not treating the "work" interfaces as something special - after all, they are (or may be) using ILP64 internally themselves, not just doing some work for another function that happens to be ILP64.
However, do we really need the _32 suffixes too, or could we just assume that any non-suffixed symbol is an implied _32 ? (I have to admit that enforcing suffixes would solve a potential problem with purely internal functions in OpenBLAS without having to mark them as hidden)

Cc @rgommers for NumPy and SciPy as the discussion seems to be gravitating towards here (we had something similar planned on a smaller scope) Maybe @kortschak or @vladimir-ch for Gonum if there is interest ?

@kortschak
Copy link

Gonum has a focus on floats, so I'm not sure how much interest there is in this from us.

@martin-frbg
Copy link
Collaborator

Thanks - discussion is entirely about large array addressing not fixed-point math.

@grisuthedragon
Copy link
Contributor

I do not like the additional complexity of having a libblas64.so with non-suffixed symbols, but I guess it is inevitable for backwards compatibility ?

That allows to recompile projects with -fdefault-integer-8 or -i8 without changing any code (as long as it is purely Fortran)

Fully agreed on the leading underscore, and on not treating the "work" interfaces as something special - after all, they are (or may be) using ILP64 internally themselves, not just doing some work for another function that happens to be ILP64. However, do we really need the _32 suffixes too, or could we just assume that any non-suffixed symbol is an implied _32 ? (I have to admit that enforcing suffixes would solve a potential problem with purely internal functions in OpenBLAS without having to mark them as hidden)

The _32 suffixes is only for completeness, so I have no problem to neglect this variant.

@grisuthedragon
Copy link
Contributor

grisuthedragon commented Jun 20, 2023

@mkrainiuk
I saw you added the first steps towards the _64 in 4c2236a. Doing some tests show up with the following problems:

  • the _64 is built although BUILD_INDEX64=OFF
  • the _64 is built together with the symbols without the suffix. Thus, linking 32 bit and 64 bit interfaces is impossible again.
  • I do not know if the way through the preprocessor and the command line is the best. Let's consider the following situation: We have a maximum command line length between 32kB and 2MB. In the case of 32kb and around 2000 BLAS/LAPACK routines, we have 16 characters for each definition in average. Having -DDGEQP3=DGEQP3_64 we get 19 characters (include a whitespace to separate them. This leads to strange situations in some platforms. I think a better way would be to provide a header-file containing the translation.

In general, it looks as it fits the suggestions we made here.

@mkrainiuk
Copy link
Contributor Author

Thanks for the great feedback!

Hi @grisuthedragon

  • the _64 is built although BUILD_INDEX64=OFF

BUILD_INDEX64 and BUILD_INDEX64_EXT_API are completely independent options:

  • The main idea of BUILD_INDEX64_EXT_API option is to build additional set of symbols with "_64" suffix for ILP64 only that can be mixed with standard API for LP64 in one application or in one library.
  • BUILD_INDEX64 option changes the integer type from 32-bit to 64-bit for the standard API and the library name from cblas to cblas64, but in some cases having different library names can't guarantee that there won't be symbol conflicts when both libraries are loaded to the same env. So I guess the long term solution could be migrating to "_64" API for ILP64 and dropping BUILD_INDEX64 option completely so that the standard API will be always for LP64.
  • the _64 is built together with the symbols without the suffix. Thus, linking 32 bit and 64 bit interfaces is impossible again.

It's possible when the standard API is built for LP64 and the extended _64 API is built for ILP64, but it requires some code modifications for using _64 in an application:

int32_t m1, n1, lda1, incx1, incy1;
int64_t m2, n2, lda2, incx2, incy2;
...
cblas_dgemm(CblasColMajor, CblasNoTrans, m1, n1, ... );     //LP64 API
cblas_dgemm_64(CblasColMajor, CblasNoTrans, m2, n2, ...);   //ILP64 API
  • I do not know if the way through the preprocessor and the command line is the best. Let's consider the following situation: We have a maximum command line length between 32kB and 2MB. In the case of 32kb and around 2000 BLAS/LAPACK routines, we have 16 characters for each definition in average. Having -DDGEQP3=DGEQP3_64 we get 19 characters (include a whitespace to separate them. This leads to strange situations in some platforms. I think a better way would be to provide a header-file containing the translation.

This is a good point, I considered new header file, but it requires manual update for any new function, so for CBLAS since the number of functions is relatively small I decided to generate the macro on the fly instead. I agree that this approach won't work for LAPACKE, so I'm looking for another solution that also can make this Fortran symbol renaming automatically during the build.

@grisuthedragon
Copy link
Contributor

grisuthedragon commented Jun 22, 2023

@mkrainiuk

BUILD_INDEX64 option changes the integer type from 32-bit to 64-bit for the standard API and the library name from cblas to cblas64, but in some cases having different library names can't guarantee that there won't be symbol conflicts when both libraries are loaded to the same env. So I guess the long term solution could be migrating to "_64" API for ILP64 and dropping BUILD_INDEX64 option completely so that the standard API will be always for LP64.

From my opinion, as mentioned before, we need three build variants. The standard build, without any special options, suffixes or similar one, that leads to the LP64 variant, as you mentioned. Then an ILP64 built without suffixes, (and 64 or _ilp64 added to the library name). This is required to rebuilt applications with -i8 or -fdefault-integer8 without code changes. And finally, the ilp64 build with suffixes, that, as you mentioned, should be the standard for ilp64 usage.
As far as I know from Julia (@ViralBShah) they are using the suffixed ilp64 library internally and thus it makes not problem to load other code that is linked against a LP64 BLAS. From this point of view, having the non-suffixed and the suffixed symbols in one library would be bad.

but in some cases having different library names can't guarantee that there won't be symbol conflicts when both libraries are loaded to the same env.

That's something nobody can guarantee and it is up to the programmer.

This is a good point, I considered new header file, but it requires manual update for any new function, so for CBLAS since the number of functions is relatively small I decided to generate the macro on the fly instead. I agree that this approach won't work for LAPACKE, so I'm looking for another solution that also can make this Fortran symbol renaming automatically during the build.

Since the set of functions changes only slowly, one can provide such a header file and eventually a script, which generates a new header from all the sources. One advantage of such a header file would be that one can provide it to the user as well to allow an easy migration between LP64 and ILP64 mode.

@langou
Copy link
Contributor

langou commented Jun 22, 2023

I agree with @martin-frbg that the set of functions in LAPACK and LAPACKE changes slowly. For LAPACKE, at this point, there is no script that I am aware of and we are writing the LAPACKE layer functions by hand. This is possible because we only add a few routines at each release. Adding a layer related to LP64 / ILP64 / etc variants by hand would not be too much of an overkill. There is an exponential growth here though, but that would work. I am not asking for more work by hand but we are already generating S, C, D, Z by hand, and while not ideal that work-ish. The point is that LAPACK is slowly growing. @martin-frbg is correct.

@martin-frbg
Copy link
Collaborator

Different Martin but of course I agree with him :)

@mkrainiuk
Copy link
Contributor Author

From my opinion, as mentioned before, we need three build variants. The standard build, without any special options, suffixes or similar one, that leads to the LP64 variant, as you mentioned. Then an ILP64 built without suffixes, (and 64 or _ilp64 added to the library name). This is required to rebuilt applications with -i8 or -fdefault-integer8 without code changes. And finally, the ilp64 build with suffixes, that, as you mentioned, should be the standard for ilp64 usage.
As far as I know from Julia (@ViralBShah) they are using the suffixed ilp64 library internally and thus it makes not problem to load other code that is linked against a LP64 BLAS. From this point of view, having the non-suffixed and the suffixed symbols in one library would be bad.

First two build variants are supported. Could you please share more details why two sets of symbols in one library would be bad? If ones load one set of symbols from the library (without suffixes) and do not load another set (with suffixes) or vise versa what kind of problems it could cause?

@mkrainiuk
Copy link
Contributor Author

That's something nobody can guarantee and it is up to the programmer.

Right, but with the suffixes we can give a chance to programmers to ensure the correct ILP64 symbols are always used.

Since the set of functions changes only slowly, one can provide such a header file and eventually a script, which generates a new header from all the sources. One advantage of such a header file would be that one can provide it to the user as well to allow an easy migration between LP64 and ILP64 mode.

Agree, so the header file with manual updating could work too if I won't find a nice automatic solution.

@grisuthedragon
Copy link
Contributor

The case where both symbols in one Library cause problems is easily constructed... On the one hand Julia, Python load it via dlopen and flags like RTLD_GLOBAL and RTLD_NOW can be specified. And on the other hand strange cross dependencies over third level projects Like qrupdate, arpack,... In combination with different linker options and orders this leads to hard-to-debug problems.

@ViralBShah
Copy link

ViralBShah commented Jun 22, 2023

We implement the _64 suffix symbols in our LAPACK in Julia in a grotesque way: https://github.com/JuliaPackaging/Yggdrasil/blob/feaab2720976d2db53b80d408a0fd19a1f5042d1/L/LAPACK/common.jl#L291

Also, Apple is using a different convention in Accelerate for LAPACK ILP64. We use libblastrampoline to dispatch to those routines on macOS: JuliaLinearAlgebra/libblastrampoline#113

@mkrainiuk
Copy link
Contributor Author

mkrainiuk commented Jun 22, 2023

The case where both symbols in one Library cause problems is easily constructed... On the one hand Julia, Python load it via dlopen and flags like RTLD_GLOBAL and RTLD_NOW can be specified. And on the other hand strange cross dependencies over third level projects Like qrupdate, arpack,... In combination with different linker options and orders this leads to hard-to-debug problems.

I'd expect having explicit _64 in the symbol name could help in the described case, because regardless of the loaded library it always points to ILP64 implementation vs standard name that could be either LP64 or ILP64, depends on what library is picked up for the symbol resolution.

@mkrainiuk
Copy link
Contributor Author

Also, Apple is using a different convention in Accelerate for LAPACK ILP64. We use libblastrampoline to dispatch to those routines on macOS: JuliaLinearAlgebra/libblastrampoline#113

Thank you for bringing it up. It's an interesting approach to add for the Relative Work.

@rgommers
Copy link

Hi all, thanks for the very useful discussion and progress on this topic.

The issue description is pretty clear about the two ways this is currently done (the Julia/OpenBLAS way and the MKL/cuBLAS way), and proposes to go with the MKL/cuBLAS way - which is implemented in the master branch of this repo since a few weeks. However I think it did leave out some relevant context on other projects, as well as on the work needed to adapt to the choice. So I'd like to delve into that a bit to make sure that we're indeed all on the same page and will actually be able to converge to what's decided here.

C/Fortran API naming vs binary symbol naming

The _64 MKL style proposal starts from the API name: it appends _64 for both the C and Fortran APIs, and then the binary symbol names become that plus whatever compiler mangling makes of that. E.g. for gfortran on Linux: dgemm + _64 + _ -> dgemm_64_.

The 64_ Julia/OpenBLAS style applies compiler mangling first, and then appends the suffix. E.g, for gfortran on Linux: dgemm + _ + 64_.

For the most important/common cases we get a single trailing underscore and hence end up with the same binary symbol names for BLAS. And different ones for CBLAS:

suffix choice base API name binary symbol name call from Fortran code call from C code
MKL _64 dgemm dgemm_64_ dgemm_64(...) dgemm_64_(...)
OpenBLAS 64_ dgemm dgemm_64_ dgemm_64(...) dgemm_64_(...)
MKL _64 cblas_dgemm cblas_dgemm_64 n/a cblas_dgemm_64(...)
OpenBLAS 64_ cblas_dgemm cblas_dgemm64_ n/a cblas_dgemm64_(...)

The story for LAPACK/LAPACKE will be the same; LAPACK will match, LAPACKE won't.

Current status

When building current master of this repo on Linux with gcc/gfortran, we get:

$ # build with: cmake -DBUILD_INDEX64=ON -DBUILD_SHARED_LIBS=ON -DCBLAS=ON
$ nm -gD libblas64.so | rg dgemm 
0000000000027df0 T dgemm_
0000000000098040 T dgemm_64_
$ nm -gD libcblas64.so | rg dgemm
00000000000109e0 T cblas_dgemm
0000000000022920 T cblas_dgemm_64

For NumPy/SciPy we build OpenBLAS with make INTERFACE64=1 SYMBOLSUFFIX=64_ (and distribute that shared library), which gives:

$ nm -gD libopenblas64_.so | grep dgemm    # partial output with relevant BLAS symbols:
dgemm_64_
cblas_dgemm64_

Julia does the same as NumPy/SciPy. I downloaded Julia 1.9.2 (the latest release) and it has a libopenblas64_.so bundled, it contains:

$ nm -gD lib/julia/libopenblas64_.so | rg dgemm
0000000000145970 T cblas_dgemm64_
0000000000143e70 T dgemm_64_

So as in the table higher up, the BLAS symbols match with reference BLAS with _64, while the CBLAS symbols don't.

If we'd instead use _64 as the symbol suffix and build OpenBLAS with $ make INTERFACE64=1 SYMBOLSUFFIX=_64, we'd get:

$ nm -gD libopenblas_64.so | rg dgemm
00000000000a3430 T cblas_dgemm_64
00000000000a0a40 T dgemm__64

Now the CBLAS symbol name matches, but the BLAS one doesn't (which is worse).

For completeness I also checked what R is doing; they don't have ILP64 support in the source code of their main code base as far as I can tell. They also don't distribute Linux binaries themselves, and Windows/macOS are standalone installers - so not much to worry about there.

Finally also note that for the OpenBLAS scheme:

  • the generated library name is libopenblas64_ today for the Julia/NumPy style,
  • the pkg-config file name is always openblas64.pc independent of what suffix is specified at build time.

History

Given that the issue description here only mentions Julia for the 64_ option, I think it's useful to extend that a bit:

  • The current 64_ scheme seems to have come from SunPerf BLAS
  • It is also used by SuiteSparse
  • Julia implemented it that way because of SuiteSparse (according to this comment)
  • That approach was "standardized" in OpenBLAS in 2015-16, with Standardize ILP64 SONAME and symbols suffix OpenMathLib/OpenBLAS#646 being the central issue
  • Fedora implemented it like that as well
  • NumPy and SciPy implemented support for this ILP64 following that same scheme in 2019-2020

Regarding other open source projects that considered the symbol suffix topic:

Impact & changes needed to adapt to _64

First let me emphasize that any decision here is much better than no decision. And that while both schemes work, the _64 MKL style one does seem a bit cleaner. That said, it seems like going that way will cause a significant amount of work, more so than staying with the more widely used OpenBLAS-style 64_. OpenBLAS, NumPy/SciPy, and Julia are used more widely and built from source in a larger number of places (SuiteSparse I don't really know), and also the distribution model of such binaries is more complicated. Updating MKL in comparison would be straightforward since being proprietary it's effectively a single set of binaries that are redistributed in some packaging systems, and it already has support for multiple symbols (e.g. it contains both dgemm_64 and dgemm_64_ already, with one being an alias of the other).

Changes that will be needed include:

OpenBLAS:

  • Adding a new build option in the OpenBLAS gmake and CMake build systems to support _64.
    • Changing the way the current INTERFACE64 and SYMBOLSUFFIX options work is probably more confusing, and they may still be needed for a while by other projects.
  • Documenting this new method as the recommended default for ILP64 builds.
  • Independent of the decision here (I think), there's work that would be quite nice to have to allow have both LP64 and ILP64 symbols in a single library.
    • It's not entirely clear if that work will be coupled directly to this topic because of the changes to add _64 symbols into LP64 libblas builds even with BUILD_INDEX64=OFF (change present in master now but not in the last release I believe, as discussed above)

NumPy/SciPy:

  • Updating the OpenBLAS builds at https://github.com/MacPython/openblas-libs (those are the ones vendored into wheels on PyPI) to use _64 (name it libopenblas_64.so, that name is not in use yet)
  • Updating the demuxing layer for _64 in SciPy (same concept as libblastrampoline in Julia and what FlexiBLAS does)
  • Build system support and updates needed to vendor libopenblas_64.so into wheels to distribute on PyPI.
    • This will include either updating or dropping npy_cblas.h, which does support suffixes and NumPy has to carry because of patchy vendor support for cblas.h in the past.

Since it's only the CBLAS symbol names that will change for the situations that actually matter in practice, I'm hopeful that rolling out this change isn't going to be too disruptive. Any hiccups are likely going to be due to the limited support for shared libraries in Python wheels - we're going to get situations where we have both libopenblas64_.so and libopenblas_64.so loaded in the same process (e.g., new NumPy version switches to _64, user imports older SciPy version, both vendor OpenBLAS as a shared library).

That assumes of course that we only need to deal with Fortran compilers that append a single underscore. If users get issues with older or more esoteric Fortran compilers, that may need more work.

Julia:

I won't hazard a detailed guess, but given that Julia binaries also vendor libopenblas64_.so and libblastrampoline is similar to SciPy's layer, it's probably similar to what I wrote above for NumPy/SciPy.

One extra impact may be due to SuiteSparse, since Julia uses that while NumPy/SciPy doesn't.

It's also not uncommon for Julia users to mix Julia and Python I believe (perhaps less common than some years ago though?). I'm not sure if that may result in extra symbol clashes; right now Julia and NumPy use the exact same scheme.


I hope the above sounds correct to everyone. Given that the work needed to adapt to this wasn't detailed out before as far as I can tell, it'd be good to hear that this is okay with everyone and that they're fine with making those changes. @martin-frbg for OpenBLAS and @ViralBShah for Julia in particular I think, WDYT?

@staticfloat
Copy link

I will note the one big reason for why Julia went with the alternate mangling; it's so that FORTRAN code that wants to link to these symbols can do so easily. gfortran automatically appends an underscore to all symbol names (which is why most BLAS APIs use dgemm_ as the symbol name in the first place, and why the CBLAS names do not have the trailing underscore). In order to take an older FORTRAN code and link it against a new ILP64 library, it's relatively straightforward to tell the compiler to redefine dgemm to dgemm_64 (and then the compiler adds its ending underscore, resulting in the name dgemm_64_. If instead you have a name such as dgemm_64 exported from your BLAS library, it's more difficult to force FORTRAN libraries to link against it, and requires source code changes rather than just passing -fdefault-integer-8 -Ddgemm=dgemm_64 at the compiler command line.

For the Julia world in particular, because we have libblastrampoline that already has advanced name-remapping capabilities, whatever decision is chosen here will be fine; we are shifting more and more of our numerical ecosystem to using LBT as the BLAS/LAPACK translation layer anyway. But for other ecosystems, I do encourage that they adopt this naming convention, as it reduces the friction necessary for other users, despite its ugly appearance.

@grisuthedragon
Copy link
Contributor

grisuthedragon commented Jul 19, 2023

* FlexiBLAS seems to be the only one that went with `_64`, in 2020 ([flexiblas#12](https://github.com/mpimd-csc/flexiblas/issues/12))

I did not implement anything in FlexiBLAS yet, since I want to see the proper solution in the reference implementation first. But I prefer the MKL style, since I works independent from the compiler's name mangling scheme and thus gives a cleaner view on the whole thing. Even though, many projects rely on the Fortran API, having consistent names in C part is necessary as well.

As soon as we have a proper standard, FlexiBLAS will implement this, but with a small difference to the stuff implemented at the moment in the master branch: Each API variant will result in a separate library, as described above.

Although the SunPerf library is mostly mentioned as the first occurrence of the suffixed symbols, and up to my knowledge SuiteSparse is the only project, which supports it, we can safely ignore this. This library and its hardware can be seen as legacy stuff. Especially the SunPerf BLAS approach leads to strange function names like DGGES364 or DGEQP364, which are not desirable from my point of view.

Adjusting the symbol names in Julia and NumPy/SciPy should not be a problem since they resolve the symbols at runtime and thus the symbols name could be mangled on the fly to fit the library.

IMHO the Julia/OpenBLAS way is relies too much on the name mangling done by gfortran and was implemented, as @ViralBShah said, a bit in a grotesque way.

@staticfloat
Copy link

As soon as we have a proper standard, FlexiBLAS will implement this, but also provide a ILP64 library without suffixes, to support rebuilding applications with -fdefault-integer-8.

While that can be useful, I highly encourage library developers to not make this the default, as it tends to cause problems on operating systems that load libraries with RTLD_GLOBAL-like semantics by default (e.g. Linux). It means that if you're in a position where you may load two separate BLAS libraries at once (e.g. you load FlexiBLAS and MKL via import numpy or similar) you run the risk of symbol confusion that can result in segfaults. This is not a problem if you know a-priori what libraries your entire program will load, however if there is a chance that somewhere someone will dlopen() something, please ensure that all ILP64 symbols are namespaced in some way.

Adjusting the symbol names in Julia and NumPy/SciPy should not be a problem since they resolve the symbols at runtime and thus the symbols name could be mangled on the fly to fit the library.

We have spent a lot of time and energy coming up with a naming scheme that is consistent, easily transformable from existing source code, works with a variety of compilers/languages (C, FORTRAN, etc...) and protects against symbol confusion. As said before, we use LBT to translate from other naming conventions to this one, so at some level we can adapt to anything that is decided here, but I think it highly likely that all of the software that is being built in the Julia ecosystem that uses BLAS/LAPACK will continue to be built to the current naming interface, so as to be as useful as possible to other projects, whether they be written in Julia, C, or FORTRAN.

@grisuthedragon
Copy link
Contributor

While that can be useful, I highly encourage library developers to not make this the default, as it tends to cause problems on operating systems that load libraries with RTLD_GLOBAL-like semantics by default (e.g. Linux). It means that if you're in a position where you may load two separate BLAS libraries at once (e.g. you load FlexiBLAS and MKL via import numpy or similar) you run the risk of symbol confusion that can result in segfaults. This is not a problem if you know a-priori what libraries your entire program will load, however if there is a chance that somewhere someone will dlopen() something, please ensure that all ILP64 symbols are namespaced in some way.

Sure, from a software development point of view that is a horrible thing. But I still have to deal with researchers and their code and there somebody says "Can we try this for larger examples" and thus the whole code gets compiled with the increased integer flag. For this reason the "dangerous" variant of the library is required. For all other cases, and proper software development, the suffixed API should be the way to go.

@staticfloat
Copy link

there somebody says "Can we try this for larger examples" and thus the whole code gets compiled with the increased integer flag.

I totally understand these kind of constraints. This is another reason why I suggest naming conventions that can be easily used within the constraints of compiler name mangling rules. In the FORTRAN example, we recompile ancient code all the time by simply adding a series of -D flags to redefine dgemm to dgemm_64, as in the example I gave above. In fact, many of our third party dependencies such as LAPACK are built with these compiler flags, defined for all BLAS and LAPACK symbols. This kind of simple renaming is not possible if we don’t follow the compiler name mangling rules, and will require source code changes in order to rebuild.

@rgommers
Copy link

Thanks for the replies and context @staticfloat and @grisuthedragon.

I will note the one big reason for why Julia went with the alternate mangling; it's so that FORTRAN code that wants to link to these symbols can do so easily.

I'll note that the symbol names that end up in the binary are the same in all common cases (compiler mangling appending a single _), so I don't think that matters for the two alternatives under consideration here - they are equivalent when calling from Fortran code, and should not require source code changes instead of the -fdefault-integer-8 -Ddgemm=dgemm_64 approach.

I did not implement anything in FlexiBLAS yet [....]

Thanks for the correction, good to know.

Adjusting the symbol names in Julia and NumPy/SciPy should not be a problem since they resolve the symbols at runtime and thus the symbols name could be mangled on the fly to fit the library.

I wish that that were true for NumPy/SciPy, but it isn't - it's all determined at build time. SciPy does have a layer which re-exports a C API with stable names, so for other Python packages there's no issue, they can use that. But for NumPy/SciPy it'll be quite a bit of work to adapt to this. Which I'm willing to do, but it's probably going to take a while before it's all done.

Sure, from a software development point of view that is a horrible thing. But I still have to deal with researchers and their code and there somebody says "Can we try this for larger examples" and thus the whole code gets compiled with the increased integer flag. For this reason the "dangerous" variant of the library is required

I think the key thing here is to distinguish between the "researcher wants to try this with limited effort" and the "how do we package BLAS and LAPACK for redistribution" use cases. For the former you may want the dangerous variant, and as a HPC cluster admin or some such role you may make it available to the users you support. But for the latter, you never want to deal with it. We should only ever see libblas_64.so in distros, not libblas64.so. So some docs which recommend what to do for packagers (Linux distros, Homebrew, etc.) would be useful. I was already planning to write those for OpenBLAS; I can contribute them in this repo too if that would be welcome.

@grisuthedragon
Copy link
Contributor

I'll note that the symbol names that end up in the binary are the same in all common cases (compiler mangling appending a single _)

This is the wrong assumption. Regarding the IBM XLF (compilers used on POWER-based HPC systems), there is nothing added to the binaries' symbol names. Thus adding 64_ on the binary level will end in DGEMM64_ and not DGEMM_64 from the Fortran API's point of view. We should not focus on the behavior of gfortran while creating an approach for the symbol names.

@ViralBShah
Copy link

So long as the LAPACK build provides a way to mangle the names with whatever suffix one wants as part of the build process, different projects can take whatever approach works best. In absence of this support in the build, all of us have to resort to crude hacks.

@martin-frbg
Copy link
Collaborator

We haven't seen the LAPACK version of this PR yet, and looking at how the BLAS64 one handles the symbol (re)naming in the sources by resorting to CMAKE copy-and-regexreplace trickery in the build directory instead of preprocessing does not give me the highest hopes. OpenBLAS already finds out how the compiler likes to mangle symbol names, guess I will have to retain at least part of its current objcopy trickery to please everybody, even if I rewrite everything to support simultaneous provision of 32 and 64bit integer interfaces. (That simultaneous presence of blas/blas_64 symbolscould be a good thing, except I expect some distributors will then go ahead and hack it apart again to supply libblas and libblas64 for their alternatives system...)

Curious coincidence that this issue got numbered after the eigenvalue of the beast :)

@mkrainiuk
Copy link
Contributor Author

Hi All,

The PR for LAPACKE is merged now, please share your feedback for the changes, if you have any. If there are no concerns for the current approach I will close this issue.

@mkrainiuk
Copy link
Contributor Author

Closing as completed since the changes were merged and there is no ongoing discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants