Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: HAProxy ca-file manipulation broken since 17c9e92b #8222

Open
wlallemand opened this issue Nov 25, 2024 · 20 comments
Open

[Bug]: HAProxy ca-file manipulation broken since 17c9e92b #8222

wlallemand opened this issue Nov 25, 2024 · 20 comments
Assignees
Labels

Comments

@wlallemand
Copy link

Contact Details

[email protected]

Version

v5.7.4

Description

Hello,

After updating our CI with the latest WolfSSL version, I found out that 2 of our regression tests were broken. v5.7.2 is behaving correctly, but v5.7.4 is broken.

After bisecting this is the first bad commit that introduced the problem:

commit 17c9e92b7f780c7e122b2bfb2a27985a3273354a
Author: Colton Willey <[email protected]>
Date:   Wed Oct 16 16:34:57 2024 -0700

    Initial rewrite of X509 STORE to replicate openssl behavior

The behavior of the X509_STORE does not seem to match the one of OpenSSL anymore, we don't have specific Wolfssl code in this part, we are only using the OpenSSL API.

I didn't try to reproduce the problem outside HAProxy yet, but I could try if that helps.

Reproduction steps

After cloning the master of haproxy and wolfssl in /tmp:

git bisect run sh testhaproxy.sh

testhaproxy.sh:
#!/bin/sh
./autogen.sh
./configure --enable-quic --enable-haproxy --enable-debug
make
cd /tmp/haproxy/
make -j8 TARGET=linux-glibc USE_OPENSSL_WOLFSSL=1 SSL_LIB=/tmp/wolfssl/src/.libs/ SSL_INC=/tmp/wolfssl/ ADDLIB=-Wl,-rpath,/tmp/wolfssl/src/.libs/
make reg-tests -- --debug reg-tests/ssl/new_del_ssl_cafile.vtc

Relevant log output

No response

@wlallemand wlallemand added the bug label Nov 25, 2024
@wlallemand
Copy link
Author

haproxy.wolfssl.debug.txt

Here the output of the reg-test with --enable-debug on wolfssl.

@ColtonWilley
Copy link
Contributor

Hello @wlallemand

I am trying to reproduce using your provided steps but am running into a problem. I tried to google around for a solution but did not find anything, I was wondering if you knew how to resolve these errors so I can reproduce?

make reg-tests -- --debug reg-tests/ssl/new_del_ssl_cafile.vtc

########################## Preparing to run tests ##########################
vtest not found in path, please specify VTEST_PROGRAM environment variable
make: *** [Makefile:1218: reg-tests] Error 1

@wlallemand
Copy link
Author

Hello @ColtonWilley,

Please compile the VTest tool there https://github.com/wlallemand/VTest/tree/haproxy-sd_notify and set VTEST_PROGRAM when launching the reg-test.

VTEST_PROGRAM=/path/to/vtest make reg-tests -- --debug reg-tests/ssl_new_del_ssl_cafile.vtc

I'm trying to isolate the code which does not work correctly so we can have a more precise idea of what's going on.

@wlallemand
Copy link
Author

load_ca.c.txt

Don't bother with haproxy, here is a reproducer.

Compile it this way:

  • OpenSSL: gcc load_ca.c -o load_ca -lssl -lcrypto
  • WolfSSL: gcc load_ca.c -DUSE_WOLFSSL -I/opt/wolfssl/include/wolfssl -I/opt/wolfssl/include/ -o load_ca -L/opt/wolfssl/lib/ -lwolfssl -Wl,-rpath,/opt/wolfssl/lib/

Use a PEM file in parameter, it should return the number of unique X509 entries in the PEM.

With OpenSSL the number is right, but with WolfSSL 5.7.4 it's always 0.

@ColtonWilley
Copy link
Contributor

@wlallemand

First off thank you so much for creating this, as somebody not familiar with haproxy it makes everything much easier on my end. Unfortunately though I was still not able to reproduce, it yielded identical results to openssl for me. Do you have a specific certificate example that is failing for you?

@wlallemand
Copy link
Author

common.pem.txt

This PEM file from our CI is giving me the following results:

$ ./load_ca_openssl common.pem.txt
count: 2

$ ./load_ca_wolfssl common.pem.txt
count: 0

@ColtonWilley
Copy link
Contributor

@wlallemand Thank you for the example file, I am now successfully able to reproduce. I will update you as soon as I have identified the root cause for this discrepancy.

@ColtonWilley
Copy link
Contributor

@wlallemand I have a PR up here #8226 that should resolve this issue. Please let me know if this solves the haproxy test failure, or if the issue persists.

@wlallemand
Copy link
Author

Thanks for the PR, that fixed part of the problem but I'm now having a SEGV in a related part:

[Current thread is 1 (Thread 0x7178920006c0 (LWP 644467))]
(gdb) bt
#0  __memcpy_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:265
#1  0x000071789f755d7e in wolfSSL_X509_get_pubkey (x509=0x71786c04b610) at ./src/x509.c:5546
#2  0x00005b102a8de316 in cert_get_pkey_algo (crt=crt@entry=0x71786c04b610, out=out@entry=0x71786c04e130) at src/ssl_utils.c:30
#3  0x00005b102a8c4255 in show_cert_detail (cert=cert@entry=0x71786c04b610, chain=chain@entry=0x0, extra_chain=extra_chain@entry=0x0, out=out@entry=0x71786c043510) at src/ssl_ckch.c:1811
#4  0x00005b102a8c5c86 in cli_io_handler_show_cafile_detail (appctx=0x71786c03f2e0) at src/ssl_ckch.c:3331
#5  0x00005b102a9c7a06 in cli_io_handler (appctx=0x71786c03f2e0) at src/cli.c:1170
#6  0x00005b102aa1bad3 in task_process_applet (t=t@entry=0x71786c036a80, context=context@entry=0x71786c03f2e0, state=<optimized out>) at src/applet.c:907
#7  0x00005b102aa9a450 in run_tasks_from_lists (budgets=<optimized out>) at src/task.c:641
#8  0x00005b102aa9aec5 in process_runnable_tasks () at src/task.c:883
#9  0x00005b102aa14279 in run_poll_loop () at src/haproxy.c:2976
#10 0x00005b102aa14949 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3190
#11 0x000071789f29ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
#12 0x000071789f329c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
(gdb) up
#1  0x000071789f755d7e in wolfSSL_X509_get_pubkey (x509=0x71786c04b610) at ./src/x509.c:5546
5546	           XMEMCPY(key->pkey.ptr, x509->pubKey.buffer, x509->pubKey.length);
(gdb) p key->pkey.ptr
$1 = 0x71786c04b4b0 "[ \202{\177q"
(gdb) p x509->pubKey.buffer
$2 = (unsigned char *) 0x0

The problem is in this function which is a simple equivalent to what openssl x509 -text does.

https://github.com/haproxy/haproxy/blob/master/src/ssl_ckch.c#L1751

I can try to isolate the code if you need!

@wlallemand
Copy link
Author

load_ca.c.txt

Here is the new reproducer, my guess is that X509_STORE_add_cert() is not incrementing the refcount so sk_X509_INFO_pop_free() frees everything and accessing the cert with cert_get_pkey_algo() can't work since the data are not here anymore.

@ColtonWilley
Copy link
Contributor

@wlallemand Again I cant thank you enough for your efforts in isolating the problem. I have already reproduced this segfault. I will let you know when I have a PR with a fix available to test.

@ColtonWilley
Copy link
Contributor

@wlallemand There is definitely an issue with the internal refcount handling, unfortunately managing this flow in addition to the hundreds of other use cases is proving a bit trickier than initially anticipated. I am on holiday tomorrow but will continue to work this problem when I return on Friday.

@wlallemand
Copy link
Author

@wlallemand Again I cant thank you enough for your efforts in isolating the problem. I have already reproduced this segfault. I will let you know when I have a PR with a fix available to test.

No problem, I understand the pain of debugging in a whole new environment :-)

@wlallemand There is definitely an issue with the internal refcount handling, unfortunately managing this flow in addition to the hundreds of other use cases is proving a bit trickier than initially anticipated. I am on holiday tomorrow but will continue to work this problem when I return on Friday.

I guess it's more difficult than anticipated if the ASAN tests were right with the current implementation! Thanks for the quick feedback!

@ColtonWilley
Copy link
Contributor

@wlallemand I have a new PR up here #8233 that should hopefully resolve your issues. Please let me know how it goes.

@wlallemand
Copy link
Author

@ColtonWilley I just tested it and the refcount issue seem to be fixed, thanks a lot! But unfortunately I still have other issues, the reg-tests are still failing, I suspect something like X509_get_serialNumber() but I have to take some time to investigate more.

@ColtonWilley
Copy link
Contributor

@wlallemand Do you have any specific failures you need me to take a look at? If so please let me know.

@wlallemand
Copy link
Author

@ColtonWilley Sorry I didn't isolate the problem yet, I'll update you soon! Thanks

@wlallemand
Copy link
Author

wlallemand commented Dec 20, 2024

Hi @ColtonWilley, sorry for the late reply, I was busy on other projects.

It's quite difficult to reproduce the exact problem outside HAProxy.

I tried again today with wolfssl master (commit 00f83fa)

I don't really get what's going on, that could still be a mess with the refcounts.

I compiled HAProxy with -fsanitize and run the following regtest:

$ make -j8 TARGET=linux-glibc USE_OPENSSL_WOLFSSL=1  SSL_LIB=/home/wla/projects/haproxy_tech/wolfssl/src/.libs/ SSL_INC=/home/wla/projects/haproxy_tech/wolfssl/ ADDLIB=-Wl,-rpath,/home/wla/projects/haproxy_tech/wolfssl/src/.libs/ CFLAGS="-fsanitize=address" LDFLAGS="-fsanitize=address"
make reg-tests -- --debug reg-tests/ssl/new_del_ssl_cafile.vtc

What I can see is some reuse-after-free:

***  h1    debug|=================================================================
***  h1    debug|==1454598==ERROR: AddressSanitizer: heap-use-after-free on address 0x52200003069d at pc 0x71eef20fb42e bp 0x71eee5bf45b0 sp 0x71eee5bf3d58
***  h1    debug|READ of size 5 at 0x52200003069d thread T5
**** dT    0.301
***  h1    debug|    #0 0x71eef20fb42d in memcpy ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors_memintrinsics.inc:115
***  h1    debug|    #1 0x71eef1d5d98c in wolfSSL_X509_get_serialNumber src/x509.c:9792
***  h1    debug|    #2 0x595eb346fa4f in ssl_sock_get_serial src/ssl_utils.c:73
***  h1    debug|    #3 0x595eb342572e in show_cert_detail src/ssl_ckch.c:1991
***  h1    debug|    #4 0x595eb342a901 in cli_io_handler_show_cafile_detail src/ssl_ckch.c:3565
***  h1    debug|    #5 0x595eb36f3990 in cli_io_handler src/cli.c:1170
***  h1    debug|    #6 0x595eb37e24e2 in task_process_applet src/applet.c:907
***  h1    debug|    #7 0x595eb394feda in run_tasks_from_lists src/task.c:641
***  h1    debug|    #8 0x595eb3951a19 in process_runnable_tasks src/task.c:883
***  h1    debug|    #9 0x595eb37cee4b in run_poll_loop src/haproxy.c:2771
***  h1    debug|    #10 0x595eb37cfec0 in run_thread_poll_loop src/haproxy.c:2985
***  h1    debug|    #11 0x71eef205ea41 in asan_thread_start ../../../../src/libsanitizer/asan/asan_interceptors.cpp:234
***  h1    debug|    #12 0x71eef189ca93 in start_thread nptl/pthread_create.c:447
***  h1    debug|    #13 0x71eef1929c3b in clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
***  h1    debug|
***  h1    debug|0x52200003069d is located 1437 bytes inside of 5144-byte region [0x522000030100,0x522000031518)
***  h1    debug|freed by thread T5 here:
**** dT    0.302
***  h1    debug|    #0 0x71eef20fc4d8 in free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:52
***  h1    debug|    #1 0x71eef1c72942 in wolfSSL_Free wolfcrypt/src/memory.c:448
***  h1    debug|    #2 0x71eef1d52348 in ExternalFreeX509 src/x509.c:3330
***  h1    debug|    #3 0x71eef1d5239b in wolfSSL_X509_free src/x509.c:3344
***  h1    debug|    #4 0x71eef1d64bb7 in wolfSSL_X509_OBJECT_free src/x509.c:13991
***  h1    debug|    #5 0x71eef1d3e9c1 in wolfSSL_sk_pop_free src/ssl.c:17734
***  h1    debug|    #6 0x71eef1d6592a in wolfSSL_sk_X509_OBJECT_pop_free src/x509.c:14530
***  h1    debug|    #7 0x71eef1d6d8e0 in X509StoreFreeObjList src/x509_str.c:1169
***  h1    debug|    #8 0x71eef1d6eb58 in wolfSSL_X509_STORE_get0_objects src/x509_str.c:1834
***  h1    debug|    #9 0x595eb342a7e8 in cli_io_handler_show_cafile_detail src/ssl_ckch.c:3556
***  h1    debug|
***  h1    debug|previously allocated by thread T5 here:
***  h1    debug|    #0 0x71eef20fd9c7 in malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69
***  h1    debug|    #1 0x71eef1c728fe in wolfSSL_Malloc wolfcrypt/src/memory.c:363
***  h1    debug|    #2 0x71eef1d55814 in loadX509orX509REQFromBuffer src/x509.c:5316
***  h1    debug|    #3 0x71eef1d55926 in wolfSSL_X509_load_certificate_buffer src/x509.c:5350
***  h1    debug|    #4 0x71eef1d6211b in wolfSSL_PEM_X509_X509_CRL_X509_PKEY_read_bio src/x509.c:12428
***  h1    debug|    #5 0x71eef1d62537 in wolfSSL_PEM_X509_INFO_read_bio src/x509.c:12566
***  h1    debug|    #6 0x595eb3438f86 in ssl_store_load_ca_from_buf src/ssl_ckch.c:1341
***  h1    debug|
***  h1    debug|Thread T5 created by T0 here:
**** dT    0.303
***  h1    debug|    #0 0x71eef20f51f9 in pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:245
***  h1    debug|    #1 0x595eb39a0eb5 in setup_extra_threads src/thread.c:252
***  h1    debug|
***  h1    debug|SUMMARY: AddressSanitizer: heap-use-after-free ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors_memintrinsics.inc:115 in memcpy
***  h1    debug|Shadow bytes around the buggy address:
***  h1    debug|  0x522000030400: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
***  h1    debug|  0x522000030480: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
***  h1    debug|  0x522000030500: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
***  h1    debug|  0x522000030580: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
***  h1    debug|  0x522000030600: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
***  h1    debug|=>0x522000030680: fd fd fd[fd]fd fd fd fd fd fd fd fd fd fd fd fd
***  h1    debug|  0x522000030700: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
***  h1    debug|  0x522000030780: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
***  h1    debug|  0x522000030800: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
***  h1    debug|  0x522000030880: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
***  h1    debug|  0x522000030900: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
***  h1    debug|Shadow byte legend (one shadow byte represents 8 application bytes):
***  h1    debug|  Addressable:           00
***  h1    debug|  Partially addressable: 01 02 03 04 05 06 07 
***  h1    debug|  Heap left redzone:       fa
***  h1    debug|  Freed heap region:       fd
***  h1    debug|  Stack left redzone:      f1
***  h1    debug|  Stack mid redzone:       f2
***  h1    debug|  Stack right redzone:     f3
***  h1    debug|  Stack after return:      f5
***  h1    debug|  Stack use after scope:   f8
***  h1    debug|  Global redzone:          f9
***  h1    debug|  Global init order:       f6
***  h1    debug|  Poisoned by user:        f7
***  h1    debug|  Container overflow:      fc
***  h1    debug|  Array cookie:            ac
***  h1    debug|  Intra object redzone:    bb
***  h1    debug|  ASan internal:           fe
***  h1    debug|  Left alloca redzone:     ca
***  h1    debug|  Right alloca redzone:    cb
***  h1    debug|==1454598==ABORTING

The trace is weird, wolfSSL_X509_STORE_get0_objects() seems to free the objects for an unknown reason, I didn't get the exact trace on my reproducer program but I still got something:

$ gcc -ggdb3 load_ca.c -fsanitize=address -DUSE_WOLFSSL -I/home/wla/projects/haproxy_tech/wolfssl/ -I/home/wla/projects/haproxy_tech/wolfssl/wolfssl/  -o load_ca -L/opt/wolfssl/lib/ -lwolfssl -Wl,-rpath,/home/wla/projects/haproxy_tech/wolfssl/src/.libs/
$. /load_ca set_cafile_interCA1.crt
count: 1
ssl_sock_get_serial:115 serial->length:0
ssl_sock_show_cafile_detail:261 CN:/=/=/=

=================================================================
==1455229==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 48 byte(s) in 1 object(s) allocated from:
    #0 0x78b8326fd9c7 in malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69
    #1 0x78b8322728fe in wolfSSL_Malloc wolfcrypt/src/memory.c:363
    #2 0x78b83230cfc7 in wolfSSL_ASN1_INTEGER_new src/ssl_asn1.c:967
    #3 0x78b83235d8c4 in wolfSSL_X509_get_serialNumber src/x509.c:9769
    #4 0x5ec75e23fb13 in ssl_sock_get_serial /home/wla/projects/perso/openssl_exp/load_ca.c:109
    #5 0x5ec75e240749 in ssl_sock_show_cafile_detail /home/wla/projects/perso/openssl_exp/load_ca.c:253
    #6 0x5ec75e240c34 in main /home/wla/projects/perso/openssl_exp/load_ca.c:314
    #7 0x78b831e2a1c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #8 0x78b831e2a28a in __libc_start_main_impl ../csu/libc-start.c:360
    #9 0x5ec75e23f7a4 in _start (/home/wla/projects/perso/openssl_exp/load_ca+0x27a4) (BuildId: d9a37f7671de172c4a8b085ff1e0fff4f3043bdc)

SUMMARY: AddressSanitizer: 48 byte(s) leaked in 1 allocation(s).

However, when commenting a function (get_certificate_count()) which does a X509_STORE_get0_objects() previously, I could make it work:

$ ./load_ca set_cafile_interCA1.crt
ssl_sock_get_serial:115 serial->length:2
ssl_sock_show_cafile_detail:261 CN:/C=FR/O=HAProxy Technologies/CN=Intermediate CA1

load_ca.c.txt
set_cafile_interCA1.crt.txt

@dgarske
Copy link
Contributor

dgarske commented Dec 23, 2024

Hi @wlallemand ,

Colton is out on holiday until after new year. I am trying to get another engineer assigned to continue this investigation.

Thanks,
David Garske, wolfSSL

@wlallemand
Copy link
Author

Hi @dgarske,

Thanks! I will also be in holidays this week and the one after. I hope I gave enough details to investigate the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants