Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datetime.now(timezone.utc) segfaults with Intel oneAPI v2022.1.2 and higher since Python 3.9 #106424

Closed
haampie opened this issue Jul 4, 2023 · 19 comments
Labels
type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@haampie
Copy link
Contributor

haampie commented Jul 4, 2023

I've bisected this to 37fcbb6 (first "bad" commit)

The following Python code is a minimal reproducer:

from datetime import datetime, timezone
datetime.now(timezone.utc)

When compiling Python with any of the following Intel compilers:

2023.2.0
2023.1.0
2023.0.0
2022.2.1

Any version 3.9, 3.10, 3.11 causes a segfault in add_datetime_timedelta / timezone_fromutc where for some reason with -O1 optimizations (edit: and higher) the self->offset / delta pointer is a NULL.

Program received signal SIGSEGV, Segmentation fault.
0x00007f303942ef0c in add_datetime_timedelta (date=0x7f30394d7000, delta=0x0, factor=1) at Modules/_datetimemodule.c:5373
5373	    int day = GET_DAY(date) + GET_TD_DAYS(delta) * factor;

Adding -fno-strict-aliasing does not prevent this.

@haampie haampie added the type-crash A hard crash of the interpreter, possibly with a core dump label Jul 4, 2023
@corona10
Copy link
Member

corona10 commented Jul 4, 2023

Any version 3.9, 3.10, 3.11 causes a segfault in add_datetime_timedelta / timezone_fromutc where for some reason with -O1 optimizations the self->offset / delta pointer is a NULL.

From the original issue, -O0 doesn't occur any issue.
Would you like to report to the Intel team first?
It can be a compiler bug.

@haampie
Copy link
Contributor Author

haampie commented Jul 4, 2023

It happens for -O1, -O2, -O3, it does not happen with -O0.

So yes, will try to bisect the intel compiler versions now... 😩

@corona10
Copy link
Member

corona10 commented Jul 4, 2023

It happens for -O1, -O2, -O3, it does not happen with -O0.

Ah, thank you for the clarification.

@haampie
Copy link
Contributor Author

haampie commented Jul 4, 2023

Building Python 3.9.17, the example segfaults with the following oneAPI compilers:

  • 2023.2.0
  • 2023.1.0
  • 2023.0.0
  • 2022.2.1

And succeeds with these:

  • 2022.0.0
  • 2021.1.2

So it could indeed be a compiler bug. Unclear how to show it is though.

@haampie
Copy link
Contributor Author

haampie commented Jul 4, 2023

Should I leave this issue open for discoverability? Others may run into it too. I've written a summary on the Spack package manager's repo spack/spack#38710 (comment), and we're hoping someone from Intel will confirm it's a compiler bug or not

@haampie haampie changed the title datetime module segfaults on timezone with Intel oneAPI v2023.1.0 datetime.now(timezone.utc) segfaults on with Intel oneAPI v2022.1.2 and higher since Python 3.9 Jul 4, 2023
@haampie haampie changed the title datetime.now(timezone.utc) segfaults on with Intel oneAPI v2022.1.2 and higher since Python 3.9 datetime.now(timezone.utc) segfaults with Intel oneAPI v2022.1.2 and higher since Python 3.9 Jul 4, 2023
@rscohn2
Copy link

rscohn2 commented Jul 4, 2023

@oleksandr-pavlyk: Are you building python with recent oneapi compilers?

@corona10
Copy link
Member

corona10 commented Jul 5, 2023

@haampie cc @rscohn2 @oleksandr-pavlyk

Would you like to check if the -fno-strict-aliasing flag is enabled in the compiler options?

@haampie
Copy link
Contributor Author

haampie commented Jul 5, 2023

Strict aliasing is enabled by default (just like in the gcc/clang build)

checking whether /home/harmen/spack/lib/spack/env/oneapi/icx accepts and needs -fno-strict-aliasing... no

but when I build with -fno-strict-aliasing it still segfaults, so that's not the problem.

Also verified that defaults in clang and icx are the same: https://godbolt.org/z/jxeaW7sjj

@haampie
Copy link
Contributor Author

haampie commented Jul 5, 2023

Hm, it looks like that configure check in Python is just there because some old GCC warned incorrectly about strict aliasing rules. (As in, it compiles some valid code with -Werror -Wstrict-aliasing, and if that fails, adds -fno-strict-aliasing, but any recent gcc / clang / icx would not issue a warning, so that flag is not added).

In any case, the important thing is that -fno-strict-aliasing does not fix the segfault.

@vstinner
Copy link
Member

For me, it's a compiler bug, I don't see how the C code could rely on an undefined behavior here. I suggest you reporting the bug to the Intel compiler.

@haampie
Copy link
Contributor Author

haampie commented Jul 13, 2023

@rscohn2 do you want to create a bug report for the compiler?

@oleksandr-pavlyk
Copy link
Contributor

@rscohn2 Intel Distribution for Python ships CPython compiled with GCC 11.2:

(intel_py) opavlyk@opavlyk-mobl:~$ conda list python
# packages in environment at /home/opavlyk/miniconda3/envs/intel_py:
#
# Name                    Version                   Build  Channel
intelpython               2023.1.0                      1    intel
python                    3.10.8               h2b77918_2    intel
(intel_py) opavlyk@opavlyk-mobl:~$ python
Python 3.10.8 (main, Mar 21 2023, 00:22:10) [GCC 11.2.0] :: Intel Corporation on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> quit()

@rscohn2
Copy link

rscohn2 commented Jul 13, 2023

If you provide detailed information with clear evidence, the compiler group will respond quickly. Otherwise it will sit in the queue. This is because they get a lot of bogus bug reports that are application errors.

I don't understand how everyone can be so sure it is a compiler bug. C programs have problems with uninitialized data, stale pointers. Optimizations change the layout of memory on stack and bugs and can be specific to compilers and releases.

There is another icx coming out next week. I will try that and if the error is still there then I will try to look at the bug.

@haampie
Copy link
Contributor Author

haampie commented Jul 14, 2023

It's already been said in this thread: the diff after which Intel Compilers produce a segfaulting Python is very small and innocent looking: 37fcbb6. It works fine on all major compilers. For a project like Python, I can only assume it uses sanitizers to detect UB and what not. If this was an application error I'm sure it would've surfaced elsewhere in the last 3 years.

I think it's more a matter of how much Intel folks care about having a working Python than it is up to me as a packager who is not familiar with Python internals to do all the detective work. I've bisected Python commits and the Intel compiler versions -- that's enough.

Lastly, what do you mean with clear evidence? As far as I know, Intel compilers are closed source, and all you can do is look at generated assembly (are you officially allowed to?)? Isn't "it works with clang, it doesn't work with icx" sufficient, given that Intel compilers are based on LLVM?

@haampie
Copy link
Contributor Author

haampie commented Jul 17, 2023

Segfaults with oneAPI 2023.2.0 compilers too

@haampie
Copy link
Contributor Author

haampie commented Jul 18, 2023

With -fsanitize=address

$ ./python 
Python 3.10.12 (main, Jul 18 2023, 17:21:14) [Clang 17.0.0 (icx 2023.2.0.20230622)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from datetime import datetime, timezone
>>> datetime.now(timezone.utc)
AddressSanitizer:DEADLYSIGNAL
=================================================================
==257767==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000018 (pc 0x7f476603ffaf bp 0x7ffe3fa18240 sp 0x7ffe3fa18160 T0)
==257767==The signal is caused by a READ memory access.
==257767==Hint: address points to the zero page.
    #0 0x7f476603ffaf in add_datetime_timedelta /tmp/harmen/spack-stage/spack-stage-python-3.10.12-jexzd3vns3w6yofsu2cziuh6hxjr4apm/spack-src/Modules/_datetimemodule.c
    #1 0x7f4768aeabad in cfunction_vectorcall_O /tmp/harmen/spack-stage/spack-stage-python-3.10.12-jexzd3vns3w6yofsu2cziuh6hxjr4apm/spack-src/Objects/methodobject.c:516:24
    #2 0x7f4768a3698f in _PyObject_VectorcallTstate /tmp/harmen/spack-stage/spack-stage-python-3.10.12-jexzd3vns3w6yofsu2cziuh6hxjr4apm/spack-src/./Include/cpython/abstract.h:114:11
    #3 0x7f4768a3698f in _PyObject_CallFunctionVa /tmp/harmen/spack-stage/spack-stage-python-3.10.12-jexzd3vns3w6yofsu2cziuh6hxjr4apm/spack-src/Objects/call.c
    #4 0x7f4768a3776a in callmethod /tmp/harmen/spack-stage/spack-stage-python-3.10.12-jexzd3vns3w6yofsu2cziuh6hxjr4apm/spack-src/Objects/call.c:557:12
    #5 0x7f4768a3776a in _PyObject_CallMethodId /tmp/harmen/spack-stage/spack-stage-python-3.10.12-jexzd3vns3w6yofsu2cziuh6hxjr4apm/spack-src/Objects/call.c:627:24
    #6 0x7f476604d9eb in datetime_datetime_now_impl /tmp/harmen/spack-stage/spack-stage-python-3.10.12-jexzd3vns3w6yofsu2cziuh6hxjr4apm/spack-src/Modules/_datetimemodule.c:5098:16
    #7 0x7f476604d9eb in datetime_datetime_now /tmp/harmen/spack-stage/spack-stage-python-3.10.12-jexzd3vns3w6yofsu2cziuh6hxjr4apm/spack-src/Modules/clinic/_datetimemodule.c.h:92:20
    #8 0x7f4768aea4b5 in cfunction_vectorcall_FASTCALL_KEYWORDS /tmp/harmen/spack-stage/spack-stage-python-3.10.12-jexzd3vns3w6yofsu2cziuh6hxjr4apm/spack-src/Objects/methodobject.c:446:24
    #9 0x7f4768c5df43 in call_function /tmp/harmen/spack-stage/spack-stage-python-3.10.12-jexzd3vns3w6yofsu2cziuh6hxjr4apm/spack-src/Python/ceval.c
    #10 0x7f4768c521f1 in _PyEval_EvalFrameDefault /tmp/harmen/spack-stage/spack-stage-python-3.10.12-jexzd3vns3w6yofsu2cziuh6hxjr4apm/spack-src/Python/ceval.c:4181:23
    #11 0x7f4768c44696 in _PyEval_EvalFrame /tmp/harmen/spack-stage/spack-stage-python-3.10.12-jexzd3vns3w6yofsu2cziuh6hxjr4apm/spack-src/./Include/internal/pycore_ceval.h:46:12
    #12 0x7f4768c44696 in _PyEval_Vector /tmp/harmen/spack-stage/spack-stage-python-3.10.12-jexzd3vns3w6yofsu2cziuh6hxjr4apm/spack-src/Python/ceval.c:5067:24
    #13 0x7f4768c44465 in PyEval_EvalCode /tmp/harmen/spack-stage/spack-stage-python-3.10.12-jexzd3vns3w6yofsu2cziuh6hxjr4apm/spack-src/Python/ceval.c:1134:12
    #14 0x7f4768d21f81 in run_eval_code_obj /tmp/harmen/spack-stage/spack-stage-python-3.10.12-jexzd3vns3w6yofsu2cziuh6hxjr4apm/spack-src/Python/pythonrun.c:1291:9
    #15 0x7f4768d21f81 in run_mod /tmp/harmen/spack-stage/spack-stage-python-3.10.12-jexzd3vns3w6yofsu2cziuh6hxjr4apm/spack-src/Python/pythonrun.c:1312:19
    #16 0x7f4768d1fc3d in PyRun_InteractiveOneObjectEx /tmp/harmen/spack-stage/spack-stage-python-3.10.12-jexzd3vns3w6yofsu2cziuh6hxjr4apm/spack-src/Python/pythonrun.c:277:9
    #17 0x7f4768d1ebde in _PyRun_InteractiveLoopObject /tmp/harmen/spack-stage/spack-stage-python-3.10.12-jexzd3vns3w6yofsu2cziuh6hxjr4apm/spack-src/Python/pythonrun.c:148:15
    #18 0x7f4768d1e9a2 in _PyRun_AnyFileObject /tmp/harmen/spack-stage/spack-stage-python-3.10.12-jexzd3vns3w6yofsu2cziuh6hxjr4apm/spack-src/Python/pythonrun.c:84:15
    #19 0x7f4768d1f7fc in PyRun_AnyFileExFlags /tmp/harmen/spack-stage/spack-stage-python-3.10.12-jexzd3vns3w6yofsu2cziuh6hxjr4apm/spack-src/Python/pythonrun.c:116:15
    #20 0x7f4768d71376 in pymain_run_stdin /tmp/harmen/spack-stage/spack-stage-python-3.10.12-jexzd3vns3w6yofsu2cziuh6hxjr4apm/spack-src/Modules/main.c:502:15
    #21 0x7f4768d71376 in pymain_run_python /tmp/harmen/spack-stage/spack-stage-python-3.10.12-jexzd3vns3w6yofsu2cziuh6hxjr4apm/spack-src/Modules/main.c:590:21
    #22 0x7f4768d71376 in Py_RunMain /tmp/harmen/spack-stage/spack-stage-python-3.10.12-jexzd3vns3w6yofsu2cziuh6hxjr4apm/spack-src/Modules/main.c:666:5
    #23 0x7f4768d7202d in pymain_main /tmp/harmen/spack-stage/spack-stage-python-3.10.12-jexzd3vns3w6yofsu2cziuh6hxjr4apm/spack-src/Modules/main.c:696:12
    #24 0x7f4768d72300 in Py_BytesMain /tmp/harmen/spack-stage/spack-stage-python-3.10.12-jexzd3vns3w6yofsu2cziuh6hxjr4apm/spack-src/Modules/main.c:720:12
    #25 0x7f4768423a8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #26 0x7f4768423b48 in __libc_start_main csu/../csu/libc-start.c:360:3
    #27 0x41f724 in _start (/home/harmen/spack/opt/spack/linux-ubuntu23.04-zen2/oneapi-2023.2.0/python-3.10.12-jexzd3vns3w6yofsu2cziuh6hxjr4apm/bin/python3.10+0x41f724)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /tmp/harmen/spack-stage/spack-stage-python-3.10.12-jexzd3vns3w6yofsu2cziuh6hxjr4apm/spack-src/Modules/_datetimemodule.c in add_datetime_timedelta
==257767==ABORTING

other sanitizers I can't get to work with icx.

@rscohn2
Copy link

rscohn2 commented Jul 18, 2023

I spent an afternoon narrowing down the problem to the store for this line not happening:

self->offset = Py_NewRef(offset);

I added a printf to show that Py_NewRef(offset); returns it's input parameter, but the value is not stored info self->offset.
I filed a ticket with the compiler that shows the asm output for the function and why I believe the store is missing. It does look like a compiler bug to me, but let's see what they say.

@haampie
Copy link
Contributor Author

haampie commented Jul 21, 2023

Thanks for the detective work @rscohn2 👍

@rscohn2
Copy link

rscohn2 commented Aug 5, 2023

It was a compiler bug. The test now passes when cpython is built with an internal build. The next compiler release (likely 2024.0) should have the fix. Thanks to @haampie!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-crash A hard crash of the interpreter, possibly with a core dump
Projects
None yet
Development

No branches or pull requests

6 participants