[Dachs-support] DaCHS dachs command returns illegal instruction
Markus Demleitner
msdemlei at ari.uni-heidelberg.de
Tue Nov 7 17:26:05 CET 2023
Hi Nima,
On Tue, Nov 07, 2023 at 12:37:43PM +0100, Nima Traore wrote:
> Thank you very much for this information :) Here is below the
> output of the where command in the gdb:
>
> ~$ gdb `which python3` core
[...]
> Core was generated by `/usr/bin/python3 /usr/bin/dachs --version'.
> Program terminated with signal SIGILL, Illegal instruction.
> #0 0x00007fb95b366820 in dgemm_otcopy_OPTERON_SSE3 () from /lib/x86_64-linux-gnu/libopenblas.so.0
Ha! It helps! You see, what this tells you is that the crash is in
the blas library, which is venerable numberics code. You probably
didn't install this yourself; I think it was pulled in as a
dependency of numpy.
The name of the function the crash happens in is another hint:
OPTERON_SSE3 suggests that this is code compiled for some AMD
architecture -- and that the processor that actually executes the
code doesn't understand some particular Opteron SSE3 opcode.
Of course, the real question is: How to fix that?
Well:
$ apt info libopenblas0
[...]
Description: Optimized BLAS (linear algebra) library (meta)
OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
.
On amd64, arm64, i386, ppc64el, s390x, kfreebsd-amd64 and kfreebsd-i386,
all kernels are included in the library and the one matching best your
processor is selected at runtime.
[...]
Hu! If this were true, what you're seeing shouldn't be happening;
your Opteron SSE3 function should only be attempted if you're
actually *running* on a CPU that can execute it. At this point, I
admitted defeat as far as rational problem analysis.
In other words: I fed
dgemm_otcopy_OPTERON_SSE3 "SIGILL"
to a web search engine. It turns out you're not the first to
experience this. Here's a bug against openblas that analyses the
problem in some detail:
https://github.com/OpenMathLib/OpenBLAS/issues/2794
I've only skimmed that report (I'm supposed to listen to a conference
talk now:-); perhaps you can study it a bit closer, but it would seem
that solutions would involve recompiling, which I'd rather avoid.
But then there's /usr/share/doc/libopenblas0/README.Debian, which
explains how to switch blas implementations. Can you have a look at
it and the
http://wiki.debian.org/DebianScience/LinearAlgebraLibraries that's
linked for there?
If any of that fixes, your problem, would you let us know? If not or
you get stuck, feel free to ask back. Or use a different
virtualisation software (which I think is the root cause of your
problem), or move to Intel- or ARM-based hardware (assuming things
aren't even more broken that it seems and you already are on one), or
use don't use a virtualised host at all and use actual hardware.
Sorry I can't be more specific...
-- Markus
More information about the Dachs-support
mailing list