[Dachs-support] DaCHS dachs command returns illegal instruction
Nima Traore
nima.traore at universite-paris-saclay.fr
Thu Nov 9 16:11:23 CET 2023
Hi Markus,
Thank you very much for these suggestions :) Our server is up and running again.
We just changed the CPU type for the virtual machine (from kvm64 -> SandyBridge), then a stop/start to apply the change, and dachs works correctly again.
~$ dachs --version
Software (2.8.2) Schema (34/34)
Thank you very much again for your help.
Best regards,
Nima
De: dachs-support-request at g-vo.org
À: "dachs-support" <dachs-support at g-vo.org>
Envoyé: Mercredi 8 Novembre 2023 12:00:01
Objet: Dachs-support Digest, Vol 32, Issue 4
Send Dachs-support mailing list submissions to
dachs-support at g-vo.org
To subscribe or unsubscribe via the World Wide Web, visit
https://lists.g-vo.org/mailman/listinfo/dachs-support
or, via email, send a message with subject or body 'help' to
dachs-support-request at g-vo.org
You can reach the person managing the list at
dachs-support-owner at g-vo.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Dachs-support digest..."
Today's Topics:
1. Re: DaCHS dachs command returns illegal instruction
(Markus Demleitner)
----------------------------------------------------------------------
Message: 1
Date: Tue, 7 Nov 2023 09:26:05 -0700
From: Markus Demleitner <msdemlei at ari.uni-heidelberg.de>
To: dachs-support at g-vo.org
Subject: Re: [Dachs-support] DaCHS dachs command returns illegal
instruction
Message-ID: <20231107162605.fzwm4nt3tihnk2cg at victor>
Content-Type: text/plain; charset=utf-8
Hi Nima,
On Tue, Nov 07, 2023 at 12:37:43PM +0100, Nima Traore wrote:
> Thank you very much for this information :) Here is below the
> output of the where command in the gdb:
>
> ~$ gdb `which python3` core
[...]
> Core was generated by `/usr/bin/python3 /usr/bin/dachs --version'.
> Program terminated with signal SIGILL, Illegal instruction.
> #0 0x00007fb95b366820 in dgemm_otcopy_OPTERON_SSE3 () from /lib/x86_64-linux-gnu/libopenblas.so.0
Ha! It helps! You see, what this tells you is that the crash is in
the blas library, which is venerable numberics code. You probably
didn't install this yourself; I think it was pulled in as a
dependency of numpy.
The name of the function the crash happens in is another hint:
OPTERON_SSE3 suggests that this is code compiled for some AMD
architecture -- and that the processor that actually executes the
code doesn't understand some particular Opteron SSE3 opcode.
Of course, the real question is: How to fix that?
Well:
$ apt info libopenblas0
[...]
Description: Optimized BLAS (linear algebra) library (meta)
OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
.
On amd64, arm64, i386, ppc64el, s390x, kfreebsd-amd64 and kfreebsd-i386,
all kernels are included in the library and the one matching best your
processor is selected at runtime.
[...]
Hu! If this were true, what you're seeing shouldn't be happening;
your Opteron SSE3 function should only be attempted if you're
actually *running* on a CPU that can execute it. At this point, I
admitted defeat as far as rational problem analysis.
In other words: I fed
dgemm_otcopy_OPTERON_SSE3 "SIGILL"
to a web search engine. It turns out you're not the first to
experience this. Here's a bug against openblas that analyses the
problem in some detail:
https://github.com/OpenMathLib/OpenBLAS/issues/2794
I've only skimmed that report (I'm supposed to listen to a conference
talk now:-); perhaps you can study it a bit closer, but it would seem
that solutions would involve recompiling, which I'd rather avoid.
But then there's /usr/share/doc/libopenblas0/README.Debian, which
explains how to switch blas implementations. Can you have a look at
it and the
http://wiki.debian.org/DebianScience/LinearAlgebraLibraries that's
linked for there?
If any of that fixes, your problem, would you let us know? If not or
you get stuck, feel free to ask back. Or use a different
virtualisation software (which I think is the root cause of your
problem), or move to Intel- or ARM-based hardware (assuming things
aren't even more broken that it seems and you already are on one), or
use don't use a virtualised host at all and use actual hardware.
Sorry I can't be more specific...
-- Markus
------------------------------
Subject: Digest Footer
Dachs-support mailing list
Dachs-support at g-vo.org
https://lists.g-vo.org/mailman/listinfo/dachs-support
------------------------------
End of Dachs-support Digest, Vol 32, Issue 4
********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.g-vo.org/pipermail/dachs-support/attachments/20231109/c541ab89/attachment.htm>
More information about the Dachs-support
mailing list