[Dachs-support] DaCHS dachs command returns illegal instruction

Nima Traore nima.traore at universite-paris-saclay.fr
Thu Nov 9 16:11:23 CET 2023


Hi Markus, 

Thank you very much for these suggestions :) Our server is up and running again. 

We just changed the CPU type for the virtual machine (from kvm64 -> SandyBridge), then a stop/start to apply the change, and dachs works correctly again. 

~$ dachs --version 
Software (2.8.2) Schema (34/34) 

Thank you very much again for your help. 

Best regards, 

Nima 


De: dachs-support-request at g-vo.org 
À: "dachs-support" <dachs-support at g-vo.org> 
Envoyé: Mercredi 8 Novembre 2023 12:00:01 
Objet: Dachs-support Digest, Vol 32, Issue 4 

Send Dachs-support mailing list submissions to 
dachs-support at g-vo.org 

To subscribe or unsubscribe via the World Wide Web, visit 
https://lists.g-vo.org/mailman/listinfo/dachs-support 
or, via email, send a message with subject or body 'help' to 
dachs-support-request at g-vo.org 

You can reach the person managing the list at 
dachs-support-owner at g-vo.org 

When replying, please edit your Subject line so it is more specific 
than "Re: Contents of Dachs-support digest..." 


Today's Topics: 

1. Re: DaCHS dachs command returns illegal instruction 
(Markus Demleitner) 


---------------------------------------------------------------------- 

Message: 1 
Date: Tue, 7 Nov 2023 09:26:05 -0700 
From: Markus Demleitner <msdemlei at ari.uni-heidelberg.de> 
To: dachs-support at g-vo.org 
Subject: Re: [Dachs-support] DaCHS dachs command returns illegal 
instruction 
Message-ID: <20231107162605.fzwm4nt3tihnk2cg at victor> 
Content-Type: text/plain; charset=utf-8 

Hi Nima, 

On Tue, Nov 07, 2023 at 12:37:43PM +0100, Nima Traore wrote: 
> Thank you very much for this information :) Here is below the 
> output of the where command in the gdb: 
> 
> ~$ gdb `which python3` core 
[...] 
> Core was generated by `/usr/bin/python3 /usr/bin/dachs --version'. 
> Program terminated with signal SIGILL, Illegal instruction. 
> #0 0x00007fb95b366820 in dgemm_otcopy_OPTERON_SSE3 () from /lib/x86_64-linux-gnu/libopenblas.so.0 

Ha! It helps! You see, what this tells you is that the crash is in 
the blas library, which is venerable numberics code. You probably 
didn't install this yourself; I think it was pulled in as a 
dependency of numpy. 

The name of the function the crash happens in is another hint: 
OPTERON_SSE3 suggests that this is code compiled for some AMD 
architecture -- and that the processor that actually executes the 
code doesn't understand some particular Opteron SSE3 opcode. 

Of course, the real question is: How to fix that? 

Well: 

$ apt info libopenblas0 
[...] 
Description: Optimized BLAS (linear algebra) library (meta) 
OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version. 
. 
On amd64, arm64, i386, ppc64el, s390x, kfreebsd-amd64 and kfreebsd-i386, 
all kernels are included in the library and the one matching best your 
processor is selected at runtime. 
[...] 

Hu! If this were true, what you're seeing shouldn't be happening; 
your Opteron SSE3 function should only be attempted if you're 
actually *running* on a CPU that can execute it. At this point, I 
admitted defeat as far as rational problem analysis. 

In other words: I fed 

dgemm_otcopy_OPTERON_SSE3 "SIGILL" 

to a web search engine. It turns out you're not the first to 
experience this. Here's a bug against openblas that analyses the 
problem in some detail: 
https://github.com/OpenMathLib/OpenBLAS/issues/2794 

I've only skimmed that report (I'm supposed to listen to a conference 
talk now:-); perhaps you can study it a bit closer, but it would seem 
that solutions would involve recompiling, which I'd rather avoid. 
But then there's /usr/share/doc/libopenblas0/README.Debian, which 
explains how to switch blas implementations. Can you have a look at 
it and the 
http://wiki.debian.org/DebianScience/LinearAlgebraLibraries that's 
linked for there? 

If any of that fixes, your problem, would you let us know? If not or 
you get stuck, feel free to ask back. Or use a different 
virtualisation software (which I think is the root cause of your 
problem), or move to Intel- or ARM-based hardware (assuming things 
aren't even more broken that it seems and you already are on one), or 
use don't use a virtualised host at all and use actual hardware. 

Sorry I can't be more specific... 

-- Markus 


------------------------------ 

Subject: Digest Footer 

Dachs-support mailing list 
Dachs-support at g-vo.org 
https://lists.g-vo.org/mailman/listinfo/dachs-support 


------------------------------ 

End of Dachs-support Digest, Vol 32, Issue 4 
******************************************** 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.g-vo.org/pipermail/dachs-support/attachments/20231109/c541ab89/attachment.htm>


More information about the Dachs-support mailing list