Geant 4 GUI and AMD drivers

Hallo!

I moved to a new AMD-based system, and cant get the standard QT - GUI running properly.
The GUI starts, and everything first seems to work, but as soon as you try to interact with it, it freezes for around 10 seconds.
Then it moves the viewport in a very choppy way and finally crashes (black screen for a second) giving the following output:

WARNING: Viewpoint direction is very close to the up vector direction.
  Change the up vector or "/vis/viewer/set/rotationStyle freeRotation".
amdgpu: amdgpu_cs_query_fence_status failed.
amdgpu: The CS has been rejected (-125), but the context isn't robust.
amdgpu: The process will be terminated.
QObject::killTimer: Timers cannot be stopped from another thread
QObject::~QObject: Timers cannot be stopped from another thread

### CAUGHT SIGNAL: 11 ### address: 0,  signal =  SIGSEGV, value =   11, description = segmentation violation. Address not mapped to object.

Backtrace:
[PID=142522, TID=-1][ 0/51]> /usr/lib/libQt5Core.so.5(_ZNK18QThreadStorageData3getEv+0x3f) [0x7facc5aeaecf]
[PID=142522, TID=-1][ 1/51]> /usr/lib/libQt5Gui.so.5(_ZN14QOpenGLContext14currentContextEv+0x32) [0x7facc617e902]
[PID=142522, TID=-1][ 2/51]> /usr/lib/libQt5OpenGL.so.5(_ZN9QGLWidget10renderTextEdddRK7QStringRK5QFont+0xb3) [0x7facc702e753]
[PID=142522, TID=-1][ 3/51]> /home/till/src/geant4/v11.0.3/geant4-v11.0.3-install/lib/libG4OpenGL.so(_ZN16G4OpenGLQtViewer8DrawTextERK6G4Text+0x305) [0x7facc93612f5]
[PID=142522, TID=-1][ 4/51]> /home/till/src/geant4/v11.0.3/geant4-v11.0.3-install/lib/libG4OpenGL.so(_ZN26G4OpenGLStoredSceneHandler12AddPrimitiveERK6G4Text+0x2b) [0x7facc934926b]
[PID=142522, TID=-1][ 5/51]> /home/till/src/geant4/v11.0.3/geant4-v11.0.3-install/lib/libG4OpenGL.so(_ZN20G4OpenGLStoredViewer27AddPrimitiveForASingleFrameERK6G4Text+0x1c) [0x7facc9345b9c]
[PID=142522, TID=-1][ 6/51]> /home/till/src/geant4/v11.0.3/geant4-v11.0.3-install/lib/libG4OpenGL.so(_ZN20G4OpenGLStoredViewer16DrawDisplayListsEv+0x5af) [0x7facc934618f]
[PID=142522, TID=-1][ 7/51]> /home/till/src/geant4/v11.0.3/geant4-v11.0.3-install/lib/libG4OpenGL.so(_ZN22G4OpenGLStoredQtViewer11ComputeViewEv+0x854) [0x7facc9380094]
[PID=142522, TID=-1][ 8/51]> /home/till/src/geant4/v11.0.3/geant4-v11.0.3-install/lib/libG4OpenGL.so(_ZN22G4OpenGLStoredQtViewer7paintGLEv+0x64) [0x7facc9380884]
[PID=142522, TID=-1][ 9/51]> /usr/lib/libQt5OpenGL.so.5(_ZN9QGLWidget6glDrawEv+0x2c5) [0x7facc

Its Geant4 11, using Qt5 (Qt version 5.15.7 ) and openGL.

The system ist

System:
  Host: T14 Kernel: 6.0.12-arch1-1 arch: x86_64 bits: 64 compiler: gcc
    v: 12.2.0 Desktop: dwm v: 6.2 Distro: Arch Linux
Machine:
  Type: Laptop System: LENOVO product: 21CGS0TV00 v: ThinkPad T14 Gen 3
    serial: <superuser required>
  Mobo: LENOVO model: 21CGS0TV00 v: ThinkPad serial: <superuser required>
    UEFI: LENOVO v: R23ET47W (1.17 ) date: 05/27/2022
Battery:
  ID-1: BAT0 charge: 44.1 Wh (90.0%) condition: 49.0/52.5 Wh (93.3%)
    volts: 16.9 min: 15.4 model: Celxpert LNV-5B10W51866�� status: discharging
CPU:
  Info: 8-core model: AMD Ryzen 7 PRO 6850U with Radeon Graphics bits: 64
    type: MT MCP arch: Zen 3+ rev: 1 cache: L1: 512 KiB L2: 4 MiB L3: 16 MiB
  Speed (MHz): avg: 1812 high: 2647 min/max: 400/4768 boost: enabled cores:
    1: 1695 2: 400 3: 2610 4: 2647 5: 2611 6: 400 7: 400 8: 2610 9: 2610 10: 400
    11: 2610 12: 1742 13: 400 14: 2633 15: 2613 16: 2613 bogomips: 86278
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
Graphics:
  Device-1: AMD Rembrandt [Radeon 680M] vendor: Lenovo driver: amdgpu
    v: kernel arch: RDNA-2 bus-ID: 04:00.0 temp: 45.0 C
  Device-2: 8SSC21D67422V1SR24P23TZ Integrated RGB Camera type: USB
    driver: uvcvideo bus-ID: 5-1:2
  Display: server: X.Org v: 21.1.5 driver: X: loaded: modesetting
    dri: radeonsi gpu: amdgpu resolution: 3840x2400~60Hz
  API: OpenGL v: 4.6 Mesa 22.3.1 renderer: AMD Radeon Graphics (rembrandt
    LLVM 14.0.6 DRM 3.48 6.0.12-arch1-1) direct render: Yes

From dmesg | grep 'amdgpu' I get

[ 6650.887087] amdgpu 0000:04:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 6650.887089] amdgpu 0000:04:00.0: amdgpu:      Faulty UTCL2 client ID: CB/DB (0x0)
[ 6650.887091] amdgpu 0000:04:00.0: amdgpu:      MORE_FAULTS: 0x0
[ 6650.887093] amdgpu 0000:04:00.0: amdgpu:      WALKER_ERROR: 0x0
[ 6650.887094] amdgpu 0000:04:00.0: amdgpu:      PERMISSION_FAULTS: 0x0
[ 6650.887096] amdgpu 0000:04:00.0: amdgpu:      MAPPING_ERROR: 0x0
[ 6650.887098] amdgpu 0000:04:00.0: amdgpu:      RW: 0x0
[ 6661.094865] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=511508, emitted seq=511510
[ 6661.095382] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process sim pid 142522 thread sim:cs0 pid 142551
[ 6661.095839] amdgpu 0000:04:00.0: amdgpu: GPU reset begin!
[ 6661.625199] amdgpu 0000:04:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
[ 6661.625403] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
[ 6661.866775] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
[ 6661.914420] amdgpu 0000:04:00.0: amdgpu: MODE2 reset
[ 6661.924808] amdgpu 0000:04:00.0: amdgpu: GPU reset succeeded, trying to resume
[ 6662.663721] amdgpu 0000:04:00.0: amdgpu: RAS: optional ras ta ucode is not available
[ 6662.676177] amdgpu 0000:04:00.0: amdgpu: RAP: optional rap ta ucode is not available
[ 6662.676179] amdgpu 0000:04:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[ 6662.676184] amdgpu 0000:04:00.0: amdgpu: SMU is resuming...
[ 6662.677405] amdgpu 0000:04:00.0: amdgpu: SMU is resumed successfully!
[ 6662.984350] amdgpu 0000:04:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[ 6662.984356] amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[ 6662.984359] amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[ 6662.984361] amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[ 6662.984363] amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[ 6662.984365] amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[ 6662.984366] amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[ 6662.984368] amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[ 6662.984370] amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[ 6662.984372] amdgpu 0000:04:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[ 6662.984374] amdgpu 0000:04:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[ 6662.984376] amdgpu 0000:04:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
[ 6662.984378] amdgpu 0000:04:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
[ 6662.984380] amdgpu 0000:04:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
[ 6662.984381] amdgpu 0000:04:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
[ 6662.997365] amdgpu 0000:04:00.0: amdgpu: recover vram bo from shadow start
[ 6662.997375] amdgpu 0000:04:00.0: amdgpu: recover vram bo from shadow done
[ 6662.997419] amdgpu 0000:04:00.0: amdgpu: GPU reset(82) succeeded!
[ 6663.061089] amdgpu_cs_ioctl: 38 callbacks suppressed
[ 6663.061099] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!

after such a run.

From the initial back_trace, I suspect, that this is a problem with qt, but I dont really understand what the problem might be specifically.
The identical simulation runs on my older Intel-based system without any hickups.
Best,
Till

Dear Till

There is a warning at. the beginning:

What happens in this case (the view tends to flip about if you continue to interact, since it can’t compute a unique viewpoint) shouldn’t cause a crash. Just try its suggestion - /vis/viewer/set/rotationStyle freeRotation and let us know what happens.

My other thought is: make sure you have the correct graphics card driver.

John

Dear John,

I added this to my vis.mac, but the same problem persists.

Regarding the driver, I tried the glxgears and they work perfectly fine.
Not sure if this is the best test for the OpenGL though, so if there is something else to test, I am happy to get some hints where to start.

Best,
Till

Sorry Till. At a bit of a loss here. Let’s blame Qt!! John

I had a problem on a new AMD system too, bit with some different simptoms though: geometry was shown not fully and the program usually was failing to draw the geometry.
Drawing it with “OGLI” option solved the problem for me - it uses other version of openGL as I understand it.

Hi Rish

No, OGLI uses the same version of OpenGL - but it draws directly to the screen (“immediate” mode). OGLS (which is the default OGL) places stuff in a graphical database (display lists, “stored” mode) that can be loaded into the graphics card, so giving much better performance for operations such as rotation and zoom.

It’s not clear why OGLI should fix your problem, but at least it’s a decent workaround.

You probably already realise that you can use OGL without Qt. Remove any Qt-related options on the CMake command line and add GEANT4_USE_OPENGL_X11=ON (see Installation Guide). You will get a non-interactive window, but you can change viewpoint with commands such as /vis/viewer/set/viewpointThetaPhi.

You can get viewer that has some modest amount of interaction with GEANT4_USE_XM=ON; this needs the motif libraries (e.g., OpenMotif).

Hope this helps
John