Currently running my simulations on 2 systems per below and getting greatly different run times that are very surprising. The simulation is loading a 180MB gdml model that has 1 detector location and runs 10 million electrons to compute the accumulated dose in this detector.
System 1 Hardware
Intel Xeon X650 @ 2.67GHz (2 processors), L1 cache 256KB, L2 1MB, L3 24MB
Installed 24GB RAM
2 sockets, 4 cores per socket, 1 thread per core
PassMark - CPU Mark 7266
Installed Virtual Linux Session (VM) on System 1 using Ubuntu 18.04.3 LTS install
1 socket, 4 cores per socket, 1 thread per core
CPU MHz: 2659.93
BogoMIPS: 5319.86
Mem 7.8GB
Swap 472MB
System 2: Stand-alone Linux Install (Ubuntu 19.10)
Hardware: i5-3470 @ 3.2GHz, L1 cache 128KB, L2 1MB, L3 6MB
8GB memory 2GB swap
1 socket, 4 cores per socket, 1 thread per core
CPU MHz: 1596.44
BogoMIPS: 6385.70
RAM type: DDR3, 1.6GHz
PassMark - CPU Mark 6733
The simulation is using multi-threading parallel processing routines and is run with 4 threads on both systems.
A run with 1E6 electrons on the VM system will run for ~2 days, a run with 10E6 particles runs for over 8 days. On the Linux box, the same runs will take about 8 to 10 times longer to complete!
When starting another job on the VM machine, both jobs crash - assuming running out of memory but not sure. When starting 2 parallel jobs on Linux box they run fine.
Any idea why the virtual linux session on System 1 runs so much more efficient than the native linux install run from System 2?
Seems run performance is not scaling with CPU benchmark score - not sure what are the critical elements to configure an optimized system with >32 threads???