Back to Llama Cpp

NVIDIA DGX Spark

benches/nemotron/nemotron-dgx-spark.md

latest5.0 KB
Original Source

NVIDIA DGX Spark

System info

bash
uname --all
Linux spark-17ed 6.11.0-1016-nvidia #16-Ubuntu SMP PREEMPT_DYNAMIC Sun Sep 21 16:52:46 UTC 2025 aarch64 aarch64 aarch64 GNU/Linux

g++ --version
g++ (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0

nvidia-smi
Fri Mar  6 11:39:45 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05              Driver Version: 580.95.05      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GB10                    On  |   0000000F:01:00.0 Off |                  N/A |
| N/A   52C    P0             13W /  N/A  | Not Supported          |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

ggml-org/nemotron-3-super-120b-GGUF

Model: https://huggingface.co/ggml-org/nemotron-3-super-120b-GGUF

  • llama-batched-bench

main: n_kv_max = 303104, n_batch = 2048, n_ubatch = 2048, flash_attn = 1, is_pp_shared = 0, is_tg_separate = 0, n_gpu_layers = 99, n_threads = 20, n_threads_batch = 20

PPTGBN_KVT_PP sS_PP t/sT_TG sS_TG t/sT sS t/s
5123215441.094468.051.62119.742.715200.37
51232210881.463700.162.43726.263.900279.01
51232421762.647773.764.04331.666.689325.29
51232843525.291774.146.15141.6211.442380.37
5123216870410.603772.6210.38549.3020.987414.72
51232321740821.231771.6918.23556.1639.466441.09
409632141285.340767.051.61619.816.956593.47
4096322825610.673767.552.45426.0813.127628.94
40963241651221.348767.464.07231.4425.420649.57
40963283302442.714767.156.27740.7848.991674.08
409632166604885.385767.5410.59648.3295.981688.14
40963232132096170.819767.3218.61955.00189.437697.31
8192321822410.690766.321.61919.7612.310668.10
81923221644821.382766.242.46725.9423.850689.65
81923243289642.782765.924.09831.2346.881701.69
81923286579285.582765.776.36840.2091.951715.52
81923216131584171.066766.2110.77447.52181.840723.62
81923232263168342.140766.1918.96953.98361.109728.78
  • llama-bench
modelsizeparamsbackendn_ubatchfatestt/s
nemotron 120B.A12B Q4_K65.10 GiB120.67 BCUDA20481pp2048768.84 ± 0.90
nemotron 120B.A12B Q4_K65.10 GiB120.67 BCUDA20481tg3219.94 ± 0.16
nemotron 120B.A12B Q4_K65.10 GiB120.67 BCUDA20481pp2048 @ d4096764.51 ± 0.50
nemotron 120B.A12B Q4_K65.10 GiB120.67 BCUDA20481tg32 @ d409619.95 ± 0.18
nemotron 120B.A12B Q4_K65.10 GiB120.67 BCUDA20481pp2048 @ d8192759.53 ± 0.71
nemotron 120B.A12B Q4_K65.10 GiB120.67 BCUDA20481tg32 @ d819219.83 ± 0.18
nemotron 120B.A12B Q4_K65.10 GiB120.67 BCUDA20481pp2048 @ d16384747.98 ± 1.58
nemotron 120B.A12B Q4_K65.10 GiB120.67 BCUDA20481tg32 @ d1638419.84 ± 0.18
nemotron 120B.A12B Q4_K65.10 GiB120.67 BCUDA20481pp2048 @ d32768724.40 ± 2.70
nemotron 120B.A12B Q4_K65.10 GiB120.67 BCUDA20481tg32 @ d3276819.45 ± 0.18

build: 04a65daab (8268)