Llama Monitor
Stopped
Server
Monitor
Chat
Server Control
Model Preset
Context
KV Key
KV Val
Tensor Split
Batch
Slots
Port
no-mmap
ngram-spec
Start
Stop
gfx906: Q4_0/Q4_1/Q8_0 use optimized MMVQ. Q4_K_M uses generic path (slower). turbo3 KV = 4.6x compression.
Server Logs
Inference
Prompt Speed
—
Generation Speed
—
Context (KV Cache)
—
Slots
—
Server Status
—
GPUs
GPU
Temp
Load
VRAM
Power
SCLK
MCLK
Server Logs
Clear
Send