Add inference_snapshot() that pulls loaded models from /api/ps and GPU
stats over SSH, then surface as facts: model name/quant/VRAM, processor
split, TTL countdown, and a hot/idle gauge for the inference GPU. Doge
can now riff on the LLM box too.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>