Question 1

Welche GPU brauche ich für Mistral Small 3 24B lokal?

Accepted Answer

Mistral Small 3 24B hat 24 Mrd. Parameter und braucht in einer praktikablen Quantisierung (Q4_K_M) ca. 15.9 GB VRAM. 17 GPUs aus dem Directory können das Modell fahren. Günstigster Einstieg: AMD Radeon RX 7900 XT.

Question 2

Wie viel VRAM braucht Mistral Small 3 24B?

Accepted Answer

Mistral Small 3 24B braucht in Q4_K_M-Quantisierung ca. 15.9 GB VRAM bei 8k Kontext. Mit GQA-Faktor und KV-Cache eingerechnet. Bei FP16 oder Q8_0 deutlich mehr.

Question 3

Welche Quantisierung für Mistral Small 3 24B?

Accepted Answer

Q4_K_M ist der pragmatische Standard — gut genug für Produktion, halbiert den VRAM-Bedarf. Q8_0 erhält Qualität nahezu vollständig, kostet aber doppelt VRAM. FP16 lohnt sich kaum, außer für Finetuning.

GPU	beste Quant	VRAM-Bedarf	Tokens/Sek	Preis	€/GB
AMD Radeon RX 7900 XT	Q4_K_M	15.9 GB	—	—	—
AMD Radeon RX 7900 XTX	Q4_K_M	15.9 GB	—	—	—
Apple Mac mini M4 Pro 64GB	Q8_0	26.1 GB	—	—	—
Apple Mac Studio M3 Ultra 192GB	Q8_0	26.1 GB	—	—	—
Apple Mac Studio M3 Ultra 96GB	Q8_0	26.1 GB	—	—	—
Apple MacBook Pro M4 Max 128GB	Q8_0	26.1 GB	—	—	—
Apple MacBook Pro M4 Max 64GB	Q8_0	26.1 GB	—	—	—
NVIDIA H100 80GB	Q8_0	26.1 GB	—	—	—
NVIDIA L40S	Q8_0	26.1 GB	—	—	—
NVIDIA GeForce RTX 3090 Ti	Q4_K_M	15.9 GB	—	—	—
NVIDIA GeForce RTX 3090	Q4_K_M	15.9 GB	—	—	—
NVIDIA GeForce RTX 4090	Q4_K_M	15.9 GB	—	—	—
NVIDIA GeForce RTX 5090	Q8_0	26.1 GB	—	—	—
NVIDIA RTX 6000 Ada Generation	Q8_0	26.1 GB	—	—	—
NVIDIA RTX A6000	Q8_0	26.1 GB	—	—	—
NVIDIA Tesla M40 24GB	Q4_K_M	15.9 GB	—	—	—
NVIDIA Tesla P40	Q4_K_M	15.9 GB	—	—	—

Welche GPU für Mistral Small 3 24B lokal?