schizoidman@lemm.ee to Technology@lemmy.worldEnglish · 2 days agoDeepSeek's distilled new R1 AI model can run on a single GPU | TechCrunchtechcrunch.comexternal-linkmessage-square22linkfedilinkarrow-up1175arrow-down118
arrow-up1157arrow-down1external-linkDeepSeek's distilled new R1 AI model can run on a single GPU | TechCrunchtechcrunch.comschizoidman@lemm.ee to Technology@lemmy.worldEnglish · 2 days agomessage-square22linkfedilink
minus-squarebrucethemoose@lemmy.worldlinkfedilinkEnglisharrow-up2·edit-22 days agoDepends on the quantization. 7B is small enough to run it in FP8 or a Marlin quant with SGLang/VLLM/TensorRT, so you can probably get very close to the H20 on a 3090 or 4090 (or even a 3060) and you know a little Docker.
Depends on the quantization.
7B is small enough to run it in FP8 or a Marlin quant with SGLang/VLLM/TensorRT, so you can probably get very close to the H20 on a 3090 or 4090 (or even a 3060) and you know a little Docker.