Root cause: _ensure_i23d_worker() reloaded from disk via from_pretrained(),
which loads the ~7GB checkpoint into CPU RAM. If Python GC hadn't freed
previous del'd tensors yet, both old+new copies in RAM → OOM Killer.
Fix: hybrid strategy per model type:
i23d (shape, ~7.25GB VRAM):
.to('cpu') ↔ .to('cuda') — stays in RAM, no disk IO, fast switch
tex_pipeline (texture, ~6.59GB VRAM):
del + gc + empty_cache ↔ reload from HF cache — full VRAM release
Renamed helpers:
_unload_i23d_worker() → _offload_i23d_to_cpu()
_ensure_i23d_worker() → _restore_i23d_to_gpu()
(tex helpers unchanged)
VRAM timeline per request in low_vram_mode:
shape gen: i23d on GPU (7.25GB), tex unloaded
→ _offload_i23d_to_cpu(): i23d→RAM (0GB VRAM)
→ _ensure_tex_pipeline(): tex loads (6.59GB)
texture gen: tex on GPU (6.59GB), i23d in RAM
→ _unload_tex_pipeline(): tex del'd (0GB VRAM)
next request: _restore_i23d_to_gpu(): RAM→GPU (7.25GB)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
37 KiB
37 KiB