Files
Hunyuan3D_2.1_Low_VRAM/gradio_app.py
Akasei 76c36e53eb fix(gradio): fix OOM killer on second request in low_vram_mode
Root cause: _ensure_i23d_worker() reloaded from disk via from_pretrained(),
which loads the ~7GB checkpoint into CPU RAM. If Python GC hadn't freed
previous del'd tensors yet, both old+new copies in RAM → OOM Killer.

Fix: hybrid strategy per model type:
  i23d (shape, ~7.25GB VRAM):
    .to('cpu') ↔ .to('cuda') — stays in RAM, no disk IO, fast switch
  tex_pipeline (texture, ~6.59GB VRAM):
    del + gc + empty_cache ↔ reload from HF cache — full VRAM release

Renamed helpers:
  _unload_i23d_worker()  → _offload_i23d_to_cpu()
  _ensure_i23d_worker()  → _restore_i23d_to_gpu()
  (tex helpers unchanged)

VRAM timeline per request in low_vram_mode:
  shape gen: i23d on GPU (7.25GB), tex unloaded
  → _offload_i23d_to_cpu(): i23d→RAM (0GB VRAM)
  → _ensure_tex_pipeline(): tex loads (6.59GB)
  texture gen: tex on GPU (6.59GB), i23d in RAM
  → _unload_tex_pipeline(): tex del'd (0GB VRAM)
  next request: _restore_i23d_to_gpu(): RAM→GPU (7.25GB)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-16 22:05:08 +08:00

37 KiB