Commit Graph

10 Commits

Author SHA1 Message Date
Akasei
3cd767a18d fix(gradio): prevent OOM on 16GB RAM by fully deleting models between uses
Previous hybrid strategy (i23d in CPU RAM, tex del'd) still caused OOM:
- i23d in CPU RAM: ~7GB
- tex loading from disk: ~7GB peak in RAM before GPU transfer
- Total: ~14GB > 16GB system RAM → OOM Killer

New strategy: fully delete both models between uses.
Neither model persists in CPU RAM between requests.
Peak RAM during any load: ~7GB (one model staging to GPU).

Changes:
- Replace _offload_i23d_to_cpu/_restore_i23d_to_gpu with
  _unload_i23d_worker/_ensure_i23d_worker (full del + reload)
- Add double gc.collect() + empty_cache before each load
- Skip i23d startup load in low_vram_mode (load on first request)
- Both models reload from local HF cache (~20-30s each)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-16 22:39:03 +08:00
Akasei
76c36e53eb fix(gradio): fix OOM killer on second request in low_vram_mode
Root cause: _ensure_i23d_worker() reloaded from disk via from_pretrained(),
which loads the ~7GB checkpoint into CPU RAM. If Python GC hadn't freed
previous del'd tensors yet, both old+new copies in RAM → OOM Killer.

Fix: hybrid strategy per model type:
  i23d (shape, ~7.25GB VRAM):
    .to('cpu') ↔ .to('cuda') — stays in RAM, no disk IO, fast switch
  tex_pipeline (texture, ~6.59GB VRAM):
    del + gc + empty_cache ↔ reload from HF cache — full VRAM release

Renamed helpers:
  _unload_i23d_worker()  → _offload_i23d_to_cpu()
  _ensure_i23d_worker()  → _restore_i23d_to_gpu()
  (tex helpers unchanged)

VRAM timeline per request in low_vram_mode:
  shape gen: i23d on GPU (7.25GB), tex unloaded
  → _offload_i23d_to_cpu(): i23d→RAM (0GB VRAM)
  → _ensure_tex_pipeline(): tex loads (6.59GB)
  texture gen: tex on GPU (6.59GB), i23d in RAM
  → _unload_tex_pipeline(): tex del'd (0GB VRAM)
  next request: _restore_i23d_to_gpu(): RAM→GPU (7.25GB)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-16 22:05:08 +08:00
Akasei
9bee8e1844 refactor(gradio): replace CPU offload with direct GPU unload/lazy-load
Instead of .to('cpu') / .to('cuda'), models are now fully del'd from
GPU (no CPU intermediate) and reloaded on demand:

- _unload_i23d_worker(): del + gc.collect() + empty_cache()
- _ensure_i23d_worker(): lazy reload from pretrained if None
- _unload_tex_pipeline(): del + gc.collect() + empty_cache()
- _ensure_tex_pipeline(): lazy load from tex_conf if None

generation_all() flow in low_vram_mode:
  shape gen → _unload_i23d_worker → _ensure_tex_pipeline →
  texture gen → _unload_tex_pipeline
  (shape model reloads on next _gen_shape call via _ensure_i23d_worker)

Startup: tex_pipeline NOT loaded in low_vram_mode (only tex_conf stored),
reducing startup VRAM from ~13.5GB to ~7.25GB.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-16 21:15:56 +08:00
Akasei
5d0405dc68 feat(gradio): apply VRAM optimization and fix texture config
- generation_all(): offload i23d_worker to CPU before texture gen,
  restore after — mirrors batch_generate.py sequential strategy.
  Prevents OOM when both models peak simultaneously on RTX 3080.
- Change texture config: max_num_view 8→9, resolution 768→512.
  768 resolution OOMs (14.6GB activation); 512 is practical max for
  RTX 3080 20GB. max_views 9 gives better texture coverage.
- Only active when --low_vram_mode flag is passed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-16 21:05:14 +08:00
WncFht
00fa3ac012 feat: 为 gradio_app.py 加上 enable_flashvdm 2025-07-13 11:44:49 +08:00
HuiwenShi
8f7b4be92e Update gradio_app.py 2025-06-16 22:13:47 +08:00
HuiwenShi
3f102487ba Update gradio_app.py 2025-06-16 22:12:54 +08:00
Zeqiang Lai
d2465f0427 Update gradio_app.py 2025-06-14 15:36:20 +08:00
Huiwenshi
dd93e7ce4e fix some 2025-06-14 14:32:20 +08:00
Huiwenshi
c88bee648e init 2025-06-13 23:53:14 +08:00