Previous hybrid strategy (i23d in CPU RAM, tex del'd) still caused OOM:
- i23d in CPU RAM: ~7GB
- tex loading from disk: ~7GB peak in RAM before GPU transfer
- Total: ~14GB > 16GB system RAM → OOM Killer
New strategy: fully delete both models between uses.
Neither model persists in CPU RAM between requests.
Peak RAM during any load: ~7GB (one model staging to GPU).
Changes:
- Replace _offload_i23d_to_cpu/_restore_i23d_to_gpu with
_unload_i23d_worker/_ensure_i23d_worker (full del + reload)
- Add double gc.collect() + empty_cache before each load
- Skip i23d startup load in low_vram_mode (load on first request)
- Both models reload from local HF cache (~20-30s each)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Instead of .to('cpu') / .to('cuda'), models are now fully del'd from
GPU (no CPU intermediate) and reloaded on demand:
- _unload_i23d_worker(): del + gc.collect() + empty_cache()
- _ensure_i23d_worker(): lazy reload from pretrained if None
- _unload_tex_pipeline(): del + gc.collect() + empty_cache()
- _ensure_tex_pipeline(): lazy load from tex_conf if None
generation_all() flow in low_vram_mode:
shape gen → _unload_i23d_worker → _ensure_tex_pipeline →
texture gen → _unload_tex_pipeline
(shape model reloads on next _gen_shape call via _ensure_i23d_worker)
Startup: tex_pipeline NOT loaded in low_vram_mode (only tex_conf stored),
reducing startup VRAM from ~13.5GB to ~7.25GB.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- generation_all(): offload i23d_worker to CPU before texture gen,
restore after — mirrors batch_generate.py sequential strategy.
Prevents OOM when both models peak simultaneously on RTX 3080.
- Change texture config: max_num_view 8→9, resolution 768→512.
768 resolution OOMs (14.6GB activation); 512 is practical max for
RTX 3080 20GB. max_views 9 gives better texture coverage.
- Only active when --low_vram_mode flag is passed.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>