Hunyuan3D_2.1_Low_VRAM

Author	SHA1	Message	Date
Akasei	3cd767a18d	fix(gradio): prevent OOM on 16GB RAM by fully deleting models between uses Previous hybrid strategy (i23d in CPU RAM, tex del'd) still caused OOM: - i23d in CPU RAM: ~7GB - tex loading from disk: ~7GB peak in RAM before GPU transfer - Total: ~14GB > 16GB system RAM → OOM Killer New strategy: fully delete both models between uses. Neither model persists in CPU RAM between requests. Peak RAM during any load: ~7GB (one model staging to GPU). Changes: - Replace _offload_i23d_to_cpu/_restore_i23d_to_gpu with _unload_i23d_worker/_ensure_i23d_worker (full del + reload) - Add double gc.collect() + empty_cache before each load - Skip i23d startup load in low_vram_mode (load on first request) - Both models reload from local HF cache (~20-30s each) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-16 22:39:03 +08:00
Akasei	76c36e53eb	fix(gradio): fix OOM killer on second request in low_vram_mode Root cause: _ensure_i23d_worker() reloaded from disk via from_pretrained(), which loads the ~7GB checkpoint into CPU RAM. If Python GC hadn't freed previous del'd tensors yet, both old+new copies in RAM → OOM Killer. Fix: hybrid strategy per model type: i23d (shape, ~7.25GB VRAM): .to('cpu') ↔ .to('cuda') — stays in RAM, no disk IO, fast switch tex_pipeline (texture, ~6.59GB VRAM): del + gc + empty_cache ↔ reload from HF cache — full VRAM release Renamed helpers: _unload_i23d_worker() → _offload_i23d_to_cpu() _ensure_i23d_worker() → _restore_i23d_to_gpu() (tex helpers unchanged) VRAM timeline per request in low_vram_mode: shape gen: i23d on GPU (7.25GB), tex unloaded → _offload_i23d_to_cpu(): i23d→RAM (0GB VRAM) → _ensure_tex_pipeline(): tex loads (6.59GB) texture gen: tex on GPU (6.59GB), i23d in RAM → _unload_tex_pipeline(): tex del'd (0GB VRAM) next request: _restore_i23d_to_gpu(): RAM→GPU (7.25GB) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-16 22:05:08 +08:00
Akasei	9bee8e1844	refactor(gradio): replace CPU offload with direct GPU unload/lazy-load Instead of .to('cpu') / .to('cuda'), models are now fully del'd from GPU (no CPU intermediate) and reloaded on demand: - _unload_i23d_worker(): del + gc.collect() + empty_cache() - _ensure_i23d_worker(): lazy reload from pretrained if None - _unload_tex_pipeline(): del + gc.collect() + empty_cache() - _ensure_tex_pipeline(): lazy load from tex_conf if None generation_all() flow in low_vram_mode: shape gen → _unload_i23d_worker → _ensure_tex_pipeline → texture gen → _unload_tex_pipeline (shape model reloads on next _gen_shape call via _ensure_i23d_worker) Startup: tex_pipeline NOT loaded in low_vram_mode (only tex_conf stored), reducing startup VRAM from ~13.5GB to ~7.25GB. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-16 21:15:56 +08:00
Akasei	5d0405dc68	feat(gradio): apply VRAM optimization and fix texture config - generation_all(): offload i23d_worker to CPU before texture gen, restore after — mirrors batch_generate.py sequential strategy. Prevents OOM when both models peak simultaneously on RTX 3080. - Change texture config: max_num_view 8→9, resolution 768→512. 768 resolution OOMs (14.6GB activation); 512 is practical max for RTX 3080 20GB. max_views 9 gives better texture coverage. - Only active when --low_vram_mode flag is passed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-16 21:05:14 +08:00
WncFht	00fa3ac012	feat: 为 gradio_app.py 加上 enable_flashvdm	2025-07-13 11:44:49 +08:00
HuiwenShi	8f7b4be92e	Update gradio_app.py	2025-06-16 22:13:47 +08:00
HuiwenShi	3f102487ba	Update gradio_app.py	2025-06-16 22:12:54 +08:00
Zeqiang Lai	d2465f0427	Update gradio_app.py	2025-06-14 15:36:20 +08:00
Huiwenshi	dd93e7ce4e	fix some	2025-06-14 14:32:20 +08:00
Huiwenshi	c88bee648e	init	2025-06-13 23:53:14 +08:00

10 Commits