Hunyuan3D_2.1_Low_VRAM

KawasakiAkasei/Hunyuan3D_2.1_Low_VRAM

Fork 0

Commit Graph

Author	SHA1	Message	Date
Akasei	5acd0a765b	test: WebUI API end-to-end verification (chair.jpg, 227s, no OOM) Generated via gradio_client /generation_all endpoint: - Shape generation: 104s - Face reduction: 2s - RAM check: 9.4GB < 10.5GB threshold → full delete path - Tex pipeline load: ~15s (from HF cache) - Texture generation: 98s - Post-request VRAM: 361 MiB (tex pipeline unloaded) - Zero OOM kills Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-17 00:15:53 +08:00
Akasei	f651475ec5	test: batch generation 9/9 success with mmap+malloc_trim fixes All 9 images processed successfully: - Phase 1: 9/9 shapes generated - Phase 2: 9/9 textured GLBs generated - Zero OOM kills, zero failures Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-16 23:42:20 +08:00
Akasei	f192c86c60	fix(oom): use mmap=True for checkpoint loading + malloc_trim + expandable_segments Root cause: torch.load() reads 6.9GB .ckpt into Python heap + model params in CPU RAM = ~14GB peak, exceeding 16GB system RAM → OOM Killer. Fix 1 - mmap=True on all torch.load() calls (torch 2.7 supports this): With mmap, checkpoint storage is file-backed (not heap). Only the model parameters (also ~7GB) exist in physical RAM during loading. Peak RAM drops from ~14GB to ~7GB — within safe limits on 16GB machines. Files changed: pipelines.py, hunyuan3ddit.py, model.py (×2), flow_matching_sit.py Fix 2 - malloc_trim(0) after every gc.collect(): Forces glibc to return freed heap pages to OS immediately, so Python's memory pool doesn't hoard freed model memory before the next load. Fix 3 - PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True: Prevents CUDA allocator fragmentation between model switches. Fix 4 - Adaptive threshold recalculated: With mmap loading, loading a model requires ~7.5GB (model params) not 14GB. CPU offload threshold lowered from 16GB → 10.5GB, enabling fast path on machines with more headroom. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-16 23:18:16 +08:00

Author

SHA1

Message

Date

Akasei

5acd0a765b

test: WebUI API end-to-end verification (chair.jpg, 227s, no OOM)

Generated via gradio_client /generation_all endpoint:
- Shape generation: 104s
- Face reduction: 2s
- RAM check: 9.4GB < 10.5GB threshold → full delete path
- Tex pipeline load: ~15s (from HF cache)
- Texture generation: 98s
- Post-request VRAM: 361 MiB (tex pipeline unloaded)
- Zero OOM kills

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

2026-03-17 00:15:53 +08:00

Akasei

f651475ec5

test: batch generation 9/9 success with mmap+malloc_trim fixes

All 9 images processed successfully:
- Phase 1: 9/9 shapes generated
- Phase 2: 9/9 textured GLBs generated
- Zero OOM kills, zero failures

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

2026-03-16 23:42:20 +08:00

Akasei

f192c86c60

fix(oom): use mmap=True for checkpoint loading + malloc_trim + expandable_segments

Root cause: torch.load() reads 6.9GB .ckpt into Python heap + model params
in CPU RAM = ~14GB peak, exceeding 16GB system RAM → OOM Killer.

Fix 1 - mmap=True on all torch.load() calls (torch 2.7 supports this):
  With mmap, checkpoint storage is file-backed (not heap). Only the model
  parameters (also ~7GB) exist in physical RAM during loading. Peak RAM
  drops from ~14GB to ~7GB — within safe limits on 16GB machines.
  Files changed: pipelines.py, hunyuan3ddit.py, model.py (×2), flow_matching_sit.py

Fix 2 - malloc_trim(0) after every gc.collect():
  Forces glibc to return freed heap pages to OS immediately, so Python's
  memory pool doesn't hoard freed model memory before the next load.

Fix 3 - PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True:
  Prevents CUDA allocator fragmentation between model switches.

Fix 4 - Adaptive threshold recalculated:
  With mmap loading, loading a model requires ~7.5GB (model params) not
  14GB. CPU offload threshold lowered from 16GB → 10.5GB, enabling fast
  path on machines with more headroom.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

2026-03-16 23:18:16 +08:00

3 Commits