Hunyuan3D_2.1_Low_VRAM

Files

Akasei f192c86c60 fix(oom): use mmap=True for checkpoint loading + malloc_trim + expandable_segments

Root cause: torch.load() reads 6.9GB .ckpt into Python heap + model params
in CPU RAM = ~14GB peak, exceeding 16GB system RAM → OOM Killer.

Fix 1 - mmap=True on all torch.load() calls (torch 2.7 supports this):
  With mmap, checkpoint storage is file-backed (not heap). Only the model
  parameters (also ~7GB) exist in physical RAM during loading. Peak RAM
  drops from ~14GB to ~7GB — within safe limits on 16GB machines.
  Files changed: pipelines.py, hunyuan3ddit.py, model.py (×2), flow_matching_sit.py

Fix 2 - malloc_trim(0) after every gc.collect():
  Forces glibc to return freed heap pages to OS immediately, so Python's
  memory pool doesn't hoard freed model memory before the next load.

Fix 3 - PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True:
  Prevents CUDA allocator fragmentation between model switches.

Fix 4 - Adaptive threshold recalculated:
  With mmap loading, loading a model requires ~7.5GB (model params) not
  14GB. CPU offload threshold lowered from 16GB → 10.5GB, enabling fast
  path on machines with more headroom.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

2026-03-16 23:18:16 +08:00

6a57a16c-f38b-4439-9bb2-bd87b78b8bf0

fix(oom): use mmap=True for checkpoint loading + malloc_trim + expandable_segments

2026-03-16 23:18:16 +08:00

7d1297b9-e1e7-409c-9693-748dca1b73df

fix(oom): use mmap=True for checkpoint loading + malloc_trim + expandable_segments

2026-03-16 23:18:16 +08:00

59ef39b3-1ba4-4a6c-8bf1-fd7bdbd64fe4

fix(oom): use mmap=True for checkpoint loading + malloc_trim + expandable_segments

2026-03-16 23:18:16 +08:00

86ad7de9-b196-4e02-9a5e-34b82cbd0280

fix(oom): use mmap=True for checkpoint loading + malloc_trim + expandable_segments