Root cause: torch.load() reads 6.9GB .ckpt into Python heap + model params
in CPU RAM = ~14GB peak, exceeding 16GB system RAM → OOM Killer.
Fix 1 - mmap=True on all torch.load() calls (torch 2.7 supports this):
With mmap, checkpoint storage is file-backed (not heap). Only the model
parameters (also ~7GB) exist in physical RAM during loading. Peak RAM
drops from ~14GB to ~7GB — within safe limits on 16GB machines.
Files changed: pipelines.py, hunyuan3ddit.py, model.py (×2), flow_matching_sit.py
Fix 2 - malloc_trim(0) after every gc.collect():
Forces glibc to return freed heap pages to OS immediately, so Python's
memory pool doesn't hoard freed model memory before the next load.
Fix 3 - PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True:
Prevents CUDA allocator fragmentation between model switches.
Fix 4 - Adaptive threshold recalculated:
With mmap loading, loading a model requires ~7.5GB (model params) not
14GB. CPU offload threshold lowered from 16GB → 10.5GB, enabling fast
path on machines with more headroom.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two root causes of CUDA OOM fixed:
1. onnxruntime-gpu CUDAExecutionProvider pre-allocated ~12GB VRAM arena
for bria-rmbg background removal, starving PyTorch models.
Fix: force CPUExecutionProvider in BackgroundRemover (rembg is
lightweight, runs fine on CPU, frees all VRAM for shape/tex).
2. Previous 'always delete' strategy was wasteful on high-RAM machines.
New adaptive strategy checks available system RAM at runtime:
- RAM >= 16GB free: offload i23d to CPU (.to('cpu')) — fast, ~1s
- RAM < 16GB free: full del + reload from disk — safe, ~20-30s
This gives instant model switching on 32GB+ machines while keeping
16GB machines safe from OOM Killer.
Helper functions:
- _prepare_for_tex(): adaptive offload/delete based on RAM check
- _ensure_i23d_worker(): restore from CPU (fast) or disk (slow)
- _get_available_ram_gb(): reads /proc/meminfo
- _can_offload_to_cpu(): threshold check with logging
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace default u2net with bria-rmbg-2.0 for better quality.
BackgroundRemover now accepts model_name param (defaults to 'bria-rmbg').
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>