Commit Graph

22 Commits

Author SHA1 Message Date
Akasei
f192c86c60 fix(oom): use mmap=True for checkpoint loading + malloc_trim + expandable_segments
Root cause: torch.load() reads 6.9GB .ckpt into Python heap + model params
in CPU RAM = ~14GB peak, exceeding 16GB system RAM → OOM Killer.

Fix 1 - mmap=True on all torch.load() calls (torch 2.7 supports this):
  With mmap, checkpoint storage is file-backed (not heap). Only the model
  parameters (also ~7GB) exist in physical RAM during loading. Peak RAM
  drops from ~14GB to ~7GB — within safe limits on 16GB machines.
  Files changed: pipelines.py, hunyuan3ddit.py, model.py (×2), flow_matching_sit.py

Fix 2 - malloc_trim(0) after every gc.collect():
  Forces glibc to return freed heap pages to OS immediately, so Python's
  memory pool doesn't hoard freed model memory before the next load.

Fix 3 - PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True:
  Prevents CUDA allocator fragmentation between model switches.

Fix 4 - Adaptive threshold recalculated:
  With mmap loading, loading a model requires ~7.5GB (model params) not
  14GB. CPU offload threshold lowered from 16GB → 10.5GB, enabling fast
  path on machines with more headroom.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-16 23:18:16 +08:00
Akasei
6534f4ba15 fix: adaptive VRAM strategy + force rembg CPU to prevent OOM
Two root causes of CUDA OOM fixed:

1. onnxruntime-gpu CUDAExecutionProvider pre-allocated ~12GB VRAM arena
   for bria-rmbg background removal, starving PyTorch models.
   Fix: force CPUExecutionProvider in BackgroundRemover (rembg is
   lightweight, runs fine on CPU, frees all VRAM for shape/tex).

2. Previous 'always delete' strategy was wasteful on high-RAM machines.
   New adaptive strategy checks available system RAM at runtime:
   - RAM >= 16GB free: offload i23d to CPU (.to('cpu')) — fast, ~1s
   - RAM <  16GB free: full del + reload from disk — safe, ~20-30s
   This gives instant model switching on 32GB+ machines while keeping
   16GB machines safe from OOM Killer.

Helper functions:
- _prepare_for_tex(): adaptive offload/delete based on RAM check
- _ensure_i23d_worker(): restore from CPU (fast) or disk (slow)
- _get_available_ram_gb(): reads /proc/meminfo
- _can_offload_to_cpu(): threshold check with logging

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-16 22:57:32 +08:00
Akasei
474001da6b feat(rembg): switch background removal to bria-rmbg model
Replace default u2net with bria-rmbg-2.0 for better quality.
BackgroundRemover now accepts model_name param (defaults to 'bria-rmbg').

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-16 22:14:21 +08:00
HuiwenShi
c9b21668e2 Create is_watertight.py 2025-09-24 11:35:53 +08:00
HuiwenShi
5b6885dcf4 Update chamfer_distance.py 2025-09-23 14:10:26 +08:00
HuiwenShi
34746fcbc2 Create chamfer_distance.py 2025-09-23 11:46:01 +08:00
s572915912
b3dd50ba37 Update misc.py
repair
2025-08-06 01:14:49 +08:00
s572915912
d9fc4d31bf Update hunyuandit-mini-overfitting-flowmatching-dinol518-bf16-lr1e4-4096.yaml
repair
2025-08-06 01:12:13 +08:00
s572915912
f4e0307665 Update train_deepspeed.sh 2025-07-11 18:32:16 +08:00
s572915912
f0a008279e Update pipelines.py 2025-07-11 16:51:33 +08:00
s572915912
dc2ea32d76 Update hunyuandit-mini-overfitting-flowmatching-dinol518-bf16-lr1e4-4096.yaml 2025-07-11 16:47:40 +08:00
s572915912
96349ad5d0 Update train_deepspeed.sh 2025-07-11 16:43:40 +08:00
s572915912
de7996251d Update hunyuandit-mini-overfitting-flowmatching-dinol518-bf16-lr1e4-4096.yaml 2025-07-11 16:37:32 +08:00
s572915912
af935af688 Update train_deepspeed.sh 2025-07-11 16:36:46 +08:00
s572915912
f2f19d74a8 Update hunyuandit-mini-overfitting-flowmatching-dinol518-bf16-lr1e4-4096.yaml
add explain
2025-07-11 15:53:01 +08:00
s572915912
8cd92830fb Update train_deepspeed.sh
auto detect
2025-07-11 15:51:55 +08:00
s572915912
b06e6ddf37 Update pipelines.py 2025-07-11 02:29:25 +08:00
Huiwenshi
d0b85dc7d9 fix some 2025-06-26 20:08:17 +08:00
Huiwenshi
e59169a8ec update readme 2025-06-26 16:34:51 +08:00
Huiwenshi
7c92655a0d fix shape training 2025-06-26 16:03:44 +08:00
Huiwenshi
4d67e18386 update 2025-06-14 01:39:07 +08:00
Huiwenshi
c88bee648e init 2025-06-13 23:53:14 +08:00