Two root causes of CUDA OOM fixed:
1. onnxruntime-gpu CUDAExecutionProvider pre-allocated ~12GB VRAM arena
for bria-rmbg background removal, starving PyTorch models.
Fix: force CPUExecutionProvider in BackgroundRemover (rembg is
lightweight, runs fine on CPU, frees all VRAM for shape/tex).
2. Previous 'always delete' strategy was wasteful on high-RAM machines.
New adaptive strategy checks available system RAM at runtime:
- RAM >= 16GB free: offload i23d to CPU (.to('cpu')) — fast, ~1s
- RAM < 16GB free: full del + reload from disk — safe, ~20-30s
This gives instant model switching on 32GB+ machines while keeping
16GB machines safe from OOM Killer.
Helper functions:
- _prepare_for_tex(): adaptive offload/delete based on RAM check
- _ensure_i23d_worker(): restore from CPU (fast) or disk (slow)
- _get_available_ram_gb(): reads /proc/meminfo
- _can_offload_to_cpu(): threshold check with logging
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
39 KiB
39 KiB