fix: adaptive VRAM strategy + force rembg CPU to prevent OOM
Two root causes of CUDA OOM fixed:
1. onnxruntime-gpu CUDAExecutionProvider pre-allocated ~12GB VRAM arena
for bria-rmbg background removal, starving PyTorch models.
Fix: force CPUExecutionProvider in BackgroundRemover (rembg is
lightweight, runs fine on CPU, frees all VRAM for shape/tex).
2. Previous 'always delete' strategy was wasteful on high-RAM machines.
New adaptive strategy checks available system RAM at runtime:
- RAM >= 16GB free: offload i23d to CPU (.to('cpu')) — fast, ~1s
- RAM < 16GB free: full del + reload from disk — safe, ~20-30s
This gives instant model switching on 32GB+ machines while keeping
16GB machines safe from OOM Killer.
Helper functions:
- _prepare_for_tex(): adaptive offload/delete based on RAM check
- _ensure_i23d_worker(): restore from CPU (fast) or disk (slow)
- _get_available_ram_gb(): reads /proc/meminfo
- _can_offload_to_cpu(): threshold check with logging
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
@@ -13,12 +13,21 @@
|
||||
# by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
|
||||
|
||||
from PIL import Image
|
||||
import onnxruntime as ort
|
||||
from rembg import remove, new_session
|
||||
|
||||
|
||||
class BackgroundRemover():
|
||||
def __init__(self, model_name: str = "bria-rmbg"):
|
||||
self.session = new_session(model_name)
|
||||
# Force CPU-only execution for onnxruntime to prevent CUDA arena
|
||||
# from consuming ~12GB+ VRAM that PyTorch models need.
|
||||
# Background removal is lightweight and runs fast on CPU.
|
||||
_orig = ort.get_device
|
||||
ort.get_device = lambda: "CPU"
|
||||
try:
|
||||
self.session = new_session(model_name)
|
||||
finally:
|
||||
ort.get_device = _orig
|
||||
|
||||
def __call__(self, image: Image.Image):
|
||||
output = remove(image, session=self.session, bgcolor=[255, 255, 255, 0])
|
||||
|
||||
Reference in New Issue
Block a user