Two root causes of CUDA OOM fixed:
1. onnxruntime-gpu CUDAExecutionProvider pre-allocated ~12GB VRAM arena
for bria-rmbg background removal, starving PyTorch models.
Fix: force CPUExecutionProvider in BackgroundRemover (rembg is
lightweight, runs fine on CPU, frees all VRAM for shape/tex).
2. Previous 'always delete' strategy was wasteful on high-RAM machines.
New adaptive strategy checks available system RAM at runtime:
- RAM >= 16GB free: offload i23d to CPU (.to('cpu')) — fast, ~1s
- RAM < 16GB free: full del + reload from disk — safe, ~20-30s
This gives instant model switching on 32GB+ machines while keeping
16GB machines safe from OOM Killer.
Helper functions:
- _prepare_for_tex(): adaptive offload/delete based on RAM check
- _ensure_i23d_worker(): restore from CPU (fast) or disk (slow)
- _get_available_ram_gb(): reads /proc/meminfo
- _can_offload_to_cpu(): threshold check with logging
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
35 lines
1.6 KiB
Python
35 lines
1.6 KiB
Python
# Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
|
|
# except for the third-party components listed below.
|
|
# Hunyuan 3D does not impose any additional limitations beyond what is outlined
|
|
# in the repsective licenses of these third-party components.
|
|
# Users must comply with all terms and conditions of original licenses of these third-party
|
|
# components and must ensure that the usage of the third party components adheres to
|
|
# all relevant laws and regulations.
|
|
|
|
# For avoidance of doubts, Hunyuan 3D means the large language models and
|
|
# their software and algorithms, including trained model weights, parameters (including
|
|
# optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
|
|
# fine-tuning enabling code and other elements of the foregoing made publicly available
|
|
# by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
|
|
|
|
from PIL import Image
|
|
import onnxruntime as ort
|
|
from rembg import remove, new_session
|
|
|
|
|
|
class BackgroundRemover():
|
|
def __init__(self, model_name: str = "bria-rmbg"):
|
|
# Force CPU-only execution for onnxruntime to prevent CUDA arena
|
|
# from consuming ~12GB+ VRAM that PyTorch models need.
|
|
# Background removal is lightweight and runs fast on CPU.
|
|
_orig = ort.get_device
|
|
ort.get_device = lambda: "CPU"
|
|
try:
|
|
self.session = new_session(model_name)
|
|
finally:
|
|
ort.get_device = _orig
|
|
|
|
def __call__(self, image: Image.Image):
|
|
output = remove(image, session=self.session, bgcolor=[255, 255, 255, 0])
|
|
return output
|