Previous hybrid strategy (i23d in CPU RAM, tex del'd) still caused OOM: - i23d in CPU RAM: ~7GB - tex loading from disk: ~7GB peak in RAM before GPU transfer - Total: ~14GB > 16GB system RAM → OOM Killer New strategy: fully delete both models between uses. Neither model persists in CPU RAM between requests. Peak RAM during any load: ~7GB (one model staging to GPU). Changes: - Replace _offload_i23d_to_cpu/_restore_i23d_to_gpu with _unload_i23d_worker/_ensure_i23d_worker (full del + reload) - Add double gc.collect() + empty_cache before each load - Skip i23d startup load in low_vram_mode (load on first request) - Both models reload from local HF cache (~20-30s each) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
38 KiB
38 KiB