RobustSplat++: Decoupling Densification, Dynamics, and Illumination for In-the-Wild 3DGS

1Sun Yat-sen University, 2FNii-Shenzhen, 3SSE, CUHKSZ
4Guangdong Key Laboratory of Information Security Technology
5Osaka University

Animation Results

Scenes:    
Methods:    

Abstract

3D Gaussian Splatting (3DGS) has gained significant attention for its real-time, photo-realistic rendering in novel-view synthesis and 3D modeling. However, existing methods struggle with accurately modeling in-the-wild scenes affected by transient objects and illuminations, leading to artifacts in the rendered images. We identify that the Gaussian densification process, while enhancing scene detail capture, unintentionally contributes to these artifacts by growing additional Gaussians that model transient disturbances and illumination variations. To address this, we propose RobustSplat++, a robust solution based on several critical designs. First, we introduce a delayed Gaussian growth strategy that prioritizes optimizing static scene structure before allowing Gaussian splitting/cloning, mitigating overfitting to transient objects in early optimization. Second, we design a scale-cascaded mask bootstrapping approach that first leverages lower-resolution feature similarity supervision for reliable initial transient mask estimation, taking advantage of its stronger semantic consistency and robustness to noise, and then progresses to high-resolution supervision to achieve more precise mask prediction. Third, we incorporate the delayed Gaussian growth strategy and mask bootstrapping with appearance modeling to handling in-the-wild scenes including transients and illuminations. Extensive experiments on multiple challenging datasets show that our method outperforms existing methods, clearly demonstrating the robustness and effectiveness of our method.

Video

Key Motivation

We identify that the Gaussian densification process, while enhancing scene detail capture, unintentionally contributes to these artifacts by growing additional Gaussians that model transient disturbances.

Analysis of Gaussian densification in transient object fitting

Densification Analysis

As training progresses, vanilla 3DGS suffers from performance degradation and exhibits artifacts due to the increasing number of Gaussians. Disabling Gaussian densification notably improves the results, even achieving performance comparable to the recent robust method SpotLessSplats. Despite producing transient-free rendering, 3DGS w/o densification struggles to recover fine details in regions with sparse Gaussian initialization (highlighted by red arrows).

Methodology

Pipeline

Overview of the proposed method. The main reconstruction pipeline employs 3DGS with Delayed Gaussian Growth, generating rendered images that are optimized with a masked reconstruction loss. The method handles two types of in-the-wild inputs: (1) Images with transient distractors, where a Mask Prediction Branch predicts per-pixel masks to guide transient suppression, with the masks supervised by Scale-cascaded Mask Bootstrapping; (2) Images with both transients and illumination variations, where an Appearance Modeling Branch predicts affine coefficients from 2D, 3D embedding, and original Gaussian colors to modulate the affine Gaussian colors.