Continual Learning of Personalized Generative Face Models with Experience Replay

Abstract

We introduce a novel continual learning problem: how to sequentially update the weights of a personalized 2D and 3D generative face model as new batches of photos in different appearances, styles, poses, and lighting are captured regularly.

We observe that naive sequential fine-tuning of the model leads to catastrophic forgetting of past representations of the individual's face. We then demonstrate that a simple random sampling-based experience replay method is effective at mitigating catastrophic forgetting when a relatively large number of images can be stored and replayed. However, for long-term deployment of these models with relatively smaller storage, this simple random sampling-based replay technique also forgets past representations.

Thus, we introduce a novel experience replay algorithm that combines random sampling with StyleGAN's latent space to represent the buffer as an optimal convex hull. We observe that our proposed convex hull-based experience replay is more effective in preventing forgetting than a random sampling baseline and the lower bound.

ER-Rand

We show that this simple experience replay method using random sampling, which we dub ER-Rand, works well with a large buffer size.

ER-Hull

We also introduce a new experience replay sampling algorithm, ER-Hull, that chooses how to populate the buffer based on the volume of the convex hull of the samples' corresponding vectors in the StyleGAN latent space.

Quantitative Results

We compare our methods to the upper bound of full finetuning across all timestamps, and the lower bound of naive sequential training on each timestamp without experience replay.

2D Quantitative Results

Continual Learning performance of personalized StyleGAN (MyStyle) in inverting an unseen test image (left) and synthesizing novel appearance (right), evaluated with Average Incremental Performance (AIP) measured with LPIPS (lower is better) and ID similarity (higher is better) as well as Forgetting of both metrics (lower is better), scaled by x10. ER-Rand and ER-Hull perform experience replay with simple random sampling and proposed convex hull optimization in StyleGAN latent space respectively.

3D Quantitative Results

Continual Learning performance of personalized EG3D (My3DGen) in reconstructing an unseen test image, evaluated with Average Incremental Performance (AIP) and Forgetting metrics measured with LPIPS (lower is better) and ID similarity (higher is better), scaled by ×10. ER-Rand and ER-Hull perform experience replay with simple random sampling and proposed convex hull optimization in StyleGAN latent space respectively. Buffer size is 3.

Continual Learning performance of personalized EG3D (My3DGen) in synthesizing novel appearance, evaluated with Average Incremental Performance (AIP) and Forgetting metrics measured with LPIPS (lower is better) and ID similarity (higher is better), scaled by ×10. ER-Rand and ER-Hull perform experience replay with simple random sampling and proposed convex hull optimization in StyleGAN latent space respectively. Buffer size is 3.

Image Reconstruction

2D Reconstruction

Example comparison of our methods with buffer size of 3 vs upper bound (full finetuning) and lower bound (naive sequential training). Celebrity shown: IU

3D Reconstruction

Example comparison of our methods with buffer size of 3 vs upper bound (full finetuning) and lower bound (naive sequential training). Celebrity shown: Harry Styles

Image Synthesis

2D Synthesis

Example comparison of our methods with buffer size of 3 vs upper bound (full finetuning) and lower bound (naive sequential training). Celebrity shown: Margot Robbie

3D Synthesis

Example comparison of our methods with buffer size of 3 vs upper bound (full finetuning) and lower bound (naive sequential training). Celebrity shown: Michael B. Jordan