Great question, VGG is mostly used as feature extraction. As such we can compare the similarity between the styles of the two images. Usually ground truth batch of images and the ones that are generated.
This is mainly used as an extra term in the loss function. As shown below s_loss and c_loss
This can be seen on the GitHub repo of the paper in question:
if self.content_lambda != 0. or self.style_lambda != 0.:
vgg_generated_images = self.pass_to_vgg(generated_images)
if self.content_lambda != 0.:
c_loss = self.content_lambda * self.content_loss(
self.pass_to_vgg(source_images), vgg_generated_images)
g_total_loss = g_total_loss + c_loss
if self.style_lambda != 0.:
s_loss = self.style_lambda * self.style_loss(
self.pass_to_vgg(target_images[:vgg_generated_images.shape[0]]),
vgg_generated_images)
g_total_loss = g_total_loss + s_loss
d_grads = d_tape.gradient(d_total_loss, discriminator.trainable_variables)
g_grads = g_tape.gradient(g_total_loss, generator.trainable_variables)
Where the VGG is initialized and loaded at the begging of training, as shown below:
base_model = VGG19(weights="imagenet", include_top=False, input_shape=input_shape)
tmp_vgg_output = base_model.get_layer("block4_conv3").output
tmp_vgg_output = Conv2D(512, (3, 3), activation='linear', padding='same',
name='block4_conv4')(tmp_vgg_output)
self.vgg = tf.keras.Model(inputs=base_model.input, outputs=tmp_vgg_output)
self.vgg.load_weights(os.path.expanduser(os.path.join(
"~", ".keras", "models",
"vgg19_weights_tf_dim_ordering_tf_kernels_notop.h5")), by_name=True)