I'm reading a research paper on generating/synthesizing videos:
MoCoGAN: Decomposing Motion and Content for Video Generation
To evaluate the generated videos, they have used a metric called 'Average Content Distance'. I couldn't find any material on google related to this. Can anyone please explain what Average Content Distance means?
Here is the snippet from the paper
we first computed the average color of the generated shape in each frame. Each frame was then represented by a 3-dimensional vector. The ACD is then given by the average pairwise L2 distance of the per-frame average color vectors.
What I understood from this is as follows:
For each frame, convert rgb to gray (average of color). Then for successive frame, calculate the l2 distance.
$$\frac{1}{MN} \sum_{x=1}^{M}\sum_{y=1}^{N}{(Frame_i(x,y) - Frame_{i+1}(x,y))^2}$$
This gives ACD. Have I understood it correctly?
Also, how does this metric represents quality of a video? How can this be used to compare qualities of different generated videos? You can also point me towards some references.
Thanks!
l2distance between the successive vectors.ACD was originally introduced in MoCoGAN paper. The paper you have referred to has extended it.
And the code you referred to, I couldn't understand it 100%, but I got the idea. Thanks a lot
– Nagabhushan S N May 27 '19 at 13:10