Data is just a bunch of measurements. What constitutes the "ground truth" and "external labels" is determined by people.
Take this data for example:
- First image is of a wild dog, then some measurements of its characteristics
- Second image is of a wild cat, then some measurements of its characteristics
- Third image is of a domesticated cat, then some measurements of its characteristics
If you want to perform a supervised learning using you could choose to take wild vs domesticated as ground truth labels. Or you could also choose to take "dog vs cat" as ground truth labels. They are just labels.
If you don't necessarily want to train a machine learning model, but want to find out if there are some natural groupings, you might do clustering. Now, if you want to check if the grouping that "naturally" arose is animal species, you could check against "an outside label" "dog vs cat". Similarly, you could choose to use "wild vs domesticated".
I argue that labels are often made up notions in some contexts. Take the hypothetical image of a wild dog, for example, in an image classification context. There could be a mountain in the background of the wild dog, and, in a completely different image classification problem, the "ground truth label" for the same image could be "mountain" and not "ocean".