The authors mean to say that the achieved Precision is better than the baseline Precision we would get at random. For example, if we have 1000 examples with 200 being positive and 800 being negative, if we at-random pick 150 points and label them all as positive, we expect ~30 out of them to be positive and thus have ~20% (=$\frac{\sim30}{150}$) Precision. This CV.SE thread on What is "baseline" in precision recall curve explores this further. (Like-wise our Recall-at-random is directly related to the size of the subsample we choose to label as positive - if we randomly label 50% of our whole sample as positive we will find ~50% of our positive instances so our Recall will be ~50% too.)
For this paper here, the authors report the precision metric because the clustering has to be able to detect clusters with high enough proportions of the poorly-performing subclasses for the second stage of their procedure (i.e. the model trained using these clusters as groups for the grouped distributionally robust optimization (GDRO)) to substantially improve performance on each of these subclasses. If the clusters had low precision didn't, the GDRO would have very noisy group labels to work with.