How you perform and calculate such a statistic depends on what you want to learn by from it.
My belief about $R^2$ is that it is a comparison of how your model performs (in terms of square loss) vs how a naïve model performs when it just predicts the mean every time. With this in mind, there are two possibilities for calculating a subgroup $R^2$.
Calculate the usual $R^2=1-\left(\dfrac{
\overset{N_{group}}{\underset{i=1}{\sum}}\left(
y_i-\hat y_i
\right)^2
}{
\overset{N_{group}}{\underset{i=1}{\sum}}\left(
y_i-\bar y
\right)^2
}\right)
$, limited to the $N_{group}$ points in the group.
Calculate $R^2=1-\left(\dfrac{
\overset{N_{group}}{\underset{i=1}{\sum}}\left(
y_i-\hat y_i
\right)^2
}{
\overset{N_{group}}{\underset{i=1}{\sum}}\left(
y_i-\bar y_{group}
\right)^2
}\right)
$, using the mean of just that particular group.
Since you know the group membership, it seems legitimate do use the second formula, which will tell you how your model performs on that one group compared to how you would do if you predicted the mean of that group every time.
When you use the sklearn implementation like you do, I believe that you get this second statistic. There is an issue about using the in-sample vs out-of-sample mean in the sklearn implementation, but these value are (hopefully) quite close and will give similar results.
Your results tell you that, while the model does a better job of predicting (in terms of square loss) than predicting the same mean every time, for some groups, you would be better off predicting the mean of the group than you would if you used your model predictions.
I will venture a guess that, if you run a regression on just the group indicator variables (giving a model that predicts the group mean), you will have lower out-of-sample MSE than your existing model has. If you have many more instances of group $C$ and $D$ that have positive "grouped" $R^2$ than of the other groups, then this might not hold, but if the groups are roughly balanced, this is my prediction. You seem to do a better job of predicting by using the group means than by using your model predictions.
(If you take the stance that $R^2$ measures the proportion of explained variance, by limiting your analysis to just one group, you are cutting down the variance, so of course a smaller proportion of the variance is explained. There are issues about this, since such an interpretation of $R^2$ need not apply, but this might give you an intuition about why your grouped $R^2$ values are lower than your overall $R^2$ value.)