An interesting property of AUC is that it does not change unless you change the ordering of the points. For instance, if you divide every value by two, the AUC is the same.
library(pROC)
set.seed(2023)
N <- 1000
p <- rbeta(N, 1, 1)
y <- rbinom(N, 1, p)
pROC::roc(y, p)$auc # I get 0.8481
pROC::roc(y, p/2)$auc # Again, I get 0.8481
In this regard, the AUC does not consider calibration; AUC does not penalize the model for making predictions that are detatched from reality, such as having events with predictions equal to $0.2$ that happen $50\%$ of the time. AUC is strictly a measure of ability to discriminate between categories.
Log loss, however, considers both calibration and discrimination. The function penalizes predictions of category $1$ members for being away from $1$ and predictions of category $0$ members for being away from $0$, so it certainly covers discrimination. Calibration is harder to see from the equation, but by being a strictly proper scoring rule, it can be thought of as seeking out the true conditional probabilities of class membership (which we hope are extreme so we get good discrimination between categories, but we are not assured of that). Brier score, which is another strictly proper scoring rule, has an explicit decomposition into calibration and discrimination.
What this result tells me is that you are harming your calibration without making much improvement to your discrimination, and I would consider this a net negative.
The reason you harm your calibration is that you do not penalize mistakes equally in your loss function. Your loss function is designed to give especially high probabilities of membership in the minority class, and when you go test your model on data that have a low probability of membership in the minority class, the probability is overestimated. This is considered a feature, not a bug, by proponents of weighted loss functions.
Your ability to discriminate between classes has minimal change because the goal of the weighted loss is just to increase the probability values in order to have more predictions of the minority class that are large. This is not a perfect analogy, but it is as if you just divide your predictions by the largest predicted probability value. You do not change the order, so the ability for the model to discriminate between categories does not change, but doing so means that you get higher predicted values.
Mostly, class imbalance is a non-problem for proper statistical methods, and attempts to "fix" class imbalance typically stem from using a threshold of $0.5$ and trying to force your predictions to fall on the correct side of that threshold, which seems to be how the weighted loss function is used here.
However, you do not have to use $0.5$ as a threshold. In fact, you do not have to use any threshold at all, and the raw predictions can be useful. This link gives a good discussion of why and links to other good material.