OVERFIT
You want the model to chase after extreme values, even though there might be little credibility to extreme predictions. I say the way to do this is to allow for considerable flexibility that will allow the model to chase after these points, at the danger of overfitting (but you seem to be okay with this). In other words, incentivize the risky predictions by penalizing it for missing the observed (extreme) values during training (which is typical of regression), and give it the ability to get close to those values and not have to be penalized.
When there are extreme observations, the enormous flexibility will allow the model to chase after them and get tight fits to them. However, that will drag the predictions away from the mainstream of the data, where true values tend to be. If the features distinguish such extreme points from the bulk of the observations, then this will be desirable, and your extreme predictions will be reliable: predictions far higher or far lower than the overall mean or median will tend to happen only when the observed values tend to be much higher or lower, respectively, than the bulk of the data as measured by the mean or median. If you lack features that distinguish such predictions, then these extreme predictions will not be reliable: most of the time, when you predict something especially high or especially low, the obseration will be fairly mundane.
Let's look at a simulation.
library(nnet)
set.seed(2023)
N <- 100
a <- -10
b <- +10
x <- seq(a, b, (b - a)/(N - 1))
Ey <- sin(x)
d <- rbinom(N, 1, 0.2)
e <- (1 - d)*rnorm(N) + d*rt(N, 1.01)
y <- Ey + e
plot(x, y)
lines(x, Ey)
L <- nnet::nnet(
y ~ x,
size = 300,
linout = T,
maxit = 2500
)
lines(x, predict(L), col = 'red')

The simulation gives an expected value of the outcome $y$ that follows a sine wave. Then there is an additive error that is mixed between standard normal and a heavy-tailed $t$-distribution that gives some extreme values. When we fit a highly flexible neural network that is able to chase after the extreme values, we get a tight fit and a model that is willing to make extreme predictions. However, when we get new data, we see that these extreme predictions do not correspond with extreme observations. In such a situation, even though the model predicts extreme values, there is little reason to anticipate an extreme value will be observed.
set.seed(2024)
par(mfrow = c(2, 2))
d1 <- rbinom(N, 1, 0.2)
e1 <- (1 - d1)*rnorm(N) + d1*rt(N, 1.01)
plot(x, Ey + e1, ylim = c(-25, 15))
lines(x, predict(L), col = 'red')
d2 <- rbinom(N, 1, 0.2)
e2 <- (1 - d2)rnorm(N) + d2rt(N, 1.01)
plot(x, Ey + e2, ylim = c(-25, 15))
lines(x, predict(L), col = 'red')
d3 <- rbinom(N, 1, 0.2)
e3 <- (1 - d3)rnorm(N) + d3rt(N, 1.01)
plot(x, Ey + e3, ylim = c(-25, 15))
lines(x, predict(L), col = 'red')
d4 <- rbinom(N, 1, 0.2)
e4 <- (1 - d4)rnorm(N) + d4rt(N, 1.01)
plot(x, Ey + e4, ylim = c(-25, 15))
lines(x, predict(L), col = 'red')
par(mfrow = c(2, 2))

In all four instances, the big spikes toward $\pm10$ are far away from the observed values.
Consequently, if you want to get the AI system to make extreme ("risky") predictions, even if those predictions will not be reliable, overfitting to the training data is a path forward, and this shows the dangers of such wanting the AI system to make such predictions that are not reliable.
EDIT
A more illustrative simulation might be the one below.
library(nnet)
library(data.table)
set.seed(2023)
N <- 100
a <- -10
b <- +10
x <- seq(a, b, (b - a)/(N - 1))
Ey <- sin(x)
d <- rbinom(N, 1, 0.2)
# e <- (1 - d)*rnorm(N) + d*rt(N, 2.01)
e <- rt(N, 4.1)
y <- Ey + e
par(mfrow = c(1, 2))
plot(x, y)
lines(x, Ey)
L <- nnet::nnet(
y ~ x,
size = 300,
linout = T,
maxit = 2500
)
preds <- predict(L)
lines(x, preds, col = 'red')
R <- 250
L <- list()
L[[1]] <- data.frame(
x = x,
y = y
)
for (i in 1:R){
d1 <- rbinom(N, 1, 0.2)
# e1 <- (1 - d1)*rnorm(N) + d1*rt(N, 2.01)
e1 <- rt(N, 4.1)
d <- data.frame(
x = x,
y = Ey + e1
)
L[[i+1]] <- d
}
dat <- data.table::rbindlist(L)
plot(dat)
lines(x, preds, col = 'red')
par(mfrow = c(1, 1))

On the left, when we make those two really extreme predictions beyond $\pm 5$, it looks like we're spot on. However, when we look at a ton of data, we see how silly those predictions are. Imagine telling your boss that, when $x = 7.777778$, then the prediction is about $8.6$. I would imagine a (warranted, I believe) response of, "Well, n-l-i, then why do we observe $x = 7.777778$ all the time yet basically never get $y \approx 8.6$ when we do?"
EDIT 2
The model already has incentive to make extreme ("risky") predictions, as the loss function is high when those value occur yet modest predictions are made. Giving flexibility allows the model to fit to those extreme predictions and reduce the loss, which it wants to do, but there is a risk that you will...
...OVERFIT