This question seems to repeat itself every now and then (see here, for example), so I'll just summarize the answers and sources that have been accumulated in the time since this question was asked.
Redefine Objective
- Ordinal Categorical Classification
A modified version of the cross-entropy, adjusted to ordinal target. It penalizes the model for predicting a wrong category that is further away from the true category more than for predicting a wrong category that is closer to the true category.
$$l(y,\hat{\ y}) = (1+\omega) * CE(y,\hat{\ y}), \text{ s.t.} $$
$$\omega = \dfrac{|class(\hat{\ y})-class(y)|}{k-1}$$
Where $k$ is the number of possible classes. The operation () the predicted class of an observation that is achieved by arg-maxing the probabilities in a muti-class prediction task.
A Keras implementation of the ordinal cross-entropy is available here. An example (taken from the link)
import ordinal_categorical_crossentropy as OCC
model = your model here
model.compile(loss=OCC.loss, optimizer='adam', metrics=['accuracy'])
Treat your problem as a regression problem
Instead of classification task, treat your task as a regression task, and then round your prediction/map them to categories using any kind of method. As Stephan Kolassa mentions, the underlying assumption of this method is that one's scores are interval scaled.
Cumulative Link Loss
Originally proposed here. It is based on the logistic regression model, but with a link function that maps the logits to the cumulative probabilities of being in a given category or lower. A set of ordered thresholds splits this space into the different classes of the problem.The projections are estimated by the model.
Under this method there two loss functions are suggested, probit based and logitbased.
A keras implementation of the two versions is available here.
Other methods I won't cover
The following table was taken from here and describe other loss functions for ordinal categorical target.

Treat ordinal categories as multi-label problem
Under this method, we convert our ordinal targets into a matrix such that every class $c$ includes all the first $c$ entries being 1, as the following suggest:
Lowest -> [1,0,0,0,0]
Low -> [1,1,0,0,0]
Medium -> [1,1,1,0,0]
High -> [1,1,1,1,0]
Highest -> [1,1,1,1,1]
Then the loss function has to be changed as this paper suggest. See this blog-post for an implementation.