One reason to prefer k binary classifiers is that this can be used for Multi-Label classification.
Here your likelihood function reads like this:
$L=\prod_{i=1}^n (\prod_k (\prod_{j_k}(p_{k,j_k}(x_i)^{\delta_{y_{i,k},j_k}})))
$
where index $i$ runs over the samples, index $k$ runs over the labels, $j_k$ indicates the binary outcomes 0 or 1, $\delta_{a,b}$ indicates the Kronecker delta,$y_{i,k}\in {0,1}$ indicates the multiple hot encoded labels of sample $i$, see https://en.m.wikipedia.org/wiki/Multi-label_classification
For multi class classification there may only be one class assigned and the likelihood is
$L = \prod_{i=1}^n P(Y_i=y_i) = \prod_{i=1}^n \left( \prod_{k=1}^K P(Y_i=k)^{\delta_{k,y_i}} \right)$, compare https://en.m.wikipedia.org/wiki/Multinomial_logistic_regression.
Multilabel classification is (in a sense) a generalization of the multiclass classification.
If you already know that classes are exclusive then the multi-label setting would allow you for too much flexibility (the event takes place either in city A or in city B - but not both). A multi label model might output (0.7,0.8) while a multi-class model would have to trade off, outputting (0.3, 0.7) due to the constraint of class exclusivity and normalization of the output.
PS: in a neural net for multi class classification the last activation function would typically be a softmax (which makes sure that the predicted class probabilities sum up to 1 for each sample). For the binary classification setting, you could simply change that to an (elementwise) sigmoid function.