The dimensions of the decision boundary match the number of decision models you have. The reason $K-1$ models is common is that the $K^{th}$ model is redundant as it is the samples that have not been positively assigned by the previous $K-1$ models. One exception is if you want to have an invalid result where the data does not conform to the expected distribution of any group then you would need $K$ models and samples that fall outside this will have an invalid label. In such situations you would then use $K$ models. So for simplicity I will use the number of models $m$ below, which will typically be $K$ or $K-1$. $v$ will be the number of variables in a sample vector, $o$ the number of observations in the data.
$w^tx$ is indeed a projection of your function w into a vector x, and as it is the inner product of the two, its dimensions depends on the outer dimensions of each matrix/vector. For this to work the dimensions of $w^t$ must be $mxv$ and for $x$ the dimensions are $vxo$.
If $w^t$ is a vector, and $x$ is a vector the product will be a scalar, a single number i.e. if $m>1,o=1$ then you get a $mx1$ vector where each element is the scalar for each model. If $m=1,o>1$ you get an $1xo$ vector where each element is the score for model 1 for each observation. If $m>1,o>1$ then you get a $mxo$ matrix of scores for each model and each observation.
If $o>1$ i.e. x is a matrix, you would to replicate $w0$ to the outer dimension of $x$ which is the number of samples $o$, i.e. you apply the offset to each sample.
If $m>1$ i.e. w is a matrix, you would calculate $w0$ for each model $x$ which is the number of models $m$, i.e. you apply the individual offsets to each model.
$w_0$ is the offset you choose to tune false positives vs false negatives, e.g.through a ROC curve.
This is a useful resource
https://www.google.com/url?sa=t&source=web&rct=j&url=http://www.cedar.buffalo.edu/~srihari/CSE574/Chap4/4.1%2520DiscFns.pdf&ved=2ahUKEwiqnM_ZjMbbAhUCLMAKHT_eCvQQFjAAegQICRAB&usg=AOvVaw0AvH7n8IK72weXR1ZtT6NT