The most straightforward technique you can try is a one-hot encoding in order to convert your discrete features into numeric ones. However, be aware that this will increase your dimensionality, so it may be more difficult to get higher performance. It's also not quite appropriate since Gaussians are better suited for continuous variables.
If these discrete features are truly important, then I agree with @hxd1011 that you'll need to represent those features separately from the continuous ones, then combine them in the joint.
One way to do this is to consider a "blocking" scheme, where you split your data into groups for every combination of discrete variables. For instance, if you only have two binary variables $A,B$ and the rest of the continuous features are in $X$, then you can split your data into 4 groups: $P(X|A=1,B=1), P(X|A=1,B=0), P(X|A=0,B=1), P(X|A=0,B=0)$. Of course, you'll still need to model the distributions of $A,B$ with what you see fit. After, you can combine them to form the joint:
$$
P(X,A,B) = P(X|A,B)P(A,B)
$$
This way, you can model each of the four group with a GMM if you want, and the categorical features with some other discrete distribution. Note this requires you have a sufficient number of points for each group.