(I might have to change this if your answers to my comment-question surprise me.)
The model, as you are describing it, is modeling within-unit changes in their propensity to change from a previous state, regardless of what that state was. Decomposing the within-between effects means that your within-unit coefficients are modeling the change in propensity to change from a previous state that results from the change in IV1 and IV2, controlling for each unit's overall propensity to change during the panel and each unit's average "levels" at IV1 and IV2. This is pretty confusing, and your effects will be totally agnostic toward the qualitative features of each qualitative state: it's only reporting (within) change in propensity to change and (between) panel-average propensity to change. Assuming this really is what you want...
Re: "Is it OK to use Wave 1 with all 0s in the dependent variable for this specification?"
Assuming Wave 1 = 0 has nothing to do with which qualitative state units are in, I think you would not want to include this wave. The model will not interpret this as a "baseline" but as equivalent to state persistence from the pre-panel wave. It will inform the coefficient in the same way that persistence would from wave 1 to wave 2 and wave 2 to wave 3. The within-unit coefficients are equivalent to including unit dummies in the model or demeaning from the unit-mean over the panel. Having that first wave will bias the estimates by effectively adding a zero to the calculation of the unit mean from which waves 2 and 3 deviate.
More generally, there is evidence that generalized linear models (like this one) do have some inherent bias in the estimation. It's often quite small, though. You could run this two-wave model and compare your within-unit coefficients to those of a traditional fixed-effects logistic regression panel model: see examples of packages.