Test using normal approximation
Under some assumptions, it should be possible to derive an approximate test that is a bit "softer" in that it allows a less restrictive null hypothesis. This requires to specify a model, and to formulate that null hypothesis within the model. I'll try this below.
Model
To quote @whuber's comment:
One way to intuit this is to suppose that underlying each treatment is a continuous score. Underlying the data is a random sample of independent score differences. However, you have discretized them into three bins where, under the null hypothesis, the $\pm 1$ bins have equal chances (say $p$) and therefore the $0$ bin has a chance of $1−p$.
Now, let $p_A$ be the probability that treatment A truly works better in an experiment, and
$p_{0A}$ be the probability, that A works better, but is given a $0$ score. Denote the corresponding probabilities for treatment B as $p_B$ and $p_{0B}$.
If experiments are independent, then
your data have a multinomial distribution with probabilities $p_{+1}= p_A-p_{0A}$, $p_{-1}= p_B-p_{0B}$ and
$p_0=p_{0A}+p_{0B}$ for the respective outcomes.
This feeling of doing the "wrong" thing may be due to that we don't want to test the hypothesis $p_A-p_B = 0$ but rather "$p_A-p_B$ is small".
Nullhypothesis
We are in the first round interested in testing the hypothesis $p_A=p_B=1/2$, or, $p_A-p_B = 0$.
Testing $p_A=p_B$
Indeed, to maximize power of this test, you should condition on the number of zeros, and thus ignore them in your analysis, since they carry no information about $p_A-p_B$. Assume that $p_0$ is quite large, but so is $n$, and thus the cases where there are only zeros are negligibly rare, then you virtually never run into the problem that you are left without data.
It still does "feel wrong" to only consider the "-1"s and "+1"s; the example you gave in the comment makes sense:
If I have
(5,-1s), (10000,0s), and (20,1s) I don't care that I get 4x as many 1s than -1s...because the vast majority of the time it makes no difference.
If treatments A and B are equally effective, then you would perhaps have 5005 cases where B was better, and 5020 cases where A was better, that is a difference of $p_A-p_B\approx 0.0015$ - indeed almost negligible. It is not so satisfactory to get a $p$-value close to zero for that (R: binom.test(5, 25)).
"Small difference hypothesis" $p_A - p_B <\delta_0$
Since only $p_{-1}$, $p_{0}$ and $p_{+1}$ are estimable from the data, we need extra assumptions about the probabilities of being given a zero score in order to retrieve $\delta := p_A-p_B$.
- The simplest option is $p_{0A} = p_{0B}=p_0/2$. Then $\delta = p_{+1}-p_{-1}$.
- Another option is $p_{0A} = p_{A}\cdot p_{0}$, $p_{0B} = p_{B}\cdot p_{0}$. Then $\delta = (p_{+1}-p_{-1})/p_0$.
Special underlying models (before discretization) may require more complicated assumptions.
Approximate test for $H_0: \delta = p_A - p_B <\delta_0$, given $p_{0A}=p_{0B}$
For large $n$, we may use a normal approximation to test hypotheses about the probabilities - this should be OK when expected numbers for "+1"s and "-1"s are around five. In the following, we assume $p_{0A} = p_{0B}$, which seems reasonable for cases where the original distributions of the treatment effects are supposed to be "almost equal".
Let $a$ denote the number of "+1" and $b$ denote the number of "-1" outcomes. A suitable test statistic is the estimator for $\delta$ given by $d = (a-b)/n$.
Since the covariance between $a$ and $b$ is $-np_{-1}p_{+1}$, the variance of $a-b$ is $n(p_{+1}(1-p_{+1})+p_{-1}(1-p_{-1})+2p_{-1}p{+1})=n(p_{+1}+p_{-1})-(p_{+1}-p_{-1})^2$.
Using a normal approximation for the number of outcomes, we obtain
$$
\text{pval} = 1 - \Phi\left(\frac{a - b - n\delta_0}{\sqrt{a+b-(a-b)^2/n}}\right).
$$
For sufficiently large $p_0$, we can treat the outcomes for "-1" and "+1" as (almost) independent.
Disclaimer: The test is approximate and assumption based.
Remarks
The normal approximation would also yield confidence intervals, which is maybe more appropriate than a mere $p$-value in the present context. You get an approximate confidence interval for the difference in proportions with R function prop.test. Again, assuming that you can treat the "+1" and the "-1" as independent, for the example (5,-1s), (10000,0s), and (20,1s) you would call
prop.test(c(5, 20), c(10025, 10025))
.
This could be a case for Bayesian analysis, giving soft bounds :-)