I'm trying to use scipy to fit a $\tanh$ function to some data. The data is of the form $(x_i, y_i)$ for $i=1,\cdots,N$, where $0\leq y_i \leq 1$. I choose $x_i$ to be linearly spaced, such that $x_0=0$ and $x_N=1$. $y_i$ are further obtained from repeating an experiment $M$ times, and checking an event happened or not ($0$ -> did not happen, $1$ -> did happen). The $y$ values are thus a point estimate of probability $p$ for a binomial variable: $y_i=\frac{1}{N}\sum\limits_{j=1}^M X_j$. The standard deviation of $y_i$ is thus $s_i=\sqrt{y_i(1-y_i)/M}$. This is how I end up with data that is spread like $0.5\tanh(a(x-b))+0.5$, which is what I'm trying to fit.
There is more spread in the data if the values are around $0.5$, but many data points at $0$ and $1$ have a standard deviation of $s_i=0$ (such that $X_j=0$ or $X_j=1$ for all measurement repeats $j$).
I tried the method described in this answer. For that particular code, if I set one of the standard deviations to be 0, i.e.:
y_spread[3] = 0
then I get a runtime error:
RuntimeWarning: divide by zero encountered in divide
This makes sense to me, as $s_i=0$, and you can't divide by $0$. Now, the question is, what is the correct way to handle this, statistically? A quick and dirty way could be to set the error to be something small, like 1e-6, when it is 0. This does result in a fit, but am I angering the statistics gods?
EDIT: added more information as requested in the comments.
20times, which gives me a bunch of1s and0s (because I'm checking whether an event happened or not), the average of which is $0\leq y_i \leq 1$. Standard deviation on $y_i$ are thus $s_i=\sqrt{y_i(1-y_i)/20}$. So I choose $x_i$ to be linearly spaced. I'm using the default inscipy.optimize.curve_fit, so Levenberg-Marquardt. – sodiumnitrate Mar 30 '23 at 19:58