Let's say you've arranged your data in vector format: $\bf{y}=\beta\bf{X} + \boldsymbol{\epsilon}$, where $\bf{y}$ and $\bf{X}$ are $n$-dimensional. (I'm assuming from your notation that $\beta$ is scalar, but the formulas below are the same for the multivariate case).
Ordinary Least Squares:
$\hat{\beta} = ({\bf X^TX})^{-1}{\bf X^T y}$
In weighted least-squares:
$\hat{\beta}_W = ({\bf X^TWX})^{-1}{\bf X^T Wy}$,
where ${\bf W}=diag(w_1,\dots,w_n)$ is $n\times n$ and diagonal. If you look at $\bf{W}$ as a means to favor low noise samples, you'd want to use it to suppress high variance samples. In that case a good choice for the diagonal elements of $\bf{W}$ would be to use values proportional to the inverse of $Var(\epsilon_i)$. In the extreme case, for example, sample $j$ with infinite noise variance (i.e., $Var(\epsilon_j)=\infty$) will be ignored, since its corresponding $w_j=0$.