In all honesty, Quadratic Variation for Stochastic Processes is an advanced topic, and computing it rigorously from first principles is a graduate-level probability question.
Part 1: Quadratic Variation: Informal "proof"
First, how is Quadratic Variation Defined? For a stochastic process $X_t$, the quadratic variation, denoted $<X_t>$, is defined as (loosely speaking, I provide the rigorous definition at the end):
$$<X_t>=\lim_{n \to \infty} \left(\sum_{i=1}^{i=n}(X_i-X_{i-1})^2\right)$$
So in words, the quadratic variation expresses the sum of square differences, as the mesh-size gets finer and finer. The limit is in the probability sense (see end of this post).
Now, we have:
$$(X_i-X_{i-1})^2=X_i^2-2X_iX_{i-1}+X_{i-1}^2$$
Usually, $X_0:=0$, so ignoring the $X_0$ term, we have:
$dX_i^2=\mu^2di^2+\sigma^2dW_i^2+2\mu di \sigma dW_i$
Notice that as $n \to \infty$, $di \to 0$, so $di^2 \to 0$ even faster. So ignoring the $di^2$ term (and all other $di$-terms of order higher than 1), we can focus on the $dW_i^2$ term. Let's try and compute its expectation & variance:
$$\mathbb{E}[dW_i^2]=\mathbb{E}[W(di)^2]=\mathbb{E}[\left(\sqrt{di}W(1)\right)^2]=di\mathbb{E}[W(1)]=di$$
$$Var\left(dW_i^2\right)=\mathbb{E}[\left(\sqrt{di}W(1)\right)^4]-\mathbb{E}[\left(\sqrt{di}W(1)\right)^2]^2=di^2\mathbb{E}[W(1)^4]-di^2$$
As $n \to \infty$, $di^2 \to 0$ faster than $di \to 0$, so the Variance converges to zero. That is basically what is meant when somebody writes $dW_t^2=dt$
Notice the above is not mathematically rigorous, it's just "hand-waving" to get an intuitive understanding. The rigorous proof is below:
Quadratic Variation: "Rigorous" proof
Formally, Quadratic Variation for a Wiener process $W_t$ is defined as below:
$\forall \epsilon > 0$:
$$\left<W\right>_t:=\lim_{n \to \infty} \mathbb{P}\left(\left|\sum_{i=1}^{i=n}\left(W_{t_i}-W_{t_{i-1}}\right)^2-t\right|>\epsilon\right)=0$$
In words: the probability that the Quadratic variation converges to "$t$", goes to $1$, as the mesh size gets infinitely fine.
The proof is pretty technical, the best one I know is in these lecture notes: but the proof stretches over 5 pages (and they prove convergence almost surely, which is a stronger convergence that "in probability", so it implies convergence in probability).