In GRU units, I don't understand the effective difference between the update and reset gate, $z_t$ and $r_t$ respectively. \begin{align} z_t &= \sigma_g(W_{z} x_t + U_{z} h_{t-1} + b_z) \\ r_t &= \sigma_g(W_{r} x_t + U_{r} h_{t-1} + b_r) \\ \hat{h}_t &= \phi_h(W_{h} x_t + U_{h} (r_t \odot h_{t-1}) + b_h) \\ h_t &= z_t \odot \hat{h}_t + (1-z_t) \odot h_{t-1} \end{align}
in the final rule, $z_t$ should select how much information preserve from $\hat{h}_t$ and the old state $h_{t-1}$, but the amount of $h_{t-1}$ to retain is already selected in $\hat{h}_t$ by $r_t$. So, why a new selection of $h_{t-1}$ is made again with $1-z_t$?