After going over this post about reward shaping, I still find it difficult to define it w.r.t to my specific problem. Suppose that I measure some state $s=\phi$, which is some angle. I also use actions $a=\tau$ that are motor torques. For example, given the state $\phi_0$, I would like to apply torque $\tau_0$, which will bring my system to a new angle state $\phi_1$. There are $2$ aspects that are important:
- I aim to bound $\phi$ to some range $[\phi_{min},\phi_{max}]$
- Large changes between two consecutive actions should be avoided , i.e $\forall t$ $||a_t-a_{t-1}||<\epsilon$ for some small $\epsilon>0$.
Given those, a Naive reward shaping would be:
- if $\phi\notin[\phi_{min},\phi_{max}]:$ set $r\leftarrow r-\phi$
- if $||a_t-a_{t-1}||\geq\epsilon:$ set $r\leftarrow r-\||a_t-a_{t-1}||$
The problem is that first, $\phi$ and $||a_t-a_{t-1}||$ may very well be on different scales. if for example $\phi\gg||a_t-a_{t-1}||$, the second line wouldn't affect the loss, and vice versa. Furthermore, under a more "physical" scope, $\phi$ might be measured in radians, yet $||a_t-a_{t-1}||$ is not at all in radians. Every physicist will tell you that this means something is wrong!
Therefore, my question is - How can my reward depend on drastically different phenomenons that do not relate to each other?