reward function design in reinforcement learning

Question

I got confused after reviewing several Q/A on this topic.

As in "how to make a reward function in reinforcement learning", the answer states "For the case of a continuous state space, if you want an agent to learn easily, the reward function should be continuous and differentiable"

While in "Is reward function needed to be continuous in deep reinforcement learning", the answer clearly state "No, there is no requirement for reward to be drawn from any continuous function. That is because the value of Rt is produced by the environment, independently of the parameters θ that the policy gradient is with respect to."

As also discussed in many other papers, blogs etc, reward function selection would paly a big role in convergence. But I'm not sure which of the above statements are more accurate or both of them are correct but they are talking about different aspects?

I wrote the accepted answer on the second question. Quite horrified to find the highly upvoted and incorrect (IMO) answer on the first question - although parts of that answer are correct and useful, the statement about reward function is way off, and the writer probably meant value function. Obviously I should not answer here, you need a second opinion. But I am strongly of the opinion that I am correct, and have commented on the accepted answer to the first question with what I think is the answerer's mistake. It was 3 years' ago though, so probably will not get a response. — Neil Slater, May 23 '19 at 08:33
Thanks for clarification. I read your deduction and I thought that's correct. But then how come so many papers mentiond "reward function should be carefully designed" if it has no impact on learning of parameters? I'll need to spend more time on the papers your linked though. Probably I missed something. — user2189731, May 23 '19 at 09:11
There's a big difference between "carefully designed" and "needs to be differentiable". Also, no-one is saying it has no impact on learning parameters - it does have a lot of impact, but you are not simply free to change it. I have another answer here that goes into more depth: https://ai.stackexchange.com/questions/12264/deciding-on-a-reward-per-each-action-in-a-given-state-q-learning/12299#12299 — Neil Slater, May 23 '19 at 09:16

reward function design in reinforcement learning

0 Answers0