Tuning Line follower PID constants with Q-learning

Question

I am currently working on a line follower buggy and have managed to tune the PID constants manually. The buggy follows the line at a moderate speed.

I will now like to take things further and learn new things as well. I read about Q-learning and will like to ask if what I am about to implement is on the right track.

I have chosen:

Three states: last three positions of line sensors
Three rewards: middle position, end of track and less wobbling (measured with gyroscope).
Four actions: $Kp$, $Ki$, $Kd$, and Max speed.

The computation will be made on a PC as the robot is wirelessly connected.

Am I on the right track?
How do I make the 3 constants have "states" because as I understand, the actions have to be non-analog ?
- Do I create a range of numbers close to the constants I have now and the Q-learning decides which is best ? (It's inefficient to just try random numbers)

score 1 · Answer 1 · answered Oct 11 '17 at 16:08

I am currently working on a very similar project, the only difference is that I am using a simulation package (MATLAB Simmechanics) where I have modeled a mobile robot with 2 actuated wheels and a castor wheel. I have 4 sensors, as a result, I am not using the "middle position" as a reward but I can easily modify that.

My model takes parameters such as friction, backlash and motor constants into account hence it should be fairly straight forward to apply the model to a real-life scenario - just like yours. Since my line follower robot is a computer simulation I can try random values of Kp,Ki, and Kd, which is certainly an advantage.

Therefore I am suggesting you to start with a simulation and then use those Kp, Ki, Kd, and Max motor speed (PWM) values in your physical model.

Tuning Line follower PID constants with Q-learning

1 Answers1