5

I am currently working on a line follower buggy and have managed to tune the PID constants​ manually. The buggy follows the line at a moderate speed.

I will now like to take things further and learn new things as well. I read about Q-learning and will like to ask if what I am about to implement is on the right track.

I have chosen:

  • Three states: last three positions of line sensors
  • Three rewards: middle position, end of track and less wobbling (measured with gyroscope).
  • Four actions: $Kp$, $Ki$, $Kd$, and Max speed.

The computation will be made on a PC as the robot is wirelessly connected.

  • Am I on the right track?
  • How do I make the 3 constants have "states" because as I understand, the actions have to be non-analog ?
    • Do I create a range of numbers close to the constants I have now and the Q-learning decides which is best ? (It's inefficient to just try random numbers)
Marco
  • 169
  • 7
Zadiq
  • 51
  • 1

1 Answers1

1

I am currently working on a very similar project, the only difference is that I am using a simulation package (MATLAB Simmechanics) where I have modeled a mobile robot with 2 actuated wheels and a castor wheel. I have 4 sensors, as a result, I am not using the "middle position" as a reward but I can easily modify that.

My model takes parameters such as friction, backlash and motor constants into account hence it should be fairly straight forward to apply the model to a real-life scenario - just like yours. Since my line follower robot is a computer simulation I can try random values of Kp,Ki, and Kd, which is certainly an advantage.

Therefore I am suggesting you to start with a simulation and then use those Kp, Ki, Kd, and Max motor speed (PWM) values in your physical model.

csg
  • 244
  • 2
  • 11