2

Sorry for this dummy question based on the number of contents in the field that I am asking about but It seems that there are tons of texts and videos explaining what Reinforcement Learning is, and most of them are really copying what the first group did. Unfortunately, some of them are even copying the script which I can assure you that they do not understand each cell does what. I have a problem with optimization regarding an opening gate with respect to a condition to make it optimized for opening and closing in the best condition. I think my agent is the one who decides to open the gate and close it. My environment is the whole environment in that gate and my agent exists. Now the state is the situation after opening the gate and defining Rewards based on that.

NOW my Question is, "Do I need a dataset of opening and closing the gate in order to train the agent?" Or in Reinforcement Learning the whole process is done using random numbers? I mean, what data is used for RL to train the AI-agent? Given the fact, all the examples and videos are based on some prepared video games which are available on gym openAI, do you think everything for my case should be written from the scratch? If I have made mistake is terms of using the terminologies in the field please correct me.

Many thanks

john22
  • 127

2 Answers2

3

For reinforcement learning you do not need a dataset as the agent has to learn how to act by repeated trial-and-error interactions with the environment, so having a dataset is of no use here. You need to be able to simulate the whole environment from any position.

Imitation learning, however, learns by trying to replicate what someone else (for ex. an expert) has done, so the dataset must exist in this case. The agent won't interact with the environment but it will try to learn to act like someone else (on the state space that was explored).

user2974951
  • 7,813
  • 1
    off policy reinforcement learning in my opinion is a RL algorithm with a dataset – Alberto Jan 17 '23 at 14:22
  • 1
    @AlbertoSinigaglia That seems to be the case yes, I was not aware of this. – user2974951 Jan 17 '23 at 16:31
  • I think you are mixing off policy with offline RL. Off policy RL algorithms (like DQN) don't always require a dataset. Have a look here - https://stats.stackexchange.com/questions/184657/what-is-the-difference-between-off-policy-and-on-policy-learning – desert_ranger Jan 17 '23 at 22:06
  • @desert_ranger: No, you are correct, off-policy learning does not necessarily need a dataset, since the the state-value (or value)-function can be learned for another policy than the agent is executing. BUT off-policy learning can be used to learn from a dataset, since a dataset can just be regarded as the result of some given policy and we want to learn the optimal policy from that, which can perfectly be done by off-policy learning. – pythonic833 Jan 30 '23 at 14:56
  • Edit to my previous comment: the actions recorded in the datasets of course need to hold the usual conditions for off-policy learnings – pythonic833 Jan 30 '23 at 15:05
3

The answer depends upon the type of Reinforcement Learning algorithm you'd want to use. In case you use an online RL algorithm like DQN, you'd want a simulator that presents you with new data from the environment (state, action, reward, new state) during every time step. You can very easily adapt gym to your simulator. The advantage of doing that, is that it allows you to reference various RL algorithms and libraries, since all of them use the gym API to interact with the environment.

If you are working with offline RL, you'd need a complete dataset ,in advance, similar to the ones used in supervised learning.

Finally, I doubt your agent will learn anything if your dataset is purely random. Although that once again depends on your problem complexity and the type of RL algorithm, you implement.