Most Popular

1500 questions
6
votes
1 answer

What is the probability of selecting the greedy action in a 0.5-greedy selection method for the 2-armed bandit problem?

I'm new to reinforcement learning and I'm going through Sutton and Barto. Exercise 2.1 states the following: In $\varepsilon$-greedy action selection, for the case of two actions and $\varepsilon=0.5$, what is the probability that the greedy action…
Daviiid
  • 575
  • 5
  • 15
6
votes
1 answer

What is the effect of parallel environments in reinforcement learning?

Do parallel environments improve the agent's ability to learn or does it not really make a difference? Specifically, I am using PPO, but I think this applies across the board to other algorithms too.
Dylan Kerler
  • 273
  • 2
  • 9
6
votes
3 answers

What exactly are partially observable environments?

I have trouble understanding the meaning of partially observable environments. Here's my doubt. According to what I understand, the state of the environment is what precisely determines the next state and reward for any particular action taken. So,…
6
votes
1 answer

Reward interpolation between MDPs. Will an optimal policy on both ends stay optimal inside the interval?

Say I've got two Markov Decision Processes (MDPs): $$\mathcal{M_0} = (\mathcal{S}, \mathcal{A}, P, R_0),\quad\text{and}\quad\mathcal{M}_1 = (\mathcal{S}, \mathcal{A}, P, R_1)$$ Both have the same set of states and actions, and the transition…
Kostya
  • 2,515
  • 10
  • 24
6
votes
1 answer

If $\gamma \in (0,1)$, what is the on-policy state distribution for episodic tasks?

In Reinforcement Learning: An Introduction, section 9.2 (page 199), Sutton and Barto describe the on-policy distribution in episodic tasks, with $\gamma =1$, as being \begin{equation} \mu(s) = \frac{\eta(s)}{\sum_{k \in S}…
Felipe Costa
  • 103
  • 5
6
votes
0 answers

Are generative models actually used in practice for industrial drug design?

I just finished reading this paper MoFlow: An Invertible Flow Model for Generating Molecular Graphs. The paper, which is about generating molecular graphs with certain chemical properties improved the SOTA at the time of writing by a bit and used a…
Adriaan
  • 61
  • 2
6
votes
2 answers

What does "semantic gap" mean?

I was reading DT-LET: Deep transfer learning by exploring where to transfer, and it contains the following: It should be noted direct use of labeled source domain data on a new scene of target domain would result in poor performance due to the…
Kais Hasan
  • 361
  • 2
  • 12
6
votes
1 answer

How does the Alpha Zero's move encoding work?

I am a beginner in AI. I'm trying to train a multi-agent RL algorithm to play chess. One issue that I ran into was representing the action space (legal moves/or honestly just moves in general) numerically. I looked up how Alpha Zero represented it,…
Akshay Ghosh
  • 105
  • 5
6
votes
2 answers

Has any schema-agnostic database engine been implemented?

Has any schema-agnostic database engine been implemented?
Leo
  • 111
  • 6
6
votes
2 answers

Are there RL algorithms that also try to predict the next state?

So far I've developed simple RL algorithms, like Deep Q-Learning and Double Deep Q-Learning. Also, I read a bit about A3C and policy gradient but superficially. If I remember correctly, all these algorithms focus on the value of the action and try…
Ram Rachum
  • 261
  • 1
  • 9
6
votes
1 answer

How does mating take place in NEAT?

In the Evolving Neural Networks through Augmenting Topologies (NEAT) paper it says (p. 110): The entire population is then replaced by the offspring of the remaining organisms in each species. But how does it take place? Are they paired and then…
Miemels
  • 389
  • 2
  • 11
6
votes
1 answer

How to deal with losses on different scales in multi-task learning?

Say I'm training a model for multiple tasks by trying to minimize sum of losses $L_1 + L_2$ via gradient descent. If these losses are on a different scale, the one whose range is greater will dominate the optimization. I'm currently trying to fix…
SpiderRico
  • 990
  • 9
  • 18
6
votes
1 answer

It is possible to use deep learning to give approximate solutions to NP-hard graph theory problems?

It is possible to use deep learning to give approximate solutions to NP-hard graph theory problems? If we take, for example, the travelling salesman problem (or the dominating set problem). Let's say I have a bunch of smaller examples, where I…
Jake B.
  • 181
  • 1
6
votes
1 answer

What should we do when the selection step selects a terminal state?

In Monte Carlo tree search, what should we do when the selection step selects a terminal state (i.e. a won or lost state), which is, by definition, a leaf node? Expansion and simulation is not in order, as it's game over, but does the tree…
degski
  • 163
  • 6
6
votes
2 answers

What exactly are the differences between semantic and lexical-semantic networks?

What exactly are the differences between semantic and lexical-semantic networks?