Most Popular
1500 questions
6
votes
1 answer
What is the probability of selecting the greedy action in a 0.5-greedy selection method for the 2-armed bandit problem?
I'm new to reinforcement learning and I'm going through Sutton and Barto. Exercise 2.1 states the following:
In $\varepsilon$-greedy action selection, for the case of two actions and $\varepsilon=0.5$, what is the probability that the greedy action…
Daviiid
- 575
- 5
- 15
6
votes
1 answer
What is the effect of parallel environments in reinforcement learning?
Do parallel environments improve the agent's ability to learn or does it not really make a difference? Specifically, I am using PPO, but I think this applies across the board to other algorithms too.
Dylan Kerler
- 273
- 2
- 9
6
votes
3 answers
What exactly are partially observable environments?
I have trouble understanding the meaning of partially observable environments. Here's my doubt.
According to what I understand, the state of the environment is what precisely determines the next state and reward for any particular action taken. So,…
CHANDRASEKHAR HETHA HAVYA
- 63
- 1
- 5
6
votes
1 answer
Reward interpolation between MDPs. Will an optimal policy on both ends stay optimal inside the interval?
Say I've got two Markov Decision Processes (MDPs):
$$\mathcal{M_0} = (\mathcal{S}, \mathcal{A}, P, R_0),\quad\text{and}\quad\mathcal{M}_1 = (\mathcal{S}, \mathcal{A}, P, R_1)$$
Both have the same set of states and actions, and the transition…
Kostya
- 2,515
- 10
- 24
6
votes
1 answer
If $\gamma \in (0,1)$, what is the on-policy state distribution for episodic tasks?
In Reinforcement Learning: An Introduction, section 9.2 (page 199), Sutton and Barto describe the on-policy distribution in episodic tasks, with $\gamma =1$, as being
\begin{equation}
\mu(s) = \frac{\eta(s)}{\sum_{k \in S}…
Felipe Costa
- 103
- 5
6
votes
0 answers
Are generative models actually used in practice for industrial drug design?
I just finished reading this paper MoFlow: An Invertible Flow Model for Generating Molecular Graphs.
The paper, which is about generating molecular graphs with certain chemical properties improved the SOTA at the time of writing by a bit and used a…
Adriaan
- 61
- 2
6
votes
2 answers
What does "semantic gap" mean?
I was reading DT-LET: Deep transfer learning by exploring where to transfer, and it contains the following:
It should be noted direct use of labeled source domain data on a new scene of target domain would result in poor performance due to the…
Kais Hasan
- 361
- 2
- 12
6
votes
1 answer
How does the Alpha Zero's move encoding work?
I am a beginner in AI. I'm trying to train a multi-agent RL algorithm to play chess. One issue that I ran into was representing the action space (legal moves/or honestly just moves in general) numerically. I looked up how Alpha Zero represented it,…
Akshay Ghosh
- 105
- 5
6
votes
2 answers
Has any schema-agnostic database engine been implemented?
Has any schema-agnostic database engine been implemented?
Leo
- 111
- 6
6
votes
2 answers
Are there RL algorithms that also try to predict the next state?
So far I've developed simple RL algorithms, like Deep Q-Learning and Double Deep Q-Learning. Also, I read a bit about A3C and policy gradient but superficially.
If I remember correctly, all these algorithms focus on the value of the action and try…
Ram Rachum
- 261
- 1
- 9
6
votes
1 answer
How does mating take place in NEAT?
In the Evolving Neural Networks through Augmenting Topologies (NEAT) paper it says (p. 110):
The entire population is then replaced by the offspring of the remaining organisms in each species.
But how does it take place? Are they paired and then…
Miemels
- 389
- 2
- 11
6
votes
1 answer
How to deal with losses on different scales in multi-task learning?
Say I'm training a model for multiple tasks by trying to minimize sum of losses $L_1 + L_2$ via gradient descent.
If these losses are on a different scale, the one whose range is greater will dominate the optimization. I'm currently trying to fix…
SpiderRico
- 990
- 9
- 18
6
votes
1 answer
It is possible to use deep learning to give approximate solutions to NP-hard graph theory problems?
It is possible to use deep learning to give approximate solutions to NP-hard graph theory problems?
If we take, for example, the travelling salesman problem (or the dominating set problem). Let's say I have a bunch of smaller examples, where I…
Jake B.
- 181
- 1
6
votes
1 answer
What should we do when the selection step selects a terminal state?
In Monte Carlo tree search, what should we do when the selection step selects a terminal state (i.e. a won or lost state), which is, by definition, a leaf node? Expansion and simulation is not in order, as it's game over, but does the tree…
degski
- 163
- 6
6
votes
2 answers
What exactly are the differences between semantic and lexical-semantic networks?
What exactly are the differences between semantic and lexical-semantic networks?
idontknowwhoiamgodhelpme
- 161
- 1