Highest Voted Questions - Artificial Intelligence Stack Exchange

6

votes

1 answer

What is the probability of selecting the greedy action in a 0.5-greedy selection method for the 2-armed bandit problem?

I'm new to reinforcement learning and I'm going through Sutton and Barto. Exercise 2.1 states the following: In $\varepsilon$-greedy action selection, for the case of two actions and $\varepsilon=0.5$, what is the probability that the greedy action…

asked May 23 '21 at 20:48

Daviiid

575
5
15

6

votes

1 answer

What is the effect of parallel environments in reinforcement learning?

Do parallel environments improve the agent's ability to learn or does it not really make a difference? Specifically, I am using PPO, but I think this applies across the board to other algorithms too.

asked May 23 '21 at 12:18

Dylan Kerler

273
2
9

6

votes

3 answers

What exactly are partially observable environments?

I have trouble understanding the meaning of partially observable environments. Here's my doubt. According to what I understand, the state of the environment is what precisely determines the next state and reward for any particular action taken. So,…

asked May 22 '21 at 07:39

CHANDRASEKHAR HETHA HAVYA

63
1
5

6

votes

1 answer

Reward interpolation between MDPs. Will an optimal policy on both ends stay optimal inside the interval?

Say I've got two Markov Decision Processes (MDPs): $$\mathcal{M_0} = (\mathcal{S}, \mathcal{A}, P, R_0),\quad\text{and}\quad\mathcal{M}_1 = (\mathcal{S}, \mathcal{A}, P, R_1)$$ Both have the same set of states and actions, and the transition…

asked May 21 '21 at 22:32

Kostya

2,515
10
24

6

votes

1 answer

If $\gamma \in (0,1)$, what is the on-policy state distribution for episodic tasks?

In Reinforcement Learning: An Introduction, section 9.2 (page 199), Sutton and Barto describe the on-policy distribution in episodic tasks, with $\gamma =1$, as being \begin{equation} \mu(s) = \frac{\eta(s)}{\sum_{k \in S}…

asked May 13 '21 at 22:22

Felipe Costa

103
5

6

votes

0 answers

Are generative models actually used in practice for industrial drug design?

I just finished reading this paper MoFlow: An Invertible Flow Model for Generating Molecular Graphs. The paper, which is about generating molecular graphs with certain chemical properties improved the SOTA at the time of writing by a bit and used a…

asked May 09 '21 at 16:39

Adriaan

61
2

6

votes

2 answers

What does "semantic gap" mean?

I was reading DT-LET: Deep transfer learning by exploring where to transfer, and it contains the following: It should be noted direct use of labeled source domain data on a new scene of target domain would result in poor performance due to the…

asked Apr 17 '21 at 10:57

Kais Hasan

361
2
12

6

votes

1 answer

How does the Alpha Zero's move encoding work?

I am a beginner in AI. I'm trying to train a multi-agent RL algorithm to play chess. One issue that I ran into was representing the action space (legal moves/or honestly just moves in general) numerically. I looked up how Alpha Zero represented it,…

asked Apr 14 '21 at 17:57

Akshay Ghosh

105
5

6

votes

2 answers

Has any schema-agnostic database engine been implemented?

asked Jan 23 '17 at 09:22

Leo

111
6

6

votes

2 answers

Are there RL algorithms that also try to predict the next state?

So far I've developed simple RL algorithms, like Deep Q-Learning and Double Deep Q-Learning. Also, I read a bit about A3C and policy gradient but superficially. If I remember correctly, all these algorithms focus on the value of the action and try…

asked Apr 01 '21 at 20:40

Ram Rachum

261
1
9

6

votes

1 answer

How does mating take place in NEAT?

In the Evolving Neural Networks through Augmenting Topologies (NEAT) paper it says (p. 110): The entire population is then replaced by the offspring of the remaining organisms in each species. But how does it take place? Are they paired and then…

asked Jan 18 '17 at 17:38

Miemels

389
2
11

6

votes

1 answer

How to deal with losses on different scales in multi-task learning?

Say I'm training a model for multiple tasks by trying to minimize sum of losses $L_1 + L_2$ via gradient descent. If these losses are on a different scale, the one whose range is greater will dominate the optimization. I'm currently trying to fix…

asked Mar 18 '21 at 01:28

SpiderRico

990
9
18

6

votes

1 answer

It is possible to use deep learning to give approximate solutions to NP-hard graph theory problems?

It is possible to use deep learning to give approximate solutions to NP-hard graph theory problems? If we take, for example, the travelling salesman problem (or the dominating set problem). Let's say I have a bunch of smaller examples, where I…

asked Mar 10 '21 at 20:58

Jake B.

181
1

6

votes

1 answer

What should we do when the selection step selects a terminal state?

In Monte Carlo tree search, what should we do when the selection step selects a terminal state (i.e. a won or lost state), which is, by definition, a leaf node? Expansion and simulation is not in order, as it's game over, but does the tree…

monte-carlo-tree-search

asked Jan 11 '17 at 09:44

degski

163
6

6

votes

2 answers

What exactly are the differences between semantic and lexical-semantic networks?

asked Jan 10 '17 at 13:15

idontknowwhoiamgodhelpme

161
1

Most Popular