Highest Voted Questions - Artificial Intelligence Stack Exchange

7

votes

1 answer

What happens when you select actions using softmax instead of epsilon greedy in DQN?

I understand the two major branches of RL are Q-Learning and Policy Gradient methods. From my understanding (correct me if I'm wrong), policy gradient methods have an inherent exploration built-in as it selects actions using a probability…

asked Jun 23 '20 at 16:47

Linsu Han

73
1
4

7

votes

1 answer

How to measure sample efficiency of a reinforcement learning algorithm?

I want to know if there is any metric to use for measuring sample-efficiency of a reinforcement learning algorithm? From reading research papers, I see claims that proposed models are more sample efficient but how does one reach this conclusion when…

asked Jun 18 '20 at 09:29

rert588

320
1
7

7

votes

2 answers

How to resolve lexical ambiguity in natural language processing?

I'm interested in implementing a program for natural language processing (aka ELIZA). Assuming that I'm already storing semantic-lexical connections between the words and its strength. What are the methods of dealing with words which have very…

asked Aug 03 '16 at 14:17

kenorb

10,483
3
44
94

7

votes

2 answers

Is there any difference between reward and return in reinforcement learning?

I am reading Sutton and Barto's book on reinforcement learning. I thought that reward and return were the same things. However, in Section 5.6 of the book, 3rd line, first paragraph, it is written: Whereas in Chapter 2 we averaged rewards, in…

asked Jun 04 '20 at 03:35

SJa

393
3
16

7

votes

2 answers

Is there any good reference for double deep Q-learning?

I am new in reinforcement learning, but I already know deep Q-learning and Q-learning. Now, I want to learn about double deep Q-learning. Do you know any good references for double deep Q-learning? I have read some articles, but some of them don't…

asked May 28 '20 at 15:55

dato nefaridze

872
8
20

7

votes

1 answer

What is the most abstract concept learned by a deep neural network?

It seems that deep neural networks are making improvements largely because as we add nodes and connections, they are able to put together more and more abstract concepts. We know that, starting from pixels, they start to recognize high level objects…

deep-learning

asked Oct 10 '16 at 00:02

alwaysLearningABC

271
1
5

7

votes

2 answers

Does the "lowest layer" refer to the first or last layer of the neural network?

People sometimes use 1st layer, 2nd layer to refer to a specific layer in a neural net. Is the layer immediately follows the input layer called 1st layer? How about the lowest layer and highest layer?

asked Apr 27 '20 at 22:21

Piete3r

227
2
5

7

votes

2 answers

Why AlphaGo didn't use Deep Q-Learning?

In the previous research, in 2015, Deep Q-Learning shows its great performance on single player Atari Games. But why do AlphaGo's researchers use CNN + MCTS instead of Deep Q-Learning? is that because Deep Q-Learning somehow is not suitable for Go?

asked Apr 24 '20 at 01:56

malioboro

2,819
3
21
47

7

votes

2 answers

How would "wisdom" be defined in AI?

For years, I have been dealing with (and teaching) Knowledge Representation and Knowledge Representation languages. I just discovered that in another community (Information Systems and the such) there is something called the "DIKW pyramid" where…

asked Oct 01 '16 at 19:00

yannis

171
2

7

votes

3 answers

What is the target Q-value in DQNs?

I understand that in DQNs, the loss is measured by taking the MSE of outputted Q-values and target Q-values. What does the target Q-values represent? And how is it obtained/calculated by the DQN?

asked Apr 19 '20 at 03:25

BG10

123
2
7

7

votes

1 answer

Why do we update all layers simultaneously while training a neural network?

Very deep models involve the composition of several functions or layers. The gradient tells how to update each parameter, under the assumption that the other layers do not change. In practice, we update all of the layers simultaneously. The above…

asked Apr 16 '20 at 06:37

stoic-santiago

1,141
8
19

7

votes

2 answers

Why does TensorFlow docs discourage using softmax as activation for the last layer?

The beginner colab example for tensorflow states: Note: It is possible to bake this tf.nn.softmax in as the activation function for the last layer of the network. While this can make the model output more directly interpretable, this approach is…

asked Apr 13 '20 at 07:38

galah92

173
5

7

votes

2 answers

How can I handle overfitting in reinforcement learning problems?

So this is my current result (loss and score per episode) of my RL model in a simple two players game: I use DQN with CNN as a policy and target networks. I train my model using Adam optimizer and calculate the loss using Smooth L1 Loss. In a…

asked Apr 09 '20 at 15:52

malioboro

2,819
3
21
47

7

votes

2 answers

Why is creating an AI that can code a hard task?

For people who have experience in the field, why is creating AI that has the ability to write programs (that are syntactically correct and useful) a hard task? What are the barriers/problems we have to solve before we can solve this problem? If you…

asked Apr 08 '20 at 03:39

Landon G

500
2
10

7

votes

2 answers

Aren't all discrete convolutions (not just 2D) linear transforms?

The image above, a screenshot from this article, describes discrete 2D convolutions as linear transforms. The idea used, as far as I understand, is to represent the 2 dimensional $n$x$n$ input grid as a vector of $n^2$ length, and the $m$x$m$…

asked Mar 30 '20 at 12:47

stoic-santiago

1,141
8
19

Most Popular