Most Popular
1500 questions
7
votes
1 answer
What happens when you select actions using softmax instead of epsilon greedy in DQN?
I understand the two major branches of RL are Q-Learning and Policy Gradient methods.
From my understanding (correct me if I'm wrong), policy gradient methods have an inherent exploration built-in as it selects actions using a probability…
Linsu Han
- 73
- 1
- 4
7
votes
1 answer
How to measure sample efficiency of a reinforcement learning algorithm?
I want to know if there is any metric to use for measuring sample-efficiency of a reinforcement learning algorithm? From reading research papers, I see claims that proposed models are more sample efficient but how does one reach this conclusion when…
rert588
- 320
- 1
- 7
7
votes
2 answers
How to resolve lexical ambiguity in natural language processing?
I'm interested in implementing a program for natural language processing (aka ELIZA).
Assuming that I'm already storing semantic-lexical connections between the words and its strength.
What are the methods of dealing with words which have very…
kenorb
- 10,483
- 3
- 44
- 94
7
votes
2 answers
Is there any difference between reward and return in reinforcement learning?
I am reading Sutton and Barto's book on reinforcement learning. I thought that reward and return were the same things.
However, in Section 5.6 of the book, 3rd line, first paragraph, it is written:
Whereas in Chapter 2 we averaged rewards, in…
SJa
- 393
- 3
- 16
7
votes
2 answers
Is there any good reference for double deep Q-learning?
I am new in reinforcement learning, but I already know deep Q-learning and Q-learning. Now, I want to learn about double deep Q-learning.
Do you know any good references for double deep Q-learning?
I have read some articles, but some of them don't…
dato nefaridze
- 872
- 8
- 20
7
votes
1 answer
What is the most abstract concept learned by a deep neural network?
It seems that deep neural networks are making improvements largely because as we add nodes and connections, they are able to put together more and more abstract concepts. We know that, starting from pixels, they start to recognize high level objects…
alwaysLearningABC
- 271
- 1
- 5
7
votes
2 answers
Does the "lowest layer" refer to the first or last layer of the neural network?
People sometimes use 1st layer, 2nd layer to refer to a specific layer in a neural net. Is the layer immediately follows the input layer called 1st layer?
How about the lowest layer and highest layer?
Piete3r
- 227
- 2
- 5
7
votes
2 answers
Why AlphaGo didn't use Deep Q-Learning?
In the previous research, in 2015, Deep Q-Learning shows its great performance on single player Atari Games. But why do AlphaGo's researchers use CNN + MCTS instead of Deep Q-Learning? is that because Deep Q-Learning somehow is not suitable for Go?
malioboro
- 2,819
- 3
- 21
- 47
7
votes
2 answers
How would "wisdom" be defined in AI?
For years, I have been dealing with (and teaching) Knowledge Representation and Knowledge Representation languages. I just discovered that in another community (Information Systems and the such) there is something called the "DIKW pyramid" where…
yannis
- 171
- 2
7
votes
3 answers
What is the target Q-value in DQNs?
I understand that in DQNs, the loss is measured by taking the MSE of outputted Q-values and target Q-values.
What does the target Q-values represent? And how is it obtained/calculated by the DQN?
BG10
- 123
- 2
- 7
7
votes
1 answer
Why do we update all layers simultaneously while training a neural network?
Very deep models involve the composition of several functions or layers. The gradient tells how to update each parameter, under the assumption that the other layers do not change. In practice, we update all of the layers simultaneously.
The above…
stoic-santiago
- 1,141
- 8
- 19
7
votes
2 answers
Why does TensorFlow docs discourage using softmax as activation for the last layer?
The beginner colab example for tensorflow states:
Note: It is possible to bake this tf.nn.softmax in as the activation function for the last layer of the network. While this can make the model output more directly interpretable, this approach is…
galah92
- 173
- 5
7
votes
2 answers
How can I handle overfitting in reinforcement learning problems?
So this is my current result (loss and score per episode) of my RL model in a simple two players game:
I use DQN with CNN as a policy and target networks. I train my model using Adam optimizer and calculate the loss using Smooth L1 Loss.
In a…
malioboro
- 2,819
- 3
- 21
- 47
7
votes
2 answers
Why is creating an AI that can code a hard task?
For people who have experience in the field, why is creating AI that has the ability to write programs (that are syntactically correct and useful) a hard task?
What are the barriers/problems we have to solve before we can solve this problem? If you…
Landon G
- 500
- 2
- 10
7
votes
2 answers
Aren't all discrete convolutions (not just 2D) linear transforms?
The image above, a screenshot from this article, describes discrete 2D convolutions as linear transforms. The idea used, as far as I understand, is to represent the 2 dimensional $n$x$n$ input grid as a vector of $n^2$ length, and the $m$x$m$…
stoic-santiago
- 1,141
- 8
- 19