Most Popular
1500 questions
6
votes
2 answers
Understanding the n-step off-policy SARSA update
In Sutton & Barto's book (2nd ed) page 149, there is the equation 7.11
I am having a hard time understanding this equation.
I would have thought that we should be moving $Q$ towards $G$, where $G$ would be corrected by importance sampling, but only…
Antoine Savine
- 163
- 4
6
votes
1 answer
How to detect LEGO bricks by using a deep learning approach?
In my thesis I dealt with the question how a computer can recognize LEGO bricks. With multiple object detection, I chose a deep learning approach. I also looked at an existing training set of LEGO brick images and tried to optimize it.
My…
melawiki
- 61
- 1
- 3
6
votes
1 answer
Can this tic tac toe program be considered AI?
I coded a tic tac toe program, but I don't know if I can call it artificial intelligence.
Here's what I did.
There is a random player, which always makes random valid moves.
And then there is the AI player, which will receive input before every…
Pablo Carrasco Hernández
- 163
- 1
- 7
6
votes
1 answer
When should we use algorithms like Adam as opposed to SGD?
As far as I know, Stochastic Gradient Descent is an optimization algorithm which belongs to the the category of algorithms where hyper-parameters have to be defined beforehand. They are useful in many cases, but there are some cases that the…
Utku
- 173
- 1
- 5
6
votes
1 answer
Why Q2 is a more or less independant estimate in Twin Delayed DDPG (TD3)?
Twin Delayed Deep Deterministic (TD3) policy gradient is inspired by both double Q-learning and double DQN. In double Q-learning, I understand that Q1 and Q2 are independent because they are trained on different samples. In double DQN, I understand…
Luke Guye
- 81
- 2
6
votes
2 answers
Reinforcement Learning with long term rewards and fixed states and actions
I have read a lot about RL algorithms, that update the action-value function at each step with the currently gained reward. The requirement here is, that the reward is obtained after each step.
I have a case, where I have three steps, that have to…
Jan
- 351
- 3
- 13
6
votes
1 answer
Reinforcement Learning with more actions than states
I have read a lot about RL recently. As far as I understood, most RL applications have much more states than there are actions to choose from.
I am thinking about using RL for a problem where I have got a lot of actions to choose from, but only very…
Jan
- 351
- 3
- 13
6
votes
4 answers
Which machine learning algorithm is used in self-driving cars?
Which deep neural network is used in Google's driverless cars to analyze the surroundings? Is this information open to the public?
kenorb
- 10,483
- 3
- 44
- 94
6
votes
1 answer
Why is a constant plane of ones added into the input features of AlphaGo?
In the paper Mastering the game of Go with deep neural networks and tree search, the input features of the networks of AlphaGo contains a plane of constant ones and a plane of constant zeros, as following.
Feature #of planes Description
Stone…
Yangcy
- 61
- 2
6
votes
1 answer
What is the relation between a policy which is the solution to a MDP and a policy like $\epsilon$-greedy?
In the context of reinforcement learning, a policy, $\pi$, is often defined as a function from the space of states, $\mathcal{S}$, to the space of actions, $\mathcal{A}$, that is, $\pi : \mathcal{S} \rightarrow \mathcal{A}$. This function is the…
nbro
- 40,472
- 12
- 105
- 192
6
votes
1 answer
Can TD($\lambda$) be used with deep reinforcement learning?
TD lambda is a way to interpolate between TD(0) - bootstrapping over a single step, and, TD(max), bootstrapping over the entire episode length, or, Monte Carlo.
Reading the link above, I see that an eligibility trace is kept for each state in order…
Gulzar
- 759
- 1
- 9
- 24
6
votes
2 answers
Rollout algorithm like Monte Carlo search suggest model based reinforcement learning?
From what I understand, Monte Carlo Tree Search Algorithm is a solution algorithm for model free reinforcement learning (RL).
Model free RL means agent doesnt know the transition and reward model. Thus for it to know which next state it will observe…
user21872
- 61
- 4
6
votes
3 answers
Are neural networks statistical models?
By reading the abstract of Neural Networks and Statistical Models paper it would seem that ANNs are statistical models.
In contrast Machine Learning is not just glorified Statistics.
I am looking for a more concise/summarized answer with focus on…
Leo Gallucci
- 216
- 2
- 8
6
votes
2 answers
What does learning mean?
Can someone explain what is the process of learning? What does it mean to learn something?
Jay Critch
- 343
- 1
- 7
5
votes
1 answer
Is it possible to combine two neural networks trained on different tasks into one that knows both tasks?
I'm relatively new to artificial intelligence and neural networks.
Let's say I have two different fully trained neural networks. The first one is trained for mathematical addition and the second one on mathematical multiplication. Now, I want to…
vP3nguin
- 153
- 4