Most Popular

1500 questions
7
votes
2 answers

How are the reward functions $R(s)$, $R(s, a)$ and $R(s, a, s')$ equivalent?

In this video, the lecturer states that $R(s)$, $R(s, a)$ and $R(s, a, s')$ are equivalent representations of the reward function. Intuitively, this is the case, according to the same lecturer, because $s$ can be made to represent the state and the…
nbro
  • 40,472
  • 12
  • 105
  • 192
7
votes
2 answers

What's the role of bounding boxes in object detection?

I'm quite new to the field of computer vision and was wondering what are the purposes of having the boundary boxes in object detection. Obviously, it shows where the detected object is, and using a classifier can only classify one object per image,…
Cody Chung
  • 173
  • 5
6
votes
2 answers

Can neural networks learn to ignore an input datum?

Disclaimer: I'm not a student in computer science and most of my knowledge about ML/NN comes from YouTube, so please bear with me! Let's say we have a classification neural network, that takes some input data $w, x, y, z$, and has some number of…
czz1850
  • 61
  • 1
6
votes
2 answers

How can the importance sampling ratio be different than zero when the target policy is deterministic?

In the book Reinforcement Learning: An Introduction (2nd edition) Sutton and Barto define at page 104 (p. 126 of the pdf), equation (5.3), the importance sampling ratio, $\rho _{t:T-1}$, as follows: $$\rho…
F.M.F.
  • 311
  • 3
  • 7
6
votes
1 answer

Would machine learning be suitable for finding the seed of a random number generator?

I'm new to machine learning, and AI in general (but with 20+ years for programming). I'm wondering if machine learning is a good general approach to find the seed of a random number generator. Suppose I have a list of 2000 numbers. Is there a…
Eden
  • 163
  • 4
6
votes
3 answers

How to train a logical XOR with reinforcement learning?

After reading an excellent BLOG post Deep Reinforcement Learning: Pong from Pixels and playing with the code a little, I've tried to do something simple: use the same code to train a logical XOR gate. But no matter how I've tuned hyperparameters,…
Dimagog
  • 119
  • 4
6
votes
3 answers

What is a high dimensional state in reinforcement learning?

In the DQN paper, it is written that the state-space is high dimensional. I am a little bit confused about this terminology. Suppose my state is a high dimensional vector of length $N$, where $N$ is a huge number. Let's say I solve this task using…
Siddhant Tandon
  • 163
  • 1
  • 5
6
votes
1 answer

How should we choose the dimensions of the encoding layer in auto-encoders?

How should we choose the dimensions of the encoding layer in auto-encoders?
Neha soni
  • 101
  • 3
6
votes
2 answers

What is the difference between imitation learning and classification done by experts?

In short, imitation learning means learning from the experts. Suppose I have a dataset with labels based on the actions of experts. I use a simple binary classifier algorithm to assess whether it is good expert action or bad expert action. How is…
user781486
  • 201
  • 2
  • 5
6
votes
1 answer

What does it mean to do multi-dimensional processing with tensors in tensor cores?

In some tweets about NeurIPS 2018, this video from NVIDIA appeared. At around 0.37, she says: If you think about the current computations in our deep learning systems, they are all based on Linear Algebra. Can we come up with better paradigms to do…
wrong_path
  • 161
  • 6
6
votes
5 answers

Why can't the XOR linear inseparability problem be solved with one perceptron like this?

Consider a perceptron where $w_0=1$ and $w_1=1$: Now, suppose that we use the following activation function \begin{align} f(x)= \begin{cases} 1, \text{ if }x =1\\ 0, \text{ otherwise} \end{cases} \end{align} The output is then summarised…
rahs
  • 163
  • 4
6
votes
4 answers

Can Machine Learning be applied to decipher the script of lost ancient languages?

Can Machine Learning be applied to decipher the script of lost ancient languages (namely, languages that were being used many years ago, but currently are not used in human societies and have been forgotten, e.g. Avestan language)? If yes, is there…
Questioner
  • 293
  • 1
  • 10
6
votes
3 answers

How to deal with episode termination in Advantage Actor-Critic algorithm?

Advantage Actor-Critic algorithm may use the following expression to get 1-step estimate of the advantage: $ A(s_t,a_t) = r(s_t, a_t) + \gamma V(s_{t+1}) (1 - done_{t+1}) - V(s_t) $ where $done_{t+1}=1$ if $s_{t+1}$ is a terminal state (end of the…
6
votes
1 answer

How is iterative deepening A* better than A*?

The iterative deepening A* search is an algorithm that can find the shortest path between a designated start node and any member of a set of goals. The A* algorithm evaluates nodes by combining the cost to reach the node and the cost to get from…
Huma Qaseem
  • 189
  • 1
  • 3
  • 12
6
votes
2 answers

What are the differences in scope between statistical AI and classical AI?

What are the differences in scope between statistical AI and classical AI? Real-world examples would be appreciated.
dua fatima
  • 323
  • 1
  • 3
  • 10