Most Popular

1500 questions
8
votes
2 answers

How to deal with large (or NaN) neural network's weights?

My weights go from being between 0 and 1 at initialization to exploding into the tens of thousands in the next iteration. In the 3rd iteration, they become so large that only arrays of nan values are displayed. How can I go about fixing this? Is it…
8
votes
5 answers

Could curiosity improve artificial intelligence?

While thinking about AI, this question came into my mind. Could curiosity help in developing a true AI? According to this website (for testing creativity): Curiosity in this context refers to persistent desire to learn and discover new things and…
Eka
  • 1,066
  • 8
  • 24
8
votes
6 answers

What is the difference between artificial intelligence and robots?

What is the difference between artificial intelligence and robots?
Vishnu JK
  • 1,072
  • 1
  • 9
  • 21
8
votes
2 answers

What does the symbol $\mathbb E$ mean in these equations?

I came across some papers that use $\mathbb E$ in equations, in particular, this paper: https://arxiv.org/pdf/1511.06581.pdf. Here is some equations from the paper that uses it: $Q^\pi \left(s,a \right) = \mathbb E \left[R_t|s_t = s, a_t = a, \pi…
theonekeyg
  • 91
  • 1
  • 4
8
votes
1 answer

An intuitive explanation of Adagrad, its purpose and its formula

It (Adagrad) adapts the learning rate to the parameters, performing smaller updates (i.e. low learning rates) for parameters associated with frequently occurring features, and larger updates (i.e. high learning rates) for parameters associated…
DaddyMike
  • 123
  • 7
8
votes
3 answers

Does this argument refuting the existence of superintelligence work?

A superintelligence is a machine that can surpass all intellectual activities by any human, and such a machine is often portrayed in science fiction as a machine that brings mankind to an end. Any machine is executed using an algorithm. By the…
wythagoras
  • 1,521
  • 12
  • 28
8
votes
3 answers

Is there any research on the application of AI for drug design?

Is there any research on the application of AI for drug design? For example, you could train a deep learning model about current compounds, substances, structures, and their products and chemical reactions from the existing dataset (basically what…
kenorb
  • 10,483
  • 3
  • 44
  • 94
8
votes
2 answers

What is the appropriate way to deal with multiple paths to same state in MCTS?

Many games have multiple paths to the same states. What is the appropriate way to deal with this in MCTS? If the state appears once in the tree, but with multiple parents, then it seems to be difficult to define back propagation: do we only…
Jay McCarthy
  • 225
  • 1
  • 4
8
votes
1 answer

Does AlphaZero use Q-Learning?

I was reading the AlphaZero paper Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, and it seems they don't mention Q-Learning anywhere. So does AZ use Q-Learning on the results of self-play or just a Supervised…
Avetik
  • 115
  • 6
8
votes
1 answer

What is the difference between a stationary and a non-stationary policy?

In reinforcement learning, there are deterministic and non-deterministic (or stochastic) policies, but there are also stationary and non-stationary policies. What is the difference between a stationary and a non-stationary policy? How do you…
nbro
  • 40,472
  • 12
  • 105
  • 192
8
votes
1 answer

Can a single neural network handle recognizing two types of objects, or should it be split into two smaller networks?

In particular, an embedded computer (with limited resources) analyzes live video stream from a traffic camera, trying to pick good frames that contain license plate numbers of passing cars. Once a plate is located, the frame is handed over to an OCR…
SF.
  • 464
  • 3
  • 13
8
votes
1 answer

What is the purpose of the actor in actor-critic algorithms?

For discrete action spaces, what is the purpose of the actor in actor-critic algorithms? My current understanding is that the critic estimates the future reward given an action, so why not just take the action that maximizes the estimated…
David Rein
  • 93
  • 6
8
votes
3 answers

What is the difference between a stochastic and a deterministic policy?

In reinforcement learning, there are the concepts of stochastic (or probabilistic) and deterministic policies. What is the difference between them?
nbro
  • 40,472
  • 12
  • 105
  • 192
8
votes
2 answers

How do we define the reward function for an environment?

How do you actually decide what reward value to give for each action in a given state for an environment? Is this purely experimental and down to the programmer of the environment? So, is it a heuristic approach of simply trying different reward…
Hazzaldo
  • 279
  • 2
  • 9
8
votes
1 answer

Which machine learning models are universal function approximators?

The universal approximation theorem states that a feed-forward neural network with a single hidden layer containing a finite number of neurons can approximate any continuous function (provided some assumptions on the activation function are…
nbro
  • 40,472
  • 12
  • 105
  • 192