Most Popular

1500 questions
6
votes
1 answer

Why do feedforward neural networks require the inputs to be of a fixed size, while RNNs can process variable-size inputs?

Why does a vanilla feedforward neural network only accept a fixed input size, while RNNs are capable of taking a series of inputs with no predetermined limit on the size? Can anyone elaborate on this with an example?
Daniel
  • 63
  • 3
6
votes
1 answer

How to improve the reward signal when the rewards are sparse?

In cases where the reward is delayed, this can negatively impact a models ability to do proper credit assignment. In the case of a sparse reward, are there ways in which this can be negated? In a chess example, there are certain moves that you can…
tryingtolearn
  • 385
  • 1
  • 2
  • 10
6
votes
1 answer

What are the state space and the state transition function in AI?

I'm studying for my AI final exam, and I'm stuck in the state space representation. I understand initial and goal states, but what I don't understand is the state space and state transition function. Can someone explain what are they with…
İsmail Uysal
  • 63
  • 1
  • 4
6
votes
1 answer

What are the advantages of RL with actor-critic methods over actor-only methods?

In general, what are the advantages of RL with actor-critic methods over actor-only (or policy-based) methods? This is not a comparison with the Q-learning series, but probably a method of learning the game with only the actor. I think it's…
ground clown
  • 111
  • 3
6
votes
1 answer

How to express a fully connected neural network succintly using linear algebra?

I'm currently reading the paper Federated Learning with Matched Averaging (2020), where the authors claim: A basic fully connected (FC) NN can be formulated as: $\hat{y} = \sigma(xW_1)W_2$ [...] Expanding the preceding expression $\hat{y} =…
6
votes
2 answers

How does AlphaZero's MCTS work when starting from the root node?

From the AlphaGo Zero paper, during MCTS, statistics for each new node are initialized as such: ${N(s_L, a) = 0, W (s_L, a) = 0, Q(s_L, a) = 0, P (s_L, a) = p_a}$. The PUCT algorithm for selecting the best child node is $a_t = argmax(Q(s,a) +…
sb3
  • 147
  • 7
6
votes
2 answers

What is the Bellman Equation actually telling?

What does the Bellman equation actually say? And are there many flavours of that? I get a little confused when I look for the Bellman equation, because I feel like people are telling slightly different things about what it is. And I think the…
Johnny
  • 69
  • 3
6
votes
1 answer

What techniques are used to make MDP discrete state space manageable?

Generating a discretized state space for an MDP (Markov Decision Process) model seems to suffer from the curse of dimensionality. Supposed my state has a few simple features: Feeling: Happy/Neutral/Sad Feeling: Hungry/Neither/Full Food left:…
Brendan Hill
  • 263
  • 1
  • 6
6
votes
1 answer

During neural network training, can gradients leak sensitive information in case training data fed is encrypted (homomorphic)?

Some algorithms in the literature allow recovering the input data used to train a neural network. This is done using the gradients (updates) of weights, such as in Deep Leakage from Gradients (2019) by Ligeng Zhu et al. In case the neural network is…
witdev
  • 73
  • 4
6
votes
2 answers

Is there a proof to explain why XOR cannot be linearly separable?

Can someone explain to me with a proof or example why you can't linearly separate XOR (and therefore need a neural network, the context I'm looking at it in)? I understand why it's not linearly separable if you draw it graphically (e.g. here), but I…
Slowat_Kela
  • 297
  • 3
  • 9
6
votes
1 answer

What kind of algorithm is the Levenberg–Marquardt algorithm?

Is a Levenberg–Marquardt algorithm a type of back-propagation algorithm or is it a different category of algorithm? Wikipedia says that it is a curve fitting algorithm. How is a curve fitting algorithm relevant to a neural net?
user3642
6
votes
0 answers

$\frac{P(x_1 \mid y, s = 1) \dots P(x_n \mid y, s = 1) P(y \mid s = 1)}{P(x \mid s = 1)}$ indicates that naive Bayes learners are global learners?

I am currently studying the paper Learning and Evaluating Classifiers under Sample Selection Bias by Bianca Zadrozny. In section 3. Learning under sample selection bias, the author says the following: We can separate classifier learners into two…
The Pointer
  • 569
  • 3
  • 19
6
votes
1 answer

How are continuous actions sampled (or generated) from the policy network in PPO?

I am trying to understand and reproduce the Proximal Policy Optimization (PPO) algorithm in detail. One thing that I find missing in the paper introducing the algorithm is how exactly actions $a_t$ are generated given the policy network…
Daniel B.
  • 815
  • 1
  • 5
  • 14
6
votes
1 answer

How to graphically represent a RNN architecture implemented in Keras?

I'm trying to create a simple blogpost on RNNs, that should give a better insight into how they work in Keras. Let's say: model = keras.models.Sequential() model.add(keras.layers.SimpleRNN(5, return_sequences=True, input_shape=[None,…
6
votes
1 answer

What is the cost function of a transformer?

The paper Attention Is All You Need describes the transformer architecture that has an encoder and a decoder. However, I wasn't clear on what the cost function to minimize is for such an architecture. Consider a translation task, for example, where…
user3667125
  • 1,570
  • 6
  • 13