Highest Voted Questions - Artificial Intelligence Stack Exchange

6

votes

1 answer

Why do feedforward neural networks require the inputs to be of a fixed size, while RNNs can process variable-size inputs?

Why does a vanilla feedforward neural network only accept a fixed input size, while RNNs are capable of taking a series of inputs with no predetermined limit on the size? Can anyone elaborate on this with an example?

asked Feb 04 '21 at 16:26

Daniel

63
3

6

votes

1 answer

How to improve the reward signal when the rewards are sparse?

In cases where the reward is delayed, this can negatively impact a models ability to do proper credit assignment. In the case of a sparse reward, are there ways in which this can be negated? In a chess example, there are certain moves that you can…

asked Feb 03 '21 at 18:17

tryingtolearn

385
1
2
10

6

votes

1 answer

What are the state space and the state transition function in AI?

I'm studying for my AI final exam, and I'm stuck in the state space representation. I understand initial and goal states, but what I don't understand is the state space and state transition function. Can someone explain what are they with…

asked Jan 06 '17 at 15:24

İsmail Uysal

63
1
4

6

votes

1 answer

What are the advantages of RL with actor-critic methods over actor-only methods?

In general, what are the advantages of RL with actor-critic methods over actor-only (or policy-based) methods? This is not a comparison with the Q-learning series, but probably a method of learning the game with only the actor. I think it's…

asked Jan 12 '21 at 22:29

ground clown

111
3

6

votes

1 answer

How to express a fully connected neural network succintly using linear algebra?

I'm currently reading the paper Federated Learning with Matched Averaging (2020), where the authors claim: A basic fully connected (FC) NN can be formulated as: $\hat{y} = \sigma(xW_1)W_2$ [...] Expanding the preceding expression $\hat{y} =…

asked Jan 05 '21 at 13:11

user1360448

83
6

6

votes

2 answers

How does AlphaZero's MCTS work when starting from the root node?

From the AlphaGo Zero paper, during MCTS, statistics for each new node are initialized as such: ${N(s_L, a) = 0, W (s_L, a) = 0, Q(s_L, a) = 0, P (s_L, a) = p_a}$. The PUCT algorithm for selecting the best child node is $a_t = argmax(Q(s,a) +…

asked Dec 30 '20 at 02:03

sb3

147
7

6

votes

2 answers

What is the Bellman Equation actually telling?

What does the Bellman equation actually say? And are there many flavours of that? I get a little confused when I look for the Bellman equation, because I feel like people are telling slightly different things about what it is. And I think the…

asked Dec 20 '20 at 21:49

Johnny

69
3

6

votes

1 answer

What techniques are used to make MDP discrete state space manageable?

Generating a discretized state space for an MDP (Markov Decision Process) model seems to suffer from the curse of dimensionality. Supposed my state has a few simple features: Feeling: Happy/Neutral/Sad Feeling: Hungry/Neither/Full Food left:…

asked Dec 22 '16 at 01:35

Brendan Hill

263
1
6

6

votes

1 answer

During neural network training, can gradients leak sensitive information in case training data fed is encrypted (homomorphic)?

Some algorithms in the literature allow recovering the input data used to train a neural network. This is done using the gradients (updates) of weights, such as in Deep Leakage from Gradients (2019) by Ligeng Zhu et al. In case the neural network is…

asked Dec 19 '20 at 22:03

witdev

73
4

6

votes

2 answers

Is there a proof to explain why XOR cannot be linearly separable?

Can someone explain to me with a proof or example why you can't linearly separate XOR (and therefore need a neural network, the context I'm looking at it in)? I understand why it's not linearly separable if you draw it graphically (e.g. here), but I…

asked Dec 16 '20 at 23:04

Slowat_Kela

297
3
9

6

votes

1 answer

What kind of algorithm is the Levenberg–Marquardt algorithm?

Is a Levenberg–Marquardt algorithm a type of back-propagation algorithm or is it a different category of algorithm? Wikipedia says that it is a curve fitting algorithm. How is a curve fitting algorithm relevant to a neural net?

asked Dec 21 '16 at 09:53

user3642

6

votes

0 answers

$\frac{P(x_1 \mid y, s = 1) \dots P(x_n \mid y, s = 1) P(y \mid s = 1)}{P(x \mid s = 1)}$ indicates that naive Bayes learners are global learners?

I am currently studying the paper Learning and Evaluating Classifiers under Sample Selection Bias by Bianca Zadrozny. In section 3. Learning under sample selection bias, the author says the following: We can separate classifier learners into two…

asked Dec 13 '20 at 22:15

The Pointer

569
3
19

6

votes

1 answer

How are continuous actions sampled (or generated) from the policy network in PPO?

I am trying to understand and reproduce the Proximal Policy Optimization (PPO) algorithm in detail. One thing that I find missing in the paper introducing the algorithm is how exactly actions $a_t$ are generated given the policy network…

asked Dec 12 '20 at 01:42

Daniel B.

815
1
5
14

6

votes

1 answer

How to graphically represent a RNN architecture implemented in Keras?

I'm trying to create a simple blogpost on RNNs, that should give a better insight into how they work in Keras. Let's say: model = keras.models.Sequential() model.add(keras.layers.SimpleRNN(5, return_sequences=True, input_shape=[None,…

asked Dec 08 '20 at 09:44

Mindaugas Bernatavičius

161
3

6

votes

1 answer

What is the cost function of a transformer?

The paper Attention Is All You Need describes the transformer architecture that has an encoder and a decoder. However, I wasn't clear on what the cost function to minimize is for such an architecture. Consider a translation task, for example, where…

asked Dec 07 '20 at 23:18

user3667125

1,570
6
13

Most Popular