Highest Voted Questions - Artificial Intelligence Stack Exchange

8

votes

1 answer

What's the advantage of log_softmax over softmax?

Previously I have learned that the softmax as the output layer coupled with the log-likelihood cost function (the same as the the nll_loss in pytorch) can solve the learning slowdown problem. However, while I am learning the pytorch mnist tutorial,…

asked Apr 30 '19 at 15:36

user1024

181
2

8

votes

1 answer

How is the policy gradient calculated in REINFORCE?

Reading Sutton and Barto, I see the following in describing policy gradients: How is the gradient calculated with respect to an action (taken at time t)? I've read implementations of the algorithm, but conceptually I'm not sure I understand how the…

asked Apr 21 '19 at 19:23

Hanzy

519
3
11

8

votes

2 answers

How can alpha zero learn if the tree search stops and restarts before finishing a game?

I am trying to understand how alpha zero works, but there is one point that I have problems understanding, even after reading several different explanations. As I understand it (see for example…

asked Apr 12 '19 at 11:42

Jonathan Lindgren

183
3

8

votes

2 answers

Can DQN perform better than Double DQN?

I'm training both DQN and double DQN in the same environment, but DQN performs significantly better than double DQN. As I've seen in the double DQN paper, double DQN should perform better than DQN. Am I doing something wrong or is it possible?

asked Apr 08 '19 at 09:08

Angelo

211
2
16

8

votes

2 answers

Is reinforcement learning using shallow neural networks still deep reinforcement learning?

Often times I see the term deep reinforcement learning to refer to RL algorithms that use neural networks, regardless of whether or not the networks are deep. For example, PPO is often considered a deep RL algorithm, but using a deep network is not…

asked Mar 30 '19 at 05:31

yewang

361
2
7

8

votes

1 answer

Are there existing examples of using neural networks for static code analysis?

Background Context: In the past I've heavily applied various "code quality metrics" to statically analyze code to provide an inkling of how "maintainable" it is and using things like the Maintainability Index alluded to here. However, a problem that…

asked Mar 24 '19 at 19:37

PhD

181
2

8

votes

1 answer

Which unsupervised learning technique can be used for anomaly detection in a time series?

I've started working on anomaly detection in Python. My dataset is a time series one. The data is being collected by some sensors which record and collect data on semiconductor-making machines. My dataset looks like this: ContextID Time_ms…

asked Mar 22 '19 at 10:45

some_programmer

225
1
4

8

votes

1 answer

What are the main benefits of using Bayesian networks?

I have some trouble understanding the benefits of Bayesian networks. Am I correct that the key benefit of the network is that one does not need to use the chain rule of probability in order to calculate joint distributions? So, using the chain…

asked Feb 18 '19 at 11:53

Sebastian Dine

181
1

8

votes

1 answer

Why isn't the ElliotSig activation function widely used?

The Softsign (a.k.a. ElliotSig) activation function is really simple: $$ f(x) = \frac{x}{1+|x|} $$ It is bounded $[-1,1]$, has a first derivative, it is monotonic, and it is computationally extremely simple (easy for, e.g., a GPU). Why it is not…

asked Feb 05 '19 at 11:34

Pietro

183
1
8

8

votes

2 answers

What is the difference between search and planning?

I'm reading the book Artificial Intelligence: A Modern Approach (by Stuart Russell and Peter Norvig). However, I don't understand the difference between search and planning. I was more confused when I saw that some search problems can be determined…

asked Jan 25 '19 at 10:34

theantomc

263
2
9

8

votes

1 answer

Suitable reward function for trading buy and sell orders

I am working to build a deep reinforcement learning agent which can place orders (i.e. limit buy and limit sell orders). The actions are {"Buy": 0 , "Do Nothing": 1, "Sell": 2}. Suppose that all the features are well suited for this task. I wanted…

asked Jan 20 '19 at 00:44

fgauth

189
1
4

8

votes

2 answers

Why are lambda returns so rarely used in policy gradients?

I've seen the Monte Carlo return $G_{t}$ being used in REINFORCE and the TD($0$) target $r_t + \gamma Q(s', a')$ in vanilla actor-critic. However, I've never seen someone use the lambda return $G^{\lambda}_{t}$ in these situations, nor in any other…

asked Jan 17 '19 at 19:27

jhinGhin

83
3

7

votes

1 answer

How to recognise metaphors in texts using NLP/NLU?

What are the current NLP/NLU techniques that can extract metaphors from texts? For example His words cut deeper than a knife. Or a simpler form like: Life is a journey that must be travelled no matter how bad the roads and accommodations.

asked Jan 14 '19 at 11:59

Younes Ch

73
1
4

7

votes

1 answer

Why do layered neural nets struggle with continous data?

In this article here, the writer claims that a new type of neural net is required to deal with data that is both continuous, and also sparsely sampled. It was my understanding that this was the entire purpose of techniques that use neural nets, to…

neural-networks

asked Dec 14 '18 at 15:08

Dylan

171
4

7

votes

1 answer

How does Hearthstone AI deal with random events

I want to learn a lot about the AI of CCG, such as Hearthstone. And now I have known one of the main algorithms that used in this kind of games, MCTS. It analyses the most promising moves, and expands the search tree based on random sampling of the…

asked Dec 11 '18 at 16:43

zen

73
2

Most Popular