Most Popular

1500 questions
6
votes
1 answer

When to use the state value function $V(s)$ and when to use the state-action value function $Q(s, a)$?

I saw the difference between value function $V(s)$ and $Q(s, a)$. But when do I use each one? When I coded in Matlab I only used $Q(s, a)$ directly (as I was thinking of a tabular approach). So, when is more beneficial than the other? I have a large…
6
votes
2 answers

When to use Value Iteration vs. Policy Iteration

Both value iteration and policy iteration are General Policy Iteration (GPI) algorithms. However, they differ in the mechanics of their updates. Policy Iteration seeks to first find a completed value function for a policy, then derive the Q…
SeeDerekEngineer
  • 541
  • 4
  • 11
6
votes
3 answers

Perfect play in information incomplete games

As titled, is there such thing as perfect play (or at least "perfectly optimal") in a game with incomplete information? Or at least a proof as to show why there cannot? Naively (and seemingly obviously), the answer would be a resounding no, since…
k.c. sayz 'k.c sayz'
  • 2,091
  • 10
  • 26
6
votes
2 answers

Would a general-purpose AI need to collaborate?

Human beings are more productive in groups than individually, possibly due to the fact that there is a limit to how much one human brain can improve itself in terms of speed of computation and areas of expertise. By contrast, if a machine with…
user289661
  • 419
  • 3
  • 11
6
votes
4 answers

What does deep learning offer with respect to standard machine learning?

I've been reading a lot about DL. I can understand to an extent how it works, in theory at least, and how it's technically different from conventional ML. But what I'm looking for is more of a "conceptual" meaning. Let's say you're designing a…
6
votes
1 answer

Is there a theoretical maximum for intelligence?

From Artificial Intelligence: A Modern Approach, Third Edition, Chapter 26: Note that the concept of ultraintelligent machines assumes that intelligence is an especially important attribute, and if you have enough of it, all problems can be solved.…
Left SE On 10_6_19
  • 1,660
  • 9
  • 23
6
votes
2 answers

Can an AI be made to maintain a train of thought?

This mostly refers to human-like or chatbot AI, but could maybe be used in other applications (math or something?). Basically, it occurred to me, that when I'm thinking or speaking, there is a constant feedback loop, in which I am formulating which…
6
votes
1 answer

Proof that there always exists a dominating policy in an MDP

I think that it is common knowledge that for any infinite horizon discounted MDP $(S, A, P, r, \gamma)$, there always exists a dominating policy $\pi$, i.e. a policy $\pi$ such that for all policies $\pi'$: $$V_\pi (s) \geq V_{\pi'}(s) \quad…
MMM
  • 185
  • 3
6
votes
1 answer

Can a purely policy convolution neural network based game learn to play better than its opponents?

This question has come from my experiment of building a cnn based tic-tac-toe game that I'm using as a beginner machine learning project. The game works purely on policy networks, more specifically - During training, at the end of each game, it…
Achilles
  • 263
  • 1
  • 5
6
votes
1 answer

How does visual cortex share convolution weight

TL;DR If we buy into the idea visual cortex functions like a convolutional neural network, then there's a problem makes me scratch my head: how does brain force weight sharing as in convolutional network? Okay, explain more Obviously, there's no way…
Kh40tiK
  • 161
  • 3
6
votes
1 answer

Are there any advantages of the local attention against convolutions?

Transformer architectures, based on the self-attention mechanism, have achieved outstanding performance in a variety of applications. The main advantage of this approach is that the given token can interact with any token in the input sequence and…
6
votes
1 answer

Which tasks are called as downstream tasks?

The following paragraph is from page no 331 of the textbook Natural Language Processing by Jacob Eisenstein. It mentions about certain type of tasks called as downstream tasks. But, it provide no further examples or details regarding these…
hanugm
  • 3,820
  • 3
  • 24
  • 56
6
votes
1 answer

Why can't I reproduce the experiments in the original paper that introduced the Firefly Algorithm?

I have been trying to reproduce the experiments done in the original: Firefly Algorithm for multimodal optimization (2010) by Xin-She Yang, but so far unsuccessfully. For the moment being, I'm okay if anyone points me in the right direction. I wrote…
Jairo
  • 91
  • 1
6
votes
2 answers

Which machine learning algorithm could I use to break up a poem by lines?

I want to create a network to predict the break up of poetry lines. The program would receive as input an unbroken poem, and would output the poem broken into lines. For example, an unbroken poem could be And then the day came, when the risk to…
6
votes
2 answers

What are the best hyper-parameters to tune in reinforcement learning?

Obviously, this is somewhat subjective, but what hyper-parameters typically have the most significant impact on an RL agent's ability to learn? For example, the replay buffer size, learning rate, entropy coefficient, etc. For example, in "normal"…
Dylan Kerler
  • 273
  • 2
  • 9