Most Popular
1500 questions
11
votes
4 answers
How can I build a self-attention model with tf.keras.layers.Attention?
I have completed an easy many-to-one LSTM model as following.
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import…
Ian
- 111
- 1
- 1
- 3
11
votes
1 answer
t-SNE Python implementation: Kullback-Leibler divergence
t-SNE, as in [1], works by progressively reducing the Kullback-Leibler (KL) divergence, until a certain condition is met.
The creators of t-SNE suggests to use KL divergence as a performance criterion for the visualizations:
you can compare the…
joker
- 113
- 1
- 6
11
votes
3 answers
For imbalanced classification, should the validation dataset be balanced?
I am building a binary classification model for imbalanced data (e.g., 90% Pos class vs 10% Neg Class).
I already balanced my training dataset to reflect a a 50/50 class split, while my holdout (training dataset) was kept similar to the original…
thereandhere1
- 735
- 1
- 8
- 22
11
votes
4 answers
Working with HPC clusters
In my university, we have an HPC computing cluster. I use the cluster to train classifiers and so on. So, usually, to send a job to the cluster, (e.g. python scikit-learn script), I need to write a Bash script that contains (among others) a command…
Jack Twain
- 719
- 1
- 5
- 7
11
votes
1 answer
What is query id ("qid") in XGBoost
In XGBoost documentation it's said that for ranking applications we can specify query group ID's qid in the training dataset as in the following snippet:
1 qid:1 101:1.2 102:0.03
0 qid:1 1:2.1 10001:300 10002:400
0 qid:2 0:1.3 1:0.3
1 qid:2 0:0.01…
Konstantin
- 153
- 1
- 9
11
votes
2 answers
Why does the transformer positional encoding use both sine and cosine?
In the transformer architecture they use positional encoding (explained in this answer and I get how it is constructed.
I am wondering why it needs to use both sine and cosine though instead of just one or the other?
Joff
- 243
- 2
- 6
11
votes
3 answers
Build a binary classifier with only positive and unlabeled data
I have 2 datasets, one with positive instances of what I would like to detect, and one with unlabeled instances. What methods can I use ?
As an example, suppose we want to understand detect spam email based on a few structured email characteristics.…
nassimhddd
- 587
- 4
- 12
11
votes
4 answers
Books on Reinforcement Learning
I have been trying to understand reinforcement learning for quite sometime, but somehow I am not able to visualize how to write a program for reinforcement learning to solve a grid world problem. Can you suggest me some text books which would help…
girl101
- 1,161
- 2
- 11
- 26
11
votes
1 answer
Fisher Scoring v/s Coordinate Descent for MLE in R
R base function glm() uses Fishers Scoring for MLE, while the glmnet appears to use the coordinate descent method to solve the same equation. Coordinate descent is more time-efficient than Fisher Scoring, as Fisher Scoring calculates the second…
gol
- 111
- 2
11
votes
2 answers
Cooperative Reinforcement Learning
I already have a functioning $Q(\lambda)$ implementation for a single agent working on a dynamic pricing problem with the goal of maximizing revenue. The problem that I'm working with, however, involves several different products that are…
user3704120
- 231
- 1
- 3
11
votes
6 answers
When should I NOT scale features
Feature scaling can be crucially necessary when using distance-, variance- or gradient-based methods (KNN, PCA, neural networks...), because depending on the case, it can improve the quality of results or the computational effort.
In some cases…
Romain Reboulleau
- 1,327
- 7
- 26
11
votes
4 answers
How to avoid overfitting in random forest?
I want to avoid overfitting in random forest. In this regard, I intend to use mtry, nodesize, and maxnodes etc. Could you please help me choose values for these parameters? I am using R.
Also, if possible, please tell me how I can use k-fold cross…
Arun
- 717
- 3
- 10
- 27
11
votes
2 answers
what is BIO Tags for creating custom NER Named entity recognization?
I would like to create custom Named Entity Recognition (NER), but I am confused about what BIO Tags are. Could anyone please explain the steps for creating NER and about this B, I, O tag.
star
- 1,471
- 7
- 19
- 29
11
votes
3 answers
Is there a difference between on-line learning, incremental learning and sequential learning?
What I mean is the following: Instead of processing all the training data at once and calculating a model, we process one data point at a time and update the model directly afterwards.
I have seen the terms "on-line (or online) learning" and…
Suzana
- 211
- 2
- 9
11
votes
2 answers
Oversampling/Undersampling only train set only or both train and validation set
I am working on a dataset with class imbalance problem. Now, I know one needs to oversample or undersample only the train set and not the test set. But my issue is: whether to oversample the train set and then split it to train and validate set or…
yamini goel
- 731
- 3
- 7
- 14