Highest Voted Questions - Data Science Stack Exchange

11

votes

4 answers

How can I build a self-attention model with tf.keras.layers.Attention?

I have completed an easy many-to-one LSTM model as following. from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense from tensorflow.keras.layers import LSTM from tensorflow.keras.layers import…

asked Jun 22 '20 at 09:27

Ian

111
1
1
3

11

votes

1 answer

t-SNE Python implementation: Kullback-Leibler divergence

t-SNE, as in [1], works by progressively reducing the Kullback-Leibler (KL) divergence, until a certain condition is met. The creators of t-SNE suggests to use KL divergence as a performance criterion for the visualizations: you can compare the…

asked Jul 17 '14 at 10:04

joker

113
1
6

11

votes

3 answers

For imbalanced classification, should the validation dataset be balanced?

I am building a binary classification model for imbalanced data (e.g., 90% Pos class vs 10% Neg Class). I already balanced my training dataset to reflect a a 50/50 class split, while my holdout (training dataset) was kept similar to the original…

asked Jun 15 '20 at 18:39

thereandhere1

735
1
8
22

11

votes

4 answers

Working with HPC clusters

In my university, we have an HPC computing cluster. I use the cluster to train classifiers and so on. So, usually, to send a job to the cluster, (e.g. python scikit-learn script), I need to write a Bash script that contains (among others) a command…

asked Jul 08 '14 at 13:45

Jack Twain

719
1
5
7

11

votes

1 answer

What is query id ("qid") in XGBoost

In XGBoost documentation it's said that for ranking applications we can specify query group ID's qid in the training dataset as in the following snippet: 1 qid:1 101:1.2 102:0.03 0 qid:1 1:2.1 10001:300 10002:400 0 qid:2 0:1.3 1:0.3 1 qid:2 0:0.01…

asked Mar 11 '20 at 17:40

Konstantin

153
1
9

11

votes

2 answers

Why does the transformer positional encoding use both sine and cosine?

In the transformer architecture they use positional encoding (explained in this answer and I get how it is constructed. I am wondering why it needs to use both sine and cosine though instead of just one or the other?

asked Feb 23 '20 at 12:54

Joff

243
2
6

11

votes

3 answers

Build a binary classifier with only positive and unlabeled data

I have 2 datasets, one with positive instances of what I would like to detect, and one with unlabeled instances. What methods can I use ? As an example, suppose we want to understand detect spam email based on a few structured email characteristics.…

asked Jul 07 '14 at 09:34

nassimhddd

587
4
12

11

votes

4 answers

Books on Reinforcement Learning

I have been trying to understand reinforcement learning for quite sometime, but somehow I am not able to visualize how to write a program for reinforcement learning to solve a grid world problem. Can you suggest me some text books which would help…

asked Aug 05 '15 at 05:58

girl101

1,161
2
11
26

11

votes

1 answer

Fisher Scoring v/s Coordinate Descent for MLE in R

R base function glm() uses Fishers Scoring for MLE, while the glmnet appears to use the coordinate descent method to solve the same equation. Coordinate descent is more time-efficient than Fisher Scoring, as Fisher Scoring calculates the second…

asked Jul 03 '14 at 17:11

gol

111
2

11

votes

2 answers

Cooperative Reinforcement Learning

I already have a functioning $Q(\lambda)$ implementation for a single agent working on a dynamic pricing problem with the goal of maximizing revenue. The problem that I'm working with, however, involves several different products that are…

asked Jul 11 '15 at 18:04

user3704120

231
1
3

11

votes

6 answers

When should I NOT scale features

Feature scaling can be crucially necessary when using distance-, variance- or gradient-based methods (KNN, PCA, neural networks...), because depending on the case, it can improve the quality of results or the computational effort. In some cases…

feature-scaling

asked Dec 05 '19 at 22:37

Romain Reboulleau

1,327
7
26

11

votes

4 answers

How to avoid overfitting in random forest?

I want to avoid overfitting in random forest. In this regard, I intend to use mtry, nodesize, and maxnodes etc. Could you please help me choose values for these parameters? I am using R. Also, if possible, please tell me how I can use k-fold cross…

asked Jul 07 '15 at 18:05

Arun

717
3
10
27

11

votes

2 answers

what is BIO Tags for creating custom NER Named entity recognization?

I would like to create custom Named Entity Recognition (NER), but I am confused about what BIO Tags are. Could anyone please explain the steps for creating NER and about this B, I, O tag.

asked Nov 19 '19 at 12:52

star

1,471
7
19
29

11

votes

3 answers

Is there a difference between on-line learning, incremental learning and sequential learning?

What I mean is the following: Instead of processing all the training data at once and calculating a model, we process one data point at a time and update the model directly afterwards. I have seen the terms "on-line (or online) learning" and…

asked Jun 23 '15 at 14:40

Suzana

211
2
9

11

votes

2 answers

Oversampling/Undersampling only train set only or both train and validation set

I am working on a dataset with class imbalance problem. Now, I know one needs to oversample or undersample only the train set and not the test set. But my issue is: whether to oversample the train set and then split it to train and validate set or…

asked Oct 17 '19 at 08:21

yamini goel

731
3
7
14

Most Popular