Most Popular

1500 questions
10
votes
5 answers

Time-series grouped cross-validation

I have data with the following structure: created_at | customer_id | features | target 2019-01-01 2 xxxxxxxx y 2019-01-02 3 xxxxxxxx y 2019-01-03 3 xxxxxxxx y ... That is, a session…
David Masip
  • 6,051
  • 2
  • 24
  • 61
10
votes
1 answer

How is GPT able to handle large vocabularies?

From what I understand, GPT and GPT-2 are trained to predict the $N^{th}$ word in a sentence given the previous $N-1$ words. When the vocabulary size is very large (100k+ words) how is it able to generate any meaningful prediction? Shouldn't it…
AAC
  • 509
  • 2
  • 5
  • 13
10
votes
4 answers

Can Boosted Trees predict below the minimum value of the training label?

I am using gradient Gradient Boosted Trees (with Catboost) for a Regression task. Can GBtrees predict a label that is below the minimum (or above the max) that was seen in the training ? For instance if the minimum value the label had is 10, would…
Yairh
  • 119
  • 1
  • 5
10
votes
2 answers

Is this Neo4j comparison to RDBMS execution time correct?

Background: Following is from the book Graph Databases, which covers a performance test mentioned in the book Neo4j in Action: Relationships in a graph naturally form paths. Querying, or traversing, the graph involves following paths. Because of…
blunders
  • 1,932
  • 2
  • 15
  • 19
10
votes
2 answers

What is a good interpretation of this 'learning curve' plot?

I read about the validation_curve and how interpret it to know if there are over-fitting or underfitting, but how can interpret the plot when the data is the error like this: The X-axis is "Nº of examples of training" Redline is train error Green…
Tlaloc-ES
  • 337
  • 1
  • 7
10
votes
5 answers

AttributeError: module 'tensorflow.python.keras.utils' has no attribute 'to_categorical'

I'm trying to run the code below in my Jupyter Notebook. I get: AttributeError: module 'tensorflow.python.keras.utils' has no attribute 'to_categorical' This is code from Kaggle tutorial. I have installed Keras and Tensorflow. import numpy as np …
vojtak
  • 241
  • 1
  • 2
  • 6
10
votes
3 answers

BPE vs WordPiece Tokenization - when to use / which?

What's the general tradeoff between choosing BPE vs WordPiece Tokenization? When is one preferable to the other? Are there any differences in model performance between the two? I'm looking for a general overall answer, backed up with specific…
vgoklani
  • 238
  • 2
  • 7
10
votes
4 answers

Skewed multi-class data

I have a dataset which contains ~100,000 samples of 50 classes. I have been using SVM with an RBF kernel to train and predict new data. The problem though is the dataset is skewed towards different classes. For example, Class 1 - 30 (~3% each),…
mike1886
  • 933
  • 9
  • 17
10
votes
1 answer

Is Minimax Linkage a Lance-Williams hierarchical clustering?

I found the following article on "Hierarchical Clustering With Prototypes via Minimax Linkage". It is stated in Property 6 that Minimax linkage cannot be written using Lance–Williams updates. A succinct proof using a counter-example is…
mic
  • 513
  • 5
  • 15
10
votes
2 answers

What are some key strengths of BERT over ELMO/ULMFiT?

I see BERT family is being used as benchmark everywhere for NLP tasks. What are some key strengths of BERT over models like ELMO or ULMFiT?
Akshay
  • 101
  • 1
  • 1
  • 4
10
votes
2 answers

How to get feature importance from a keras deep learning model?

In case of scikit-learn's models, we can get feature importance using the relevant attributes of the model. I've been working on a RNN, using LSTMs for text embedding. Is there any way to get feature importance of various features from the…
soham_dhole
  • 140
  • 1
  • 1
  • 8
10
votes
4 answers

How to impute Missing values not the usual way?

I have a dataset of 4712 records working on binary classification. Label 1 is 33% and Label 0 is 67%. I can't drop records because my sample is already small. Because there are few columns which has around 250-350 missing records. How do I know…
The Great
  • 2,565
  • 2
  • 20
  • 43
10
votes
2 answers

Reducing the dimensionality of word embeddings

I trained word embeddings with 300 dimensions. Now, I would like to have word embeddings with 50 dimensions: is it better to retrain the word embeddings with 50 dimensions, or can I use some dimensionality reduction method to scale the word…
Franck Dernoncourt
  • 5,690
  • 10
  • 40
  • 76
10
votes
2 answers

How are samples selected from training data in Xgboost

In Random Forest, each tree is not fed with the full batch of training data, only a sample. How does this work for Xgboost? If this sampling happens as well, how does it work for this ML algorithm?
Aman Raparia
  • 257
  • 2
  • 8
10
votes
3 answers

What is the correct way to call Keras flow_from_directory() method?

In the following article there is an instruction that dataset needs to be divided into train, validation and test folders where the test folder should not contain the labeled subfolders. Instead it should only contain a single folder (i.e.…
Tauno
  • 799
  • 2
  • 9
  • 9