Most Popular
1500 questions
14
votes
2 answers
Class token in ViT and BERT
I'm trying to understand the architecture of the ViT Paper, and noticed they use a CLASS token like in BERT.
To the best of my understanding this token is used to gather knowledge of the entire class, and is then solely used to predict the class of…
Shir
- 241
- 1
- 2
- 5
14
votes
2 answers
What to do when testing data has less features than training data?
Let's say we are predicting the sales of a shop and my training data has two sets of features:
One about the store sales with the dates (the field "Store" is not unique)
One about the store types (the field "Store" is unique here)
So the matrix…
alvas
- 2,410
- 7
- 25
- 40
14
votes
9 answers
Is Python suitable for big data
I read in this post Is the R language suitable for Big Data that big data constitutes 5TB, and while it does a good job of providing information about the feasibility of working with this type of data in R it provides very little information about…
ragingSloth
- 1,824
- 3
- 14
- 15
14
votes
3 answers
When are p-values deceptive?
What are the data conditions that we should watch out for, where p-values may not be the best way of deciding statistical significance? Are there specific problem types that fall into this category?
user179
- 143
- 1
- 4
14
votes
10 answers
How can I appropriately handle cleaning of gender data?
I’m a data science student and I’ve begun working with an open mental health dataset. As part of this, I need to clean the data so that I can perform an analysis of it.
In this dataset, the gender field is a string that could have had anything…
nick012000
- 263
- 2
- 9
14
votes
5 answers
How can you include information not present in an image for neural networks?
I am training a CNN to identify objects in images (one label per image). However, I have additional information about these images that cannot be retrieved by looking at the image itself. In more detail, I'm talking about the physical location of…
seb
- 143
- 1
- 6
14
votes
3 answers
what is darknet and why is it needed for YOLO object detection?
what is darknet and why is it needed for YOLO object detection ? I read that its a neural network written in C , but why is it needed for YOLO object detection when we have lot of machine learning framework,api like tensorflow,keras,pytorch .
Im…
star
- 1,471
- 7
- 19
- 29
14
votes
2 answers
What are the good parameter ranges for BERT hyperparameters while finetuning it on a very small dataset?
I need to finetune BERT model (from the huggingface repository) on a sentence classification task. However, my dataset is really small.I have 12K sentences and only 10% of them are from positive classes. Does anyone here have any experience on…
zwlayer
- 259
- 1
- 2
- 8
14
votes
3 answers
How to automatically mount my Google Drive to Google Colab
I have recently discovered Google Colab and I am wondering if there is an option to permanently authorize Google Colab to access and mount my Google Drive.
from google.colab import drive
drive.mount('/content/drive')
Go to this URL in a browser:…
Georgi Stoyanov
- 243
- 1
- 2
- 5
14
votes
2 answers
Preprocessing for Text Classification in Transformer Models (BERT variants)
This might be silly to ask, but I am wondering if one should carry out the conventional text preprocessing steps for training one of the transformer models?
I remember for training a Word2Vec or Glove, we needed to perform an extensive text cleaning…
TwinPenguins
- 4,249
- 3
- 19
- 53
14
votes
2 answers
SHAP value analysis gives different feature importance on train and test set
Should SHAP value analysis be done on the train or test set?
What does it mean if the feature importance based on mean |SHAP value| is different between the train and test set of my lightgbm model?
I intend to use SHAP analysis to identify how each…
pbk
- 143
- 1
- 5
14
votes
3 answers
Where can I download historical market capitalization and daily turnover data for stocks?
There are plenty of sources which provide the historical stock data but they only provide the OHLC fields along with volume and adjusted close. Also a couple of sources I found provide market cap data sets but they're restricted to US stocks. Yahoo…
tejaskhot
- 4,065
- 7
- 20
- 18
14
votes
2 answers
Fast k-means like algorithm for $10^{10}$ points?
I am looking to do k-means clustering on a set of 10-dimensional points. The catch: there are $10^{10}$ points.
I am looking for just the center and size of the largest clusters (let's say 10 to 100 clusters); I don't care about what cluster each…
Alex I
- 3,152
- 1
- 21
- 27
14
votes
1 answer
Differences between gradient calculated by different reduction methods in PyTorch
I'm playing with different reduction methods provided in built-in loss functions. In particular, I would like to compare the following.
The averaged gradient by performing backward pass for each loss value calculated with reduction="none"
The…
Zhuoran Liu
- 141
- 1
- 3
14
votes
5 answers
How to make LightGBM to suppress output?
I have tried for a while to figure out how to "shut up" LightGBM. Especially, I would like to suppress the output of LightGBM during training (i.e. feedback on the boosting steps).
My model:
params = {
'objective': 'regression',
…
Peter
- 7,446
- 5
- 19
- 49