Most Popular

1500 questions
30
votes
2 answers

How to feed LSTM with different input array sizes?

If I like to write a LSTM network and feed it by different input array sizes, how is it possible? For example I want to get voice messages or text messages in a different language and translate them. So the first input maybe is "hello" but the…
user3486308
  • 1,270
  • 5
  • 18
  • 28
30
votes
1 answer

What is the difference between upsampling and bi-linear upsampling in a CNN?

I am trying to understand this paper and am unsure of what bi-linear upsampling is. Can anyone explain this at a high-level?
JGG
  • 513
  • 2
  • 5
  • 7
30
votes
1 answer

Should one hot vectors be scaled with numerical attributes

In the case of having a combination of categorical and numerical Attributes, I usually convert the categorical attributes to one hot vectors. My question is do I leave those vectors as is and scale the numerical attributes through…
Suresh Kasipandy
  • 578
  • 1
  • 4
  • 8
30
votes
3 answers

Why do we convert skewed data into a normal distribution

I was going through a solution of the Housing prices competition on Kaggle (Human Analog's Kernel on House Prices: Advance Regression Techniques) and came across this part: # Transform the skewed numeric features by taking log(feature + 1). # This…
PixelPioneer
  • 795
  • 2
  • 9
  • 10
30
votes
3 answers

How to get p-value and confident interval in LogisticRegression with sklearn?

I am building a multinomial logistic regression with sklearn (LogisticRegression). But after it finishes, how can I get a p-value and confident interval of my model? It only appears that sklearn only provides coefficient and intercept. Thank you a…
hminle
  • 401
  • 1
  • 4
  • 4
29
votes
7 answers

Why is the decoder not a part of BERT architecture?

I can't see how BERT makes predictions without using a decoder unit, which was a part of all models before it including transformers and standard RNNs. How are output predictions made in the BERT architecture without using a decoder? How does it do…
Hrishikesh Athalye
  • 445
  • 1
  • 5
  • 7
29
votes
10 answers

Collaborating on Jupyter Notebooks

I have prepared Jupyter Notebook with some findings and I shared it with other team members through GitHub to get their feedback in a written form. It used to work like this when working together on a piece of code but does not work for Jupyter…
dzieciou
  • 697
  • 1
  • 6
  • 15
29
votes
4 answers

Cross validation Vs. Train Validate Test

I have a doubt regarding the cross validation approach and train-validation-test approach. I was told that I can split a dataset into 3 parts: Train: we train the model. Validation: we validate and adjust model parameters. Test: never seen before…
NaveganTeX
  • 465
  • 1
  • 4
  • 9
29
votes
2 answers

What is the difference between fit() and fit_generator() in Keras?

What is the difference between fit() and fit_generator() in Keras? When should I use fit() vs fit_generator()?
N.IT
  • 1,995
  • 4
  • 19
  • 35
29
votes
4 answers

Books about the "Science" in Data Science?

What are the books about the science and mathematics behind data science? It feels like so many "data science" books are programming tutorials and don't touch things like data generating processes and statistical inference. I can already code, what…
Anton
  • 399
  • 4
  • 5
29
votes
3 answers

How to combine categorical and continuous input features for neural network training

Suppose we have two kinds of input features, categorical and continuous. The categorical data may be represented as one-hot code A, while the continuous data is just a vector B in N-dimension space. It seems that simply using concat(A, B) is not a…
JunjieChen
  • 525
  • 1
  • 5
  • 8
29
votes
6 answers

Deep learning basics

I am looking for a paper detailing the very basics of deep learning. Ideally like the Andrew Ng course for deep learning. Do you know where I can find this ?
Maxi
  • 433
  • 1
  • 5
  • 7
29
votes
1 answer

What is Hellinger Distance and when to use it?

I am interested in knowing what really happens in Hellinger Distance (in simple terms). Furthermore, I am also interested in knowing what are types of problems that we can use Hellinger Distance? What are the benefits of using Hellinger Distance?
Smith Volka
  • 665
  • 2
  • 6
  • 13
29
votes
10 answers

Any Online R console?

I am looking for an online console for the language R. Like I write the code and the server should execute and provide me with the output. Similar to the website Datacamp.
Gotham
  • 291
  • 1
  • 3
  • 3
29
votes
1 answer

Word2Vec vs. Sentence2Vec vs. Doc2Vec

I recently came across the terms Word2Vec, Sentence2Vec and Doc2Vec and kind of confused as I am new to vector semantics. Can someone please elaborate the differences in these methods in simple words. What are the most suitable tasks for each…
Smith
  • 529
  • 1
  • 5
  • 14