Highest Voted Questions - Data Science Stack Exchange

20

votes

2 answers

Parameterization regression of rotation angle

Let's say I have a top-down picture of an arrow, and I want to predict the angle this arrow makes. This would be between $0$ and $360$ degrees, or between $0$ and $2\pi$. The problem is that this target is circular, $0$ and $360$ degrees are exactly…

asked Nov 21 '17 at 15:33

Jan van der Vegt

9,368
35
52

20

votes

3 answers

Why are autoencoders for dimension reduction symmetrical?

I'm not an expert in autoencoders or neural networks by any means, so forgive me if this is a silly question. For the purpose of dimension reduction or visualizing clusters in high dimensional data, we can use an autoencoder to create a (lossy) 2…

asked Oct 13 '17 at 05:25

dcl

251
1
2
7

20

votes

4 answers

What is the difference between a hashing vectorizer and a tfidf vectorizer

I'm converting a corpus of text documents into word vectors for each document. I've tried this using a TfidfVectorizer and a HashingVectorizer I understand that a HashingVectorizer does not take into consideration the IDF scores like a…

asked Aug 14 '17 at 16:42

Minu

805
2
9
18

20

votes

1 answer

Are t-sne dimensions meaningful?

Are there any meanings for the dimensions of a t-sne embedding? Like with PCA we have this sense of linearly transformed variance maximizations but for t-sne is there intuition besides just the space we define for mapping and minimization of the…

asked Mar 02 '17 at 16:46

Nitro

407
3
9

20

votes

5 answers

Do modern R and/or Python libraries make SQL obsolete?

I work in an office where SQL Server is the backbone of everything we do, from data processing to cleaning to munging. My colleague specializes in writing complex functions and stored procedures to methodically process incoming data so that it can…

asked Feb 24 '17 at 19:33

AffableAmbler

363
1
2
11

19

votes

2 answers

Multivariate linear regression in Python

I'm looking for a Python package that implements multivariate linear regression. (Terminological note: multivariate regression deals with the case where there are more than one dependent variables while multiple regression deals with the case where…

asked Oct 28 '15 at 02:21

Franck Dernoncourt

5,690
10
40
76

19

votes

3 answers

Dataframe has no column names. How to add a header?

I am using a dataset to practice for building a decision tree classifier. Here is my code: import pandas as pd tdf = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data', sep =…

asked Feb 09 '19 at 18:24

user633599

333
1
2
9

19

votes

1 answer

scikit-learn n_jobs parameter on CPU usage & memory

In most estimators on scikit-learn, there is an n_jobs parameter in fit/predict methods for creating parallel jobs using joblib. I noticed that setting it to -1 creates just 1 Python process and maxes out the cores, causing CPU usage to hit 2500 %…

asked Jul 13 '18 at 10:06

user29151

19

votes

2 answers

Sample Importance (Training Weights) in Keras

How do you add more importance to some samples than others (sample weights) in Keras? I'm not looking for class_weightwhich is a fix for unbalanced datasets. What I currently have is: trainingWeights which is the desired importance I want to give…

asked May 02 '18 at 18:04

wacax

3,390
4
23
45

19

votes

3 answers

Advantages of stacking LSTMs?

I'm wondering in what situations it is advantageous to stack LSTMs?

asked Aug 29 '17 at 16:48

Vadim Smolyakov

646
1
5
14

19

votes

6 answers

The data in our relational DBMS is getting big, is it the time to move to NoSQL?

We created a social network application for eLearning purposes. It's an experimental project that we are researching on in our lab. It has been used in some case studies for a while and the data in our relational DBMS (SQL Server 2008) is getting…

asked May 14 '14 at 05:37

ePezhman

293
1
4

19

votes

2 answers

Could Deep Learning be used to crack encryption?

Say you have a dataset with millions of rows and the attributes Plain Text, Key, and Output Ciphertext. Could Deep Learning, theoretically, be used to find patterns in the outputs that help decipher the ciphertext? Are there any other potential…

asked Jan 31 '17 at 05:32

user28473

191
1
1
3

19

votes

4 answers

What is the difference between word-based and char-based text generation RNNs?

While reading about text generation with Recurrent Neural Networks I noticed that some examples were implemented to generate text word by word and others character by character without actually stating why. So, what is the difference between RNN…

asked Aug 01 '16 at 22:38

minerals

2,147
3
17
19

19

votes

1 answer

What is difference between one hot encoding and leave one out encoding?

I am reading a presentation and it recommends not using leave one out encoding, but it is okay with one hot encoding. I thought they both were the same. Can anyone describe what the differences between them are?

asked Mar 23 '16 at 03:25

icm

529
2
5
9

19

votes

2 answers

How does SelectKBest work?

I am looking at this tutorial: https://www.dataquest.io/mission/75/improving-your-submission At section 8, finding the best features, it shows the following code. import numpy as np from sklearn.feature_selection import SelectKBest,…

asked Mar 18 '16 at 10:34

user

1,993
6
21
38

Most Popular