Most Popular

1500 questions
20
votes
2 answers

Parameterization regression of rotation angle

Let's say I have a top-down picture of an arrow, and I want to predict the angle this arrow makes. This would be between $0$ and $360$ degrees, or between $0$ and $2\pi$. The problem is that this target is circular, $0$ and $360$ degrees are exactly…
Jan van der Vegt
  • 9,368
  • 35
  • 52
20
votes
3 answers

Why are autoencoders for dimension reduction symmetrical?

I'm not an expert in autoencoders or neural networks by any means, so forgive me if this is a silly question. For the purpose of dimension reduction or visualizing clusters in high dimensional data, we can use an autoencoder to create a (lossy) 2…
dcl
  • 251
  • 1
  • 2
  • 7
20
votes
4 answers

What is the difference between a hashing vectorizer and a tfidf vectorizer

I'm converting a corpus of text documents into word vectors for each document. I've tried this using a TfidfVectorizer and a HashingVectorizer I understand that a HashingVectorizer does not take into consideration the IDF scores like a…
Minu
  • 805
  • 2
  • 9
  • 18
20
votes
1 answer

Are t-sne dimensions meaningful?

Are there any meanings for the dimensions of a t-sne embedding? Like with PCA we have this sense of linearly transformed variance maximizations but for t-sne is there intuition besides just the space we define for mapping and minimization of the…
Nitro
  • 407
  • 3
  • 9
20
votes
5 answers

Do modern R and/or Python libraries make SQL obsolete?

I work in an office where SQL Server is the backbone of everything we do, from data processing to cleaning to munging. My colleague specializes in writing complex functions and stored procedures to methodically process incoming data so that it can…
AffableAmbler
  • 363
  • 1
  • 2
  • 11
19
votes
2 answers

Multivariate linear regression in Python

I'm looking for a Python package that implements multivariate linear regression. (Terminological note: multivariate regression deals with the case where there are more than one dependent variables while multiple regression deals with the case where…
Franck Dernoncourt
  • 5,690
  • 10
  • 40
  • 76
19
votes
3 answers

Dataframe has no column names. How to add a header?

I am using a dataset to practice for building a decision tree classifier. Here is my code: import pandas as pd tdf = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data', sep =…
user633599
  • 333
  • 1
  • 2
  • 9
19
votes
1 answer

scikit-learn n_jobs parameter on CPU usage & memory

In most estimators on scikit-learn, there is an n_jobs parameter in fit/predict methods for creating parallel jobs using joblib. I noticed that setting it to -1 creates just 1 Python process and maxes out the cores, causing CPU usage to hit 2500 %…
user29151
19
votes
2 answers

Sample Importance (Training Weights) in Keras

How do you add more importance to some samples than others (sample weights) in Keras? I'm not looking for class_weightwhich is a fix for unbalanced datasets. What I currently have is: trainingWeights which is the desired importance I want to give…
wacax
  • 3,390
  • 4
  • 23
  • 45
19
votes
3 answers

Advantages of stacking LSTMs?

I'm wondering in what situations it is advantageous to stack LSTMs?
Vadim Smolyakov
  • 646
  • 1
  • 5
  • 14
19
votes
6 answers

The data in our relational DBMS is getting big, is it the time to move to NoSQL?

We created a social network application for eLearning purposes. It's an experimental project that we are researching on in our lab. It has been used in some case studies for a while and the data in our relational DBMS (SQL Server 2008) is getting…
ePezhman
  • 293
  • 1
  • 4
19
votes
2 answers

Could Deep Learning be used to crack encryption?

Say you have a dataset with millions of rows and the attributes Plain Text, Key, and Output Ciphertext. Could Deep Learning, theoretically, be used to find patterns in the outputs that help decipher the ciphertext? Are there any other potential…
user28473
  • 191
  • 1
  • 1
  • 3
19
votes
4 answers

What is the difference between word-based and char-based text generation RNNs?

While reading about text generation with Recurrent Neural Networks I noticed that some examples were implemented to generate text word by word and others character by character without actually stating why. So, what is the difference between RNN…
minerals
  • 2,147
  • 3
  • 17
  • 19
19
votes
1 answer

What is difference between one hot encoding and leave one out encoding?

I am reading a presentation and it recommends not using leave one out encoding, but it is okay with one hot encoding. I thought they both were the same. Can anyone describe what the differences between them are?
icm
  • 529
  • 2
  • 5
  • 9
19
votes
2 answers

How does SelectKBest work?

I am looking at this tutorial: https://www.dataquest.io/mission/75/improving-your-submission At section 8, finding the best features, it shows the following code. import numpy as np from sklearn.feature_selection import SelectKBest,…
user
  • 1,993
  • 6
  • 21
  • 38