Highest Voted Questions - Data Science Stack Exchange

17

votes

2 answers

High-dimensional data: What are useful techniques to know?

Due to various curses of dimensionality, the accuracy and speed of many of the common predictive techniques degrade on high dimensional data. What are some of the most useful techniques/tricks/heuristics that help deal with high-dimensional data…

asked Jan 25 '15 at 22:52

ASX

451
2
4
7

17

votes

4 answers

Can a neural network compute $y = x^2$?

In spirit of the famous Tensorflow Fizz Buzz joke and XOr problem I started to think, if it's possible to design a neural network that implements $y = x^2$ function? Given some representation of a number (e.g. as a vector in binary form, so that…

asked Mar 22 '19 at 13:02

Boris Burkov

317
3
9

17

votes

1 answer

How to use Scikit-Learn Label Propagation on graph structured data?

As part of my research, I am interested in performing label propagation on a graph. I am especially interested in those two methods: Xiaojin Zhu and Zoubin Ghahramani. Learning from labeled and unlabeled data with label propagation. Technical…

asked Feb 12 '19 at 17:15

Thibaud Martinez

171
1
4

17

votes

2 answers

How to remove rows from a dataframe that are identical to another dataframe?

I have two data frames df1 and df2. For my analysis, I need to remove rows from df1 that have identical column values (Email) in df2? >>df1 First Last Email 0 Adam Smith email@email.com 1 John Brown email2@email.com 2 Joe Max …

asked Aug 21 '18 at 10:22

a_a_a

837
2
8
11

17

votes

2 answers

How to store strings in CSV with new line characters?

My question is: what are ways I can store strings in a CSV that contain newline characters (i.e. \n), where each data point is in one line? Sample data This is a sample of the data I have: data = [ ['some text in one line', 1], ['text…

asked Jul 22 '18 at 14:31

Bruno Lubascher

3,548
1
12
36

17

votes

3 answers

One-Class discriminatory classification with imbalanced, heterogenous Negative background?

I'm working on improving an existing supervised classifier, for classifying {protein} sequences as belonging to a specific class (Neuropeptide hormone precursors), or not. There are about 1,150 known "positives", against a background of about 13…

asked Jun 11 '14 at 10:11

GrimSqueaker

366
2
5

17

votes

5 answers

High model accuracy vs very low validation accuarcy

I'm building a sentiment analysis program in python using Keras Sequential model for deep learning my data is 20,000 tweets: positive tweets: 9152 tweets negative tweets: 10849 tweets I wrote a sequential model script to make the binary…

asked Apr 04 '18 at 09:02

Amy.Dj

173
1
1
5

17

votes

2 answers

Updating the weights of the filters in a CNN

I am currently trying to understand the architecture of a CNN. I understand the convolution, the ReLU layer, pooling layer, and fully connected layer. However, I am still confused about the weights. In a normal neural network, each neuron has its…

asked Dec 17 '17 at 21:51

Felix

173
1
1
5

17

votes

2 answers

Custom loss function with additional parameter in Keras

I'm looking for a way to create a loss function that looks like this: The function should then maximize for the reward. Is this possible to achieve in Keras? Any suggestions how this can be achieved are highly appreciated. def…

asked Nov 22 '17 at 22:24

Nickpick

661
2
7
18

17

votes

3 answers

GANs (generative adversarial networks) possible for text as well?

Are GANs (generative adversarial networks) good just for images or can they be used for text as well? Like training a network to generate meaningful text from a summary. UPD - quotes from the GAN inventor Ian Goodfellow. GANs have not been applied…

gan

asked Nov 18 '17 at 08:31

Open Food Broker

381
1
2
13

17

votes

5 answers

Prediction interval around LSTM time series forecast

Is there a method to calculate the prediction interval (probability distribution) around a time series forecast from an LSTM (or other recurrent) neural network? Say, for example, I am predicting 10 samples into the future (t+1 to t+10), based on…

asked Nov 06 '17 at 12:16

4Oh4

308
1
2
7

17

votes

3 answers

Why should we not feed LDA with TF-IDF input?

Can someone explain why we can not feed LDA topic model with TFIDF? What is wrong with this approach conceptually?

asked Aug 04 '17 at 03:56

sariii

171
1
1
5

17

votes

1 answer

How should the bias be initialized and regularized?

I've read a couple of papers about kernel initialization and many papers mention that they use L2 regularization of the kernel (often with $\lambda = 0.0001$). Does anybody do something different than initializing the bias with constant zero and not…

neural-network

asked Mar 30 '17 at 04:40

Martin Thoma

18,880
35
95
169

17

votes

3 answers

Why are variables of train and test data defined using the capital letter (in Python)?

I hope this question is the most suitable in this site... In Python, usually the class name is defined using the capital letter as its first character, for example class Vehicle: ... However, in machine learning field, often times train and…

asked Mar 15 '17 at 07:36

Blaszard

911
1
13
29

17

votes

2 answers

Use liblinear on big data for semantic analysis

I use Libsvm to train data and predict classification on semantic analysis problem. But it has a performance issue on large-scale data, because semantic analysis concerns n-dimension problem. Last year, Liblinear was release, and it can solve…

asked May 14 '14 at 01:57

Puffin GDI

283
3
15

Most Popular