Most Popular
1500 questions
17
votes
2 answers
High-dimensional data: What are useful techniques to know?
Due to various curses of dimensionality, the accuracy and speed of many of the common predictive techniques degrade on high dimensional data. What are some of the most useful techniques/tricks/heuristics that help deal with high-dimensional data…
ASX
- 451
- 2
- 4
- 7
17
votes
4 answers
Can a neural network compute $y = x^2$?
In spirit of the famous Tensorflow Fizz Buzz joke and XOr problem I started to think, if it's possible to design a neural network that implements $y = x^2$ function?
Given some representation of a number (e.g. as a vector in binary form, so that…
Boris Burkov
- 317
- 3
- 9
17
votes
1 answer
How to use Scikit-Learn Label Propagation on graph structured data?
As part of my research, I am interested in performing label propagation on a graph. I am especially interested in those two methods:
Xiaojin Zhu and Zoubin Ghahramani. Learning from labeled and unlabeled data with label propagation. Technical…
Thibaud Martinez
- 171
- 1
- 4
17
votes
2 answers
How to remove rows from a dataframe that are identical to another dataframe?
I have two data frames df1 and df2.
For my analysis, I need to remove rows from df1 that have identical column values (Email) in df2?
>>df1
First Last Email
0 Adam Smith email@email.com
1 John Brown email2@email.com
2 Joe Max …
a_a_a
- 837
- 2
- 8
- 11
17
votes
2 answers
How to store strings in CSV with new line characters?
My question is: what are ways I can store strings in a CSV that contain newline characters (i.e. \n), where each data point is in one line?
Sample data
This is a sample of the data I have:
data = [
['some text in one line', 1],
['text…
Bruno Lubascher
- 3,548
- 1
- 12
- 36
17
votes
3 answers
One-Class discriminatory classification with imbalanced, heterogenous Negative background?
I'm working on improving an existing supervised classifier, for classifying {protein} sequences as belonging to a specific class (Neuropeptide hormone precursors), or not.
There are about 1,150 known "positives", against a background of about 13…
GrimSqueaker
- 366
- 2
- 5
17
votes
5 answers
High model accuracy vs very low validation accuarcy
I'm building a sentiment analysis program in python using Keras Sequential model for deep learning
my data is 20,000 tweets:
positive tweets: 9152 tweets
negative tweets: 10849 tweets
I wrote a sequential model script to make the binary…
Amy.Dj
- 173
- 1
- 1
- 5
17
votes
2 answers
Updating the weights of the filters in a CNN
I am currently trying to understand the architecture of a CNN. I understand the convolution, the ReLU layer, pooling layer, and fully connected layer. However, I am still confused about the weights.
In a normal neural network, each neuron has its…
Felix
- 173
- 1
- 1
- 5
17
votes
2 answers
Custom loss function with additional parameter in Keras
I'm looking for a way to create a loss function that looks like this:
The function should then maximize for the reward. Is this possible to achieve in Keras?
Any suggestions how this can be achieved are highly appreciated.
def…
Nickpick
- 661
- 2
- 7
- 18
17
votes
3 answers
GANs (generative adversarial networks) possible for text as well?
Are GANs (generative adversarial networks) good just for images or can they be used for text as well?
Like training a network to generate meaningful text from a summary.
UPD - quotes from the GAN inventor Ian Goodfellow.
GANs have not been applied…
Open Food Broker
- 381
- 1
- 2
- 13
17
votes
5 answers
Prediction interval around LSTM time series forecast
Is there a method to calculate the prediction interval (probability distribution) around a time series forecast from an LSTM (or other recurrent) neural network?
Say, for example, I am predicting 10 samples into the future (t+1 to t+10), based on…
4Oh4
- 308
- 1
- 2
- 7
17
votes
3 answers
Why should we not feed LDA with TF-IDF input?
Can someone explain why we can not feed LDA topic model with TFIDF? What is wrong with this approach conceptually?
sariii
- 171
- 1
- 1
- 5
17
votes
1 answer
How should the bias be initialized and regularized?
I've read a couple of papers about kernel initialization and many papers mention that they use L2 regularization of the kernel (often with $\lambda = 0.0001$).
Does anybody do something different than initializing the bias with constant zero and not…
Martin Thoma
- 18,880
- 35
- 95
- 169
17
votes
3 answers
Why are variables of train and test data defined using the capital letter (in Python)?
I hope this question is the most suitable in this site...
In Python, usually the class name is defined using the capital letter as its first character, for example
class Vehicle:
...
However, in machine learning field, often times train and…
Blaszard
- 911
- 1
- 13
- 29
17
votes
2 answers
Use liblinear on big data for semantic analysis
I use Libsvm to train data and predict classification on semantic analysis problem. But it has a performance issue on large-scale data, because semantic analysis concerns n-dimension problem.
Last year, Liblinear was release, and it can solve…
Puffin GDI
- 283
- 3
- 15