Most Popular
1500 questions
11
votes
2 answers
Implementing Complementary Naive Bayes in python?
Problem
I have tried using Naive bayes on a labeled data set of crime data but got really poor results (7% accuracy). Naive Bayes runs much faster than other alogorithms I've been using so I wanted to try finding out why the score was so…
grasshopper
- 213
- 1
- 5
11
votes
4 answers
How do you create an optimized walk list given longitude and latitude coordinates?
I am working on a political campaign where dozens of volunteers will be conducting door-knocking promotions over the next few weeks. Given a list with names, addresses and long/lat coordinates, what algorithms can be used to create an optimized walk…
McGovernTheory
- 219
- 1
- 4
11
votes
3 answers
Which is faster: PostgreSQL vs MongoDB on large JSON datasets?
I have a large dataset with 9m JSON objects at ~300 bytes each. They are posts from a link aggregator: basically links (a URL, title and author id) and comments (text and author ID) + metadata.
They could very well be relational records in a table,…
blue-dino
- 383
- 2
- 3
- 11
11
votes
3 answers
What are R's memory constraints?
In reviewing “Applied Predictive Modeling" a reviewer states:
One critique I have of statistical learning (SL) pedagogy is the
absence of computation performance considerations in the evaluation of
different modeling techniques. With its…
blunders
- 1,932
- 2
- 15
- 19
11
votes
2 answers
Delete/Drop only the rows which has all values as NaN in pandas
I have a Dataframe, i need to drop the rows which has all the values as NaN.
ID Age Gender
601 21 M
501 NaN F
NaN NaN NaN
The resulting data frame should look like.
Id Age Gender
601 21 M
501 …
Harshith
- 283
- 2
- 5
- 16
11
votes
3 answers
How to encode a class with 24,000 categories?
I'm currently working on a logistic regression model for genomics. One of the input fields I want to include as a covariate is genes. There are around 24,000 known genes. There are many features with this level of variability in computational…
Kermit
- 529
- 5
- 17
11
votes
5 answers
LinkedIn web scraping
I recently discovered a new R package for connecting to the LinkedIn API. Unfortunately the LinkedIn API seems pretty limited to begin with; for example, you can only get basic data on companies, and this is detached from data on individuals. I'd…
christopherlovell
- 480
- 1
- 5
- 18
11
votes
3 answers
Why does Gradient Boosting regression predict negative values when there are no negative y-values in my training set?
As I increase the number of trees in scikit learn's GradientBoostingRegressor, I get more negative predictions, even though there are no negative values in my training or testing set. I have about 10 features, most of which are binary.
Some of the…
user2592989
- 219
- 1
- 2
- 6
11
votes
1 answer
Why could my DDQN get significantly worse after beating the game repeatedly?
I've been trying to train a DDQN to play OpenAI Gym's CartPole-v1, but found that although it starts off well and starts getting full score (500) repeatedly (at around 600 episodes in the pic below), it then seems to go off the rails and do worse…
Danny Tuppeny
- 213
- 2
- 7
11
votes
4 answers
Feature Extraction Technique - Summarizing a Sequence of Data
I often am building a model (classification or regression) where I have some predictor variables that are sequences and I have been trying to find technique recommendations for summarizing them in the best way possible for inclusion as predictors in…
B_Miner
- 702
- 1
- 7
- 20
11
votes
2 answers
What is the difference between and Embedding Layer and an Autoencoder?
I'm reading about Embedding layers, especially applied to NLP and word2vec, and they seem nothing more than an application of Autoencoders for dimensionality reduction. Are they different? If so, what are the differences between them?
Leevo
- 6,225
- 3
- 16
- 52
11
votes
4 answers
How can Time Series Analysis be done with Categorical Variables
Most of the time series analysis tutorials/textbooks I've read about, be they for univariate or multivariate time series data, usually deal with continuous numerical variables.
I currently have a problem at hand that deals with multivariate time…
Brian Yen
- 111
- 1
- 1
- 5
11
votes
2 answers
Amplifying a Locality Sensitive Hash
I'm trying to build a cosine locality sensitive hash so I can find candidate similar pairs of items without having to compare every possible pair. I have it basically working, but most of the pairs in my data seem to have cosine similarity in the…
Philip Pearl
- 251
- 1
- 5
11
votes
1 answer
What is fractionally-strided convolution layer?
In paper Generating High-Quality Crowd Density Maps using Contextual Pyramid CNNs, in Section 3.4, it said
Since, the aim of this work is to estimate high-resolution and
high-quality density maps, F-CNN is constructed using a set of
…
Haha TTpro
- 243
- 1
- 2
- 7
11
votes
3 answers
Inverse Relationship Between Precision and Recall
I made some search to learn precision and recall and I saw some graphs represents inverse relationship between precision and recall and I started to think about it to clarify subject. I wonder the inverse relationship always hold? Suppose I have a…
tkarahan
- 422
- 5
- 14