Most Popular
1500 questions
10
votes
1 answer
Python Seaborn: how are error bars computed in barplots?
I'm using seaborn library to generate bar plots in python. I'm wondering what statistics are used to compute the error bars, but can't find any reference to this in the seaborn's barplot documentation.
I know the bar values are computed based on…
Michael Hooreman
- 793
- 2
- 9
- 21
10
votes
5 answers
Tool to Generate 2D Data via Mouse Clicking
Often when I am learning new machine learning methods or experimenting with a data analysis algorithm I need to generate a series of 2D points. Teachers also do this often when making a lesson or tutorial.
In some cases I just create a function, add…
MD004
- 310
- 1
- 3
- 10
10
votes
2 answers
Find missing object(s) in image with a priori knowledge about the missing object(s) (w.r.t base image)
Problem Statement:
I am working on developing a method, or borrow/modify/combine existing ones, where given an golden image (reference or base with all expected objects to be present), it is able to identify the missing objects and draw a bounding…
TwinPenguins
- 4,249
- 3
- 19
- 53
10
votes
3 answers
What is momentum in neural network?
While using "Two class neural network" in Azure ML, I encountered "Momentum" property. As per documentation, which is not clear, it says
For The momentum, type a value to apply during learning as a weight on
nodes from previous…
Sandeep Bhutani
- 894
- 1
- 7
- 24
10
votes
3 answers
Vector space model cosine tf-idf for finding similar documents
Have corpus of over million documents
For a given document want to find similar documents using cosine as in vector space model
$d_1 \cdot d_2 / ( ||d_1|| ||d_2|| )$
All tf have been normalized using augmented frequency, to prevent a bias…
paparazzo
- 188
- 14
10
votes
2 answers
How do scientists come up with the correct Hidden Markov Model parameters and topology to use?
I understand how a Hidden Markov Model is used in genomic sequences, such as finding a gene. But I don't understand how to come up with a particular Markov model. I mean, how many states should the model have? How many possible transitions? Should…
SmallChess
- 3,540
- 2
- 18
- 30
10
votes
2 answers
Classification of vector sequences
My dataset is comprised of vector sequences. Each vector has 50 real-valued dimensions. The number of vectors in a sequence range from 3-5 to 10-15. In other words, the length of a sequence is not fixed.
Some fair amount of the sequences (not…
Vladislavs Dovgalecs
- 481
- 3
- 8
10
votes
3 answers
How to split train/test datasets having equal classes proportion
I would like to know how I can split in an equal number the following
Target
0 1586
1 318
in order to have the same proportion of 0 and 1 classes in a dataset to train, if my dataset is called df and includes 10 columns, both numerical and…
user105599
- 155
- 1
- 1
- 5
10
votes
3 answers
Is there a library that would perform segmented linear regression in python?
There is a package named segmented in R. Is there a similar package in python?
vikasreddy
- 253
- 2
- 6
10
votes
1 answer
How to binary encode multi-valued categorical variable from Pandas dataframe?
Suppose we have the following dataframe with multiple values for a certain column:
categories
0 - ["A", "B"]
1 - ["B", "C", "D"]
2 - ["B", "D"]
How can we get a table like this?
"A" "B" "C" "D"
0 - 1 1 0 0
1 - 0 1 1 1
2…
Denis L
- 218
- 2
- 7
10
votes
1 answer
Why you shouldn't upsample before cross validation
I have an imbalanced dataset and I am trying different methods to address the data imbalance. I found this article that explains the correct way to cross-validate when oversampling data using SMOTE technique.
I have created a model using AdaBoost…
sums22
- 427
- 5
- 15
10
votes
1 answer
XGBoost custom objective for regression in R
I implemented a custom objective and metric for a xgboost regression. In order to see if I'm doing this correctly, I started with a quadratic loss. The implementation seems to work well, but I cannot reproduce the results from a standard…
Peter
- 7,446
- 5
- 19
- 49
10
votes
3 answers
How to use a dataset with only one category of data
I am performing a classification task, to try to detect an object. A picture of the environment is taken, candidates are generated of this possible object using vision algorithms, and once isolated, these candidates will be passed through a CNN for…
Finn Williams
- 451
- 1
- 7
- 17
10
votes
2 answers
How to use Cohen's Kappa as the evaluation metric in GridSearchCV in Scikit Learn?
I have class imbalance in the ratio 1:15 i.e. very low event rate. So to select tuning parameters of GBM in scikit learn I want to use Kappa instead of F1 score. My understanding is Kappa is a better metric than F1 score for class imbalance.
But I…
GeorgeOfTheRF
- 2,028
- 5
- 17
- 20
10
votes
3 answers
NASDAQ Trade Data
I am trying to find stock data to practice with, is there a good resource for this? I found this but it only has the current year.
I already have a way of parsing the protocol, but would like to have some more data to compare with. It doesn't have…
Marin
- 103
- 5