Most Popular

1500 questions
10
votes
1 answer

Python Seaborn: how are error bars computed in barplots?

I'm using seaborn library to generate bar plots in python. I'm wondering what statistics are used to compute the error bars, but can't find any reference to this in the seaborn's barplot documentation. I know the bar values are computed based on…
Michael Hooreman
  • 793
  • 2
  • 9
  • 21
10
votes
5 answers

Tool to Generate 2D Data via Mouse Clicking

Often when I am learning new machine learning methods or experimenting with a data analysis algorithm I need to generate a series of 2D points. Teachers also do this often when making a lesson or tutorial. In some cases I just create a function, add…
MD004
  • 310
  • 1
  • 3
  • 10
10
votes
2 answers

Find missing object(s) in image with a priori knowledge about the missing object(s) (w.r.t base image)

Problem Statement: I am working on developing a method, or borrow/modify/combine existing ones, where given an golden image (reference or base with all expected objects to be present), it is able to identify the missing objects and draw a bounding…
TwinPenguins
  • 4,249
  • 3
  • 19
  • 53
10
votes
3 answers

What is momentum in neural network?

While using "Two class neural network" in Azure ML, I encountered "Momentum" property. As per documentation, which is not clear, it says For The momentum, type a value to apply during learning as a weight on nodes from previous…
Sandeep Bhutani
  • 894
  • 1
  • 7
  • 24
10
votes
3 answers

Vector space model cosine tf-idf for finding similar documents

Have corpus of over million documents For a given document want to find similar documents using cosine as in vector space model $d_1 \cdot d_2 / ( ||d_1|| ||d_2|| )$ All tf have been normalized using augmented frequency, to prevent a bias…
paparazzo
  • 188
  • 14
10
votes
2 answers

How do scientists come up with the correct Hidden Markov Model parameters and topology to use?

I understand how a Hidden Markov Model is used in genomic sequences, such as finding a gene. But I don't understand how to come up with a particular Markov model. I mean, how many states should the model have? How many possible transitions? Should…
SmallChess
  • 3,540
  • 2
  • 18
  • 30
10
votes
2 answers

Classification of vector sequences

My dataset is comprised of vector sequences. Each vector has 50 real-valued dimensions. The number of vectors in a sequence range from 3-5 to 10-15. In other words, the length of a sequence is not fixed. Some fair amount of the sequences (not…
10
votes
3 answers

How to split train/test datasets having equal classes proportion

I would like to know how I can split in an equal number the following Target 0 1586 1 318 in order to have the same proportion of 0 and 1 classes in a dataset to train, if my dataset is called df and includes 10 columns, both numerical and…
user105599
  • 155
  • 1
  • 1
  • 5
10
votes
3 answers

Is there a library that would perform segmented linear regression in python?

There is a package named segmented in R. Is there a similar package in python?
vikasreddy
  • 253
  • 2
  • 6
10
votes
1 answer

How to binary encode multi-valued categorical variable from Pandas dataframe?

Suppose we have the following dataframe with multiple values for a certain column: categories 0 - ["A", "B"] 1 - ["B", "C", "D"] 2 - ["B", "D"] How can we get a table like this? "A" "B" "C" "D" 0 - 1 1 0 0 1 - 0 1 1 1 2…
Denis L
  • 218
  • 2
  • 7
10
votes
1 answer

Why you shouldn't upsample before cross validation

I have an imbalanced dataset and I am trying different methods to address the data imbalance. I found this article that explains the correct way to cross-validate when oversampling data using SMOTE technique. I have created a model using AdaBoost…
sums22
  • 427
  • 5
  • 15
10
votes
1 answer

XGBoost custom objective for regression in R

I implemented a custom objective and metric for a xgboost regression. In order to see if I'm doing this correctly, I started with a quadratic loss. The implementation seems to work well, but I cannot reproduce the results from a standard…
Peter
  • 7,446
  • 5
  • 19
  • 49
10
votes
3 answers

How to use a dataset with only one category of data

I am performing a classification task, to try to detect an object. A picture of the environment is taken, candidates are generated of this possible object using vision algorithms, and once isolated, these candidates will be passed through a CNN for…
Finn Williams
  • 451
  • 1
  • 7
  • 17
10
votes
2 answers

How to use Cohen's Kappa as the evaluation metric in GridSearchCV in Scikit Learn?

I have class imbalance in the ratio 1:15 i.e. very low event rate. So to select tuning parameters of GBM in scikit learn I want to use Kappa instead of F1 score. My understanding is Kappa is a better metric than F1 score for class imbalance. But I…
GeorgeOfTheRF
  • 2,028
  • 5
  • 17
  • 20
10
votes
3 answers

NASDAQ Trade Data

I am trying to find stock data to practice with, is there a good resource for this? I found this but it only has the current year. I already have a way of parsing the protocol, but would like to have some more data to compare with. It doesn't have…
Marin
  • 103
  • 5