Highest Voted Questions - Data Science Stack Exchange

10

votes

1 answer

Python Seaborn: how are error bars computed in barplots?

I'm using seaborn library to generate bar plots in python. I'm wondering what statistics are used to compute the error bars, but can't find any reference to this in the seaborn's barplot documentation. I know the bar values are computed based on…

asked Oct 29 '15 at 12:33

Michael Hooreman

793
2
9
21

10

votes

5 answers

Tool to Generate 2D Data via Mouse Clicking

Often when I am learning new machine learning methods or experimenting with a data analysis algorithm I need to generate a series of 2D points. Teachers also do this often when making a lesson or tutorial. In some cases I just create a function, add…

asked Oct 27 '15 at 17:16

MD004

310
1
3
10

10

votes

2 answers

Find missing object(s) in image with a priori knowledge about the missing object(s) (w.r.t base image)

Problem Statement: I am working on developing a method, or borrow/modify/combine existing ones, where given an golden image (reference or base with all expected objects to be present), it is able to identify the missing objects and draw a bounding…

asked Oct 30 '20 at 14:13

TwinPenguins

4,249
3
19
53

10

votes

3 answers

What is momentum in neural network?

While using "Two class neural network" in Azure ML, I encountered "Momentum" property. As per documentation, which is not clear, it says For The momentum, type a value to apply during learning as a weight on nodes from previous…

asked Oct 18 '20 at 09:25

Sandeep Bhutani

894
1
7
24

10

votes

3 answers

Vector space model cosine tf-idf for finding similar documents

Have corpus of over million documents For a given document want to find similar documents using cosine as in vector space model $d_1 \cdot d_2 / ( ||d_1|| ||d_2|| )$ All tf have been normalized using augmented frequency, to prevent a bias…

asked Oct 09 '15 at 16:31

paparazzo

188
14

10

votes

2 answers

How do scientists come up with the correct Hidden Markov Model parameters and topology to use?

I understand how a Hidden Markov Model is used in genomic sequences, such as finding a gene. But I don't understand how to come up with a particular Markov model. I mean, how many states should the model have? How many possible transitions? Should…

asked Oct 09 '15 at 00:02

SmallChess

3,540
2
18
30

10

votes

2 answers

Classification of vector sequences

My dataset is comprised of vector sequences. Each vector has 50 real-valued dimensions. The number of vectors in a sequence range from 3-5 to 10-15. In other words, the length of a sequence is not fixed. Some fair amount of the sequences (not…

asked Oct 07 '15 at 19:14

Vladislavs Dovgalecs

481
3
8

10

votes

3 answers

How to split train/test datasets having equal classes proportion

I would like to know how I can split in an equal number the following Target 0 1586 1 318 in order to have the same proportion of 0 and 1 classes in a dataset to train, if my dataset is called df and includes 10 columns, both numerical and…

asked Oct 11 '20 at 14:05

user105599

155
1
1
5

10

votes

3 answers

Is there a library that would perform segmented linear regression in python?

There is a package named segmented in R. Is there a similar package in python?

asked Oct 01 '15 at 18:40

vikasreddy

253
2
6

10

votes

1 answer

How to binary encode multi-valued categorical variable from Pandas dataframe?

Suppose we have the following dataframe with multiple values for a certain column: categories 0 - ["A", "B"] 1 - ["B", "C", "D"] 2 - ["B", "D"] How can we get a table like this? "A" "B" "C" "D" 0 - 1 1 0 0 1 - 0 1 1 1 2…

asked Sep 30 '15 at 17:41

Denis L

218
2
7

10

votes

1 answer

Why you shouldn't upsample before cross validation

I have an imbalanced dataset and I am trying different methods to address the data imbalance. I found this article that explains the correct way to cross-validate when oversampling data using SMOTE technique. I have created a model using AdaBoost…

asked Sep 22 '20 at 11:40

sums22

427
5
15

10

votes

1 answer

XGBoost custom objective for regression in R

I implemented a custom objective and metric for a xgboost regression. In order to see if I'm doing this correctly, I started with a quadratic loss. The implementation seems to work well, but I cannot reproduce the results from a standard…

asked Sep 09 '20 at 12:29

Peter

7,446
5
19
49

10

votes

3 answers

How to use a dataset with only one category of data

I am performing a classification task, to try to detect an object. A picture of the environment is taken, candidates are generated of this possible object using vision algorithms, and once isolated, these candidates will be passed through a CNN for…

asked Sep 07 '20 at 01:31

Finn Williams

451
1
7
17

10

votes

2 answers

How to use Cohen's Kappa as the evaluation metric in GridSearchCV in Scikit Learn?

I have class imbalance in the ratio 1:15 i.e. very low event rate. So to select tuning parameters of GBM in scikit learn I want to use Kappa instead of F1 score. My understanding is Kappa is a better metric than F1 score for class imbalance. But I…

asked Sep 11 '15 at 03:00

GeorgeOfTheRF

2,028
5
17
20

10

votes

3 answers

NASDAQ Trade Data

I am trying to find stock data to practice with, is there a good resource for this? I found this but it only has the current year. I already have a way of parsing the protocol, but would like to have some more data to compare with. It doesn't have…

asked Jul 19 '14 at 20:46

Marin

103
5

Most Popular