Highest Voted Questions - Data Science Stack Exchange

20

votes

5 answers

Python library to implement Hidden Markov Models

What stable Python library can I use to implement Hidden Markov Models? I need it to be reasonably well documented, because I've never really used this model before. Alternatively, is there a more direct approach to performing a time-series analysis…

asked Oct 16 '15 at 06:45

neural-nut

1,783
3
17
27

20

votes

3 answers

Overfitting in Linear Regression

I'm just getting started with machine learning and I have trouble understanding how overfitting can happen in a linear regression model. Considering we use only 2 feature variables to train a model, how can a flat plane possibly be overfitted to a…

asked Aug 27 '20 at 08:52

Sachin Krishna

359
1
2
7

20

votes

5 answers

What is the difference between explainable and interpretable machine learning?

O’Rourke says that explainable ML uses a black box model and explains it afterwards, whereas interpretable ML uses models that are no black boxes. Christoph Molnar says interpretable ML refers to the degree to which a human can understand the cause…

asked Mar 24 '20 at 07:56

Funkwecker

595
1
5
13

20

votes

8 answers

Monitoring machine learning models in production

I am looking for tools that allow me to monitor machine learning models once they are gone to production. I would like to monitor: Long term changes: changes of distribution in the features with respect to training time, that would suggest…

asked Dec 13 '19 at 12:15

David Masip

6,051
2
24
61

20

votes

3 answers

How to create custom Activation functions in Keras / TensorFlow?

I'm using keras and I wanted to add my own activation function myf to tensorflow backend. how to define the new function and make it operational. so instead of the line of code: model.add(layers.Conv2D(64, (3, 3), activation='relu')) I'll write…

asked Sep 09 '19 at 07:34

Basta

201
1
2
4

20

votes

3 answers

L1 & L2 Regularization in Light GBM

This question pertains to L1 & L2 regularization parameters in Light GBM. As per official documentation: reg_alpha (float, optional (default=0.)) – L1 regularization term on weights. reg_lambda (float, optional (default=0.)) – L2 regularization term…

asked Aug 08 '19 at 17:08

Vikrant Arora

456
1
4
10

20

votes

4 answers

K-means: What are some good ways to choose an efficient set of initial centroids?

When a random initialization of centroids is used, different runs of K-means produce different total SSEs. And it is crucial in the performance of the algorithm. What are some effective approaches toward solving this problem? Recent approaches are…

asked Apr 30 '15 at 13:42

ngub05

333
1
2
8

20

votes

2 answers

In CNN, why do we increase the number of filters in deeper Convolution layers for complex images?

I have been doing this online course Introduction to TensorFlow for AI, ML and DL. Here in one part, they were showing a CNN model for classifying human and horses. In this model, the first Conv2D layer had 16 filters, followed by two more Conv2D…

asked Jul 12 '19 at 06:52

Sanjay

313
1
2
8

20

votes

2 answers

What are the differences between Convolutional1D, Convolutional2D, and Convolutional3D?

I've been learning about Convolutional Neural Networks. When looking at Keras examples, I came across three different convolution methods. Namely, 1D, 2D & 3D. What are the differences between these three layers? What are their use cases? Are there…

asked May 06 '19 at 06:59

Saurabh

347
1
2
9

20

votes

5 answers

Choose binary classification algorithm

I have a binary classification problem: Approximately 1000 samples in training set 10 attributes, including binary, numeric and categorical Which algorithm is the best choice for this type of problem? By default I'm going to start with SVM…

asked Jun 15 '14 at 14:01

IgorS

5,474
11
31
43

20

votes

4 answers

Macro- or micro-average for imbalanced class problems

The question of whether to use macro- or micro-averages when the data is imbalanced comes up all the time. Some googling shows that many bloggers tend to say that micro-average is the preferred way to go, e.g.: Micro-average is preferable if there…

asked Aug 13 '18 at 09:57

Krrr

303
1
2
6

20

votes

2 answers

How to include labels in sns heatmap

I got this matrix 120 100 80 40 20 10 5 0 120 64.21 58.20 51.20 56.37 47.00 45.61 46.86 2.16 100 62.84 57.80 50.60 51.32 39.43 39.30 42.80 0.89 80 62.62 56.20 51.20 51.61 …

asked May 16 '18 at 17:04

Srihari

777
4
12
27

20

votes

2 answers

Can the number of epochs influence overfitting?

I am using a convolution neural network ,CNN. At a specific epoch, I only save the best CNN model weights based on improved validation accuracy over previous epochs. Does increasing the number of epochs also increase over-fitting for CNNs and deep…

asked Feb 07 '18 at 13:34

user121

369
1
3
9

20

votes

2 answers

Extract most informative parts of text from documents

Are there any articles or discussions about extracting part of text that holds the most of information about current document. For example, I have a large corpus of documents from the same domain. There are parts of text that hold the key…

asked Dec 08 '14 at 14:51

MaticDiba

651
1
6
10

20

votes

3 answers

What is the difference between CountVectorizer token counts and TfidfTransformer with use_idf set to False?

We can use CountVectorizer to count the number of times a word occurs in a corpus: # Tokenizing text from sklearn.feature_extraction.text import CountVectorizer count_vect = CountVectorizer() X_train_counts =…

asked Dec 11 '17 at 22:51

Cybernetic

780
1
4
10

Most Popular