Highest Voted Questions - Data Science Stack Exchange

13

votes

5 answers

Generate pdf from jupyter notebook without code

I have a Jupyter notebook that contains markdown, code, and outputs (graphs). I would like to generate PDF from this notebook. I tried to hide code using HTML code which I get from here then I tried to download it as pdf but again code shows up. But…

asked Jul 08 '20 at 09:31

GIRISH kuniyal

253
1
2
8

13

votes

8 answers

If A and B are correlated and A and C are correlated. Why is it possible for B and C to be uncorrelated?

Let's say A and B are correlated A and C are correlated B and C is uncorrelated How is it possible for B and C to be uncorrelated when they are both correlated to A?

correlation

asked May 31 '20 at 10:09

Ashley

131
1
3

13

votes

2 answers

Is FPGrowth still considered "state of the art" in frequent pattern mining?

As far as I know the development of algorithms to solve the Frequent Pattern Mining (FPM) problem, the road of improvements have some main checkpoints. Firstly, the Apriori algorithm was proposed in 1993, by Agrawal et al., along with the…

asked Jul 12 '14 at 17:25

Rubens

4,107
5
23
42

13

votes

3 answers

What do Python's pandas/matplotlib/seaborn bring to the table that Tableau does not?

I spent the past year learning Python. As a person who thought coding was impossible to learn for those outside of the CS/IT sphere, I was obviously gobsmacked by the power of a few lines of Python code! Having arrived at an intermediate level…

asked Mar 29 '20 at 12:00

Uralan

143
1
8

13

votes

2 answers

Efficient algorithm to compute the ROC curve for a classifier consisting of an ensemble of disjoint classifiers

Suppose I have classifiers C_1 ... C_n that are disjoint in the sense that no two will return true on the same input (e.g. the nodes in a decision tree). I want to build a new classifier that is the union of some subset of these (e.g. I want to…

algorithms

asked Aug 25 '15 at 16:04

Josh Brown Kramer

233
1
4

13

votes

4 answers

How can I provide an answer to Neural Network skeptics?

After given several talks on NN's, I always have a skeptic that wants a real measure of how well the model is. How do you know the model is truly accurate? I explain the use of test data etc. to evaluate the total error, however, there is always…

neural-network

asked Feb 28 '20 at 20:38

Shinobii

419
4
10

13

votes

1 answer

Feature selection using feature importances in random forests with scikit-learn

I have plotted the feature importances in random forests with scikit-learn. In order to improve the prediction using random forests, how can I use the plot information to remove features? I.e. how to spot whether a feature is useless or even worse…

asked Aug 04 '15 at 17:44

Franck Dernoncourt

5,690
10
40
76

13

votes

5 answers

In industry, what type of new data science algorithms does one develop?

I've seen several job descriptions for data science which include developing a novel algorithm to be a part of production environments. Can you give some input of what could be meant here exactly? Would they mean an algorithm that behaves somewhat…

asked Jan 17 '20 at 19:02

Mariah

338
1
9

13

votes

2 answers

Activation function between LSTM layers

I'm aware the LSTM cell uses both sigmoid and tanh activation functions internally, however when creating a stacked LSTM architecture does it make sense to pass their outputs through an activation function (e.g. ReLU)? So do we prefer this: model =…

asked Jan 16 '20 at 16:03

lsfischer

242
1
2
8

13

votes

8 answers

I am a programmer, how do I get into field of Data Science?

First of all this term sounds so obscure. Anyways..I am a software programmer. One of the languages I can code is Python. Speaking of Data I can use SQL and can do Data Scraping. What I figured out so far after reading soo many articles that Data…

asked Jul 24 '15 at 20:10

Volatil3

341
3
10

13

votes

3 answers

Measuring performance of different classifiers with different sample sizes

I'm currently using several different classifiers on various entities extracted from text, and using precision/recall as a summary of how well each separate classifier performs across a given dataset. I'm wondering if there's a meaningful way of…

asked Jun 28 '14 at 14:57

Dave Challis

395
2
10

13

votes

3 answers

Are ontologies and the Semantic Web dead?

Is the Semantic Web dead? Are ontologies dead? I am developing a work plan for my thesis about "A knowledge base through a set ontology for interest groups around wetlands". I have been researching and developing ontologies for it but I am still…

knowledge-base

asked May 07 '15 at 00:16

Antonio Edgar Martinez

155
1
5

13

votes

4 answers

Pandas change value of a column based another column condition

I have values in column1, I have columns in column2. What I want to achieve: Condition: where column2 == 2 leave to be 2 if column1 < 30 elsif change to 3 if column1 > 90. Here is what i did so far, the problem is 2 does not change to 3 where…

asked Jul 31 '19 at 10:08

Koko

213
1
2
6

13

votes

6 answers

Datasets understanding best practices

I am a CS master student in data mining. My supervisor once told me that before I run any classifier or do anything with a dataset I must fully understand the data and make sure that the data is clean and correct. My questions: What are the best…

asked Jun 24 '14 at 07:29

Jack Twain

719
1
5
7

13

votes

1 answer

How to know if a model is overfitting or underfitting by looking at graph

Just recently got my hands on tensorboard, but can you tell me what features should I look for in the graph (Accuracy and Validation Accuracy) And please do enlighten me about the concept of underfitting as well.

asked Jun 05 '19 at 13:54

Nikhil.Nixel

329
1
2
10

Most Popular