Most Popular

1500 questions
13
votes
5 answers

Generate pdf from jupyter notebook without code

I have a Jupyter notebook that contains markdown, code, and outputs (graphs). I would like to generate PDF from this notebook. I tried to hide code using HTML code which I get from here then I tried to download it as pdf but again code shows up. But…
GIRISH kuniyal
  • 253
  • 1
  • 2
  • 8
13
votes
8 answers

If A and B are correlated and A and C are correlated. Why is it possible for B and C to be uncorrelated?

Let's say A and B are correlated A and C are correlated B and C is uncorrelated How is it possible for B and C to be uncorrelated when they are both correlated to A?
Ashley
  • 131
  • 1
  • 3
13
votes
2 answers

Is FPGrowth still considered "state of the art" in frequent pattern mining?

As far as I know the development of algorithms to solve the Frequent Pattern Mining (FPM) problem, the road of improvements have some main checkpoints. Firstly, the Apriori algorithm was proposed in 1993, by Agrawal et al., along with the…
Rubens
  • 4,107
  • 5
  • 23
  • 42
13
votes
3 answers

What do Python's pandas/matplotlib/seaborn bring to the table that Tableau does not?

I spent the past year learning Python. As a person who thought coding was impossible to learn for those outside of the CS/IT sphere, I was obviously gobsmacked by the power of a few lines of Python code! Having arrived at an intermediate level…
Uralan
  • 143
  • 1
  • 8
13
votes
2 answers

Efficient algorithm to compute the ROC curve for a classifier consisting of an ensemble of disjoint classifiers

Suppose I have classifiers C_1 ... C_n that are disjoint in the sense that no two will return true on the same input (e.g. the nodes in a decision tree). I want to build a new classifier that is the union of some subset of these (e.g. I want to…
13
votes
4 answers

How can I provide an answer to Neural Network skeptics?

After given several talks on NN's, I always have a skeptic that wants a real measure of how well the model is. How do you know the model is truly accurate? I explain the use of test data etc. to evaluate the total error, however, there is always…
Shinobii
  • 419
  • 4
  • 10
13
votes
1 answer

Feature selection using feature importances in random forests with scikit-learn

I have plotted the feature importances in random forests with scikit-learn. In order to improve the prediction using random forests, how can I use the plot information to remove features? I.e. how to spot whether a feature is useless or even worse…
Franck Dernoncourt
  • 5,690
  • 10
  • 40
  • 76
13
votes
5 answers

In industry, what type of new data science algorithms does one develop?

I've seen several job descriptions for data science which include developing a novel algorithm to be a part of production environments. Can you give some input of what could be meant here exactly? Would they mean an algorithm that behaves somewhat…
Mariah
  • 338
  • 1
  • 9
13
votes
2 answers

Activation function between LSTM layers

I'm aware the LSTM cell uses both sigmoid and tanh activation functions internally, however when creating a stacked LSTM architecture does it make sense to pass their outputs through an activation function (e.g. ReLU)? So do we prefer this: model =…
lsfischer
  • 242
  • 1
  • 2
  • 8
13
votes
8 answers

I am a programmer, how do I get into field of Data Science?

First of all this term sounds so obscure. Anyways..I am a software programmer. One of the languages I can code is Python. Speaking of Data I can use SQL and can do Data Scraping. What I figured out so far after reading soo many articles that Data…
Volatil3
  • 341
  • 3
  • 10
13
votes
3 answers

Measuring performance of different classifiers with different sample sizes

I'm currently using several different classifiers on various entities extracted from text, and using precision/recall as a summary of how well each separate classifier performs across a given dataset. I'm wondering if there's a meaningful way of…
Dave Challis
  • 395
  • 2
  • 10
13
votes
3 answers

Are ontologies and the Semantic Web dead?

Is the Semantic Web dead? Are ontologies dead? I am developing a work plan for my thesis about "A knowledge base through a set ontology for interest groups around wetlands". I have been researching and developing ontologies for it but I am still…
13
votes
4 answers

Pandas change value of a column based another column condition

I have values in column1, I have columns in column2. What I want to achieve: Condition: where column2 == 2 leave to be 2 if column1 < 30 elsif change to 3 if column1 > 90. Here is what i did so far, the problem is 2 does not change to 3 where…
Koko
  • 213
  • 1
  • 2
  • 6
13
votes
6 answers

Datasets understanding best practices

I am a CS master student in data mining. My supervisor once told me that before I run any classifier or do anything with a dataset I must fully understand the data and make sure that the data is clean and correct. My questions: What are the best…
Jack Twain
  • 719
  • 1
  • 5
  • 7
13
votes
1 answer

How to know if a model is overfitting or underfitting by looking at graph

Just recently got my hands on tensorboard, but can you tell me what features should I look for in the graph (Accuracy and Validation Accuracy) And please do enlighten me about the concept of underfitting as well.
Nikhil.Nixel
  • 329
  • 1
  • 2
  • 10