Most Popular
1500 questions
18
votes
5 answers
Open source data science projects to contribute
Contribution into open source projects is typically a good way to get some practice for newbies, and try a new area for experienced data scientists and analysts.
Which projects do you contribute? Please provide some intro + link on Github.
IgorS
- 5,474
- 11
- 31
- 43
18
votes
2 answers
How should ethics be applied in data science
There was a recent furore with facebook experimenting on their users to see if they could alter user's emotions and now okcupid.
Whilst I am not a professional data scientist I read about data science ethics from Cathy O'Neill's book 'Doing Data…
EdChum
- 355
- 1
- 10
18
votes
5 answers
Anconda R version - How to upgrade to 4.0 and later
I use R through the anaconda navigator, which manages all my package installations. I need to use qgraph for a project, which is dependent on mnormt library, which in turn needs RStudio verion >4.0
I think the solution to my problem would be to…
Saranya Prakash
- 183
- 1
- 1
- 4
18
votes
3 answers
What is the proper way to use early stopping with cross-validation?
I am not sure what is the proper way to use early stopping with cross-validation for a gradient boosting algorithm. For a simple train/valid split, we can use the valid dataset as the evaluation dataset for the early stopping and when refitting we…
Amine SOUIKI
- 181
- 1
- 4
18
votes
7 answers
Why does Keras need TensorFlow as backend?
Why does Keras need the TensorFlow engine? I am not getting correct directions on why we need Keras. We can use TensorFlow to build a neural network model, but why do most people use Keras with TensorFlow as backend?
star
- 1,471
- 7
- 19
- 29
18
votes
5 answers
Downloading a large dataset on the web directly into AWS S3
Does anyone know if it's possible to import a large dataset into Amazon S3 from a URL?
Basically, I want to avoid downloading a huge file and then reuploading it to S3 through the web portal. I just want to supply the download URL to S3 and wait…
Will Stedden
- 183
- 1
- 1
- 5
18
votes
2 answers
K-means vs. online K-means
K-means is a well known algorithm for clustering, but there is also an online variation of such algorithm (online K-means). What are the pros and cons of these approaches, and when should each be preferred?
Rubens
- 4,107
- 5
- 23
- 42
18
votes
3 answers
How to determine feature importance in a neural network?
I have a neural network to solve a time series forecasting problem. It is a sequence-to-sequence neural network and currently it is trained on samples each with ten features. The performance of the model is average and I would like to investigate…
Aesir
- 458
- 1
- 6
- 15
18
votes
3 answers
When should I use StandardScaler and when MinMaxScaler?
I have a feature vector with One-Hot-Encoded features and with continous features.
How can I decide now, which data I shall scale with StandardScaler and which data scale with MinMaxScaler? I think I do not have to scale the one-hot-encoded anyway…
jochen6677
- 591
- 2
- 4
- 9
18
votes
3 answers
Feature Scaling both training and test data
It is stated that for:
Feature Normalization -
The test set must use identical scaling to the training set.
And the point is given that:
Do not scale the training and test sets using different scalars: this
could lead to random skew in the…
aspiring1
- 377
- 1
- 2
- 13
18
votes
7 answers
Interactive labeling/annotating of time series data
I have a data set of time series data. I'm looking for an annotation (or labeling) tool to visualize it and to be able to interactively add labels on it, in order to get annotated data that I can use for supervised ML.
E.g. the input data is a…
mibrl12
- 283
- 1
- 2
- 5
18
votes
4 answers
XGBoost outputs tend towards the extremes
I am currently using XGBoost for risk prediction, it seems to be doing a good job in the binary classification department but the probability outputs are way off, i.e., changing the value of a feature in an observation by a very small amount can…
alwayslearning
- 181
- 4
18
votes
2 answers
How to plot two columns of single DataFrame on Y axis
I have two data frames (Action, Comedy). Action contains two columns (year, rating) ratings columns contains average rating with respect to year. The Comedy data frame contains the same two columns with different mean values.
I merged both data…
Bilal Butt
- 291
- 1
- 2
- 4
18
votes
4 answers
One hot encoding alternatives for large categorical values
I have a data frame with large categorical values over 1600 categories. Is there any way I can find alternatives so that I don't have over 1600 columns?
I found this interesting link.
But they are converting to class/object which I don't want. I…
vinaykva
- 283
- 1
- 2
- 7
18
votes
2 answers
Number and size of dense layers in a CNN
Most networks I've seen have one or two dense layers before the final softmax layer.
Is there any principled way of choosing the number and size of the dense layers?
Are two dense layers more representative than one, for the same number of…
geometrikal
- 533
- 1
- 5
- 14