Most Popular
1500 questions
14
votes
5 answers
Advantages of pandas dataframe to regular relational database
In Data Science, many seem to be using pandas dataframes as the datastore. What are the features of pandas that make it a superior datastore compared to regular relational databases like MySQL, which are used to store data in many other fields of…
Simon Boehm
- 371
- 1
- 2
- 12
14
votes
4 answers
Detecting anomalies with neural network
I have a large multi dimensional dataset that is generated each day.
What would be a good approach to detect any kind of 'anomaly' as compared with previous days? Is this a suitable problem that could be addressed with neural networks?
Any…
Nickpick
- 661
- 2
- 7
- 18
14
votes
1 answer
Forget Layer in a Recurrent Neural Network (RNN) -
I'm trying to figure out the dimensions of each variables in an RNN in the forget layer, however, I'm not sure if I'm on the right track. The next picture and equation is from Colah's blog post "Understanding LSTM Networks":
where:
$x_t$ is input…
user1157751
- 689
- 1
- 8
- 22
14
votes
4 answers
Do neural networks have explainability like decision trees do?
In Decision Trees, we can understand the output of the tree structure and we can also visualize how the Decision Tree makes decisions. So decision trees have explainability (their output can be explained easily.)
Do we have explainability in Neural…
navya
- 141
- 5
14
votes
2 answers
How to train model to predict events 30 minutes prior, from multi-dimensionnal timeseries
Experts in my field are capable of predicting the likelyhood an event (binary spike in yellow) 30 minutes before it occurs. Frequency here is 1 sec, this view represents a few hours worth of data, i have circled in black where "malicious" pattern…
William D
- 143
- 1
- 6
14
votes
3 answers
How can I make big confusion matrices easier to read?
I have recently published a dataset (link) with 369 classes. I ran a couple of experiments on them to get a feeling for how difficult the classification task is. Usually, I like it if there are confusion matrices to see the type of error being made.…
Martin Thoma
- 18,880
- 35
- 95
- 169
14
votes
1 answer
Heatmap on a map in Python
Mode Analytics has a nice heatmap feature, but it is not conducive to comparing maps (only one per report).
What they do allow is data to be pulled easily into a wrapped python notebook. And then any image in python can easily be added to a…
ScottieB
- 323
- 1
- 2
- 8
14
votes
3 answers
Replace all numeric values in a pyspark dataframe by a constant value
Consider a pyspark dataframe consisting of 'null' elements and numeric elements. In general, the numeric elements have different values. How is it possible to replace all the numeric values of the dataframe by a constant numeric value (for example…
justus
- 141
- 1
- 1
- 4
14
votes
3 answers
How to use RBM for classification?
At the moment I'm playing with Restricted Boltzmann Machines and since I'm at it I would like try to classify handwritten digits with it.
The model I created is now a quite fancy generative model but I don't know how to go further with it.
In this…
Stefan Falk
- 243
- 1
- 2
- 7
14
votes
3 answers
How are deep-learning NNs different now (2016) from the ones I studied just 4 years ago (2012)?
It is said in Wikipedia and deeplearning4j that Deep-learning NN (DLNN) are NN that have >1 hidden layer.
These kind of NN were standard at university for me, while DLNN are very hyped right now. Been there, done that - what's the big deal?
I heard…
Make42
- 752
- 2
- 8
- 18
14
votes
3 answers
Why convolute if Max Pooling is just going to downsample the image anyway?
The idea of applying filters to do something like identify edges, is a pretty cool idea.
For example, you can take an image of a 7. With some filters, you can end up with transformed images that emphasize different characteristics of the original…
Monica Heddneck
- 477
- 2
- 7
- 14
14
votes
5 answers
Feature importance with scikit-learn Random Forest shows very high Standard Deviation
I am using scikit-learn Random Forest Classifier and I want to plot the feature importance such as in this example.
However my result is completely different, in the sense that feature importance standard deviation is almost always bigger than…
gc5
- 879
- 2
- 9
- 17
14
votes
2 answers
Validation loss and accuracy remain constant
I am trying to implement this paper on a set of medical images. I am doing it in Keras. The network essentially consists of 4 conv and max-pool layers followed by a fully connected layer and soft max classifier.
As far as I know, I have followed the…
pseudomonas
- 1,042
- 3
- 14
- 30
14
votes
1 answer
Recognize a grammar in a sequence of fuzzy tokens
I have text documents which contain mainly lists of Items.
Each Item is a group of several token from different types: FirstName, LastName, BirthDate, PhoneNumber, City, Occupation, etc.
A token is a group of words.
Items can lie on several…
OoDeLally
- 241
- 1
- 3
14
votes
4 answers
Import csv file contents into pyspark dataframes
How can I import a .csv file into pyspark dataframes? I even tried to read csv file in Pandas and then convert it to a spark dataframe using createDataFrame, but it is still showing some error. Can someone guide me through this? Also, please tell me…
neha
- 141
- 1
- 1
- 4