Most Popular
1500 questions
29
votes
1 answer
NLP - why is "not" a stop word?
I am trying to remove stop words before performing topic modeling. I noticed that some negation words (not, nor, never, none etc..) are usually considered to be stop words. For example, NLTK, spacy and sklearn include "not" on their stop word lists.…
E.K.
- 415
- 4
- 6
29
votes
7 answers
Difference between AlphaGo's policy network and value network
I was reading a high level summary about Google's AlphaGo, and I came across the terms "policy network" and "value network". At a high level, I understand that the policy network is used to suggest moves and the value network is used to, "Reduce the…
Ryan Zotti
- 4,149
- 3
- 19
- 32
28
votes
2 answers
Predicting a word using Word2vec model
Given a sentence:
"When I open the ?? door it starts heating automatically"
I would like to get the list of possible words in ?? with a probability.
The basic concept used in word2vec model is to "predict" a word given surrounding context.
Once the…
DED
- 345
- 1
- 3
- 7
28
votes
2 answers
Removing strings after a certain character in a given text
I have a dataset like the one below. I would like to remove all characters after the character ©. How can I do that in R?
data_clean_phrase <- c("Copyright © The Society of Geomagnetism and Earth",
"© 2013 Chinese National Committee…
Hamideh
- 940
- 2
- 12
- 22
28
votes
3 answers
Data Science Project Ideas
I don't know if this is a right place to ask this question, but a community dedicated to Data Science should be the most appropriate place in my opinion.
I have just started with Data Science and Machine learning. I am looking for long term project…
Kevin Desai
- 383
- 1
- 3
- 4
28
votes
4 answers
What makes columnar databases suitable for data science?
What are some of the advantages of columnar data-stores which make them more suitable for data science and analytics?
Dawny33
- 8,296
- 12
- 48
- 104
28
votes
8 answers
Visualizing a graph with a million vertices
What is the best tool to use to visualize (draw the vertices and edges) a graph with 1000000 vertices? There are about 50000 edges in the graph. And I can compute the location of individual vertices and edges.
I am thinking about writing a program…
Cici
- 453
- 1
- 4
- 10
28
votes
4 answers
Scikit-learn: Getting SGDClassifier to predict as well as a Logistic Regression
A way to train a Logistic Regression is by using stochastic gradient descent, which scikit-learn offers an interface to.
What I would like to do is take a scikit-learn's SGDClassifier and have it score the same as a Logistic Regression here.…
hlin117
- 685
- 1
- 8
- 11
28
votes
5 answers
VM image for data science projects
As there are numerous tools available for data science tasks, and it's cumbersome to install everything and build up a perfect system.
Is there a Linux/Mac OS image with Python, R and other open-source data science tools installed and available for…
JeanVuda
- 421
- 4
- 6
28
votes
2 answers
Keras vs. tf.keras
I'm a bit confused in choosing between Keras (keras-team/keras) and tf.keras (tensorflow/tensorflow/python/keras/) for my new research project.
There is a debate that Keras isn't owned by anyone, so people are happier to contribute in and it'll be…
Mo-
- 1,255
- 1
- 10
- 26
28
votes
7 answers
Publicly available social network datasets/APIs
As an extension to our great list of publicly available datasets, I'd like to know if there is any list of publicly available social network datasets/crawling APIs. It would be very nice if alongside with a link to the dataset/API, characteristics…
Rubens
- 4,107
- 5
- 23
- 42
28
votes
2 answers
What is the advantage of using log softmax instead of softmax?
Are there any advantages to using log softmax over softmax? What are the reasons to choose one over the other?
rawwar
- 861
- 2
- 12
- 23
28
votes
1 answer
Adaboost vs Gradient Boosting
How is AdaBoost different from a Gradient Boosting algorithm since both of them use a Boosting technique?
I could not figure out actual difference between these both algorithms from a theory point of view.
CodeMaster GoGo
- 778
- 1
- 6
- 15
28
votes
3 answers
What is the exact definition of VC dimension?
I'm studying machine learning from Andrew Ng Stanford lectures and just came across the theory of VC dimensions. According to the lectures and what I understood, the definition of VC dimension can be given as,
If you can find a set of $n$ points,…
Kaushal28
- 383
- 1
- 3
- 6
28
votes
3 answers
What does "baseline" mean in the context of machine learning?
What does "baseline" mean in the context of machine learning and data science?
Someone wrote me:
Hint: An appropriate baseline will give an RMSE of approximately 200.
I don't get this. Does he mean that if my predictive model on the training data…
Meiiso
- 411
- 1
- 4
- 7