Highest Voted Questions - Data Science Stack Exchange

11

votes

2 answers

Counting indexes in pandas

I feel like this is a rudimentary question but I'm very new to this and just haven't been able to crack it / find the answer. Ultimately what I'm trying to do here is to count unique values on a certain column and then determine which of those…

asked Nov 08 '16 at 19:00

Mr. Hasquestions

113
1
1
6

11

votes

1 answer

Multiple Categorical values for a single feature how to convert them to binary using python

I have a data set of movies which has 28 columns. One of them is genres. For each row in this data set, the value for column genres is of the form "Action|Animation|Comedy|Family|Fantasy". I want to encode them using pandas.get_dummies() but since…

asked Oct 31 '16 at 12:14

aks_Nin

111
1
1
4

11

votes

4 answers

Feature selection and classification accuracy relation

One of the methodology to select a subset of your available features for your classifier is to rank them according to a criterion (such as information gain) and then calculate the accuracy using your classifier and a subset of the ranked…

asked Oct 24 '16 at 13:13

Pauline

113
1
1
6

11

votes

2 answers

Unable to figure out the linear embedding layer in the convolutional neural network?

I have the network architecture from the paper "learning fine-grained image similarity with deep ranking" and I am unable to figure out how the output from the three parallel network is merged using the linear embedding layer. The only information…

asked Oct 07 '16 at 06:16

A. Sam

233
1
6

11

votes

2 answers

Earlystopping in multi-output deep learning

When working with a neural network with more than one output, what is generally advised as the best strategy for early-stopping the training process? Given that I am currently monitoring the net validation loss (validation loss from n different…

asked Sep 13 '16 at 17:37

didgeridoo92

111
1
4

11

votes

2 answers

What does Negative Log Likelihood mean?

I have a data set which has continuous independent variables and a continuous dependent variable. To predict the dependent variable using the independent variables, I've run an ensemble of regression models and tried to compare them against each…

asked Sep 02 '16 at 19:36

Minu

805
2
9
18

11

votes

1 answer

What is the significance of model merging in Keras?

I have learned that Keras has a functionality to "merge" two models according to the following: from keras.layers import Merge left_branch = Sequential() left_branch.add(Dense(32, input_dim=784)) right_branch =…

keras

asked Aug 15 '16 at 09:23

Hendrik

8,587
17
42
55

11

votes

3 answers

Can map-reduce algorithms written for MongoDB be ported to Hadoop later?

In our company, we have a MongoDB database containing a lot of unstructured data, on which we need to run map-reduce algorithms to generate reports and other analyses. We have two approaches to select from for implementing the required…

asked May 18 '14 at 12:03

Amir Ali Akbari

1,393
3
13
25

11

votes

1 answer

Calculate cosine similarity in Apache Spark

I have a DataFrame with IDF of certain words computed. For example (10,[0,1,2,3,4,5],[0.413734499590671,0.4244680552337798,0.4761400657781007, 1.4004620708967006,0.37876590175292424,0.48374466516332]) .... and so on Now give a query Q, I can…

asked Aug 10 '16 at 05:43

Ganesh Krishnan

243
1
2
6

11

votes

3 answers

Is TensorFlow a complete Machine Learning Library?

I am new to TensorFlow and I need to understand the capabilities and shortcomings of TensorFlow before I can use it. I know that it is a deep learning framework, but apart from that which other machine learning algorithms can we use with tensor…

machine-learning

asked Jul 21 '16 at 18:40

Swaroop

213
1
2
6

11

votes

2 answers

Book keeping of experiment runs and results

I am a hands on researcher and I like testing out viable solutions, so I tend to run a lot of experiments. For example, if I am calculating a similarity score between documents, I might want to try out many measures. In fact, for each measure I…

asked Oct 05 '14 at 06:25

machine-wisdom

113
5

11

votes

3 answers

How can I classify text considering word order, instead of just using a bag-of-words approach?

I've made a Naive Bayes classifier that uses the bag-of-words technique to classify spam posts on a message board. It works, but I think I could get much better results if my models considered the word orderings and phrases. (ex: 'girls' and 'live'…

classification

asked Oct 02 '14 at 23:15

Yerk

211
1
5

11

votes

4 answers

Choosing regularization method in neural networks

When training neural networks, there are at least 4 ways to regularize the network: L1 Regularization L2 Regularization Dropout Batch Normalization plus of course other things like weight sharing and reducing the number of connections, which…

asked May 25 '16 at 05:08

Thomas Johnson

665
1
7
11

11

votes

2 answers

How much time do scikit classifiers take to classify?

I am planning to use scikit linear support vector machine (SVM) classifier for text classification on a corpus consisting of 1 million labeled documents. What I am planning to do is, when a user enters some keyword, the classifier will first…

asked Oct 01 '14 at 13:26

user3498

111
2

11

votes

7 answers

ChatGPT's Architecture - Decoder Only? Or Encoder-Decoder?

Does ChatGPT use an encoder-decoder architecture, or a decoder-only architecture? I have been coming across Medium and TowardsDataScience articles suggesting that it has an encoder-decoder architecture (see sources below): --…

asked Feb 03 '23 at 08:57

user141493

251
1
3
9

Most Popular