Highest Voted Questions - Data Science Stack Exchange

12

votes

2 answers

Why is training take so long on my GPU?

Details: GPU: GTX 1080 Training: ~1.1 Million images belonging to 10 classes Validation: ~150 Thousand images belonging to 10 classes Time per Epoch: ~10 hours I've setup CUDA, cuDNN and Tensorflow( Tensorflow GPU as well). I don't think my model is…

asked Jan 02 '18 at 10:58

Rahul

121
1
5

12

votes

3 answers

Initialize perceptron weights with zero

I'm new to datascience so please just don't blast me. In a text book i found: Now, the reason we don't initialize the weights to zero is that the learning rate (eta) only has an effect on the classification outcome if the weights are…

asked Dec 30 '17 at 11:56

Poiera

451
1
5
9

12

votes

1 answer

How do I implement the sigmoid function in Octave?

so given that the sigmoid function is defined as hθ(x) = g(θ^(T)x), how can I implement this funcion in Octave given that g = zeros(size(z)) ?

asked Dec 19 '17 at 16:43

Shuryu Kisuke

223
1
2
5

12

votes

2 answers

Predict task duration

I'm trying to create a regression model that predicts the duration of a task. The training data I have consists of roughly 40 thousand completed tasks with these variables: Who performed the task (~250 different people) What part (subproject) of…

asked Nov 30 '17 at 12:48

Jurgy

238
2
11

12

votes

6 answers

How to get the number of syllables in a word?

I have already gone through this post which uses nltk's cmudict for counting the number of syllables in a word: from nltk.corpus import cmudict d = cmudict.dict() def nsyl(word): return [len(list(y for y in x if y[-1].isdigit())) for x in…

nlp

asked Sep 28 '17 at 06:04

Dawny33

8,296
12
48
104

12

votes

2 answers

Tradeoffs between Storm and Hadoop (MapReduce)

Can someone kindly tell me about the trade-offs involved when choosing between Storm and MapReduce in Hadoop Cluster for data processing? Of course, aside from the obvious one, that Hadoop (processing via MapReduce in a Hadoop Cluster) is a batch…

asked Jun 01 '14 at 10:25

mbbce

347
2
8

12

votes

1 answer

Can HDF5 be reliably written to and read from simultaneously by separate python processes?

I'm writing a script to record live data over time into a single HDF5 file which includes my whole dataset for this project. I'm working with Python 3.6 and decided to create a command line tool using click to gather the data. My concern is what…

asked Aug 17 '17 at 11:59

basse

297
3
8

12

votes

1 answer

What feature engineering is necessary with tree based algorithms?

I understand data hygiene, which is probably the most basic feature engineering. That is making sure all your data is properly loaded, making sure N/As are treated as a special value rather than a number between -1 and 1, and tagging your…

asked Aug 08 '17 at 15:00

William Entriken

423
1
4
10

12

votes

3 answers

Find the consecutive zeros in a DataFrame and do a conditional replacement

I have a dataset like this: Sample Dataframe import pandas as pd df = pd.DataFrame({ 'names': ['A','B','C','D','E','F','G','H','I','J','K','L'], 'col1': [0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0], 'col2': [0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0,…

asked Jul 20 '17 at 19:43

Kevin

533
2
5
12

12

votes

3 answers

Instances vs. cores when using EC2

Working on what could often be called "medium data" projects, I've been able to parallelize my code (mostly for modeling and prediction in Python) on a single system across anywhere from 4 to 32 cores. Now I'm looking at scaling up to clusters on…

asked May 23 '14 at 19:45

Therriault

871
1
8
13

12

votes

3 answers

Relation between convolution in math and CNN

I've read explanation of convolution and understand it to some extent. Can somebody help me understand how this operation relates to convolution in Convolutional Neural Nets? Is filter like function g which applies weight?

asked Jun 27 '17 at 14:23

noname7619

323
2
9

12

votes

3 answers

Xgboost - How to use feature_importances_ with XGBRegressor()?

How could we get feature_importances when we are performing regression with XGBRegressor()? There is something like XGBClassifier().feature_importances_?

asked Jun 21 '17 at 15:33

Simone

705
1
14
23

12

votes

2 answers

What is the feature matrix in word2vec?

I'm a beginner in neural networks and currently I'm exploring the word2vec model. However I'm having a tough time to understand what the feature matrix exactly is. I can understand that the first matrix is a one-hot encoding vector for a given…

asked Jun 21 '17 at 07:00

Satrajit Maitra

121
1
4

12

votes

4 answers

How to know the model has started overfitting?

I hope the following excerpts will provide an insight into what my question is going to be. These are from here. The learning then gradually slows down. Finally, at around epoch 280 the classification accuracy pretty much stops improving. Later…

asked May 22 '17 at 21:00

figs_and_nuts

833
1
5
14

12

votes

2 answers

Naming conventions for dataframes

I often find myself writing code like the following (oversimplfied example) df = read_csv('customer_data_export.csv') df2 = df.query("date > '2017-01-10'") data = df_filtered.groupby('transaction_id').sum() plot_data = pivot_table(data,…

asked May 15 '17 at 23:33

Max Flander

316
1
2
7

Most Popular