Most Popular

1500 questions
12
votes
2 answers

Why is training take so long on my GPU?

Details: GPU: GTX 1080 Training: ~1.1 Million images belonging to 10 classes Validation: ~150 Thousand images belonging to 10 classes Time per Epoch: ~10 hours I've setup CUDA, cuDNN and Tensorflow( Tensorflow GPU as well). I don't think my model is…
Rahul
  • 121
  • 1
  • 5
12
votes
3 answers

Initialize perceptron weights with zero

I'm new to datascience so please just don't blast me. In a text book i found: Now, the reason we don't initialize the weights to zero is that the learning rate (eta) only has an effect on the classification outcome if the weights are…
Poiera
  • 451
  • 1
  • 5
  • 9
12
votes
1 answer

How do I implement the sigmoid function in Octave?

so given that the sigmoid function is defined as hθ(x) = g(θ^(T)x), how can I implement this funcion in Octave given that g = zeros(size(z)) ?
Shuryu Kisuke
  • 223
  • 1
  • 2
  • 5
12
votes
2 answers

Predict task duration

I'm trying to create a regression model that predicts the duration of a task. The training data I have consists of roughly 40 thousand completed tasks with these variables: Who performed the task (~250 different people) What part (subproject) of…
Jurgy
  • 238
  • 2
  • 11
12
votes
6 answers

How to get the number of syllables in a word?

I have already gone through this post which uses nltk's cmudict for counting the number of syllables in a word: from nltk.corpus import cmudict d = cmudict.dict() def nsyl(word): return [len(list(y for y in x if y[-1].isdigit())) for x in…
Dawny33
  • 8,296
  • 12
  • 48
  • 104
12
votes
2 answers

Tradeoffs between Storm and Hadoop (MapReduce)

Can someone kindly tell me about the trade-offs involved when choosing between Storm and MapReduce in Hadoop Cluster for data processing? Of course, aside from the obvious one, that Hadoop (processing via MapReduce in a Hadoop Cluster) is a batch…
mbbce
  • 347
  • 2
  • 8
12
votes
1 answer

Can HDF5 be reliably written to and read from simultaneously by separate python processes?

I'm writing a script to record live data over time into a single HDF5 file which includes my whole dataset for this project. I'm working with Python 3.6 and decided to create a command line tool using click to gather the data. My concern is what…
basse
  • 297
  • 3
  • 8
12
votes
1 answer

What feature engineering is necessary with tree based algorithms?

I understand data hygiene, which is probably the most basic feature engineering. That is making sure all your data is properly loaded, making sure N/As are treated as a special value rather than a number between -1 and 1, and tagging your…
William Entriken
  • 423
  • 1
  • 4
  • 10
12
votes
3 answers

Find the consecutive zeros in a DataFrame and do a conditional replacement

I have a dataset like this: Sample Dataframe import pandas as pd df = pd.DataFrame({ 'names': ['A','B','C','D','E','F','G','H','I','J','K','L'], 'col1': [0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0], 'col2': [0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0,…
Kevin
  • 533
  • 2
  • 5
  • 12
12
votes
3 answers

Instances vs. cores when using EC2

Working on what could often be called "medium data" projects, I've been able to parallelize my code (mostly for modeling and prediction in Python) on a single system across anywhere from 4 to 32 cores. Now I'm looking at scaling up to clusters on…
Therriault
  • 871
  • 1
  • 8
  • 13
12
votes
3 answers

Relation between convolution in math and CNN

I've read explanation of convolution and understand it to some extent. Can somebody help me understand how this operation relates to convolution in Convolutional Neural Nets? Is filter like function g which applies weight?
noname7619
  • 323
  • 2
  • 9
12
votes
3 answers

Xgboost - How to use feature_importances_ with XGBRegressor()?

How could we get feature_importances when we are performing regression with XGBRegressor()? There is something like XGBClassifier().feature_importances_?
Simone
  • 705
  • 1
  • 14
  • 23
12
votes
2 answers

What is the feature matrix in word2vec?

I'm a beginner in neural networks and currently I'm exploring the word2vec model. However I'm having a tough time to understand what the feature matrix exactly is. I can understand that the first matrix is a one-hot encoding vector for a given…
Satrajit Maitra
  • 121
  • 1
  • 4
12
votes
4 answers

How to know the model has started overfitting?

I hope the following excerpts will provide an insight into what my question is going to be. These are from here. The learning then gradually slows down. Finally, at around epoch 280 the classification accuracy pretty much stops improving. Later…
figs_and_nuts
  • 833
  • 1
  • 5
  • 14
12
votes
2 answers

Naming conventions for dataframes

I often find myself writing code like the following (oversimplfied example) df = read_csv('customer_data_export.csv') df2 = df.query("date > '2017-01-10'") data = df_filtered.groupby('transaction_id').sum() plot_data = pivot_table(data,…
Max Flander
  • 316
  • 1
  • 2
  • 7