Highest Voted Questions - Data Science Stack Exchange

12

votes

2 answers

Python Machine Learning/Data Science Project Structure

I'm looking for information on how should a Python Machine Learning project be organized. For Python usual projects there is Cookiecutter and for R ProjectTemplate. This is my current folder structure, but I'm mixing Jupyter Notebooks with actual…

python

asked Feb 08 '16 at 16:58

David Gasquez

221
2
6

12

votes

2 answers

Deep Learning with Spectrograms for sound recognition

I was looking into the possibility to classify sound (for example sounds of animals) using spectrograms. The idea is to use a deep convolutional neural networks to recognize segments in the spectrogram and output one (or many) class labels. This is…

asked Jan 29 '16 at 15:39

user667804

271
3
6

11

votes

2 answers

ggvis vs. ggplot2+Shiny; which one to choose for interactive visualization?

There is a similar question here in CrossValidated, and I have read the answers. My question is a bit different. I don't want to merely visualize my data, and indeed what I want to visualize is not easy to visualize with either package. I have two…

asked Jan 21 '16 at 14:47

Shahin

271
1
9

11

votes

1 answer

Solutions for Continuous Online Cluster Identification?

Let me show you an example of a hypothetical online clustering application: At time n points 1,2,3,4 are allocated to the blue cluster A and points b,5,6,7 are allocated to the red cluster B. At time n+1 a new point a is introduced which is…

asked Aug 14 '14 at 19:09

Raffael

211
1
6

11

votes

4 answers

Overfitting/Underfitting with Data set size

In the below graph, x-axis => Data set Size y-axis => Cross validation Score Red line is for Training Data Green line is for Testing Data In a tutorial that I'm referring to, the author says that the point where the red line and the green…

asked Jan 12 '16 at 09:57

tharindu_DG

315
1
3
9

11

votes

1 answer

applying word2vec on small text files

I'm totally new to word2vec so pls bear it with me. I have a set of text files each containing a set of tweets, between 1000-3000. I have chosen a common keyword ("kw1") and wants to find semantically relevant terms for "kw1" using word2vec. For…

asked Jan 10 '16 at 10:49

samsamara

211
2
5

11

votes

3 answers

Can regression trees predict continuously?

Suppose I have a smooth function like $f(x, y) = x^2+y^2$. I have a training set $D \subsetneq \{((x, y), f(x,y)) | (x,y) \in \mathbb{R}^2\}$ and, of course, I don't know $f$ although I can evaluate $f$ wherever I want. Are regression trees capable…

asked Dec 16 '15 at 11:39

Martin Thoma

18,880
35
95
169

11

votes

3 answers

How do AI's learn to act when the problem space is too big

I learn best through experimentation and example. I'm learning about neural networks and have (what I think) is a pretty good understanding of classification and regression and also supervised and unsupervised learning, but I've stumbled upon…

asked Dec 11 '15 at 23:43

FraserOfSmeg

363
1
10

11

votes

4 answers

Clustering for mixed numeric and nominal discrete data

My data includes survey responses that are binary (numeric) and nominal / categorical. All responses are discrete and at individual level. Data is of shape (n=7219, p=105). Couple things: I am trying to identify a clustering technique with a…

asked Nov 02 '15 at 04:12

kms

310
1
4
15

11

votes

5 answers

Covariate shift detection

Is there any standard approach for detecting the covariate shift between the training and test data ? This would be useful to validate the assumption that covariate shift exists in my database which contains a few hundred images.

asked Oct 02 '15 at 09:49

Daniel Wonglee

191
1
4

11

votes

1 answer

Learning with Positive labels only

I have ~7 million rows of customer data (~500 sparse attributes) A million out of them have opted in to a new service. How do I use this signal to predict which of the remaining customers are likely to adopt the service? And how do I measure the…

asked Sep 21 '20 at 20:08

Vivek Kalyanarangan

560
2
11

11

votes

2 answers

Does BERT has any advantage over GPT3?

I have read a couple of documents that explain in detail about the greater edge that GPT-3(Generative Pre-trained Transformer-3) has over BERT(Bidirectional Encoder Representation from Transformers). So am curious to know whether BERT scores better…

asked Sep 12 '20 at 04:37

Bipin

213
1
2
8

11

votes

3 answers

Statistics + Computer Science = Data Science?

i want to become a data scientist. I studied applied statistics (actuarial science), so i have a great statistical background (regression, stochastic process, time series, just for mention a few). But now, I am going to do a master degree in…

asked Jul 22 '14 at 08:39

user3643160

163
6

11

votes

3 answers

Data visualization for pattern analysis (language-independent, but R preferred)

I want to plot the bytes from a disk image in order to understand a pattern in them. This is mainly an academic task, since I'm almost sure this pattern was created by a disk testing program, but I'd like to reverse-engineer it anyway. I already…

asked Jul 19 '14 at 05:27

Valmiky Arquissandas

213
1
8

11

votes

2 answers

Finding optimal threshold in multi-class classification task

In a binary classification problem, it is easy to find the optimal threshold (F1) by setting different thresholds, evaluating them and picking the one with the highest F1. Similarly is there a proper way to find optimal thresholds for all the…

classification

asked Jul 06 '20 at 21:01

saiRegrefree

146
1
4

Most Popular