Most Popular
1500 questions
12
votes
2 answers
Python Machine Learning/Data Science Project Structure
I'm looking for information on how should a Python Machine Learning project be organized. For Python usual projects there is Cookiecutter and for R ProjectTemplate.
This is my current folder structure, but I'm mixing Jupyter Notebooks with actual…
David Gasquez
- 221
- 2
- 6
12
votes
2 answers
Deep Learning with Spectrograms for sound recognition
I was looking into the possibility to classify sound (for example sounds of animals) using spectrograms. The idea is to use a deep convolutional neural networks to recognize segments in the spectrogram and output one (or many) class labels. This is…
user667804
- 271
- 3
- 6
11
votes
2 answers
ggvis vs. ggplot2+Shiny; which one to choose for interactive visualization?
There is a similar question here in CrossValidated, and I have read the answers. My question is a bit different. I don't want to merely visualize my data, and indeed what I want to visualize is not easy to visualize with either package.
I have two…
Shahin
- 271
- 1
- 9
11
votes
1 answer
Solutions for Continuous Online Cluster Identification?
Let me show you an example of a hypothetical online clustering application:
At time n points 1,2,3,4 are allocated to the blue cluster A and points b,5,6,7 are allocated to the red cluster B.
At time n+1 a new point a is introduced which is…
Raffael
- 211
- 1
- 6
11
votes
4 answers
Overfitting/Underfitting with Data set size
In the below graph,
x-axis => Data set Size
y-axis => Cross validation Score
Red line is for Training Data
Green line is for Testing Data
In a tutorial that I'm referring to, the author says that the point where the red line and the green…
tharindu_DG
- 315
- 1
- 3
- 9
11
votes
1 answer
applying word2vec on small text files
I'm totally new to word2vec so pls bear it with me. I have a set of text files each containing a set of tweets, between 1000-3000. I have chosen a common keyword ("kw1") and wants to find semantically relevant terms for "kw1" using word2vec. For…
samsamara
- 211
- 2
- 5
11
votes
3 answers
Can regression trees predict continuously?
Suppose I have a smooth function like $f(x, y) = x^2+y^2$. I have a training set $D \subsetneq \{((x, y), f(x,y)) | (x,y) \in \mathbb{R}^2\}$ and, of course, I don't know $f$ although I can evaluate $f$ wherever I want.
Are regression trees capable…
Martin Thoma
- 18,880
- 35
- 95
- 169
11
votes
3 answers
How do AI's learn to act when the problem space is too big
I learn best through experimentation and example. I'm learning about neural networks and have (what I think) is a pretty good understanding of classification and regression and also supervised and unsupervised learning, but I've stumbled upon…
FraserOfSmeg
- 363
- 1
- 10
11
votes
4 answers
Clustering for mixed numeric and nominal discrete data
My data includes survey responses that are binary (numeric) and nominal / categorical. All responses are discrete and at individual level.
Data is of shape (n=7219, p=105).
Couple things:
I am trying to identify a clustering technique with a…
kms
- 310
- 1
- 4
- 15
11
votes
5 answers
Covariate shift detection
Is there any standard approach for detecting the covariate shift between the training and test data ? This would be useful to validate the assumption that covariate shift exists in my database which contains a few hundred images.
Daniel Wonglee
- 191
- 1
- 4
11
votes
1 answer
Learning with Positive labels only
I have ~7 million rows of customer data (~500 sparse attributes)
A million out of them have opted in to a new service.
How do I use this signal to predict which of the remaining customers are likely to adopt the service? And how do I measure the…
Vivek Kalyanarangan
- 560
- 2
- 11
11
votes
2 answers
Does BERT has any advantage over GPT3?
I have read a couple of documents that explain in detail about the greater edge that GPT-3(Generative Pre-trained Transformer-3) has over BERT(Bidirectional Encoder Representation from Transformers). So am curious to know whether BERT scores better…
Bipin
- 213
- 1
- 2
- 8
11
votes
3 answers
Statistics + Computer Science = Data Science?
i want to become a data scientist. I studied applied statistics (actuarial science), so i have a great statistical background (regression, stochastic process, time series, just for mention a few). But now, I am going to do a master degree in…
user3643160
- 163
- 6
11
votes
3 answers
Data visualization for pattern analysis (language-independent, but R preferred)
I want to plot the bytes from a disk image in order to understand a pattern in them. This is mainly an academic task, since I'm almost sure this pattern was created by a disk testing program, but I'd like to reverse-engineer it anyway.
I already…
Valmiky Arquissandas
- 213
- 1
- 8
11
votes
2 answers
Finding optimal threshold in multi-class classification task
In a binary classification problem, it is easy to find the optimal threshold (F1) by setting different thresholds, evaluating them and picking the one with the highest F1. Similarly is there a proper way to find optimal thresholds for all the…
saiRegrefree
- 146
- 1
- 4