Most Popular

1500 questions
12
votes
2 answers

Python Machine Learning/Data Science Project Structure

I'm looking for information on how should a Python Machine Learning project be organized. For Python usual projects there is Cookiecutter and for R ProjectTemplate. This is my current folder structure, but I'm mixing Jupyter Notebooks with actual…
David Gasquez
  • 221
  • 2
  • 6
12
votes
2 answers

Deep Learning with Spectrograms for sound recognition

I was looking into the possibility to classify sound (for example sounds of animals) using spectrograms. The idea is to use a deep convolutional neural networks to recognize segments in the spectrogram and output one (or many) class labels. This is…
user667804
  • 271
  • 3
  • 6
11
votes
2 answers

ggvis vs. ggplot2+Shiny; which one to choose for interactive visualization?

There is a similar question here in CrossValidated, and I have read the answers. My question is a bit different. I don't want to merely visualize my data, and indeed what I want to visualize is not easy to visualize with either package. I have two…
Shahin
  • 271
  • 1
  • 9
11
votes
1 answer

Solutions for Continuous Online Cluster Identification?

Let me show you an example of a hypothetical online clustering application: At time n points 1,2,3,4 are allocated to the blue cluster A and points b,5,6,7 are allocated to the red cluster B. At time n+1 a new point a is introduced which is…
Raffael
  • 211
  • 1
  • 6
11
votes
4 answers

Overfitting/Underfitting with Data set size

In the below graph, x-axis => Data set Size y-axis => Cross validation Score Red line is for Training Data Green line is for Testing Data In a tutorial that I'm referring to, the author says that the point where the red line and the green…
tharindu_DG
  • 315
  • 1
  • 3
  • 9
11
votes
1 answer

applying word2vec on small text files

I'm totally new to word2vec so pls bear it with me. I have a set of text files each containing a set of tweets, between 1000-3000. I have chosen a common keyword ("kw1") and wants to find semantically relevant terms for "kw1" using word2vec. For…
samsamara
  • 211
  • 2
  • 5
11
votes
3 answers

Can regression trees predict continuously?

Suppose I have a smooth function like $f(x, y) = x^2+y^2$. I have a training set $D \subsetneq \{((x, y), f(x,y)) | (x,y) \in \mathbb{R}^2\}$ and, of course, I don't know $f$ although I can evaluate $f$ wherever I want. Are regression trees capable…
Martin Thoma
  • 18,880
  • 35
  • 95
  • 169
11
votes
3 answers

How do AI's learn to act when the problem space is too big

I learn best through experimentation and example. I'm learning about neural networks and have (what I think) is a pretty good understanding of classification and regression and also supervised and unsupervised learning, but I've stumbled upon…
FraserOfSmeg
  • 363
  • 1
  • 10
11
votes
4 answers

Clustering for mixed numeric and nominal discrete data

My data includes survey responses that are binary (numeric) and nominal / categorical. All responses are discrete and at individual level. Data is of shape (n=7219, p=105). Couple things: I am trying to identify a clustering technique with a…
kms
  • 310
  • 1
  • 4
  • 15
11
votes
5 answers

Covariate shift detection

Is there any standard approach for detecting the covariate shift between the training and test data ? This would be useful to validate the assumption that covariate shift exists in my database which contains a few hundred images.
Daniel Wonglee
  • 191
  • 1
  • 4
11
votes
1 answer

Learning with Positive labels only

I have ~7 million rows of customer data (~500 sparse attributes) A million out of them have opted in to a new service. How do I use this signal to predict which of the remaining customers are likely to adopt the service? And how do I measure the…
11
votes
2 answers

Does BERT has any advantage over GPT3?

I have read a couple of documents that explain in detail about the greater edge that GPT-3(Generative Pre-trained Transformer-3) has over BERT(Bidirectional Encoder Representation from Transformers). So am curious to know whether BERT scores better…
Bipin
  • 213
  • 1
  • 2
  • 8
11
votes
3 answers

Statistics + Computer Science = Data Science?

i want to become a data scientist. I studied applied statistics (actuarial science), so i have a great statistical background (regression, stochastic process, time series, just for mention a few). But now, I am going to do a master degree in…
user3643160
  • 163
  • 6
11
votes
3 answers

Data visualization for pattern analysis (language-independent, but R preferred)

I want to plot the bytes from a disk image in order to understand a pattern in them. This is mainly an academic task, since I'm almost sure this pattern was created by a disk testing program, but I'd like to reverse-engineer it anyway. I already…
11
votes
2 answers

Finding optimal threshold in multi-class classification task

In a binary classification problem, it is easy to find the optimal threshold (F1) by setting different thresholds, evaluating them and picking the one with the highest F1. Similarly is there a proper way to find optimal thresholds for all the…
saiRegrefree
  • 146
  • 1
  • 4