Most Popular
1500 questions
10
votes
1 answer
With regards to VC-dimension, why can you shatter 3 points with circles but not 4 points?
When using VC-dimensions to estimate the capability of a binary classifier, you can find 3 points in R2 that can be shattered, e.g.:
But you can not shatter any 4 points with a circle.
This is stated in these lecture notes. Can anyone give me an…
Brian Spiering
- 21,136
- 2
- 26
- 109
10
votes
1 answer
How should one deal with implicit data in recommendation
A recommendation system keeps a log of what recommendations have been made to a particular user and whether that user accepts the recommendation. It's like
user_id item_id result
1 4 1
1 7 -1
5 19 1
5 80 …
wdg
- 203
- 1
- 6
10
votes
1 answer
How to determine the complexity of an English sentence?
I am working on an app to help people learn English as a second language. I have validated that sentences help in learning a language by providing extra context. I did that by conducting a small research in a classroom of 60 students.
I have mined…
BuildMyVocab
- 103
- 7
10
votes
2 answers
Train object detection without annotated data/bounding boxes
From what I can see most object detection NNs (Fast(er) R-CNN, YOLO etc) are trained on data including bounding boxes indicating where in the picture the objects are localised.
Are there algos that simply take the full picture + label annotations,…
salient
- 203
- 1
- 2
- 6
10
votes
1 answer
t-SNE: Why equal data values are visually not close?
I have 200 data points that have the same values on all features.
After t-SNE dimension reduction they doesn't look so equal anymore, just like this:
Why aren't they on the same point in the visualization and even seems to be distributed in two…
ScientiaEtVeritas
- 201
- 1
- 7
10
votes
4 answers
Classify multivariate time series
I have a set of data composed of time series (8 points) with about 40 dimensions (so each time series is 8 by 40). The corresponding ouput (the possible outcomes for the categories ) is eitheir 0 or 1.
What would be the best approach to design a…
AugBar
- 203
- 1
- 2
- 8
10
votes
1 answer
How to use TFIDF vectors with multinomial naive bayes?
Say we have used the TFIDF transform to encode documents into continuous-valued features.
How would we now use this as input to a Naive Bayes classifier?
Bernoulli naive-bayes is out, because our features aren't binary anymore.
Seems like we can't…
dhrumeel
- 201
- 2
- 4
10
votes
1 answer
What is the difference between fasttext and DANs in document classification?
I came across two interesting papers that describe promising approaches for document classification using word embedding.
1. The fasttext algorithm
Described in the paper Bag of Tricks for Efficient Text Classification here.
(With further…
user1043144
- 201
- 1
- 3
10
votes
6 answers
What are some of the best practices for sharing data and models with colleagues?
As a data scientist who recently joined a new team, I wanted to ask the community how they share data and models among their colleagues. Currently I have to resort to storing data in some central server or location where all of us can access (which…
asampat3090
- 81
- 1
- 6
10
votes
2 answers
Kernel trick explanation
In support vector machines, I understand it would be computationally prohibitive to calculate a basis function at every point in the data set. However, it is possible to find this optimal solution due to the so-called kernel trick.
Other answers to…
user1717828
- 245
- 3
- 9
10
votes
2 answers
Forecasting non-negative sparse time-series data
I have a time-series dataset (daily frequency) representing the sales of a product to a customer over time. The sales is represented as the following:
$$[0, 0, 0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 17, 0, 0, 0, 0, 9, 0, ...]$$
in which each…
Bernardo Aflalo
- 323
- 2
- 9
10
votes
1 answer
number of parameters for convolution layers
In this highly cited paper, authors give the following discussion on the number of weight parameters. I am not very clear why it has $49C^2$ parameters. I think it should be $49C$ since each of $C$ input channels shares the same filter, which has…
user297850
- 253
- 1
- 3
- 8
10
votes
1 answer
Is it possible to train a neural network to solve polynomial equations?
I randomly generate millions groups of triplet $\lbrace x_0, x_1, x_2 \rbrace$ within range $(0,1)$, then calculate the corresponding coefficients of the polynomial $(x-x_0)(x-x_1)(x-x_2)$, which result in triplet groups normalized in a form of…
Feng Wang
- 203
- 2
- 7
10
votes
2 answers
Why `max_features=n_features` does not make the Random Forest independent of number of trees?
Consider the following simple classification problem (Python, scikit-learn)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.ensemble import…
Jorge Leitao
- 253
- 2
- 9
10
votes
1 answer
Confused about how to apply KMeans on my a dataset with features extracted
I am trying to apply a basic use of the scikitlearn KMeans Clustering package, to create different clusters that I could use to identify a certain activity. For example, in my dataset below, I have different usage events (0,...,11), and each event…
Gary
- 529
- 2
- 5
- 12