Questions tagged [dimensionality-reduction]

Techniques for reducing a large number of variables or dimensions spanned by data to a smaller number of dimensions while preserving as much information about the data as possible. Prominent methods include PCA, Factor Analysis, MDS, Independent Component Analysis, Multiple Correspondence Analysis, Isomap, etc. The two main subclasses of techniques: feature extraction and feature selection.

1106 questions
8
votes
1 answer

Why is the curse of dimensionality also called the empty space phenomenon?

The curse of dimensionality refers to the fact that the huge number of correlated features tends to increase the complexity of the treatment that has to be applied to the data set. This is also called the empty space phenomenon. So, does anyone know…
7
votes
5 answers

When should dimensional-reduction be used?

Yesterday I asked this question in which I had 180 subjects with 500 features each. While I was sure that dimensional-reduction is a must in this case (500 features), most of the answers I got said that 500 are not too many. So, My question is: Is…
Dov
  • 1,810
5
votes
2 answers

How to evaluate dimension reduction from n-space to d-space?

I'm performing dimension reduction on some data sets and would like to evaluate how has a particular dimension reduction algorithm performed in terms of how much data is lost. If we are given 1000 dimensions, and we reduce it to 2, then how…
gizgok
  • 569
4
votes
1 answer

linear versus nonlinear dimensionality reduction techniques

I was going through a short tutorial on dimensionality reduction techniques. Some of these techniques are linear while others are non-linear. What is the distinction between them? Why the terms 'linear' and 'non-linear'?
Upendra01
  • 1,946
  • 7
  • 22
  • 31
4
votes
0 answers

Walkthrough for Locally Linear Embedding

Can someone please walk me through the final step for LLE? Specifically, computing the coordinates of the vectors $Y_i$ on the lower dimensional manifold. Disclaimer: I am aware of another post regarding LLE steps; however, it was largely…
gf.c
  • 178
3
votes
1 answer

Does dimension reduction in more than 2 (or 3) dimension make sense?

I'm using dimension reduction for data analysis (pca, tsne, umap...). Most examples I see project data in only 2 (or 3) dimensions, but I would naively imagine that by projecting in more dimension and visualize those dimensions 2 by 2 on multiple…
ThomaS
  • 133
3
votes
0 answers

Distance-based dimensionality reduction from 2D to 1D?

Problem: Let $C = \{c_1, c_2, c_3, ..., c_n\}$ be a set of cities. And $G = \{(x_1, y_1), (x_2, y_2), ..., (x_n, y_n)\}$ a set of their respective geographical coordinates. What I want to do is finding a set $G' = \{a_1, a_2, ..., a_n\}$ (or a…
2
votes
1 answer

Bayesian Information Criterion -- what is the base of the logarithm?

Apologies in advance for a very basic question! On Wikipedia, I see that the Bayesian Information Criterion evaluates a model using $$ BIC = k\ln(n)- 2\ln(L) $$ where $k$ is the number of parameters, $n$ the sample size and $L$ the (maximised)…
2
votes
2 answers

Is there a formula for an acceptable number of dimensions given a data set size?

I understand the curse of dimensionality, and in machine learning at least, have heard that a minimum of 100-500 samples per class label is needed to effectively train an algorithm (leaving aside single shot learning techniques in development). Is…
skeller88
  • 289
2
votes
0 answers

Searching intersection of elements in subsets with approximation

Task: Having a large number of transactions that consists of distinct elements from one large set $S$ I need to find transactions in which items have intersection with more than 20% of items in current transaction. I need to avoid unusual complexity…
franchb
  • 121
2
votes
1 answer

How to create one synthetic variable from 5 measured variables

I have 5 more less correlated variables measuring conceptually same thing - size. For example, height, weight, shoe size, jacket size and age. I just want to summarize the information from all variables into one and use that single (synthetic)…
user333
  • 7,211
1
vote
1 answer

Things to consider before summing binary variables to create a total score

I would like to reduce the amount of data I have before conducting an analysis. My data consists of several sets of questions assessing comprehension of various topics (e.g. 4 questions assess comprehension of one concept, 4 questions assess…
Salada
  • 11
1
vote
0 answers

Why does SNE include the word Stochastic?

I understand how SNE and tSNE work, but I don't get if it is just called like that because it is a probabilistic method or because there are hidden justifications that use Stochastic Processes.
1
vote
0 answers

Explaining Johnson Lindenstrauss lemma simply

I can't get my head around this concept of Johnson Lindenstrauss lemma which uses random projections, and I cannot find a simple explanation or example of how this works for a novice to understand. Please may I have explained in a simple way how it…
1
vote
2 answers

How does a dimensionality-reduced variable relate to its high-dimension constituents?

If I had a feature vector, X, and applied PCA or EFA to reduce it to a single-dimension variable, should we expect that number to have strong correlations with each of its high-dimension constituents?
LogCapy
  • 105
  • 6
1
2