Questions tagged [distance]

Measure of distance between distributions or variables, such as Euclidean distance between points in n-space.

Mathematically a distance, $d$, or metric, is a function that satisfies the following properties. For two points $x, y, z$:

  1. $d(x,y) \geq 0$
  2. $d(x,y) = 0 \implies x = y$
  3. $d(x,y) = d(y,x)$
  4. $d(x,z) \leq d(x,y) + d(y,z)$

Note that certain concepts of distance in probability theory do not satisfy these properties. In particular, the KL-distance between two distributions is not symmetric, and doesn't satisfy the third property above.

Euclidean distance, Manhattan distance and Hamming distance are all common metrics.

735 questions
5
votes
3 answers

Vector space model: cosine similarity vs euclidean distance

I have corpora of classified text. From these I create vectors. Each vector corresponds to one document. Vector components are word weights in this document computed as TFIDF values. Next I build a model in which every class is presented by a…
3
votes
3 answers

What is the point of measuring statistical distance?

On pg. 378 of "Cryptography with Tamperable and Leaky Memory", Kalai et al. claim two probability distributions are $e(k)$ close if the distance between them is at most $e(k)$. What is significance of two distributions X and Y being "close to" or…
3
votes
1 answer

Unit of the mahalalanobis distance between two individuals

I'm calculating the mahalanobis distance between two individuals/vectors of the same pool as it is described in mahalanobis distance between individuals by using https://stats.stackexchange.com/a/48576/163146 as code for R. Now I wonder the…
Ben
  • 3,443
2
votes
0 answers

Computing a "concentration of event occurrence" index

I'm looking at the concentration of an event occurrences in a given interval of time. For example, we suppose that an event occurred 4 times in an interval of length 10. I can represent this as a string where X means the event occurred whereas o…
juba
  • 265
2
votes
2 answers

Calculating distances: Using count data or discrete probabilities?

I am looking into calculating distances between vectors for some data analysis. One question I have is whether I should use actual count data or convert to discrete probabilities. For some distances, the method is clear from the underlying theory…
anna6931
  • 121
1
vote
2 answers

similarity metric for two characteristics

I am trying to develop a way to compare items. Each item has the same two properties associated with it, say, $p_{1}$ and $p_{2}$. All I am given are these two measures for each item, each of which may be given as a range, say $x_{1} \le p_{1} \le…
1
vote
1 answer

Mahalanobis distance gives counterintuitive results

I have generated 100 sample time series, each 24 items long, and each with an exponential distribution with a different scale for each of the 24 time points. This is the scale parameter per time point: My 100 time series look like this: This is…
1
vote
1 answer

Conditions on distributions which guarantees smaller average distance

Suppose there are two CDF's $F$ and $G$ whose common support is $[0,1]$, and pdf $f$ and $g$, respectively. If distances are measured in squared differences. What kind of condition would guarantee the following…
0
votes
1 answer

About $D(F,G)=\int(F(x)-G(x))^2w(x)dx$ and $D(F,G)=\int(F(x)-G(x))^2w(x)dF(x)$

I learned that statistical distance between two 1-dim distributions F and G $D_E(F,G)=\int(F(x)-G(x))^2dx$ is famous. But what about $D(F,G)=\int(F(x)-G(x))^2w(x)dx$ or $D(F,G)=\int(F(x)-G(x))^2w(x)dF(x)$? What are their names? From what aspects…
0
votes
1 answer

distance between 2 clusters

I have the formula and an example for calculating the distance between 2 clusters below, I couldn't figure out why and how {2,4,8} can become 8 and {-5,0,5} can become -5. Can someone please enlighten me please?
Lobbie
  • 3
  • 1
  • 3
0
votes
1 answer

Compute Frechet distance

IF there are two curves P [p1,p2...pm] and Q [q1,q2...qm]. To compute the Frechet distance, we will arrange them in the form of grid and compute distance between different points in the grid and fill the cells. What is to be done next. How do we…
user6460588
  • 103
  • 11
0
votes
0 answers

How to determine a formula for an index of adherence?

I hope you will be patient with the inarticulate question of a non-mathematician. It's hard to get an answer when you don't even know how to ask the question. Here the contest: Let's say that I have a kind of Tamagotchi that has the following needs…
Daniele D.
  • 101
  • 3
0
votes
1 answer

Highest values changes on vector

I need a statistic or metric to obtain which vector has the highest changes between its values, I mean, I would like to get vector b, because it has different number contiguous to each other one. They don't have just binary numbers, maybe between…
0
votes
0 answers

Distance between blocks in a file

I am working on my research and it's computer related. I need to calculate how blocks in a file are close to each other (the distance between the blocks). For example, if we have the following three blocks: >line 5 Block 1 >line 20 >line 30 Block…
Arwa
  • 151
-1
votes
1 answer

Comparing Euclidean distances

I have 3 sets of values for X, Y & Z eg X = 7,8,7,8,6,9,8 Y = 8,7,7,6,7,8,8 Z = 8,8,8,7,8,9,9 I want compare the euclidean distance between these sets (X & Y and then X & Z) and convert it to a percentage to be able to convey the relative gap. Max…