Measures of closeness between distributions, clusterings, data sets or other objects.
Questions tagged [similarities]
511 questions
18
votes
3 answers
Quantifying similarity between two data sets
Summary: Trying to find the best method summarize the similarity between two aligned data sets of data using a single value.
Details:
My question is best explained with a diagram. The graphs below show two different data sets, each with values…
Gabriel Southern
- 301
4
votes
1 answer
How to compute cosine similarity on multi-type data?
I have records (rows) in a database and I want to compute similar records. I have a constraint to use cosine similarity. If the variables (attributes, columns) vary in type and come in this form:
[number] [number] [boolean] [20 chars string]
how…
curious
- 1,101
- 1
- 8
- 7
4
votes
1 answer
Similarity measure between two text documents
Let the first document involve the words $\{x_1,x_2,\ldots,x_n\}$ and the second one be composed of $\{y_1,y_2,\ldots,y_m\}$ where $n$ is not necessarily equal to $m$. I have a similarity measure that works for elements of the sets; $s(x_i,y_j)$ for…
Helen
- 75
3
votes
1 answer
Similarity measure between two colors?
If one feature of my objects is color, how to compute similarity between two objects?
It is correct to transform it to RGB and use Euclidean distance in 3D?
Or is there any acceptable one dimensional ordering of colors (as in the case of rainbow)?
sitems
- 3,929
2
votes
2 answers
Similarity percentage based on attributes and their quantities
I have two sets that contain (partially overlapping) attributes in different quantities and I am looking for a method to compare the similarity between the sets based on the attributes and their quantities. I am new to this and I would appreciate…
coffee-and-code
- 121
2
votes
0 answers
Metric for GPS routes
I have a set of drivers' routes each represented as a sequence (time, longitude, latitude). The data is received from GPS in driver's smartphone. The sequences have different length and coordinates are significantly spread (there are some outliers…
Igor
- 171
2
votes
0 answers
Similarity/duplicate measure for collection of vectors
I have 12 vectors of the size 1x16, which are generated as a side-product of my algorithm. If any of the vectors are very similar, that could indicate that my algorithm is performing badly. Roughly half of the values at an absolute value below …
pir
- 5,056
1
vote
0 answers
How to include a nested factor in a SIMPER test in PRIMER
I'm trying to conduct a SIMPER analysis in PRIMER. I'm wanting to do the analysis using a factor that is made up of the interaction between 2 other factors, with one nested in a third.
I'm looking at the response of invertebrates after fire across…
Psk
- 11
1
vote
1 answer
String similarity measure for pitch contour similarity (Parson code)
I have a dataset with short melodies hummed by experimental subjects. Each melody consists of a variable number of notes (10 to 20 notes) and is coded by the pitch levels of the notes (automatically extracted from the audio file), and also in terms…
z8080
- 2,370
1
vote
1 answer
similarity distance when weight should change
I am trying to find a similarity measure for a very specific problem.
I have a list of cities that have some characteritisques. For example bilingual school, firemen stations, etc. These cities are represented as binary vectors where each element…
DroppingOff
- 537
- 1
- 4
- 12
1
vote
1 answer
Test Similarity of two models
I have two models that I've trained using different ML algorithms. I want to compare how well they match each other as a test of whether they are generalising the training set or simply memorising it. I could calculate the coefficient of…
Goods
- 173
- 6
1
vote
1 answer
Is there a measure I could apply to measure spread of predictions?
I have some data with predictors that work out the value of changes applied to an ID.
Each TYPE is a predictor for the CHANGE applied to an ID.
Below is an example of the data:
ID CHANGE TYPE VALUE STATE
A RA 1 1.14 POS
A …
DataD
- 11
- 1
- 4
1
vote
1 answer
statistical test for similarity
if you were comparing two randomly assigned groups of people for similarity at the start of a study, which statistical test would typically be best for:
Marital status
Height
Likert item statement - I like chocolate
s. stamos
- 11
- 2
1
vote
0 answers
Verify model correctness
I try to explain my question with an example.
Assume we have a network of 3 computers. We can measure different entity in this system such as CPU load, disk load, Log messages and etc. (all the entities are numeric). Let's the vector x holds all…
John Green
- 11
0
votes
1 answer
Combining similarity scores from 2 measures into one score
Assume I have items: a, b, and c. And I calculated the similarities between each pair using two different measures, and I got the following similarity score metrics:
using measure 1, the results are:
a b c
a 1 0.5 0.1
b …
Adham Enaya
- 123