numpy - align 2 vectors with potentially missing values

Question

I have 2 numpy matrix with slightly different alignment

X

    id,  value
     1,   0.78
     2,   0.65
     3,   0.77
       ...
       ...
    98,   0.88
    99,   0.77
   100,   0.87

Y

    id,  value
     1,   0.79
     2,   0.65
     3,   0.78
       ...
       ...
    98,   0.89
   100,   0.80

Y is simply missing a particular ID. I would like to perform vector operations on X and Y (e.g. correlation, difference...etc). Meaning I need to drop the corresponding missing value in X. How would I do that?

this looks like a dataframe. are these 2d arrays like `[[1, 0.79], [2, 0.65], ...]`? — Shrey Joshi, Jun 01 '21 at 20:39
Please in next questions provide **copy-pasteable** code to generate your data. — Gulzar, Jun 01 '21 at 22:35

Gulzar · Accepted Answer · 2021-06-08T09:21:52.113

All the values are the same, so the extra element in x will be the difference between the sums.

This solution is o(n), other solutions here are o(n^2)

Data generation:

import numpy as np

# x = np.arange(10)
x = np.random.rand(10)
y = np.r_[x[:6], x[7:]]  # exclude 6
print(x)
np.random.shuffle(y)
print(y)

Solution:

Notice np.isclose() used for floating point comparison.

sum_x = np.sum(x)
sum_y = np.sum(y)
diff = sum_x - sum_y
value_index = np.argwhere(np.isclose(x, diff))

print(value_index)

Delete relevant index

deleted = np.delete(x, value_index)
print(deleted)

out:

[0.36373441 0.5030346  0.895204   0.03352821 0.20693263 0.28651572
 0.25859596 0.97969841 0.77368822 0.80105397]
[0.97969841 0.77368822 0.28651572 0.36373441 0.5030346  0.895204
 0.03352821 0.80105397 0.20693263]
[[6]]
[0.36373441 0.5030346  0.895204   0.03352821 0.20693263 0.28651572
 0.97969841 0.77368822 0.80105397]

score 0 · Answer 2 · edited Jun 02 '21 at 06:49

0

You can try this:

X = X[~numpy.isnan(X)]
Y = Y[~numpy.isnan(Y)]

And there you can do whatever operation you want

edited Jun 02 '21 at 06:49

Majid Hajibaba

2,834
6
19
46

answered Jun 01 '21 at 20:47

Gonzalo Zabala

16

1

This has nothing to do with the question – Gulzar Jun 01 '21 at 22:49

score 0 · Answer 3 · answered Jun 01 '21 at 20:48

Use in1d:

>>> X
array([[ 1.  ,  0.53],
       [ 2.  ,  0.72],
       [ 3.  ,  0.44],
       [ 4.  ,  0.35],
       [ 5.  ,  0.32],
       [ 6.  ,  0.14],
       [ 7.  ,  0.52],
       [ 8.  ,  0.4 ],
       [ 9.  ,  0.1 ],
       [10.  ,  0.1 ]])

>>> Y
array([[ 1.  ,  0.19],
       [ 2.  ,  0.96],
       [ 3.  ,  0.24],
       [ 4.  ,  0.44],
       [ 5.  ,  0.12],
       [ 6.  ,  0.91],
       [ 7.  ,  0.7 ],
       [ 8.  ,  0.54],
       [10.  ,  0.09]])

>>> X[np.in1d(X[:, 0], Y[:, 0])]
array([[ 1.  ,  0.53],
       [ 2.  ,  0.72],
       [ 3.  ,  0.44],
       [ 4.  ,  0.35],
       [ 5.  ,  0.32],
       [ 6.  ,  0.14],
       [ 7.  ,  0.52],
       [ 8.  ,  0.4 ],
       [10.  ,  0.1 ]])

score -1 · Answer 4 · answered Jun 01 '21 at 20:55

You can use np.intersect1d to do that with return_indices=True. Here is an example:

x_id = np.array([1, 3, 4, 5, 6, 8])
x_value = np.array([0.44, 0.58, 0.64, 0.25, 0.94, 0.11])
y_id = np.array([1, 3, 4, 6, 8, 8])
y_value = np.array([0.58, 0.56, 0.54, 0.52, 0.51, 0.53])
sharedVals, xFilteredIds, yFilteredIds = np.intersect1d(x_id, y_id, return_indices=True)
print(x_value[xFilteredIds])
print(y_value[yFilteredIds])

This print the values of x_value and y_value where the associated IDs are in both x_id and y_id:

[0.44 0.58 0.64 0.94 0.11]
[0.58 0.56 0.54 0.52 0.51]

Note that the IDs must be sorted (but not necessarily unique unless you use the option assume_unique=False).

numpy - align 2 vectors with potentially missing values

4 Answers4

Data generation:

Solution:

Delete relevant index

out: