1

I have 2 numpy matrix with slightly different alignment

X

    id,  value
     1,   0.78
     2,   0.65
     3,   0.77
       ...
       ...
    98,   0.88
    99,   0.77
   100,   0.87

Y

    id,  value
     1,   0.79
     2,   0.65
     3,   0.78
       ...
       ...
    98,   0.89
   100,   0.80

Y is simply missing a particular ID. I would like to perform vector operations on X and Y (e.g. correlation, difference...etc). Meaning I need to drop the corresponding missing value in X. How would I do that?

user3240688
  • 1,088
  • 2
  • 12
  • 26

4 Answers4

2

All the values are the same, so the extra element in x will be the difference between the sums.

This solution is o(n), other solutions here are o(n^2)

Data generation:

import numpy as np

# x = np.arange(10)
x = np.random.rand(10)
y = np.r_[x[:6], x[7:]]  # exclude 6
print(x)
np.random.shuffle(y)
print(y)

Solution:

Notice np.isclose() used for floating point comparison.

sum_x = np.sum(x)
sum_y = np.sum(y)
diff = sum_x - sum_y
value_index = np.argwhere(np.isclose(x, diff))

print(value_index)

Delete relevant index

deleted = np.delete(x, value_index)
print(deleted)

out:

[0.36373441 0.5030346  0.895204   0.03352821 0.20693263 0.28651572
 0.25859596 0.97969841 0.77368822 0.80105397]
[0.97969841 0.77368822 0.28651572 0.36373441 0.5030346  0.895204
 0.03352821 0.80105397 0.20693263]
[[6]]
[0.36373441 0.5030346  0.895204   0.03352821 0.20693263 0.28651572
 0.97969841 0.77368822 0.80105397]
Gulzar
  • 17,272
  • 18
  • 86
  • 144
0

You can try this:

X = X[~numpy.isnan(X)]
Y = Y[~numpy.isnan(Y)]

And there you can do whatever operation you want

Majid Hajibaba
  • 2,834
  • 6
  • 19
  • 46
0

Use in1d:

>>> X
array([[ 1.  ,  0.53],
       [ 2.  ,  0.72],
       [ 3.  ,  0.44],
       [ 4.  ,  0.35],
       [ 5.  ,  0.32],
       [ 6.  ,  0.14],
       [ 7.  ,  0.52],
       [ 8.  ,  0.4 ],
       [ 9.  ,  0.1 ],
       [10.  ,  0.1 ]])

>>> Y
array([[ 1.  ,  0.19],
       [ 2.  ,  0.96],
       [ 3.  ,  0.24],
       [ 4.  ,  0.44],
       [ 5.  ,  0.12],
       [ 6.  ,  0.91],
       [ 7.  ,  0.7 ],
       [ 8.  ,  0.54],
       [10.  ,  0.09]])
>>> X[np.in1d(X[:, 0], Y[:, 0])]
array([[ 1.  ,  0.53],
       [ 2.  ,  0.72],
       [ 3.  ,  0.44],
       [ 4.  ,  0.35],
       [ 5.  ,  0.32],
       [ 6.  ,  0.14],
       [ 7.  ,  0.52],
       [ 8.  ,  0.4 ],
       [10.  ,  0.1 ]])
Corralien
  • 70,617
  • 7
  • 16
  • 36
-1

You can use np.intersect1d to do that with return_indices=True. Here is an example:

x_id = np.array([1, 3, 4, 5, 6, 8])
x_value = np.array([0.44, 0.58, 0.64, 0.25, 0.94, 0.11])
y_id = np.array([1, 3, 4, 6, 8, 8])
y_value = np.array([0.58, 0.56, 0.54, 0.52, 0.51, 0.53])
sharedVals, xFilteredIds, yFilteredIds = np.intersect1d(x_id, y_id, return_indices=True)
print(x_value[xFilteredIds])
print(y_value[yFilteredIds])

This print the values of x_value and y_value where the associated IDs are in both x_id and y_id:

[0.44 0.58 0.64 0.94 0.11]
[0.58 0.56 0.54 0.52 0.51]

Note that the IDs must be sorted (but not necessarily unique unless you use the option assume_unique=False).

Jérôme Richard
  • 25,329
  • 3
  • 19
  • 45