I wish to map an integer 1D tensor values for each batch, so it should be pretty fast. The mapping is from the tensor unique values to 0:(n_unique - 1).
If it were numpy, I could do something like this:
x = np.array([1,2,4,4])
rep = dict(enumerate(np.unique(x)))
rep_inv = dict(zip(rep.values(), rep.keys()))
x_map = np.vectorize(rep_inv.get)(x)
x_map
array([0, 1, 2, 2])
I found this solution for tensorflow, which works a single time:
x = tf.constant([1,2,4,4], dtype = tf.int64)
def get_table(x):
x_unique, _ = tf.unique(x)
x_mapto = tf.range(tf.shape(x_unique)[0], dtype=tf.int64)
table = tf.lookup.StaticVocabularyTable(
tf.lookup.KeyValueTensorInitializer(
x_unique,
x_mapto,
key_dtype=tf.int64,
value_dtype=tf.int64,
),
num_oov_buckets=1,
)
return table
table = get_table(x)
x_map = table.lookup(x)
x_map
<tf.Tensor: shape=(4,), dtype=int64, numpy=array([0, 1, 2, 2], dtype=int64)>
But comes the second batch and I get an error:
OP_REQUIRES failed at lookup_table_op.cc:964 : Failed precondition: Table was already initialized with different data.
Is there a way to bypass this (bug?), solve this or use a whole different approach to achieve what is required? (using TF 2.4.0)