Element-wise string concatenation in numpy

Question

Is this a bug?

import numpy as np
a1=np.array(['a','b'])
a2=np.array(['E','F'])

In [20]: add(a1,a2)
Out[20]: NotImplemented

I am trying to do element-wise string concatenation. I thought Add() was the way to do it in numpy but obviously it is not working as expected.

As the name implies, number is for numbers. Python itself has pretty good string operations. Why not just use that? `"".join(["a", "b"])` works fine. — Keith, Mar 31 '12 at 18:29
I was looking at this http://docs.scipy.org/doc/numpy/reference/routines.char.html — Dave31415, Mar 31 '12 at 18:39
That's cool. But: "All of them are based on the string methods in the Python standard library.". So if you just use the standard library you can write code that doesn't depend on numpy. — Keith, Mar 31 '12 at 18:44
The `add` operation does not do the same thing as `join`. numpy's add can be useful for multidimensional arrays or nested lists. — gypaetus, Dec 03 '15 at 17:50

Mike T · Accepted Answer · 2019-03-26T23:42:43.077

74

This can be done using numpy.core.defchararray.add. Here is an example:

>>> import numpy as np
>>> a1 = np.array(['a', 'b'])
>>> a2 = np.array(['E', 'F'])
>>> np.core.defchararray.add(a1, a2)
array(['aE', 'bF'], 
      dtype='<U2')

There are other useful string operations available for NumPy data types.

edited Mar 26 '19 at 23:42

answered Dec 16 '12 at 23:49

Mike T

38,021
17
143
186

The `add` string operations you link to gives a `NotImplemented` (as in the question) for numpy 1.6.1 under python 3.2. Do you know from which version is implemented? – Francesco Montesano Aug 06 '13 at 08:57
@FrancescoMontesano checking with that version combination on Ubuntu 12.04.2 LTS, the example in my answer works as expected. Generally speaking, using `np.add` also raises `NotImplemented` with any version. Ensure you are using `np.core.defchararray.add`. – Mike T Aug 06 '13 at 09:49
Now I've seen the full signature of `add` in the docs (I missed that before). Anyway, would be nice if numpy would wrap `np.core.defchararray.*` into corresponding numeric ndarray operations. I think its much neater and easy to remember to do `np.add`. – Francesco Montesano Aug 06 '13 at 10:11
6

As noted in the docstring of the module, "the preferred alias for `defchararray` is `numpy.char`", so you can just say `np.char.add`. – jdehesa Jul 31 '17 at 14:54
@MikeT : Is it possible to define a delimiter to get an output like array(['a#E', 'b#F']) ? Btw thank you for the above solution. Using map('#'.join, zip(a1, a2)) I can but curious it is possible with numpy. – PanwarS87 Aug 15 '17 at 14:11

Saullo G. P. Castro · Answer 2 · 2016-09-23T09:59:11.037

13

You can use the chararray subclass to perform array operations with strings:

a1 = np.char.array(['a', 'b'])
a2 = np.char.array(['E', 'F'])

a1 + a2
#chararray(['aE', 'bF'], dtype='|S2')

another nice example:

b = np.array([2, 4])
a1*b
#chararray(['aa', 'bbbb'], dtype='|S4')

edited Sep 23 '16 at 09:59

answered May 09 '14 at 06:08

Saullo G. P. Castro

53,388
26
170
232

score 7 · Answer 3 · edited May 08 '14 at 20:10

7

This can (and should) be done in pure Python, as numpy also uses the Python string manipulation functions internally:

>>> a1 = ['a','b']
>>> a2 = ['E','F']
>>> map(''.join, zip(a1, a2))
['aE', 'bF']

edited May 08 '14 at 20:10

Saullo G. P. Castro

53,388
26
170
232

answered Mar 31 '12 at 18:29

Niklas B.

89,411
17
190
222

Ok, so the add function I was using is not at top level in numpy. Is either of those faster/better or preferred for any reason? – Dave31415 Mar 31 '12 at 18:42
14

This doesn't answer the question. There are times when one might want to do this in numpy, e.g. when working with large arrays of strings. The original poster gave a simple example for which one would use pure Python, but was asking for a numpy solution. – apdnu Apr 20 '13 at 21:11
@Thucydides411 From what I understood at the time of writing my answer, numpy just used the builtin Python primitives, so I didn't see what advantage that would have. Not sure whether that is true, it seems like it is not. Maybe I misinterpreted the statement "All of them are based on the string methods in the Python standard library." in the docs – Niklas B. May 06 '14 at 17:02
@NiklasB. Thank you, Nick. I was looking for the exact same thing. Just curious how I can implement the same using numpy. I will dig numpy docs. – PanwarS87 Aug 15 '17 at 14:14

score 3 · Answer 4 · answered Jun 28 '13 at 21:11

Another solution is to convert string arrays into arrays of python of objects so that str.add is called:

>>> import numpy as np
>>> a = np.array(['a', 'b', 'c', 'd'], dtype=np.object)   
>>> print a+a
array(['aa', 'bb', 'cc', 'dd'], dtype=object)

This is not that slow (less than twice as slow as adding integer arrays).

score 2 · Answer 5 · answered Sep 18 '18 at 14:32

One more basic, elegant and fast solution:

In [11]: np.array([x1 + x2 for x1,x2 in zip(a1,a2)])
Out[11]: array(['aE', 'bF'], dtype='<U2')

It is very fast for smaller arrays.

In [12]: %timeit np.array([x1 + x2 for x1,x2 in zip(a1,a2)])
3.67 µs ± 136 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [13]: %timeit np.core.defchararray.add(a1, a2)
6.27 µs ± 28.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [14]: %timeit np.char.array(a1) + np.char.array(a2)
22.1 µs ± 319 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

For larger arrays, time difference is not much.

In [15]: b1 = np.full(10000,'a')    
In [16]: b2 = np.full(10000,'b')    

In [189]: %timeit np.array([x1 + x2 for x1,x2 in zip(b1,b2)])
6.74 ms ± 66.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [188]: %timeit np.core.defchararray.add(b1, b2)
7.03 ms ± 419 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [187]: %timeit np.char.array(b1) + np.char.array(b2)
6.97 ms ± 284 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Element-wise string concatenation in numpy

5 Answers5

Linked

Related