9

I would like to convert a NumPy array of integers representing ASCII codes to the corresponding string. For example ASCII code 97 is equal to character "a". I tried:

from numpy import *
a=array([97, 98, 99])
c = a.astype('string')
print c

which gives:

['9' '9' '9']

but I would like to get the string "abc".

Håkon Hægland
  • 36,323
  • 18
  • 71
  • 152

6 Answers6

10
print "".join([chr(item) for item in a])

output

abc
Ashoka Lella
  • 6,407
  • 1
  • 29
  • 38
  • Thanks Ashoka for the nice solution. I was too focused on trying to use a NumPy function, but this seems like an elegant solution. – Håkon Hægland Jul 19 '14 at 08:45
10

Another solution that does not involve leaving the NumPy world is to view the data as strings:

arr = np.array([97, 98, 99], dtype=np.uint8).view('S3').squeeze()

or if your numpy array is not 8-bit integers:

arr = np.array([97, 98, 99]).astype(np.uint8).view('S3').squeeze()

In these cases however you do have to append the right length to the data type (e.g. 'S3' for 3 character strings).

coderforlife
  • 1,197
  • 16
  • 27
7

create an array of bytes and decode the the byte representation using the ascii codec:

np.array([98,97,99], dtype=np.int8).tostring().decode("ascii")

note that tostring is badly named, it actually returns bytes which happens to be a string in python2, in python3 you will get the bytes type back which need to be decoded.

jtaylor
  • 2,229
  • 17
  • 19
2
import numpy as np
np.array([97, 98, 99], dtype='b').tobytes().decode("ascii")

Output:

'abc'

Data type objects (dtype)

tostring() is deprecated since version 1.19.0. Use tobytes() instead.

ivanbgd
  • 131
  • 1
  • 5
1
from numpy import array

a = array([97, 98, 99])
print("{0:c}{1:c}{2:c}".format(a[0], a[1], a[2]))

Of course, join and a list comprehension can be used here as well.

Boris Verkhovskiy
  • 10,733
  • 7
  • 77
  • 79
nouseforname
  • 690
  • 1
  • 5
  • 17
1

Solutions that rely on Python loops or string formatting will be slow for large datasets. If you know that all of your data are ASCII, a faster approach could be to use fancy indexing:

import numpy as np
a = np.array([97, 98, 99])
np.array([chr(x) for x in range(127)])[a]
# array(['a', 'b', 'c'], dtype='<U1')

An advantage is that it works for arbitrarily shaped arrays.

nth
  • 1,250
  • 14
  • 12