34

Input:

mystr = "100110"

Desired output numpy array:

mynumpy == np.array([1, 0, 0, 1, 1, 0])

I have tried:

np.fromstring(mystr, dtype=int, sep='')

but the problem is I can't split my string to every digit of it, so numpy takes it as an one number. Any idea how to convert my string to numpy array?

Mateen Ulhaq
  • 21,459
  • 16
  • 82
  • 123
Am1rr3zA
  • 6,481
  • 16
  • 74
  • 117

3 Answers3

46

list may help you do that.

import numpy as np

mystr = "100110"
print np.array(list(mystr))
# ['1' '0' '0' '1' '1' '0']

If you want to get numbers instead of string:

print np.array(list(mystr), dtype=int)
# [1 0 0 1 1 0]
dragon2fly
  • 1,994
  • 16
  • 20
  • It should be noted that for large inputs, **grc**'s first method using `np.fromstring('...', np.int8)` is _much_ faster. Creating a `list` from the (large) string is unnecessary. – kyrill Apr 16 '19 at 13:58
28

You could read them as ASCII characters then subtract 48 (the ASCII value of 0). This should be the fastest way for large strings.

>>> np.fromstring("100110", np.int8) - 48
array([1, 0, 0, 1, 1, 0], dtype=int8)

Alternatively, you could convert the string to a list of integers first:

>>> np.array(map(int, "100110"))
array([1, 0, 0, 1, 1, 0])

Edit: I did some quick timing and the first method is over 100x faster than converting it to a list first.

grc
  • 21,645
  • 4
  • 39
  • 63
  • 5
    I would strongly recommend using `ord('0')` instead of `48`. Better explicit than implicit. – DerWeh Nov 02 '18 at 10:24
12

Adding to above answers, numpy now gives a deprecation warning when you use fromstring
DeprecationWarning: The binary mode of fromstring is deprecated, as it behaves surprisingly on unicode inputs. Use frombuffer instead.
A better option is to use the fromiter. It performs twice as fast. This is what I got in jupyter notebook -

import numpy as np
mystr = "100110"

np.fromiter(mystr, dtype=int)
>> array([1, 0, 0, 1, 1, 0])

# Time comparison
%timeit np.array(list(mystr), dtype=int)
>> 3.5 µs ± 627 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit np.fromstring(mystr, np.int8) - 48
>> 3.52 µs ± 508 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit np.fromiter(mystr, dtype=int)
1.75 µs ± 133 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
hru_d
  • 776
  • 9
  • 13