1

I have a dataframe such as below and I want to split the string column into rows each with an equal string of 4 characters.

date, string
2002-06-01, 12345678
2002-06-02, 87654321

Expected Output

date, string
2002-06-01, 1234
2002-06-01, 5678
2002-06-02, 8765
2002-06-02, 4321

I have tried the example given here: Split cell into multiple rows in pandas dataframe

from itertools import chain

def chainer(s):
    return list(chain.from_iterable(s.str.split(df['string'], 4)))

lens = df['string'].str.split(df['string'], 4).map(len)
res = pd.DataFrame({'date': np.repeat(df['date'], lens), 'string': chainer(df['string'])})

But I get the error: TypeError: unhashable type: 'Series'. How can I fix this issue.

Gee
  • 413
  • 3
  • 8

2 Answers2

2

Exlplode Chunks

df.assign(
    string=[
        [x[i:i+4] for i in range(0, len(x), 4)]
         for x in df.string]
).explode('string')

         date string
0  2002-06-01   1234
0  2002-06-01   5678
1  2002-06-02   8765
1  2002-06-02   4321
piRSquared
  • 265,629
  • 48
  • 427
  • 571
0

Here is another way:

df.assign(string = df['string'].str.findall(r'\d{4}')).explode('string')

Output:

       date string
0  6/1/2002   1234
0  6/1/2002   5678
1  6/2/2002   8765
1  6/2/2002   4321
rhug123
  • 4,193
  • 1
  • 3
  • 20