0

I'm working with a fairly large dataframe (few million rows) and am trying to vectorise the last line.

Each row in the Data column contains a 2d array of size 8.

The dlc column indicates the index in which to slice the array in Data for each row.

Below is a mimimal example:

import pandas as pd
import numpy as np

df = pd.DataFrame({'Data':list(np.zeros(shape=(100, 8))), 'dlc':np.random.randint(8,size=100)})

df['Data'][:df['dlc']] #line to vectorise

df['Data'][:df['dlc']] causes a TypeError to occur.

    TypeError: cannot do slice indexing on RangeIndex with these indexers [0     3
1     0
2     3
3     1
4     5
     ..
95    4
96    5
97    1
98    5
99    5
Name: dlc, Length: 100, dtype: int32] of type Series

I was previously using the below line but it was extremely slow:

df.apply(lambda x: x['Data'][:x['dlc']], axis=1)
RMRiver
  • 625
  • 1
  • 5
  • 14
  • 2
    AFAIK, a vectorised solution isn't available. See [this](https://stackoverflow.com/questions/55250558/pandas-slicing-column-values-based-on-another-column) question for more. You would basically need to do `df["Slice"] = [x[:y] for x,y in zip(df["Data"], df["dlc"])]` which should be faster than using `apply` – not_speshal Nov 09 '21 at 16:30

0 Answers0