Create a pandas DataFrame slicing a series with two series?

Question

Goodmorning to everyone (admitting is morning wherever you are), I am honoured to write in this wonderful site, this is my first question, hope you'll find this interesting. I have a series from which I want to extract rows for another DataFrame by Slicing it. Here it is some context: the series represent a process variable generated by a machine in a lane. I have the times in which sections of material go under this machine, respecitvely the arriavl and exit time, that are contained in two other series.

I'm going to use these signals in a ML project I'm trying to develop and the alghoritm is supervised so I have the Label. I thought to put each process variable I have in a separate Dataframe and eventually do a join later instead of having an array in a column field, sine I read Pandas is more faster if every column of a DataFrame have only a Scalar, is this correct or it doesn't change anything in a creation of a DataFrame?
Assuming that my previously reasinement is correct I have the three series tarr, tend and s, similar to the one below:

import pandas as pd
import numpy as np
TCamp = np.timedelta64(250,'ms')
idx = pd.MultiIndex.from_tuples([(464288, 55, 0),(464288, 55, 1),(464288, 55, 2),(464288, 55, 3),(464288, 55, 4), (464288, 55, 5),(464288, 55, 6),(464288, 55, 7),(464288, 55, 8),(464288, 55, 9)], names=['Colata(Par)', 'Slab', 'Slice'])
tarr = pd.Series(np.array(['2021-11-20T01:37:22.090000000','2021-11-20T01:37:14.090000000','2021-11-20T01:37:06.340000000','2021-11-20T01:36:58.340000000','2021-11-20T01:36:50.590000000','2021-11-20T01:36:42.840000000','2021-11-20T01:36:34.840000000','2021-11-20T01:36:27.090000000','2021-11-20T01:36:19.090000000','2021-11-20T01:36:11.090000000'],dtype='datetime64[ns]'),index=idx)
tend = pd.Series(np.array(['2021-11-20T01:37:29.590000000','2021-11-20T01:37:21.840000000','2021-11-20T01:37:13.840000000','2021-11-20T01:37:06.090000000','2021-11-20T01:36:58.090000000','2021-11-20T01:36:50.340000000','2021-11-20T01:36:42.590000000','2021-11-20T01:36:34.590000000','2021-11-20T01:36:26.840000000','2021-11-20T01:36:18.840000000'],dtype='datetime64[ns]'),index=idx)
s = pd.Series(np.random.uniform(1.4,1.5,315), index=[tarr.values[-1]+i*TCamp for i in range(315)])

I want to create, admitting it's the best option, a DataFrame where the index is taken form tarr (remembering it's the same index in tend) and as values I want the slices of s teken from tarr to tend. I wrote this code

idx = tarr.index
tarr = tarr.values
tend = tend.values
df = pd.concat([pd.DataFrame([s[x:y].values], index=[idx[i]]) for i,(x,y) in enumerate(zip(tarr, tend))])

Maybe it's silly, but first I converted the series in numpy array hoping that this could fasten the computing.

Out[101]:
                       0         1         2         3         4         5         6         7         8   ...        23        24        25        26        27        28        29        30        31
(464288, 55, 0)  1.440785  1.416969  1.422908  1.442870  1.436515  1.414262  1.474375  1.426283  1.460844  ...  1.445614  1.455839  1.434067  1.428702  1.456841  1.406867  1.416680  1.498526       NaN
(464288, 55, 1)  1.424743  1.491728  1.441107  1.468172  1.428047  1.425284  1.442666  1.408087  1.492354  ...  1.449526  1.421042  1.442521  1.466031  1.429108  1.462252  1.439046  1.421225  1.486515
(464288, 55, 2)  1.475340  1.465274  1.494914  1.401652  1.453811  1.419799  1.436432  1.475635  1.412079  ...  1.476395  1.404988  1.470828  1.472004  1.494946  1.468327  1.423555  1.406835       NaN
(464288, 55, 3)  1.475291  1.431561  1.495104  1.460506  1.421245  1.411963  1.411887  1.468771  1.471905  ...  1.419006  1.476418  1.427846  1.426180  1.476835  1.499209  1.472152  1.443316  1.449456
(464288, 55, 4)  1.423420  1.429175  1.440244  1.476869  1.442356  1.459726  1.475783  1.453469  1.428708  ...  1.494100  1.409971  1.412222  1.446604  1.453895  1.472602  1.408404  1.406487       NaN
(464288, 55, 5)  1.462259  1.422866  1.428152  1.449186  1.414712  1.498510  1.495371  1.416978  1.432208  ...  1.486691  1.420766  1.409290  1.422413  1.495155  1.472826  1.407366  1.418593       NaN
(464288, 55, 6)  1.428384  1.449596  1.477857  1.407660  1.435210  1.491050  1.458693  1.492235  1.483423  ...  1.477553  1.447089  1.429405  1.433497  1.484601  1.485276  1.430502  1.464646  1.449117
(464288, 55, 7)  1.486584  1.405553  1.446462  1.468316  1.499108  1.405232  1.410002  1.450734  1.467617  ...  1.493026  1.431031  1.462516  1.479795  1.470909  1.435462  1.412458  1.488296       NaN
(464288, 55, 8)  1.430702  1.455248  1.485239  1.451106  1.452792  1.416129  1.468898  1.487687  1.454509  ...  1.423530  1.469781  1.430409  1.498169  1.467050  1.428302  1.489567  1.416920  1.405749
(464288, 55, 9)  1.498188  1.458571  1.476445  1.491516  1.479044  1.477391  1.455229  1.403565  1.483421  ...  1.445026  1.416440  1.453774  1.406295  1.499867  1.486285  1.437962  1.446287  1.404403

The point is that the series tarr and tend are about 6/7 k rows (while the s series is about 70k). I measured with the debugger that to process each element in the list comprehension it takes about 2 second, giving me a lot of time to wait for a single signal. Is there any other way to do the above operation which is faster? Should I think about writing a multithread model like the map funciton in the map-reduce method (even thought I haven't really experience in this field) or is there a way to vectorize the problem? Thank you all for the help.

There probably _is_ a way to vectorize this. But, I need to have a minimal, reproducible example. Have a look at [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/17242583) to see how to make one. — richardec, Mar 17 '22 at 00:07
Oh, sorry, now I understand the meaning of reproducibility, I'll edit the question again soon. — Matteo Cft, Mar 17 '22 at 08:44
@richardec thank you for the link, now I understand that using the editor is far more easier than I thought — Matteo Cft, Mar 17 '22 at 12:32
Wow, your reproducible example is beautiful! I'll see what I can do. — richardec, Mar 17 '22 at 13:37
I'll edit again the question to short it, hoping that people would be more propense to read it. — Matteo Cft, Mar 19 '22 at 18:17

Create a pandas DataFrame slicing a series with two series?

0 Answers0