How do you generate a rolling count the number of rows that are duplicated in Pandas?

Question

I'm pretty new to pandas but I know some basic SQL.

I would like to know what is the equivalent way to achieve this in pandas. My goal is to write a np.select statement later and write specific actions dependent on the row number (rn). The value I would like to count is ['Other Identifier'] which has known duplicates.

Row_number() OVER ( PARTITION BY Other Identifier ORDER BY Date DESC ) as rn

I've tried doing it like this:

df['Other Identifier Ct'] = df.groupby(cols)['Other Identifier'].transform('size')

^ That seems to not be a "rolling count" and it just seems to sum up all duplicates. I need each row to show the # instance (1, 2, 3, etc..)

df['Other Identifier Ct'] = df.duplicated.['Other Identifier'].sum()

^ I ran into a syntax error into that one and can't actually run the output, but I think this would also not be rolling count. Could somebody please help advise? Thanks!

The equivalent of your shown SQL code would be`df['rn'] = df.sort_values('Date', ascending=False).groupby('Other Identifier').cumcount() + 1` based off [this answer](https://stackoverflow.com/a/36704460/15497888) by [MaxU](https://stackoverflow.com/users/5741205/maxu) — Henry Ecker, Feb 21 '22 at 04:08

How do you generate a rolling count the number of rows that are duplicated in Pandas?

0 Answers0