How to get specific number of rows based on column values in dataframe

Question

Suppose I have a MNIST dataset in this way.

df = pd.read_csv('data/train.csv')
data = df.loc[df['label'].isin([1,6])]

I am trying to select only those rows whose column ['label'] == 1 or 6.

But, I am want to get only 500 rows from each column ['label']

How do I do it?

Maybe try something like `df.loc[df['label'].iloc[0:500].isin([1,6])]`... — l'L'l, Oct 21 '17 at 04:38
Do you mean the first 500 rows? Then df[df.label.isin([1,6]))[0:500] will do. — skrubber, Oct 21 '17 at 04:39

score 2 · Answer 1 · answered Oct 21 '17 at 04:39

2

You can group them and select the number you want for each value:

data = df.loc[df['label'].isin([1,6])].groupby('label').head(500)

answered Oct 21 '17 at 04:39

Gerges

score 0 · Answer 2 · answered Oct 21 '17 at 04:38

0

Use groupby first then filer i.e

ndf= df.groupby('label').head(500)
data = ndf.loc[ndf['label'].isin([1,6])]

answered Oct 21 '17 at 04:38

Bharath

2 Answers2