7

I have a data set with 36k rows. I want to randomly select 9k rows from it using pandas. How do I accomplish this task?

2 Answers2

13

I think you can use sample - 9k or 25% rows:

df.sample(n=9000)

Or:

df.sample(frac=0.25)

Another solution with creating random sample of index by numpy.random.choice and then select by loc - index has to be unique:

df = df.loc[np.random.choice(df.index, size=9000)]

Solution if not unique index:

df = df.iloc[np.random.choice(np.arange(len(df)), size=9000)]
jezrael
  • 729,927
  • 78
  • 1,141
  • 1,090
6

numpy

i = np.random.permutation(np.arange(len(df)))
idx = i[:9000]
pd.DataFrame(df.values[idx], df.index[idx])
piRSquared
  • 265,629
  • 48
  • 427
  • 571