1

level column DataFrame that looks like this:

df

Solid             Liquid                Gas
pen paper pipe    water juice milk      oxygen nitrogen helium
5   2     1       4     3     1         7      8        10
5   2     1       4     3     1         7      8        10
5   2     1       4     3     1         7      8        10
4   4     7       3     2     0         6      7        9
3   7     9       4     6     5         3      3        4

What I wanted was to randomly choose 2 columns among "Solid", "Liquid", and "Gas" with 3 sub-columns with them.

for example if Solid and Gas were to randomly selected, the expected result should be:

Solid             Gas
pen paper pipe    oxygen nitrogen helium
5   2     1       7      8        10
5   2     1       7      8        10
5   2     1       7      8        10
4   4     7       6      7        9
3   7     9       3      3        4

I have tried this code but it did not give me the same result.

result = df.sample(n=5, axis=1)
result

[output]

Solid    Gas
pipe     oxygen
1        7
1        7
1        7
1        7
7        6
9        3

Can anyone please help me figure this one out? Thank you :)

Kim Yejun
  • 51
  • 5

1 Answers1

2

You can sample the first level columns and then select the sampled columns:

df[pd.Series(df.columns.levels[0]).sample(2)]

Or use the random.sample function:

import random
df[random.sample(df.columns.levels[0].tolist(),2)]
Allen Qin
  • 18,332
  • 6
  • 47
  • 59