-2

I have a dataframe with temperature data for a certain period. With this data, I want to calculate the relative frequency of the month of August being warmer than 20° as well as January being colder than 2°. I have already managed to extract these two columns in a separate dataframe, to get the count of each temperature event and used the normalize function to get the frequency for each value in percent (see code).

df_temp1[df_temp1.aug >=20]
df_temp1[df_temp1.jan <= 2]

df_temp1['aug'].value_counts()
df_temp1['jan'].value_counts()

df_temp1['aug'].value_counts(normalize=True)*100
df_temp1['jan'].value_counts(normalize=True)*100

What I haven't managed is to calculate the relative frequency for aug>=20, jan<=2, as well as aug>=20 AND jan<=2 and aug>=20 OR jan<=2. Maybe someone could help me with this problem. Thanks.

  • text `did not get a satisfying result.` is totally useless for us. I don't understand what you want to do. Better show minimal working code - with some example data in code - and what you get and what you expect. – furas Oct 31 '21 at 10:41
  • it is problem with beginners that they didn't read [how to ask](https://stackoverflow.com/help/how-to-ask) and later they create question which are downvoted. Stackoverflow is place where you should show code and FULL error message and we try to resolve this single problem. And showing code cam be more usefull then describing problem. And if you show code (with example data) then we can use it to create solution, – furas Oct 31 '21 at 15:25
  • and put all information in question, not in comment - they will be more readable (because you can't format code in comment) and more people will see it so more people may help you. – furas Oct 31 '21 at 15:26
  • Welcome to stackoverflow. Here is a piece of CONSTRUCTIVE criticism. Do the following thing. Do ```print(df.head(40))``` and paste the result in your question. Don't forget to put it between ``` ``` so it becomes readable. Paste code you've tried in the same way. Your first experience with SO should be a pleasant one. It is also stated that we should be kind to newbies. – Serge de Gosson de Varennes Oct 31 '21 at 16:51

1 Answers1

0

I would try something like this:

proprortion_of_augusts_above_20 = (df_temp1['aug'] >= 20).mean()
proprortion_of_januaries_below_20 = (df_temp1['jan'] <= 2).mean()

This calculates it in two steps. First, df_temp1['aug'] >= 20 creates a boolean array, with True representing months above 20, and False representing months which are not.

Then, mean() reinterprets True and False as 1 and 0. The average of this is the percentage of months which fulfill the criteria, divided by 100.

As an aside, I would recommend posting your data in a question, which allows people answering to check whether their solution works.

Nick ODell
  • 9,210
  • 2
  • 26
  • 54