I have the following dataset:
df
datetime_time price
2008-01-01 97.414384
2008-01-02 105.022374
2008-01-03 102.897246
etc...
2011-12-29 100.414384
2011-12-30 102.201384
2011-12-31 92.467231
I already wrote a function that take the n largest price of each year and return the dates that are present in different years (for example if the 10th of January is one of the highest price in 2008 and 2010 then it will be return and plotted).
Code (could be optimized but it works just fine):
def find_same_days(df,price):
lst=[]
lst.append(["2008",list(df["2008-01-01":"2008-12-31"][price].nlargest(n=100).index.strftime('%m/%d'))])
lst.append(["2009",list(df["2009-01-01":"2009-12-31"][price].nlargest(n=100).index.strftime('%m/%d'))])
lst.append(["2010",list(df["2010-01-01":"2010-12-31"][price].nlargest(n=100).index.strftime('%m/%d'))])
lst.append(["2011",list(df["2011-01-01":"2011-12-31"][price].nlargest(n=100).index.strftime('%m/%d'))])
lst.append(["2012",list(df["2012-01-01":"2012-12-31"][price].nlargest(n=100).index.strftime('%m/%d'))])
df_merge = pd.DataFrame(lst, columns=["year", "best_change"])
year = []
value = []
for i, row in df_merge.iterrows():
for j, val in enumerate(row['best_change']):
year.append(row['year'])
value.append(val)
df_new = pd.DataFrame({'year': year,'best_change': value})
df_com = df_new[df_new.duplicated(subset=['best_change'], keep=False)]
fig = px.histogram(df_com,x="best_change", color="year")
fig.show()
However, this does not take into account that depending on the year, the first monday of January for example, would not be the same date (in 2008 it was the 7th of January, in 2009 it was the 5th etc...). And I would like to check this correlation in my dataset.
I don't know if there is an easier way to do that in Python, but my goal was to add a column to my pd.DataFrame with, for each day, his position in the Month (First monday would be 1-Mon-Jan, 2nd = 2-Mon-Jan etc...).
I found multiple function to find the first Monday of the month etc (here for example) but I would like it the other way around for what I want to do and I don't know if there is any function that could help me solving this problem more easily ?
Thanks a lot