Given a dataset df as follows:
variable_name date pred_value real_value
0 import 2022/3/31 2721.80 2736.20
1 import 2022/3/31 2721.80 2736.20
2 import 2022/3/31 2705.50 2736.20
3 import 2022/3/31 2500.00 2736.20
4 import 2022/3/31 2900.05 2736.20
5 import 2022/4/30 2795.66 2759.98
6 import 2022/4/30 2694.45 2759.98
7 import 2022/4/30 2855.36 2759.98
8 import 2022/4/30 2300.00 2759.98
9 GDP 2022/3/31 1.13 1.10
10 GDP 2022/3/31 1.13 1.10
11 GDP 2022/3/31 1.17 1.10
12 GDP 2022/3/31 0.91 1.10
13 GDP 2022/4/30 1.29 1.30
14 GDP 2022/4/30 1.29 1.30
15 GDP 2022/4/30 1.28 1.30
16 GDP 2022/4/30 1.50 1.30
Code:
df.date = pd.to_datetime(df.date)
dfm = df.melt(id_vars=['variable_name', 'date'])
p = sns.relplot(kind='scatter', data=dfm, x='date', y='value', row='variable_name', height=4,
aspect=2.5, hue='variable',
palette=['tab:blue', 'tab:red'],
alpha=0.5,
facet_kws={'sharey': False, 'sharex': True})
p.set_xticklabels(rotation=30)
# p.xaxis.set_major_locator(MonthLocator())
# p.xaxis.set_major_formatter(DateFormatter('%Y-%m'))
Out:
As you can see, we have 3 issues in the output plot need to improve. 1. x-labels are daily, I hope it could be monthly as original date in df; 2. add x-labels for each subplot, since if we have many variables in df, then it will be difficult to intuitively see the date, 3. for each month's pred_value of one variable, if we have duplicated values inside, the scatter points are overwritten. Is it possible to align them horizontally? It means for example, in 2022-03-31, GDP containing two 1.13 as pred_values so I'll need to plot two points horizontally which a little bit similar to effect below:
How could I edit code to achieve the purpose above? Thanks.
References:
How to format the y- or x-axis labels in a seaborn FacetGrid