I asked a similar question, but the task became more complicated. Now I need to use a weighted average. The average is calculated as in the module with the adjust=True parameter, only for a fixed window, for example, 3. At the same time, the closer the point is to the NaN, the greater its weight, i.e. the first point on the right has more weight, and the first point on the left has more weight.
Сode to reproduce dataframe:
nan = np.nan
d = {'group': {0: 1,
1: 1,
2: 1,
3: 1,
4: 1,
5: 1,
6: 1,
7: 1,
8: 1,
9: 1,
10: 1,
11: 1,
12: 2,
13: 2,
14: 2,
15: 2,
16: 2,
17: 2,
18: 2,
19: 2,
20: 2,
21: 2,
22: 2,
23: 2},
'value': {0: nan,
1: 1.0,
2: 2.0,
3: 3.0,
4: 4.0,
5: nan,
6: nan,
7: 3.0,
8: 6.0,
9: 4.0,
10: 3.0,
11: nan,
12: nan,
13: nan,
14: 1.0,
15: 2.0,
16: 3.0,
17: 4.0,
18: nan,
19: nan,
20: nan,
21: 6.0,
22: 8.0,
23: 9.0}}
df = pd.DataFrame(d)
A function that returns a weighted average:
def get_avg(x, inverse=False):
w = np.array([(1-0.2)**i for i in range(len(x))])
if inverse:
w = w[::-1]
x_w = w*x
return np.sum(x_w)/np.sum(w)
Example of calculation.
For index 1, there are only values on the right, then get_avg(np. array([1, 2, 3]), inverse=True) will return 2.1475, this will be the skip value.
For indexes 5, 6 on the left, get_avg (np. array([2, 3, 4]), inverse=True) will return 3.1475, on the right get_avg(np. array([3, 6, 4]), inverse=False) will return 4.2459, then after linear interpolation, the missing values will be equal to [3.5136612 , 3.87978142].
This processing takes place within each group.
Expected result:
nan = np.nan
d = {'group': {0: 1,
1: 1,
2: 1,
3: 1,
4: 1,
5: 1,
6: 1,
7: 1,
8: 1,
9: 1,
10: 1,
11: 1,
12: 2,
13: 2,
14: 2,
15: 2,
16: 2,
17: 2,
18: 2,
19: 2,
20: 2,
21: 2,
22: 2,
23: 2},
'value': {0: 2.1475409,
1: 1.0,
2: 2.0,
3: 3.0,
4: 4.0,
5: 3.5136612,
6: 3.87978142,
7: 3.0,
8: 6.0,
9: 4.0,
10: 3.0,
11: 4.1147540,
12: 1.852459,
13: 1.852459,
14: 1.0,
15: 2.0,
16: 3.0,
17: 4.0,
18: 4.22131148,
19: 5.29508197,
20: 6.36885246,
21: 6.0,
22: 8.0,
23: 9.0}}
df = pd.DataFrame(d)