I want the weighted mean of my dv, velocity.
In this scenario, velocity is a derived/interpolated measure comprised of repeated measures of randomly sampled speeds in a given region. There will always be a fixed number of velocity measures derived from a smaller number of speed measurements (in the real data, not in these data).
In the sample data, there are up to 10 dv measurements for each unique combination of id, timepoint, and direction. The <=10 dv measures can be derived from at least 3 speed measures.
As such, dv's that were calculated where there was more speed sampling are more accurate. This method is standard and has been validated in my field.
Sample data:
dat <- structure(list(n_speed = c(7, 6, 5, 4, 7, 6, 4, 9), id = c("subj_1",
"subj_1", "subj_1", "subj_1", "subj_2", "subj_2", "subj_2", "subj_2"
), timepoint = c("t1", "t1", "t2", "t2", "t1", "t1", "t2", "t2"
), direction = c("long", "lat", "long", "lat", "long", "lat",
"long", "lat")), class = "data.frame", row.names = c(NA, -8L))
> head(dat)
dv id intervention timepoint direction region
1 0.7708878 subj_1 ctrl t1 long healthy
2 0.9193373 subj_1 ctrl t1 long healthy
3 1.0000385 subj_1 ctrl t1 long healthy
4 0.6570246 subj_1 ctrl t1 long healthy
5 0.9345068 subj_1 ctrl t1 long healthy
6 0.9421999 subj_1 ctrl t1 long healthy
and
speed_measures <- structure(list(n_speed = c(7, 6, 5, 4, 7, 6, 4, 9), id = c("subj_1",
"subj_1", "subj_1", "subj_1", "subj_2", "subj_2", "subj_2", "subj_2"
), timepoint = c("t1", "t1", "t2", "t2", "t1", "t1", "t2", "t2"
), direction = c("long", "lat", "long", "lat", "long", "lat",
"long", "lat")), class = "data.frame", row.names = c(NA, -8L))
> speed_measures
n_speed id timepoint direction
1 7 subj_1 t1 long
2 6 subj_1 t1 lat
3 5 subj_1 t2 long
4 4 subj_1 t2 lat
5 7 subj_2 t1 long
6 6 subj_2 t1 lat
7 4 subj_2 t2 long
8 9 subj_2 t2 lat
Here we can see that for subj_1 x t2 x lat, we derived all the dv's at the unique level of id, timepoint, and direction using 4 speed measures. Conversely, for subj_2 x t2 x lat, we derived dv using 9 speed measures. When we ultimately calculate the estimated marginal means of t2 x lat, subj_2 should have greater influence on the mean than subj_1
So we can join these table to see the number of speed measurements that went into deriving the dv measurements for each unique id x timepoint x direction combination. region and intervention are additional factors, but each speed measurement and derived velocity occurred at the unique id x timepoint x direction levels.
We will only take 70 of the 80 measures to simulate imbalance in the actual data.
dat_combined <- speed_measures |> left_join(dat_short) |> slice_sample(n = 70)
> head(dat_combined)
n_speed id timepoint direction dv intervention region
1 7 subj_1 t1 long 0.7708878 ctrl healthy
2 7 subj_1 t1 long 0.9193373 ctrl healthy
3 7 subj_1 t1 long 1.0000385 ctrl healthy
4 7 subj_1 t1 long 0.6570246 ctrl healthy
5 7 subj_1 t1 long 0.9345068 ctrl healthy
6 7 subj_1 t1 long 0.9421999 ctrl healthy
The ultimate goal is to determine velocity changes at the 3 way interaction of timepoint, intervention, and region, controlling for/averaging over direction with emmeans.
I built the following mixed effects model using R lme4:
dv ~ intervention * timepoint * region + direction + (1|id)
This was my initial attempt to derive a semblance of weights:
dat_weighted <- dat_combined |>
group_by(id, timepoint, direction) |>
mutate(length_dv = length(dv)) |>
mutate(num_speed_per_velocity = n_speed/length_dv)
> head(dat_weighted)
A tibble: 6 × 9
Groups: id, timepoint, direction [5]
n_speed id timepoint direction dv intervention region length_dv num_speed_per_velocity
<dbl> <chr> <chr> <chr> <dbl> <chr> <chr> <int> <dbl>
1 4 subj_2 t2 long 0.627 trt damaged 9 0.444
2 6 subj_1 t1 lat 0.508 ctrl damaged 9 0.667
3 6 subj_1 t1 lat 0.703 ctrl healthy 9 0.667
4 5 subj_1 t2 long 0.748 ctrl healthy 8 0.625
5 6 subj_2 t1 lat 0.326 trt damaged 9 0.667
6 9 subj_2 t2 lat 0.589 trt damaged 8 1.12
Looking forward to your input!