Can power decrease despite an increase in sample size?

Question

I've analyzed fish density data (log(x+1) transformed) to see what power I'll have to detect a 30% increase and decrease in density from future surveys, and for which species of interest. Using pwrss::power.t.test, I found I needed to estimate something called a "non-centrality parameter" (ncp) and I suspect I'm not doing it correctly. My results show that power can sometimes decrease even though sample size has increased. This doesn't make sense to me so I'm assuming I've done something wrong. Can power decrease (are my calculations of the ncp correct) and are there any obvious mistakes here?

Data:

Season = time of year
assem = species
n = sample size
mean_log_den = mean log(density+1) of n sites worth of data
ln_up/ln_down = hypothetical increase/decrease in mean log(density+1) by 30%

My method:

I do these two tests (seen below) for n=47, 94, 141, 188, and 235 (adding 47 sites, 1 seasons worth of data each time) to artificially boost the sample size (actually increasing n is limited by funding). I'm asking the questions...using n=47 first (call it dry season 2023), what power do I have to detect a 30% change in density if I sample again in dry season 2024 (again, n=47)? Not enough power? I combine 2022 and 2023 dry season data and ask what power would I have 2 years from now (2024 + 2025 data)? Do I have enough data to detect a change 3 years from now? 4 years? Etc...and the (ibbeam_table_log$ln_up*sqrt(ibbeam_table_log$n)) part, I found, gives me multiple power calculations for each of the 5 species and 2 seasons = 10 power calculations per sample n size that I try (this was simply my attempt to make the process go faster).

# alpha = 0.025 for two tests
power_up_log <- power.t.test(ncp = (ibbeam_table_log$ln_up*sqrt(ibbeam_table_log$n)),
                             df = 46,
                             alpha = 0.025,
                             alternative = "not equal",
                             plot = FALSE)
power_down_log <- power.t.test(ncp = (ibbeam_table_log$ln_down*sqrt(ibbeam_table_log$n)),
                               df = 46,
                               alpha = 0.025,
                               alternative = "not equal",
                               plot = FALSE)
ibbeam_table_log$pwr_plus30_log <- power_up_log
ibbeam_table_log$pwr_minus30_log <- power_down_log

Plot (shows one species, Gulf pipefish, actually has a drop in power even though sample size increases):

My source (information within the plot) for ncp = mu_2*sqrt(n) where mu_2 is the resultant hypothetical 30% increase or decrease in density we hope to detect (with =>80% power).

Data:

    > dput(log)
structure(list(Season = c("DRY", "DRY", "DRY", "DRY", "DRY", 
"WET", "WET", "WET", "WET", "WET", "DRY", "DRY", "DRY", "DRY", 
"DRY", "WET", "WET", "WET", "WET", "WET", "DRY", "DRY", "DRY", 
"DRY", "DRY", "WET", "WET", "WET", "WET", "WET", "WET", "WET", 
"WET", "WET", "WET", "DRY", "DRY", "DRY", "DRY", "DRY", "DRY", 
"DRY", "DRY", "DRY", "DRY", "WET", "WET", "WET", "WET", "WET"
), assem = c("Far", "Goldspotted Killifish", "Gulf Pipefish", 
"Pal", "Rainwater Killifish", "Far", "Goldspotted Killifish", 
"Gulf Pipefish", "Pal", "Rainwater Killifish", "Far", "Goldspotted Killifish", 
"Gulf Pipefish", "Pal", "Rainwater Killifish", "Far", "Goldspotted Killifish", 
"Gulf Pipefish", "Pal", "Rainwater Killifish", "Far", "Goldspotted Killifish", 
"Gulf Pipefish", "Pal", "Rainwater Killifish", "Far", "Goldspotted Killifish", 
"Gulf Pipefish", "Pal", "Rainwater Killifish", "Far", "Goldspotted Killifish", 
"Gulf Pipefish", "Pal", "Rainwater Killifish", "Far", "Goldspotted Killifish", 
"Gulf Pipefish", "Pal", "Rainwater Killifish", "Far", "Goldspotted Killifish", 
"Gulf Pipefish", "Pal", "Rainwater Killifish", "Far", "Goldspotted Killifish", 
"Gulf Pipefish", "Pal", "Rainwater Killifish"), n = c(188L, 188L, 
188L, 188L, 188L, 188L, 188L, 188L, 188L, 188L, 235L, 235L, 235L, 
235L, 235L, 235L, 235L, 235L, 235L, 235L, 141L, 141L, 141L, 141L, 
141L, 141L, 141L, 141L, 141L, 141L, 47L, 47L, 47L, 47L, 47L, 
47L, 47L, 47L, 47L, 47L, 94L, 94L, 94L, 94L, 94L, 94L, 94L, 94L, 
94L, 94L), mean_log_den = c(0.471, 0.231, 0.301, 0.581, 1.366, 
0.272, 0.554, 0.112, 0.351, 1.237, 0.63, 0.237, 0.37, 0.603, 
1.409, 0.343, 0.532, 0.101, 0.353, 1.34, 0.342, 0.221, 0.216, 
0.473, 1.331, 0.223, 0.606, 0.107, 0.373, 1.25, 0.192, 0.629, 
0.142, 0.375, 0.97, 0.38, 0.25, 0.378, 0.434, 1.737, 0.276, 0.216, 
0.192, 0.404, 1.457, 0.184, 0.579, 0.117, 0.374, 1.263), ln_up = c(0.6123, 
0.3003, 0.3913, 0.7553, 1.7758, 0.3536, 0.7202, 0.1456, 0.4563, 
1.6081, 0.819, 0.3081, 0.481, 0.7839, 1.8317, 0.4459, 0.6916, 
0.1313, 0.4589, 1.742, 0.4446, 0.2873, 0.2808, 0.6149, 1.7303, 
0.2899, 0.7878, 0.1391, 0.4849, 1.625, 0.2496, 0.8177, 0.1846, 
0.4875, 1.261, 0.494, 0.325, 0.4914, 0.5642, 2.2581, 0.3588, 
0.2808, 0.2496, 0.5252, 1.8941, 0.2392, 0.7527, 0.1521, 0.4862, 
1.6419), ln_down = c(0.3297, 0.1617, 0.2107, 0.4067, 0.9562, 
0.1904, 0.3878, 0.0784, 0.2457, 0.8659, 0.441, 0.1659, 0.259, 
0.4221, 0.9863, 0.2401, 0.3724, 0.0707, 0.2471, 0.938, 0.2394, 
0.1547, 0.1512, 0.3311, 0.9317, 0.1561, 0.4242, 0.0749, 0.2611, 
0.875, 0.1344, 0.4403, 0.0994, 0.2625, 0.679, 0.266, 0.175, 0.2646, 
0.3038, 1.2159, 0.1932, 0.1512, 0.1344, 0.2828, 1.0199, 0.1288, 
0.4053, 0.0819, 0.2618, 0.8841), pwr_plus30_log = c(0.999999999, 
0.96772348, 0.998990992, 1, 1, 0.994976086, 1, 0.398045507, 0.999964469, 
1, 1, 0.992979772, 0.999999823, 1, 1, 0.999997407, 1, 0.405403699, 
0.999999009, 1, 0.998605925, 0.872729631, 0.856087956, 0.999999704, 
1, 0.878995969, 1, 0.272814915, 0.999734214, 1, 0.282095586, 
0.999330182, 0.156470172, 0.843434601, 1, 0.853594832, 0.470449384, 
0.84958515, 0.935692326, 1, 0.882990675, 0.671554282, 0.55798027, 
0.997290751, 1, 0.51850308, 0.99999964, 0.215867708, 0.991983902, 
1), pwr_minus30_log = c(0.98773456, 0.484379053, 0.735043484, 
0.999511868, 1, 0.637461744, 0.998816741, 0.120737331, 0.86537357, 
1, 0.99999631, 0.613380068, 0.956146549, 0.999986314, 1, 0.922117091, 
0.999705432, 0.122784707, 0.936517997, 1, 0.717613612, 0.336935564, 
0.322036093, 0.951012992, 1, 0.342969698, 0.997021704, 0.087815708, 
0.797042823, 1, 0.090177931, 0.756224614, 0.059082879, 0.311828281, 
0.988816152, 0.320124089, 0.141701672, 0.316794624, 0.414714267, 
0.999999998, 0.346987146, 0.21335981, 0.169775598, 0.678437659, 
1, 0.156637885, 0.94894302, 0.073661564, 0.603538515, 1), pwr2 = c(98.773456, 
48.4379053, 73.5043484, 99.9511868, 100, 63.7461744, 99.8816741, 
12.0737331, 86.537357, 100, 99.999631, 61.3380068, 95.6146549, 
99.9986314, 100, 92.2117091, 99.9705432, 12.2784707, 93.6517997, 
100, 71.7613612, 33.6935564, 32.2036093, 95.1012992, 100, 34.2969698, 
99.7021704, 8.7815708, 79.7042823, 100, 9.0177931, 75.6224614, 
5.9082879, 31.1828281, 98.8816152, 32.0124089, 14.1701672, 31.6794624, 
41.4714267, 99.9999998, 34.6987146, 21.335981, 16.9775598, 67.8437659, 
100, 15.6637885, 94.894302, 7.3661564, 60.3538515, 100)), row.names = c(NA, 
-50L), class = "data.frame")

A power analysis is done during study planning, so the question is rather hypothetical. — Michael M, Oct 05 '23 at 21:39
@MichaelM, Is it possible to take the perspective of "ok, this was the density we observed in the past, now can we calculate power for future studies with the same parameters and increased n.."? — Nate, Oct 05 '23 at 21:43
Why are you interested in computing power for a t-test with 46 degrees of freedom? — dipetkov, Oct 05 '23 at 22:03
@dipetkov, Each set of (2) points on the plot (by species and sample size) is a single power analysis result I did (the first was the dataset when n=47 sites and I thought degrees of freedom was n-1). I repeated this on subsets of the data where n=94 (df =93), n=141 (df=140), etc... Using the R code in the way I did, it spits out 10 values of power at a time (2 seasons x 5 species). I then aggregated those results and plotted them. Does this make sense? — Nate, Oct 05 '23 at 22:10
Is a t-test the wrong test? Am I technically doing 50 tests (5 sample sizes, 5 species, 2 seasons)? — Nate, Oct 05 '23 at 22:18
To me it's not clear what data you collect and what analysis you do. Do you want to detect a change of 30% between two consecutive seasons, wet and dry? (I also have some doubts about the transformation log(x+1) and why apply the 30% to the transformed value and not to x.) So I don't understand whether a t-test makes sense. It could be helpful to explain the details. — dipetkov, Oct 05 '23 at 22:51
My goal is to see for which species can I detect an increase or decrease in density (30% as an example). It's likely this only works for really abundant species because 30% changes of higher numbers are easier to detect. The project has a sample size problem though. Due to funding, we can't physically sample more sites, but what if we combine data; density data from 2023 dry season + 2022 dry season into a single dataset (2 years), or 2023+2022+2021, (3 years) etc.. and ask, what power do we have to detect a 30% change in density 2, 3, 4, 5, years from now? — Nate, Oct 05 '23 at 23:16
I want to know what power I would have to detect a change in density between 2023 dry season to 2024 dry season (that's one scenario). Is that not enough power? What if I combined 2022+2023 and asked the same question for 2024+2025 dry season density (like-with-like for seasons). Applying the 30% change on the transformed data was a guess, I thought that was the correct way to do it - I'm open to suggestions! I chose the log(density+1) transform because of the high number of zeros and large variance in each sample (catches are often "0"'s and or 100+). — Nate, Oct 05 '23 at 23:24
There are challenges in combining data across years: the true density may be varying from year to year (and season to season). Have you considered simulating data to estimate power? Simulation works well for more complex situations and it has other benefits: setting up the simulation may help you to clarify how to do the analysis as well. — dipetkov, Oct 06 '23 at 07:14

J-J-J · Accepted Answer · 2024-01-29T08:47:44.220

To answer the question in the title (besides all the problems raised in the comment section), power can decrease while the sample size increases, if simultaneously you reduce enough the effect size you want to detect - which is what you did in your calculations, and the source of the drop in power you observed.

As a side note: another circumstance where power can decrease with an increase in sample size is when we're dealing with discrete random variables. In this case, we can have small fluctuations in power as the sample size increases. The large drop in power observed in the question does not come from that, so I don't address this "small fluctuations" problem in this answer. However, if you're interested in this problem and where it can arise, you can refer to: Chernick, M. R., & Liu, C. Y. (2002). The Saw-Toothed Behavior of Power Versus Sample Size and Software Solutions. The American Statistician, 56(2), 149–155. https://doi.org/10.1198/000313002317572835.

Now, coming back to the original question: why did you see this drop in power?

The effect size usually used for power calculations for a one-sample t-test is Cohen's d.

In your code, you used the ln_down variable as Cohen's d. For the case where you checked the power for a sample size of 47, you defined $d= 0.2646$, and for the case where you checked the power for a sample size of 94, you used $d=0.133$. Increasing the sample size did not make up for the decreased effect size.

Using another R library (pwr), here is what you did:

library(pwr)
pwr.t.test(n=94, d = 0.133, sig.level = 0.025, 
            type =  "one.sample",
            alternative = "two.sided")
One-sample t test power calculation
      n = 94
      d = 0.133

sig.level = 0.025
      power = 0.1664253
alternative = two.sided
pwr.t.test(n=47, d = 0.2646, sig.level = 0.025, 
            type =  "one.sample",
            alternative = "two.sided")
 One-sample t test power calculation 

          n = 47
          d = 0.2646
  sig.level = 0.025
      power = 0.3167946
alternative = two.sided

I see, thank you! Also, am I technically doing a short-hand version of 100 power tests (2 seasons, 5 species, 5 scenarios, 2 possible effects) and should lower alpha even further? Maybe to 0.0005? — Nate, Oct 06 '23 at 11:37
@Nate Probably. I think you should ask a question about the problem you're trying to solve, and what would be an appropriate analysis for that. The discussion in the comment section under your current question may be a good basis to formulate this other question. — J-J-J, Oct 06 '23 at 16:27
Why would he ask again on CrossValidated when he has just asked here? — Ben, Jan 29 '24 at 07:59
@Ben I think there are two different problems, one described in the title of the question (that could include the "small fluctuations in power" issue ) and another problem described in the body of the question (that does not come from that). But still a good point. I removed my remark about asking a second question, because I'm not sure how we should proceed when the title and the body of a question do no lend themselves to similar responses. — J-J-J, Jan 29 '24 at 08:07

Can power decrease despite an increase in sample size?

1 Answers1