I am analyzing data from cancer patients that underwent surgery for cancer removal. During surgery, the surgeon checked a variable number of lymph nodes to see how many had cancer in them. This is reported as the number of cancer positive lymph nodes ($a$)and cancer negative lymph nodes ($b$) and can be expressed as the lymph node ratio ($\frac{a}{a+b}$) that is the proportion of lymph nodes positive for cancer in that patient. It is important to note that the total number of lymph nodes checked for each patient varies significant (from 1 to 109).
I want to correlate the lymph node ratio with a molecular feature like gene mutation. Each patient either has or does not have the gene mutation. The idea is that higher lymph node ratio is an indicator of poor prognosis and we want to see if the gene mutation is associated with more aggressive tumors.
My initial thought was to use a t-test or wilcoxon test, but based on my (limited) knowledge of the beta distribution, I think simply using the ratio is throwing away information about the ratio's uncertainty for each patient that is inherent to the count data. This makes me thing beta regression is possibly the better approach.
I have found the beta regression library in R (betareg) that performs beta regression, but this accepts the outcome variable as a ratio. I can't tell how to incorporate the total number of lymph nodes into the estimation procedure or if this is even possible with beta regression. I wasn't sure if it would be appropriate to set the weights parameter to be the total number of lymph nodes in a similar fashion to how logistic regression with aggregate data is performed with glm.
Can the total number of successes and failures be incorporated into beta regression, and if so how?