You have two predictors that seem to fail the proportional hazards (PH) assumption, one of which is time-varying. I'll describe other ways that might handle the PH problems better, then end with some suggestions for the approach you describe if that's still necessary.
Much can be learned from the time-dependence vignette of the R survival package, and from Frank Harrell's Regression Modeling Strategies. It's also worth getting access to the classic text by Therneau and Grambsch, which goes into detail about many applications of Cox models. In particular, these references discuss different ways to handle violations of PH.
Model continuous predictors flexibly
It sounds like you tried a single linear term for age in your model, did a PH test, and found a PH violation. If age has a more complicated association than that with outcome, however, then the improper modeling of age can show up as an apparent violation of PH. Specifying the functional form of the association properly might fix the PH problem on its own. A regression spline for age is a good way to let the data tell you the appropriate functional form. I find the rcs() function in the R rms package to have more useful defaults than the ns() function in the splines package.
The same approach could fix the problems with your time-varying covariate, if it's a continuous variable. There is no fundamental difference between time-fixed and time-varying covariates in terms of how the regression coefficients are estimated in a Cox model, except that with a time-varying covariate the algorithm picks out, at each event time, the covariate values that happen to be in place for each at-risk individual at that specific event time. Fitting the proper functional form for the association with outcome thus might also fix the PH issue for your time-varying covariate.
Does lack of PH matter?
With a large data set it's quite possible to have a "statistically significant" violation of PH that doesn't matter in practice. That's a judgment call, to be made based on your understanding of the subject matter. Even if PH is violated, you end up with a type of event-averaged coefficient estimate that might be adequate for some purposes.
Handling time-varying covariates
As noted above, at each event time the algorithm looks only at the covariate values in place at that specific time for all individuals still at risk. There is no consideration of past covariate history, just the present values. That might not adequately describe the association of a covariate with outcome. For example, current blood glucose levels might not be associated so strongly with cardiovascular events as are hemoglobin A1C levels, which represent time-averaged blood glucose. Think very carefully about the biology underlying the time-varying covariate, to see if using only instantaneous values makes sense. In some circumstances you might want to model both the trajectory of the time-varying covariate and the time to event; see the survival task view for suggestions about joint modeling.
Your approach
There is no problem (in principle) building a Cox model with time-varying covariates and adjusting those covariates via a function of time to handle a violation of PH. Adjusting time-varying covariates as a function of time requires specifying separate time-adjusted covariate values for each individual at risk at each event time in the data set. The footnote to this answer mentions 2 ways to start to do that, although you might need to do some data manipulation yourself to ensure the correct format. You have to make sure that, at each event time, the algorithm can find the correct time-adjusted value of the time-varying covariate for each individual still at risk. The potential problem in practice is that you can end up with extremely large data sets, as a single individual will have one row for each event time in the entire data set during which the individual is at risk.
Stratification by another variable (ageGroup here) adds no additional problem; you simply set ageGroup as a multi-level categorical predictor and specify a term strata(ageGroup) in the coxph() function of the survival package (or strat(ageGroup) if you use the cph() function of the rms package). At each event time, the comparisons among covariate values are restricted to individuals within the same stratum as the individual having the event. Sometimes having a large number of strata can lead to practical problems arising from small numbers of individuals within a stratum. Thus, if a spline doesn't fix the PH problem for age, I'd recommend modeling a time-varying coefficient for age instead as described in the time-dependence vignette rather than breaking age up into multiple strata.
In response to comments
The hazard in a Cox model for an individual $i$ with time-varying covariate values $X_i(t)$ can be written:
$$h_i(t)= h_0(t) \exp(X_i(t) \beta) ,$$
where $h_0(t)$ is the baseline hazard and $\beta$ is the vector of regression coefficients (coefficients assumed for now to be constant in time). That is the form handled directly by the coxph() function in the R survival package via the counting-process format, with outcomes coded as Surv(startTime, stopTime, status). That form allows for time-fixed covariates too; you just code the same value of a time-fixed covariate into each data row for an individual.
For stratification of such a model, you have two choices. You could assume that only the baseline hazards are different among strata, but the $\beta$ coefficients are shared among strata. Then the above equation, for individual $i$ in stratum $s$, becomes:
$$h_i(t | s)= h_{0,s}(t) \exp(X_i(t) \beta) ,$$
where $h_{0,s}(t)$ is the baseline hazard for stratum $s$. In the coxph() function you specify such a fit for strata defined by ageGroup by adding a term +strata(ageGroup) to the predictors. That's how stratification is usually handled. Again, there is no problem with incorporating time-varying covariates via the counting-process data format, or specifying a time-constant covariate by simply repeating the same value for each data row corresponding to an individual.
It's possible also to allow one or more $\beta$ coefficients to differ among strata. For that, you add an interaction term between the predictor of interest and the strata. For example, if you want the coefficient for cov1 to vary among age-group strata, include a predictor term +cov1*strata(ageGroup). The statements above for incorporating both time-fixed and time-varying covariates into the model still hold.
strata()term in such a model Is there some more specific problem that you have in mind? If so, please edit the question to elaborate. – EdM Apr 14 '23 at 09:10