For your first question,* automated variable selection is fraught with difficulties. If you do it, your results might not extend well to new data.
Frank Harrell provides extensive guidance in his course notes and Regression Modeling Strategies (RMS) book, particularly in Chapter 4 with respect to general strategies.
If you only have 9 predictors (including all levels beyond the first of categorical variables and any non-linear or interaction terms) there might be no need to reduce the number at all if you have on the order of 150 events, as you probably aren't at risk of severe overfitting. Even if some covariates don't have "statistically significant" coefficients, keeping them in will improve the predictive ability of the model.
If you have too many predictors for the number of events, see his recommendations for data reduction, or consider penalized regression (e.g., ridge) for covariates not of primary interest.
An exception considered by Harrell is to do "limited backwards step-down variable selection if parsimony is more important than accuracy." (RMS, page 97).
For your second question, including a predictor as a time-varying covariate does NOT remove the need to evaluate its adherence to proportional hazards. A Cox model evaluates the covariates based on their values at event times. The association between a covariate and the hazard of an event certainly might change over time.
As I recall, you have a model that uses patient age as the time scale and reaching some level of disability as the event. Do you think that the extra log-hazard of an event due to having cancer is the same for a 30-year-old as it is for an 80-year-old? That's what a proportional hazards (PH) model would assume in that scenario. You need to evaluate that and at least illustrate any substantial non-proportionality with, for example, a plot of smoothed scaled Schoenfeld residuals over time.
*I recall that you have been warned about the dangers in treating both the "cancer" predictor and the disability "event" as binary. Different types and stages of cancer might have different associations with the disability you have in mind, and disability is typically graded rather than all-or-none. I'm providing an answer assuming that a survival model with time-varying covariates is appropriate, but consider whether this survival model is the best way to represent your data.