You could use any Machine Learning (ML) method suitable for regression (i.e., dealing with numerical outputs) rather than classification. For this, simply create a predictor vector containing the lagged variables $y_{t-1}$ and treat this just as you would any other predictor.
You would need to calculate the forecasts step-by-step so you can always feed the forecast $\hat{y}_t$ as a predictor into the model to predict $\hat{y}_{t+1}$.
You would need to decide on suitable lag orders by, say, using a holdout sample, since ML algorithms won't give you ACF/PACF. (Or you could try to get an idea of good lags by running ACF/PACF plots on the time series itself, disregarding the predictors you have.)
As for specific algorithms, I personally am a big fan of Random Forests, which are implemented in the randomForest package for R. They can deal with nonlinearities, interactions and large numbers of predictors, and they are fast. However, note that most implementations of RFs can inherently not predict outside the historical range. See here for a discussion - the comments point to a couple of implementations using leaf regression, which can extrapolate, and will dampen your trend, which will usually be a good thing.
Other ML algorithms like k-nearest neighbors are similarly inherently unable to extrapolate. If you have a strongly trended series, keep this in mind. You may want to look at differencing your data first.
importanceentry thatrandomForest()returns. This may or may not work well.) – Stephan Kolassa Feb 04 '16 at 10:53