@civilstat's answer already explains why you should use dedicated splitting techniques for time series data.
This answer is for data that we do not think of as time series. I.e., data that may have an unwanted correlation with time, such as detector aging/drift.
In chemometrics, variants of cross validation which do not shuffle are known (but AFAIK rarely used), e.g.
venetian blinds cross validation: assigning case i to fold $(i \mod k) + 1$
The idea here is basically doing a stratified cross validation. Like for any other stratification, while it is fine to do so for fixed factors/outcome variables, doing this for a random factor (which measurement order typically is) would make the error estimate optimistically biased.
block-wise cross validation: using contiguous blocks for the folds
We do have a general recommendation to shuffle (randomize) the order in which measurement are done, so one may argue that that measurement shuffling can also serve for the cross validation.
However: the reason behind the recommendation to shuffle measurement order is
that we typically need to think of detector drift, i.e. slow systematic changes of measurements over time. In addition, we often need to think and check the possibility of contamination leading to neighboring measurements being more similar to each other. Both can be detected with an appropriate design for the calibration samples, typically by randomization of the measurement order.
So, if the measurements didn't follow an appropriate design, the best randomization for the cross validation cannot break correlation that is already present in the data and the error estimate will be optimistically biased.
Also, I recommend to still randomize the CV:
- so we can properly check the CV predictions for drift effects
- IMHO k-fold CV should be repeated in order to check stability. For this we anyways need different splits.
update
Does that [randomization] really make sense if it means that for each fold, I will have training examples that look almost identical to each of the validation examples?
I'd say it's the other way round: if randomization causes too similar cases to end up in training and test subsets, something is wrong with your CV set-up at a higher level. And then, not randomizing the order (shuffling) is also not an appropriate solution, since you will still have neighboring examples ending up in train and test subsets. Your mistake may be lower, but OTOH appropriate solutions exist, so use them.
Remember: the basic requirement is to split so that train and test subsets are statistically independent. From a stats point of view, this independence must be satisfied for all random factors in your modeling.
For data with repeated/multiple measurements of the same physical sample or the same patient, this may be done by splitting into training vs testing patients (rather than measurements).
For your example case of image sequences, it may mean excluding chunks of data at the boundary between training and testing.
OTOH, if the similarity is due to a fixed factor (influencing factors that you want the model to use to make better predictions), you don't need statistical independence and may even stratify your train and test sets. E.g., if I know detector temperature to have a systematic effect on my measurements from which I predict some analyte concentration, I may split so that folds have approximately equal coverage of concentration as well as temperature ranges.
Whether such influencing factors are present and whether they are random or fixed is very application specific, so you'll need to decide this as part of your modeling and then set up your CV splitting accordingly.