A first solution would be to detect outliers from the distances between the sequences and a representative sequence such as the medoid. You get the distances from the medoid using for example the disscenter function of TraMineR. (See the help page of that function.) Once you have these distances, you can define outliers as those sequences that lie at more than say 2 or 2.5 times the pseudo standard deviation from the medoid. The pseudo standard deviation is obtained with the dissvar function.
Alternatively, you could consider multiple representatives instead of a single one. (See Gabadinho and Ritschard, 2013.) In that case you would retain the distance of each sequence to its closest representative. I illustrate below using the 2000 sequences in the biofam dataset that ships with TraMineR.
library(TraMineR)
data(biofam)
biofam.lab <- c("Parent", "Left", "Married", "Left+Marr",
"Child", "Left+Child", "Left+Marr+Child", "Divorced")
biofam.slab <- c("P","L","M","LM","C","LC","LMC","D")
biofam.seq <- seqdef(biofam, 10:25, labels=biofam.lab, states = biofam.slab)
## Computing an OM dissimilarity matrix using INDELSLOG costs
costs <- seqcost(biofam.seq, method="INDELSLOG")
biofam.om <- seqdist(biofam.seq, method="OM", indel=costs$indel,
sm=costs$sm)
## Representative set using the neighborhood density criterion
## The neighborhood radius (pradius) is set as
## 20% of the maximal distance, and the number of representatives
## is chosen so as 75% of the sequences lie in the neighborhood of
## a representative.
biofam.rep <- seqrep(biofam.seq, diss=biofam.om, criterion="density",
coverage=.75, pradius=.2)
rep.idx <- attr(biofam.rep, "Index") ## indexes of repreentatives
rep.seq <- biofam.seq[rep.idx,]
dist.to.rep <-attr(biofam.rep, "Distances")
min.dist <- apply(dist.to.rep, 1, min, na.rm=TRUE)
discrep <- dissvar(biofam.om)
q <- 2*discrep
outliers <- which(min.dist> q)
## Plot of representatives and outliers
par(mfrow=c(1,3))
seqiplot(rep.seq, sortv="from.end", with.legend=FALSE,
border=NA, main="Representatives")
seqIplot(biofam.seq[outliers,], sortv="from.end", with.legend=FALSE,
main="Outliers")
seqlegend(biofam.seq)
Here, we see that there are 29 outliers (out of 2000 sequences). They include people who continue staying with their parents after getting married, who live with a child without getting married, who marry early and don't have children, and people who divorce.