2

Assume I tracked the behavior of users when they visit a Website. In detail I track the movement from which site to site they click within the Website. So I have a lot of tuples with (userID, site, time). Now I want to visualize if there are any patterns or clusters for the movement. Let's say most of the users click step by step through the Website, and another group visits site 1 then site 2, then they go back to 1 and then to 2 and then to site 3.

What method can I use to classify the behavior?

a<-as.POSIXlt("2013-07-01 00:30:00")
b<-as.POSIXlt("2013-07-29 00:30:00")

aI<-as.numeric(a)
bI<-as.numeric(b)

times<-sample(seq(aI,bI,by=2),10000)
t<-sort(times)
class(t)<-c("POSIXt","POSIXct")
id<-seq(1,10050,20)
userID<-1
for(i in 1:200){
   userID<-c(userID,sample(id[i]:id[i+1],50,replace=T))
}
userID<-userID[1:10000]

movement<-list(LETTERS[1:20],c("A","B","A","B","C","D","E","F","G","H","I")
              ,c("A","B","C","B","C","D","E","D","C","D","E")
              ,c("C","B","C","D","E","F","G","H","I","I"))

site<-character(10000)
for(i in unique(userID)){
   p<-sample(movement,1)[[1]]
   site[userID==i]<-p[1:length(userID[userID==i])]
}

table<-data.frame(userID=userID,site=site,time=t)
Nick Cox
  • 56,404
  • 8
  • 127
  • 185
Klaus
  • 451

2 Answers2

2

Not an expert on this, but I'd suggest you look at package TraMineR. It's a very nice package. The description says:

This package is a toolbox for sequence manipulation,
description, rendering and more generally the mining of
sequence data in the field of social sciences. Though it is
primarily intended for analyzing state or event sequences that
describe life courses such as family formation histories or
professional careers its features also apply to many other
kinds of categorical sequence data. It accepts many different
sequence representations as input and provides tools for
translating sequences from one format to another. It offers
several statistical functions for describing and rendering
sequences, for computing distances between sequences with
different metrics among which optimal matching, the longest
common prefix and the longest common subsequence, and simple
 functions for extracting the most frequent subsequences and
identifying the most discriminating ones among them.
Wayne
  • 21,174
  • Thx, nice pkg, is there a methode you prefer to apply on my dump in start post? – Klaus Aug 14 '13 at 19:16
  • I fiddled with it a bit, the restriction is that you have an event sequence and the package has fewer options for working with that kind of data. Not sure what to suggest, specifically. – Wayne Aug 15 '13 at 13:17
  • I suggest for a function if checks for all users how they steped during the factor variable "site" depending on time. It should look like a tree, where the next level splits the number of users in several decisions which are given by the factor site. – Klaus Aug 15 '13 at 13:32
1

Now I want to visualize if there are any patterns or clusters for the movement.

Parallel coordinates are an excellent tool for such things.

What method can I use to classify the behavior?

I get the feeling you are not entirely sure what you want. Since you have no a priori idea of what kind of behaviour (classes) you're going to get, you should probably look at clustering techniques instead to learn the structure in your data.

Nick Cox
  • 56,404
  • 8
  • 127
  • 185
Marc Claesen
  • 18,401
  • look at clustering techniques yes I reed something about that, could you show me an example by using my dump in the start post? – Klaus Aug 14 '13 at 19:14
  • I would describe the algorithm Iam looking for in the following steps. 1) Check for all user how they step during the Website depending on time (it should look like a tree). 2) Summerise if there are some main patterns, means if a number of people shows the same behavior. – Klaus Aug 15 '13 at 12:14