We did a study on the reliability of a task where we measure for how long (milliseconds) people look at certain images. The internal reliability (consistency) at the first session was very high (cronbach's alpha = .90) but at the second session it drops dramatically (alpha = .30). Despite reliability being low at the second session, the test-retest reliability (stability) of the scores is high (alpha = .80).
We are now struggling to understand how high test-retest reliability can be achieved with low internal reliability. Some of the things we have already looked into include the variance of the outcome (similar at both sessions) and the relationship between scores at session 1 and session 2 (linear correlation).
Any suggestions would be greatly appreciated!