There is absolutely no difference in the way these algorithms operate over grayscale and RGB images and colour SIFT algorithms simply take into account the additional information contained in colour.
These algorithms operate on typical images that are represented as matrices of (usually) integer numbers. Each number represents the intensity, or brightness, of light coming in to the imaging equipment from a particular direction. Colour information is (usually) represented with more than one information channels due to the way light works. These additional channels, also look like matrices. The raw form of a typical RGB optical photography image is three grayscale images packed together, OR, one image with 24 bit colour depth (8 bit for every colour channel).
There are however different ways by which you can visualise the RGB colour space. One of them is the Hue Saturation Value that returns one single number representing colour (Hue) another one representing the strength of colour (Saturation) and another one representing the brightness (Value).
The point in colour SIFT is to diminish the effect that colour has on tracking features but do it in the same way across all different colours (colour invariant). This can be done, either by pre-processing the input image (e.g. transforming it to a different colour space) so that the effect of colour is diminished and THEN sending it to a typical SIFT, or by incorporating colour invariant features to the standard keypoint selection process employed by SIFT (thus, cSIFT).
Besides SIFT, colour invariant metrics are also required in image indexing where they have been in use for longer, so you might get some additional information from "colour invariant metrics in content based image retrieval" (or this) type of searches.
Hope this helps.