So, I need to do some exploratory data analysis and I picked MDS to figure up if there were trends in the data. The structure of my data looks like this:
$ Generation: int 2 2 2 2 2 2 2 2 2 2 ...
$ Panel : chr "A" "A" "A" "A" ...
$ Line : int 1 1 1 1 1 1 1 1 1 1 ...
$ Rep : int 2 2 2 2 2 2 2 2 6 6 ...
$ Sex : chr "F" "F" "F" "F" ...
$ Size : num 1662 1720 1721 1778 1565 ...
$ ILD12 : num 1930 1954 1947 1932 1915 ...
$ ILD15 : num 1524 1567 1575 1539 1528 ...
$ ILD18 : num 427 414 420 389 418 ...
$ ILD23 : num 732 706 702 733 749 ...
$ ILD25 : num 1380 1386 1383 1393 1391 ...
$ ILD29 : num 1544 1584 1554 1568 1531 ...
$ ILD37 : num 1586 1546 1575 1568 1611 ...
$ ILD39 : num 2070 2060 2046 2061 2060 ...
$ ILD46 : num 1515 1481 1498 1493 1532 ...
$ ILD49 : num 1970 1973 1953 1971 1962 ...
$ ILD57 : num 673 695 705 691 697 ...
$ ILD58 : num 1117 1166 1172 1164 1127 ...
$ ILD67 : num 192 194 188 196 178 ...
$ ILD69 : num 611 644 623 642 585 ...
$ ILD78 : num 522 552 531 545 497 ...
$ ILD89 : num 97.5 99.2 97.9 99.9 96.9 ...
How would I deal with categorical data in my dataset if I am using R to analyse the data? I am using ggPlot too - would I just fit a model first using cmdscale and then plot the x and y coordinates?
So something like this:
ggplot(df, aes(x=x, y=y, color = Panel)) +
geom_point() +
ggtitle("Metric MDS Results") +
labs(x="Coordinate 1", y="Coordinate 2")
theme_bw()
Am I correct to assume the color parameter in ggplot shows the similarity of categorical variable Panel?