12

I'm looking for educational datasets from MOOCs. PSLC DataShop contains some learning interaction data, but not from MOOCs. I'm especially interested in logs tracking students' activities such as browsing the website or submitting answers.

Franck Dernoncourt
  • 7,780
  • 9
  • 39
  • 86

5 Answers5

7

From my knowledge, Coursera uses Backbone.js for their site, so you can find several JSON endpoints with a lot of data. Unfortunately, I am not familiar with Backbone and the only links I know are from Google

List of all courses - https://www.coursera.org/maestro/api/topic/list?full=1

Another list of courses - https://www.coursera.org/maestro/api/topic/list2

List of all universities - https://www.coursera.org/maestro/api/university/list

Information about a specific course - https://www.coursera.org/maestro/api/topic/information?topic-id=compdata

Maybe if you are familiar with those technologies, you can find other links with more details and maybe with non-personal data you may find interesting.

UPDATE Also, this user seems to have a lot of data about MOOC (completion rates etc). Maybe you can contact with him and ask him to share them with you.

http://moocmoocher.wordpress.com/2013/02/13/synthesising-mooc-completion-rates/

Tasos
  • 4,714
  • 3
  • 20
  • 43
7

Udacity just (August 12, 2014) released an API that makes all of their course information easily available:

See the overview here: Udacity Course Catalog API

And the documentation here: Udacity Course Catalog API Documentation

Some other APIs that have online course data:

michaelrbock
  • 171
  • 1
  • 4
4

The staff from the metadata course on coursera posted some statistics, which I aggregated here.

Karsten W.
  • 940
  • 5
  • 15
4

How about this?

MIT and Harvard release de-identified learning data from open online courses

Data:

enter image description here

Franck Dernoncourt
  • 7,780
  • 9
  • 39
  • 86
psychemedia
  • 266
  • 1
  • 2
  • Thanks, that's nice but there is barely any information for each student :( (see screenshot that I added in your answer). A few MB compared to ~ 10 GB by course when they receive the database dump from edX. – Franck Dernoncourt Aug 13 '14 at 20:25
1

I come after the storm but:

  • Yu, J., Luo, G., Xiao, T., Zhong, Q., Wang, Y., Feng, W., ... & Tang, J. (2020, July). MOOCCube: a large-scale data repository for NLP applications in MOOCs. In Proceedings of the 58th annual meeting of the association for computational linguistics (pp. 3135-3142). http://moocdata.cn/data/MOOCCube
  • Kuzilek J., Hlosta M., Zdrahal Z. Open University Learning Analytics dataset Sci. Data 4:170171 doi: 10.1038/sdata.2017.171 (2017). https://analyse.kmi.open.ac.uk/open_dataset (preview on Kaggle)
  • Choi, Y., Lee, Y., Shin, D., Cho, J., Park, S., Lee, S., Baek, J., Kim, B., & Jang, Y. (2019). EdNet: A Large-Scale Hierarchical Dataset in Education. Artificial Intelligence in Education, 12164, 69-73. https://github.com/riiid/ednet
  • (only exercises) P. J. Chen, M. E. Hsieh, T. Y. Tsai. Junyi Online Learning Dataset: A large-scale public online learning activity dataset from elementary to senior high school students., 2020. Available from https://www.kaggle.com/junyiacademy/learning-activity-public-dataset-by-junyi-academy.

I am notably interested if content is available; it seems that Junyi contains problem titles in Taiwanese, and MOOCCube contains a lot of content information.