Please ensure Javascript is enabled for purposes of website accessibility
Hamburger Nav

Digital Learning Platforms and Secondary Data Analysis

[This SEERNet guest blog is written by Sarah Miller, Research Scientist for E-TRIALS at Worcester Polytechnic Institute.]

As researchers begin to understand the possibility of designing studies using Digital Learning Platforms (DLPs) to collect data, there is an additional opportunity to explore the potential in data that DLPs have previously captured for secondary analysis. Compared with traditional school-based data collection procedures, DLPs are unique in their ability to accumulate data from thousands of students each year. Researchers with access to such data can apply analytic techniques to expand learning science without impacting student users (Cohen, 2017). That is why Neil and Cristina Heffernan, co-founders of ASSISTments (Heffernan and Heffernan, 2014), have made it a priority to openly share data from the platform with researchers.

ASSISTments is a free DLP through which teachers can assign math work and quickly review resulting formative assessment data. As students complete their work, they receive support in the form of correctness feedback, hints, and explanations. They are also given the opportunity to demonstrate learning by re-trying similar problems from the skills. When papers that use ASSISTments datasets are published out of Dr. Heffernan’s lab at WPI, the dataset is also made available in a de-identified manner. This practice is an important part of his lab’s commitment to the IES SEER principles of open science and the desire to allow others to replicate their work. In addition to ensuring the lab’s datasets are accessible, Dr. Heffernan has also made this procedure a requirement for other researchers who design studies within E-TRIALS (Ed-Tech Research Infrastructure to Advance Learning Science). One of E-TRIALS’ terms of use stipulates that researchers must be willing to follow the tenets of open science, including sharing the anonymized dataset that is provided to them to do their research. 

These datasets often focus on the results of randomized controlled trials or all of the assignments completed in ASSISTments in a given school year. Additional datasets focus on students’ responses to certain types of problems or information about the problems themselves. Despite being collected in the same DLP, each type of dataset contains different information which may be used for different purposes. For example, the 50 Experiments+2022 provides clickstream data from 50 randomized controlled trials run in ASSISTments and was intended to be used to investigate personalized learning (Haim et al., 2022). In contrast, ASSIST2009-10 is a school-year specific data set, which includes both mastery learning data and non-mastery data and has previously been used to predict student performance (such as, Pardos et al. 2012). This dataset was one of the first Dr. Heffernan’s lab made publicly available. Reviewing this dataset’s pattern of use indicates data from ASSISTments has become increasingly prevalent throughout the 2020s. Specifically, ASSIST2009-10 was cited at least 12 times between 2012 and 2018, 24 times in 2019 and 2020, and 37 times since 2021. Many researchers, such as Abdelrahman and colleagues (2023) use multiple ASSISTments’ datasets to answer their research questions.

For publicly available datasets to remain accessible and traceable over time, each link hosting a dataset must remain unchanged. Of course, to be made publicly available, datasets must meet certain criteria to protect both ASSISTments users and ASSISTments itself. Every effort is made to remove personally identifiable information (PII) from data sets by both ASSISTments and Dr. Heffernan’s lab members. First, before ASSISTments provides researchers with access to data, individual students and classrooms are assigned a unique identifier and personal information such as student names and email addresses are removed. Second, Dr. Heffernan’s lab members review the data to ensure students have not included any personal information in their answers. For example, when students are allowed to submit pictures of their work for certain questions, additional information, such as a map which may relate to their hometown, may be captured in the background of their picture. These types of responses are carefully reviewed and cleaned prior to sharing with other researchers. In addition, it is important for ASSISTments to ensure that data specific to the platform that could be used to build a similar system is not shared with a third party competitor. To mitigate such concerns, ASSISTments requires university oversight for certain datasets which the lab considers to be available upon request rather than easily downloadable. 

Researchers all around the world contact Dr. Heffernan’s team about data sets available upon request. Some of these researchers have connected with members of the lab through conferences while others reach out after having read a paper that cited a certain dataset. The majority of individuals requesting data sets are early career researchers enrolled in masters or doctoral programs. In addition, Dr. Heffernan has created multiple pathways for researchers to obtain alerts when new data sets become available. 

To explore ASSISTments datasets for secondary analysis, visit the E-TRIALS website. DrawEduMath, the newest dataset which will focus on images of written student work, will be available on Hugging Face in March 2025.

References

Abdelrahman, G., Wang, Q., & Nunes, B. (2023). Knowledge tracing: A survey. ACM Computing 

Surveys, 55(11), 1-37.

Cohen, A. (2017). Analysis of student activity in web-supported courses as a tool for predicting 

dropout. Educational Technology Research and Development, 65, 1285-1304.

Haim, A., Prihar, E., Shaw, S. T., Sales, A., & Heffernan, N. (2022, October 20-21). An 

expansion on data associated with ‘Exploring Common Trends in Online Educational 

Experiments Data [Conference presentation]. CODE@MIT 2022, Cambridge, MA, 

United States. https://osf.io/m2jqe/ 

Heffernan, N. T., & Heffernan, C. L. (2014). The ASSISTments ecosystem: Building a platform 

that brings scientists and teachers together for minimally invasive research on human 

learning and teaching. International Journal of Artificial Intelligence in Education, 24

470-497.

Pardos, Z. A., Wang, Q. Y., & Trivedi, S. (2012). The Real World Significance of Performance 

Prediction. International Educational Data Mining Society.