Education is better with AI

Educational institutes strive for students to achieve the course outcomes. Students also join courses offered by an educational institute fully expecting to complete it. But the fact of life is that some students pass and some don’t.

The reasons behind failure could be subjective and impossible to fully interpret. But what’s for sure is that the success rate can be improved if the students who need that extra support can be proactively identified and appropriate interventions made in time to enhance their performance.

Supply Chain Canada (SCC) is Canada’s largest association for supply chain management professionals. It has more than 7,500 members working in a variety of roles such as sourcing, procurement, logistics, inventory and contract management. SCC offers hundreds of professional development training and strives to achieve a stellar student learning experience. But it faces a challenge as any other educational institute – some learners succeed and some don’t.

Jasmeet Singh Ghuman and Ronil Shivji are supply chain professionals. They trained with SCC Alberta to complete the Applied Machine Learning Professional Certificate program and decided to test their skills by building an AI application that can proactively identify students who will likely pass or fail a course based on their individual factors.

This prediction could be very useful for any educational institute to allocate focused assistance to learners, thereby improving course performance and increasing student satisfaction at the same time.

Problem Statement

Can we proactively identify students likely to fail a course so an educational institute can provide focused assistance for a better chance of success.

Solution

Ask any student enrollment or employment counsellor how they are able to guide students. It is because they learnt successful and unsuccessful patterns from years of experience!

While humans are good at recognizing patterns, they also have limitations. For one, we are unable to analyze large quantities of data at a time. Second, we have a limited memory. And we forget. 

But a computer doesn’t have such limitations. What if a computer can be taught to learn patterns from data! 

Machine Learning (ML) is just that – algorithms that learn from past data to predict the future. In this case, we can use ML to teach a computer from past data of success/failure and the student demographics so that it can predict the success/failure of new students with similar demographics for similar courses.

ML techniques such as data analysis, data wrangling, feature selection, feature preprocessing, dataset creation, general purpose/deep learning algorithms, model review, and model deployment were used in this project. The outcome was an AI-app that predicts likely success/failure by inputting the course and the student demographics.

Data Collection

Anonymized data files from the Open University Learning Analytics Dataset (OULAD) were obtained to run the experiments. This enables the analysis of student behaviour, represented by their actions. The dataset contains the information about 22 courses, 32,593 students, their assessment results, and logs of their interactions with the virtual learning environment represented by daily summaries of student clicks (10,655,280 entries). These contained data on courses, enrolment, student demographics, and student assessments for seven courses between years 2013 and 2014.

It served as a good sample to try the ML techniques to run the experiments because all educational institutes commonly collect similar data.

Data Preparation

Although there were seven (7) distinct data files, only the “studentInfo.csv” was used in this experiment because it contained demographic information about the students (input features) along with performance results (the target feature).

Figure 1: Raw data containing ~32,000 records

FeatureDescription
code_moduleAn identification code for a module on which the student is registered
code_presentationThe identification code of the presentation during which the student is registered on the module
id_studentA unique identification number for the student
genderThe student’s gender
regionThe geographic region where the student lived while taking the module-presentation
highest_educationHighest student education level on entry to the module presentation
imd_bandSpecifies the Index of Multiple Deprivation band of the place where the student lived during the module-presentation
age_bandBand of the student’s age
num_of_prev_attemptsThe number of times the student has attempted this module
studied_creditsThe total number of credits for the modules that the student is currently studying
disabilityIndicates whether the student has declared a disability
final_resultStudent’s final result in the module (Pass, Fail, Distinction, or Withdrawn)

The data shows that students had to pass seven (7) code_modules but more than 50% of the students either failed or withdrew from one or more courses. You can see why this is a problem for an educational institution!

Figure 2: Raw data visualization showing categories of final_result

Braintoy mlOS was used in these experiments to iteratively complete the steps of data wrangling, modeling, and evaluation to arrive at the best models to predict student performance. The cause and effect relationship was learnt during each iteration with the goal of improving the model performance in the subsequent iteration.

The Target Feature was final_result. It is what is to be predicted using the other columns in the data, called Input Features. Because final_result had categorical values i.e. Pass, Fail, Distinction, and Withdrawn, this is a classification problem. In simple words, the classification model “classifies students by categories”.

First Iteration

Data Wrangling

Every experiment has to start somewhere! The first iteration was to just build a baseline model using the raw data supplied. The data wasn’t balanced and no feature selection was done. It was taken as-is. But irrelevant features such as id_student were ignored. It is because unique values such as IDs don’t have a pattern.

Modeling

Classification models were built using the as-is data. About a dozen deep learning and general purpose ML algorithms were used. Although all models showed poor results at <50% accuracy, XGB, Support Vector Machine, and Neural Network came out as top performers.

Figure 3: Few classification models created in this iteration

A low performance was expected. It is because the data was not cleaned yet.

Model Evaluation

The ROC curve showed that the models were learning, but not as much.

Figure 4: Multiclass ROC curves for one of the top models

It was summarized that the models at their current accuracy cannot be used to reliably predict pass, fail, distinction, or withdrawal, however this first iteration set a baseline while using raw data. But it taught a valuable lesson – better data engineering builds better models!

It can only get better from here.

Second Iteration

Data Wrangling

In the second iteration, it was decided to reduce the number of classes from four (4) to two (2) to increase the fidelity of the data. The classes “Pass” and “Distinction” were clubbed as “Pass”, and the classes “Fail” and “Withdraw” were clubbed as “Fail”. Now there were only two (2) classes – Pass or Fail.

PassDistinctionFailWithdrawnTotal

Previous

12,361

38%

3,024

9%

7,052

22%

10,156

31%

32,593

New

15,385

47%

17,208

53%

Clubbing these features also removed the imbalance of classes that had existed. Now there were roughly half and half records for the two classes to be predicted.

To further improve modeling efficiency, Feature Importance was used to identify the Input Features that have the maximum influence on the Target Feature.

The following five (5) out of twelve (12) features were shortlisted:

Figure 5: Feature Importance

Modeling

Several classification models were built using the shortlisted features. Neural Network, Support Vector Machine and Random Forest came out as top performers, all at above 70% accuracy.

Figure 6: Top three classification models created in this iteration using only two classes, Pass or Fail

This immediately came out as a proof that better data engineering gives better model performance.

Model Evaluation

Models built in the second iteration performed better than the previous one.

 

Figure 7: The ROC curve shows that the models are learning better than previous iterations

While the performance of the models improved, there was scope to improve it further. This would be especially relevant for anomaly detection scenarios e.g. “when will a student most certainly pass or fail?”.

It appeared that there are outliers in the dataset that are causing the model to predict False Positives and False Negatives.

Figure 8: Confusion matrix showing incorrect prediction values in the top right corner 

Such data points can be identified and removed from the training dataset to improve the model accuracy further.

Third Iteration

Data Wrangling

To start with, in this iteration, the Training:Test set was adjusted from a typical 80:20 ratio to 50:50 ratio. This split was done to flag the outliers in at least 50% of the data vs. only 20%.

Figure 9: New dataset generated with a 50:50 split

Because sampling is random and because only the sampling ratio was changed, the resultant model accuracies remained similar to the previous iteration. However, now the Test Set had 50% of the total records that showed Original vs. Predicted values. This now became a labeled dataset with a sufficient number of records that could be used to remove the outliers that the models were incorrectly classifying.

  • The Original vs. Predicted data was uploaded as raw data
  • The Final_Result column was converted to binary (1, 0) so a new column can be calculated
  • The New column was created by comparing Final_Result and Predicted_Final_result

Figure 10: Formula used to compare two columns and write the value in a new column

  • Data labeled “FN” and “FP” were deleted as they were deemed as outliers
  • Final_Result column was converted from 0 or 1 back to Pass or Fail

Figure 11: The data wrangling steps

  • This wrangled data was then used to create the Training and Test Set in the usual 80:20 ratio

Modeling

Several classification models were built using the wrangled data. Support Vector Machine, Neural Network, and Random Forest again came out as top performers, all showing above 90% accuracy.

Figure 12: Top three classification models created in this iteration

Figure 13: Comparison of all models created in this iteration

Model Evaluation

The models had now achieved the desired accuracy that gave confidence in predictions.

Figure 14: Model documentation showing confusion matrix, ROC, and original vs. predicted

The ROC curve (AUC value of 1.00) validates that the model is learning well. The confusion matrix shows that it was classifying the records correctly.

Model Review

Jaspreet Gill of Braintoy was the coach for the Applied Machine Learning program and the model reviewer for this project.

The review involved checking data wrangling operations, feature preprocessing techniques, train-test datasets, and sampling and evaluating the model performance for accuracy, ROC (receiver operating curve), precision, recall, the confusion matrix, as well as checking for other efficiencies such as computation time and storage space. 

Accuracy is based on the number of correct predictions made by the model. ROC is the plot between true positive rate and false positive rate. In simple terms, the more the area under the curve, the better the model is. Precision represents the actual percentage of correct predictions made by model. It measures the model’s accuracy in classifying a sample as positive. Recall measures the model’s ability to detect positive samples. The higher the recall, the more positive samples are detected. When the recall is high, the model can classify all the positive samples correctly as positive. Thus, the model can be trusted in its ability to detect positive samples. Confusion Matrix is another method to evaluate the performance of the model. It is a N*N matrix. N indicates the number of classes. If there are 2 classes in the dataset, then the confusion matrix will be 2×2. Confusion Matrix is created on the test dataset for which the true values are known.

The key evaluation criteria for these classification models were Accuracy, ROC, and the Confusion Matrix.

The model using the Support Vector Machine algorithm was accepted. The observed accuracy was 92.29% This indicates that the model based on the Support Vector Machine algorithm is better performing as compared to the Multilayer Perceptron Neural Network algorithm model which has an accuracy of 91.63%. For the chosen model, the area under the ROC curve was 100%. The precision for this model was 92%. Hence this is the best model to deploy for making real time predictions.

Model Deployment

Every model can be deployed as an application in a few clicks. Real-time APIs can be generated. The accepted model was now deployed as an application. This app will predict the possible outcome of a course for a student based on factors identified during the modeling process.

Figure 15: Model deployed as an prediction service

From this point onwards, inputs can be given to the application and outputs obtained in real-time.

Summary

Machine learning can help improve the quality of learning. Educational institutes can match the right students to the right courses. They can offer proactive help to students who need it the most. Improving student success means improved satisfaction and increased revenue. Everyone wins!

It was also learnt that while there are infinite ML techniques to solve an infinite variety of problems, the procedure for modeling is the same – it starts with understanding the problem, collecting and organizing data, building and evaluating models, and then deploying the models. The key is to be able to creatively use the techniques for rapid iterations, with each iteration improving the results a little better than the previous iteration.  

Takeaways

  1. Modeling is about iterative problem solving
  2. Data quality determines model performance
  3. Few techniques can be used in many creative ways

Authors

Jasmeet Ghuman is a supply chain professional with 9 years of experience in Procurement, Category Management and Supply Chain Analytics. He has worked with organizations like City of Calgary, Rio Tinto and Accenture.

Ronil (Ron) Shivji is a supply chain professional with 20+ years of experience in Procurement, Contracts Management and Category Management working with companies such as Imperial Oil, ConocoPhillips, Fluid Energy Group, TransAlta, and AltaLink. He has an MBA and holds a Supply Chain Management Professional (SCMP) designation.