Autocategorizing transactions

Businesses rely on a cardinal rule – if you categorize your transactions, you’ll keep track of your finances.

To categorize and track transactions, it is usual for companies in the supply chain to develop in-house home-grown catalogs of codes and descriptions. But because each company is distinct, buyers and suppliers have to figure out how to interpret each other’s category codes. And even if everyone understood each other’s category codes, the reality is that the bulk of the purchase orders and invoices are just free text. Free text means that an experienced person has to invest time to classify the transaction in the right category.

It is impossible to train every person to provide the right spend classification for every transaction across extremely diverse commodity groups. Buyers and suppliers employ an army of people to match purchase orders, invoices, payments, and transactions and hope to reliably report on it. The cost isn’t small.

Uncategorized transactions decreases spend categorization in the right commodities. It limits spend visibility, analysis, and forecasting. Improper spend categorization is a source of pilferage. This haunts supply chain professionals all over the world.

Abhisek Basu is a supply chain professional who trained with SCC Alberta to complete the Applied Machine Learning Professional Certificate. He tested his skills by building an Artificial Intelligence app that helps SCM professionals automatically find the correct category of spend from free text data obtained from any requisition, purchase order or invoice.

Problem Statement

Stop unaccounted spend pilferage by automatically tagging purchase line items / requisition descriptions to the correct spend categories.

Dataset

The United Nations Standard Products and Services Code (UNSPSC) is a global coding system for goods and services. The UNSPSC codeset groups commodities sharing a common use or function. Buyers and sellers use it to codify extremely diverse goods and services in common categories without referring to the buyer’s and supplier’s own home-grown catalogue codes and descriptions. Think of UNSPSC as the common language for category management in global business.

Figure 1: UNSPSC code segments

The dataset to run the experiments was taken from the UNSPSC site (v19.0501).

Figure 2: Raw data from the UNSPSC v19.0501 categorization

FeatureDescription
SegmentA numeric code to the SegmentTitle. This is the highest classification level in the UNSPSC codeset, defined as the logical aggregation of families. Of a total of 21,770 records, there were 22 unique segments. 
SegmentTitleText description of Segment.
FamilyA numeric code to the FamilyTitle. This is the second highest classification level in the UNSPSC codeset, a commonly recognized group of inter-related commodity categories. Of the 21,770 records, there were 94 unique families.
FamilyTitleText description of Family.
ClassA numeric code to the ClassTitle. This is the third highest classification level in the UNSPSC codeset. Of the 21,770 records, there were 313 unique classes.
ClassTitleText description of Class.
CommodityA numeric code to the CommodityTitle. This is the fourth highest classification level in the UNSPSC codeset, a group of sustainable products or services. Of the 21,770 records, there were 21,770 unique commodities.
CommodityTitleText description of Commodity.

This can be replaced with a company’s in-house catalog as well.

Goal

The goal of this project was to make an AI app that can automatically classify any free text taken from from the line items in any requisition descriptions of purchase orders or invoices to the proper UNSPSC Segment, Class, Family and Commodity so that the spend can be categorized automatically, with no human effort.

This project used NLP (Natural Language Processing) techniques for classification modeling of text data. Braintoy mlOS was used for the experiments. Three classification models were built to get the results.

Data Engineering

The UNSPSC v19.0501 dataset in .csv format was uploaded. 

Figure 3: Raw data files uploaded in Data Engine

Target and Input Features

Commodity was taken as the Target Feature in the first attempt, however an error was thrown back stating “No Model Created, tune your algorithm”. This was because every Commodity is a unique value!

The first learning was that unique values such as IDs don’t have any pattern. They are just a unique label given to a record that refers to other features in the dataset. The Class, Family and Segment should be the right Target Variable(s) for modeling as these are the “buckets” in which any free text is to be categorized.

It was also realized that it is prudent to have at least 50 records in every Class to properly classify text in the UNSPSC dataset. A frequency of 10, 20 and 30 records was checked too, however, a frequency of at least 50 records for each Class gave a more reliable model performance for the purpose of this project. In Data Wrangling, “Delete Rows by Frequency Less Than and Equal” algorithm was selected on values where the line items are less than 50. While more models can be built to tackle less than 50 records, those would be experiments left for another day.

Figure 4: Data wrangling algorithms and functions are applied to the raw data

Now we had the raw data to rapidly run all experiments and compare them.

The next step was feature selection. 

In machine learning lingo, the Target Feature is what we want to predict. The Input Feature(s) are what we use to predict the Target Feature.

To predict the UNSPSC Class, a new Input Feature called the “NLPDataX” was created. It was a concatenation of the text from the Commodity Title and Family Title. The Target Feature is Class. Note that any relevant text can be concatenated in “NLPDataX” to be included in training the models.  

Similarly, for other two models that predict the Family and Segment, the Input Feature will be kept the same i.e. “NLPDataX”, but the Target Feature will be changed to Family and Segment respectively.

Figure 5: Selecting NLPDataX as the input feature and Class as the target feature

Figure 6: Selecting NLPDataX as the input feature and  Segment as the Target feature

Figure 7: Selecting NLPDataX as the input feature and and the Family as the target feature

Feature Pre-processing

The TFIDF algorithm was used to preprocess the text in the Input Feature “NLPDataX” for all three models, one that predicted Class, the second that predicted Family, and the third that predicted Segment.

TFIDF is a statistical measure that evaluates how relevant a word is to a document in a collection of documents. It is used in automated text analysis. It is useful for machine learning algorithms to score words in Natural Language Processing (NLP).

Figure 8: Selection of TFIDF algorithm for data preprocessing 

Cross Validation

The dataset was split into a Training and Test Set for the three (3) models to predict the Class, Family and Segment as the Target variables. The datasets were created for Training and Testing in the usual 80:20 ratio.

Figure 9: Cross Validation datasets UNSPSC for Class, UNSPSC2 for Family and UNSPSC3 for Segment.

Modeling

The cross validation datasets generated in the previous step were used to build three classification models. 

The AutoPilot feature was used to create various ML models by using various general purpose, special purpose, and deep learning algorithms. It built and arranged the models in descending order of accuracy.

The model documentation is automated. The accuracy and ROC were the two performance evaluation criteria used to evaluate the classification models. The accuracy tells how well the models performed. The ROC tells how well the models learnt from the data.

Several other model evaluation criteria were also used, elaborated in Model Governance.

Classification by Class

Model ‘v.9-v.95a’- (UNSPSC) using Multilayer Perceptron Neural Network algorithm was selected to predict the UNSPSC Class because the accuracy was observed to be highest at 91.25% and the Area Under ROC Curve (AUC) was 1.00. 

Figure 10: Model documentation showing ROC Curve for Classification by Class

The challenger was selected to the Random Forest Classifier with an 88.1% accuracy and AUC was 1.00.

Classification by Family

Model ‘v.3-v.979’- (UNSPSC2) Logistic Regression Classifier Algorithm was selected to predict the UNSPSC Family because the accuracy was observed to be 100% and the Area Under ROC Curve (AUC) was 1.00.

Figure 11: Model documentation showing ROC Curve for classification by Family

The challenger was selected as the Random Forest classifier with a 100% accuracy and AUC as 1.00.

Classification by Segment 

Model ‘v.3-v.998’- (UNSPSC3) Logistic Regression Classifier Algorithm is selected for predicting the UNSPSC Segment because the accuracy was observed to be 100% and the AUC was 1.00.

Figure 12: Model documentation showing ROC Curve for classification by Segment

The challenger model for this third model was selected as the Random Forest classifier with 100% accuracy and AUC as 1.00. 

The selected models were published for review.

 Figure 13: Model published for the reviewer to accept or reject

Model Governance

The published models appear under ‘My Models’ in the Model Governance. The reviewer can now evaluate the models and accept or reject.

Jaspreet Gill of Braintoy was the coach for the Applied Machine Learning program and the model reviewer for this project.

The review involved checking data wrangling operations, feature preprocessing techniques, train-test datasets, and sampling and evaluating the model performance for accuracy, ROC (receiver operating curve), precision, recall, the confusion matrix, as well as checking for other efficiencies such as computation time and storage space. 

Accuracy is based on the number of correct predictions made by the model. ROC is the plot between true positive rate and false positive rate. In simple terms, the more the area under the curve, the better the model is. Precision represents the actual percentage of correct predictions made by model. It measures the model’s accuracy in classifying a sample as positive. Recall measures the model’s ability to detect positive samples. The higher the recall, the more positive samples are detected. When the recall is high, the model can classify all the positive samples correctly as positive. Thus, the model can be trusted in its ability to detect positive samples. Confusion Matrix is another method to evaluate the performance of the model. It is a N*N matrix. N indicates the number of classes. If there are 2 classes in the dataset, then the confusion matrix will be 2×2. Confusion Matrix is created on the test dataset for which the true values are known. 

The key evaluation criteria for these classification models were Accuracy, ROC, and the Confusion Matrix.

The best performing classification models (by Class, Family and Segment) were approved for deployment.

  • Classification by Class – Model ‘v.9-v.95a’- (UNSPSC) using Multilayer Perceptron Neural Network algorithm was accepted. The observed accuracy was 91.25% This indicates the model based on the Multilayer Perceptron Neural Network is better performing as compared to the random forest algorithm model which has an accuracy of only 88.1%. For this model, the area under the ROC curve was 100%. Hence this is the best model to deploy for making real time predictions. The precision for this model is 91%.
  • Classification by Family – Model ‘v.3-v.979’- (UNSPSC2) Logistic Regression Classifier Algorithm was accepted. This accuracy, the area under ROC curve and precision for this model were all 100%, respectively.
  • Classification by Segment – Model ‘v.3-v.998’- (UNSPSC3) Logistic Regression Classifier Algorithm was accepted for predicting UNSPSC segments because the observed accuracy, the area under ROC curve and the precision were all 100%, respectively.

Model Deployment

Models can be deployed to production once approved. Deployment means that the models are put in an app that then runs as a microservice and give real-time predictions. This is done in a few clicks.

Figure 14: Models deployed in an application named “Abhisek Basu”

Dashboard

A dashboard is automatically created for every app on deployment.

Figure 15: The app created for the “Abhisek Basu” application

This app can now take any text as an input and give an output in real-time. A modeler can interact with the model and monitor it.

Figure 16: App that classifies any text by UNSPSC Class in real-time

The first model correctly predicted the Class code “32101600” when the Input text was “16 bit microcontroller Printed circuits and integrated circuits and microassemblies”. 

Figure 17: App that classifies any text by UNSPSC Family in real-time

The second model correctly predicted the Family code “32100000” when the Input text was “16 bit microcontroller Printed circuits and integrated circuits and microassemblies”.

Figure 18: App that classifies any text by UNSPSC Segment in real-time.

The third model correctly predicted the Segment code “32000000” when the Input text was “16 bit microcontroller Printed circuits and integrated circuits and microassemblies”.

Data Scoring

Data Scoring can classify thousands of records in bulk. Data can be scored either in the ML Engine, Model Review, or the Dashboard.

To test Data Scoring, three input values were taken to predict the UNSPSC Family. This .csv file was uploaded in the Data Engine, and then loaded for scoring.

Figure 19: CSV file showing three sample records used for scoring 

Figure 20: Data scoring results for sample records

The app correctly predicted the Family codes for the given Input data. As an example, for the first record in the input data “Fruit tree seeds or cuttings Seeds and bulbs and seedlings and cuttings”, the model predicted the Family code as “10150000”. This was checked in the UNSPSC website and corroborated to be true.

Figure 21: Validation obtained from the UNSPSC Site

Summary

Artificial Intelligence can catalyze the digital transformation that is so much needed today in the face of the pandemic.

Supply chain professionals can use machine learning techniques to tag “Free Text” from PO’s and invoices to the right categories for better spend penetration. This reduces cost, operational overhead, and risk.

Takeaways

  1. Supply chains can get quick value from machine learning
  2. Past data can be used to make reliable predictions in the future
  3. Data cleansing is important for good modeling

Author

Abhisek Basu is a supply chain professional with 18 years of experience in Procurement, Transportation and Demand Planning. He worked with companies like Bank of America, Accenture and Tata Consultancy Services.