Reliable Shipping Time Prediction

Logistics is the backbone of our global economy. Knowing the time that it takes to transport an item from one place to another maximizes operational efficiency. It gives supply chains the ability to manage inventory levels, reduce cost, and provide the flexibility to meet demand fluctuations.

E-commerce enterprises, local stores, trucking, logistics and warehousing businesses and many others benefit by reliably knowing when a delivery will happen. Customers also demand accurate times to plan their schedules. Reliably predicting shipping time is therefore a key performance indicator for a successful logistics operation.

Businesses such as Google Maps, Uber and Amazon use Artificial Intelligence to instantly predict times. They use machine learning models that use past data like mode of travel, traffic, weather, route information, and time it took to deliver to reliably predict time down to minutes, in an instant. These businesses have dramatically increased their market share by improving customer convenience and operations effectiveness at the same time.

AI is useful. But is it only limited to giants

To dispel this myth, I used a publicly available dataset from Kaggle to try machine learning to predict estimated shipping times for an international shipment. 

During this experience, I learnt the art of modeling, and not just the science of it. The recipes to predict shipping time can be used by any business, from shipping and transportation enterprises to local businesses such as restaurants who deliver take-out orders.


The data that was sourced from Kaggle had a file named train_2_pr.csv that was used for my experiments. It had 5,114 records containing the following features:

  1. shipment_ID – A unique shipment ID given to each order.
  2. send_timestamp – The date/time when the order was sent.
  3. pick_up_point – The pick-up point of the order.
  4. drop_off_point – The drop-off point of the order.
  5. source_country – Country from where the goods need to be shipped.
  6. destination_country – Country where the goods need to be shipped to.
  7. freight_cost – Cost of transportation per Kg.
  8. gross_weight – Gross weight in Kg.
  9. shipment_charges – The fixed cost per shipment.
  10. shipment_mode – Whether shipped by Air or Ocean.
  11. shipping_company – The candidate company to make the shipping.
  12. selected – Whether the candidate company was selected or not.
  13. shipping_time – The time taken for the goods to reach the destination.

The target feature was shipping_time.

Checking Target Feature

Figure: shipping_time was observed between 5-57 days with a Standard Deviation of 10 days

Note that there may also be relevant data in other files. Because it will be considered in subsequent experiments as I wanted to start with building something small and then improve on it. 

I moved to the wrangling and pre-processing steps.

Garbage In, Garbage-out

Data wrangling and data pre-processing is an important step to prepare any data for efficient modeling. Models can’t give intended results if the data that trained the models isn’t prepared well.

It was verified that there were no missing values in the target feature.

The data was then organized by applying the following data wrangling and preprocessing steps:

shipment_IDRemoved because unique IDs are of no use in ML because they only refer to other features of the data, and hence redundant.
send_timestampRemoved as timestamps are not relevant to ML unless converted to something more meaningful such as treating the data as a time series or by classifying the data in categories such as year, month, day or time of day. This can be done in subsequent experiments.
pick_up_point Removed as it was the same for all orders and therefore made no difference to my modeling experiments.
drop_off_pointPreprocessed from categorical to numeric format of 0 or 1 as two categories recorded as a text value were observed.  
source_country Removed as it was the same as pick_up_point.
destination_countryRemoved as the drop_off_point already referred to it.
shipment_modePreprocessed from categorical to numeric format 0 or 1 as two categories (air/ocean) recorded as text values were observed.
shipping_companyThree categories of shipping companies were observed. These were preprocessed into numeric format as 0, 1 and 2.
selectedRemoved because every candidate company was marked as selected and was hence redundant to this experiment.

The data frame reduced to just 7 features: drop_off_point, freight_cost, gross_weight, shipment_charges, shipment_mode, shipping_company, and shipping_time. The Feature Importance showed influential features to predict shipping_time to be the shipment_mode, freight_cost, and gross_weight.

Showing Feature Importance

Figure: Feature Importance 

It was observed that the shipment_mode was the most influential feature. And that the type and weight of the order mattered more than other variables such as the drop_off_point or shipping_company. Quick insights like these could be valuable for a business. As an example, because the shipping_company isn’t a significant factor to predict shipping_time, whichever shipping_company is available can be chosen to make a delivery!

After preparing the data, it was split in 80:20 ratio into a training and test set for cross-validation.


Supervised learning in mlOS was used to build a variety of regression models, keeping shipping_time as the target variable.

Regression Models using Supervised ML

Figure: Several versions of models using various regression algorithms were created

A Mean Absolute Error (MAE) of 3.93 days was observed.

Neural Network, Random Forest and Linear Regression performed at similar accuracy scores. Considering that the shipping_time ranged from 5 to 57 days with a standard deviation of 10 days, an error of 3.93 seemed good.

Original vs. Predicted

Figure: Predicted vs. Original validation when using RandomForest

But just accuracy score isn’t truly a reliable measure of model performance. While the predictions did follow a trend, the MAE score if seen by itself is deceptive.

Predicted vs. Target using Linear Regression for overall model

Figure: Using Linear Regression

Predicted vs. Target using Random Forest for overall model

Figure: Random Forest shows a slightly better performance than Linear Regression.

Looking at these graphs, it was clear that there was an uncertainty in prediction. While the error was only 3.93 days it wasn’t really the same for all predicted values. Think of it as darts always landing on the board but rarely on the bullseye.

But what was wrong? 

Looking deeper into the model precision revealed that discrepancy. 

It was the influential variable, the air vs. ocean shipment_mode. Because there was a large variation for shipping_time between these two modes, specific models for air and ocean could perhaps give a more reliable prediction for each.

Think of it as a student who scored a high overall GPA. But that doesn’t mean that all lessons were learnt equally well!


The MAE of 3.93 days considered both categories of shipments – air and ocean. Looking only into air shipments which had a range for shipping time between 5 to 5.5 days, the MAE was only 0.1 days.

Models that used data for air only

Figure: Model accuracies using data for air freight only

The model for air shipping was observed to be more predictable vs. ocean freight.

Predicted vs. Target using SVM regressor for air shipping

Predicted vs. Target using Random Forest regressor for air shipping

The difference in model performance was clearly visible. You can see that the predicted vs. target results for air shipping were now tightly clustered. The darts were landing on the bullseye!

It was wise to model the two categories of shipment separately to improve the reliability of predictions for shipping_time. It is because as compared to MAE of only 0.1 days for air shipments, it was observed to be 8.03 days for ocean shipments.

Ocean shipping model

Figure: Model accuracies using data for ocean freight only

There is scope for improvement! More experiments are required to model ocean shipment.


Machine Learning can help plan logistic operations by reliably estimating shipping time. Logistics managers can make better judgments by being assisted by models that were trained on past data to predict the future.

In my first experiment, a model for air shipping was proven reliable. I will attempt a model for ocean shipping in the next experiment. The goal would then be to ensemble these decisions to arrive at the best logistical model.

My experiments could be a tool to assist shipment and inventory planning but should not be used without real data and the insight of logistics managers to exercise and validate their intuition.

I learnt from this experience that AI is a set of models that mimic human decisions. Just as humans can refine their thinking through thought and experience, the accuracy and precision can be further improved in subsequent revisions by improving the training data.

The possibilities for business are endless.


  1. Accuracy score isn’t the only measure of a good model.
  2. A single model may not be efficient enough to handle all scenarios.
  3. Breaking a complex problem into simpler steps helps solve it.


Gurnoor Singh Bhangu

Gurnoor Singh Bhangu was an officer in the Indian Army and is now a MBA student at Schulich School of Business. He has 5 years experience in military operations and logistics that was used for this research. 

He believes that AI/ML can make complex processes simpler and faster. It enhances flexibility and rapid response capability to sudden changes in an increasingly volatile world and can level the playing field for big and small businesses.