The novel coronavirus is causing havoc all over the world! I read the news that Italy has been in the middle of this with more than 47,000 infections and 4,000 deaths as of yesterday. The health authorities there have been understanding the data for clues. In one source (Bloomberg), they found that more than 99% of fatalities were people who suffered from previous medical conditions.
On analyzing the medical records of about 18% of coronavirus fatalities, just 3 victims, or 0.8% of the total, had no previous pathology. Almost half of the victims suffered from at least three prior illnesses and about a fourth had either one or two previous conditions. More than 75% had high blood pressure, about 35% had diabetes and a third suffered from heart disease. The median age of the infected is 63 but most who died have an average age of 79.5. All of Italy’s victims under 40 have been males with serious existing medical conditions.
This data tells me that people of different ages and health have different risks of surviving this deadly virus.
The question I asked myself – considering my age and health, what is my personal strength to fight this virus?
This blog is about how I used research from newsfeed to create a Machine Learning model that assesses and predicts my personal strength to fight the deadly virus.
I built this application – from data to dashboard – on mlOS.
Step 1 – Data Engineering
This starts with loading data, defining a dataset, and then generating a cross-validation dataset.
A master data set COVID_19_synthetic_data.csv was uploaded.
- Row # A serial number
- Age [1 – 99] Age of the patient
- Hypertension [0 or 1] 1 if the individual suffered from hypertension, 0 otherwise
- Heart_Disease [0 or 1] 1 if the individual suffered from heart disease, 0 otherwise
- Diabetes [0 or 1] 1 if the individual suffered from diabetes, 0 otherwise
- Lung_Disease [0 or 1] 1 if the individual suffered from lung disease, 0 otherwise
- Immune_Strength [0 – 5] If the individual hospitalized for fever and breathing issue
- Risk_Score [1 – 10] Strength = 10-Risk_Score, percentile of an individual
Fig 1: A view of the tabular data
The next step was to define a dataset. In this, the ‘Target (Output)’ was selected as “Risk_Score”. The rest of the columns from the data file were selected as a ‘Feature (Input)’.
Fig. 2: Selecting input and output features for model building
The feature pre-processing was the next step. This allowed using algorithms like 0 to 1 normalization, categorical to numeric, standard scaling, min-max normalization etc. to pre-process the data for modeling. No feature extraction steps were needed since the data is numeric with no missing values.
Fig. 3: The ‘Feature Pre-processing’ step of ‘Define Dataset’
Review and Save: The dataset was named as ‘ds_mar_19’ and saved.
Fig. 4: Naming the dataset by clicking on Define Dataset button
Now the dataset was split into two parts – training and validation datasets. The standard norm of 80% and 20% was used. The system randomly selected 80% of the data for training and the remaining 20% for validation. The training set will be used to train the machine learning model and then the system will use the validation set to test the ML model, calculate, and show the performance metrics.
Fig. 5: Generating the Cross-Validation files for the dataset
Step 2 – Machine Learning
This step employs the cross-validation datasets created in the previous step to make machine learning models.
Since a numerical value of Risk_Score (the target variable) is to be predicted, this is a Regression problem, and hence the ‘Regression’ tab is chosen.
I clicked ‘Add Base Model’ to select the desired dataset.
Fig. 6: Selecting the dataset for machine learning
Clicking on ‘Select Dataset’ pops up the window of ‘Select Regressor’.
I chose the Neural Network Regressor to start this modeling experiment. The system already suggested the best parameters for my dataset. I clicked on ‘Select Regressor’ to choose it.
Fig. 7: Choosing an appropriate algorithm to build an ML model
Clicking ‘Create New Model Version’ created a model using the algorithm
Fig. 8: Machine Learning Model building progress.
A generic model name and version are available. Selecting it shows the rank, error, auto-generated documentation, and a ready to publish button. The right window shows the performance metrics. The model shows a Mean Absolute Error of 0.24 suggesting a mediocre result.
mlOS comes pre-shipped with very many algorithms. I moved on to others and decided to use the ‘Autopilot’ – that builds models using various algorithms and ranks them by how they performed.
Fig 9: ‘Autopilot’ creates models using various algorithms and ranks them by their performance scores.
The Decision Tree and the Random Forest Regressor came up as the top algorithms (ranked #1). The Mean Absolute Error values are 0.01 for each. As against the Neural Network, the Decision Tree and the Random Forest Regressors were therefore found to be more suitable algorithms for my dataset.
The best model was ready to be chosen and published. Model that used the RandomForestRegressor was chosen. Hitting the publish icon brings up a pop-up window for additional comments and confirmation. The pop-up also provides an opportunity to publish a model for review.
Fig 10: The selected model is being published to a reviewer
Step 3 – Model Governance
Good governance practice means that AI needs to be validated before production use. To avoid unintended consequences, the person who created the model publishes it to a peer or a third party for review.
mlOS sends an email to the selected reviewer. In doing so, the reviewer gets interactive documentation to validate if the model is indeed solving the problem for which it was created and is achieving the desired performance and results.
This is a topic of its own and is to be covered in a separate blog
Fig. 11: The reviewer can Accept or Reject the ML model
Once the reviewer accepts or rejects a model, the system sends an email back to the modeler confirming the reviewer’s decision.
Step 4 – Deployment
It is now time to deploy the accepted model to production use.
Since a regression model was created, the user selects the Regression tab under ‘Deploy Models’. The accepted model(s) will be available under ‘Select Model & Deploy’.
To deploy the model, the user selects the model and then successively clicks ‘Use this model’, ‘Generate Code’ and ‘Deploy’. These buttons trigger a reconfirmation pop-up window.
Fig 12: Deploying the model to production
Clicking the Deploy button creates a ‘containerization’ process that bundles the necessary code and creates a real-time API that can be called from any application in the world.
My model is now live.
It was time to interact with it.
Step 5 – Dashboard
A real-time dashboard is automatically generated for every deployed model.
Fig. 13 Dashboard window from where the user can interact with the deployed model
Fig 14: Automatically generated dashboard of the deployed model
This completes the model building and deployment.
The following UI calls the API of the deployed model to show the results
Fig 15: Application User Interface
I had asked myself a question before starting this – considering my age and health, what is my personal strength to fight this virus?
And I got my answer!