Selecting Features Importance

Importance of Feature Importance

Feature Importance is a technique in machine learning that assigns a score to the input features based on how useful they are at predicting a target output.

But how do I know which features to keep and which to not?

Fortunately, techniques exist that can help a modeler with this dilemma. To understand this, we use the Iris data and the automaton within Braintoy mlOS to check and see for ourselves. Often referred to as the “Hello World” for machine learning enthusiasts, the Iris dataset is balanced with 50 instances each of Setosa, Virginia, and Versicolor flower types.

It is a perfect choice for this machine learning experiment as we know what to expect.

It is wise to know the Feature Importance from the raw data upfront while a dataset is being created. mlOS uses supervised learning techniques to find the importance score for each feature, automatically detecting if the target variable indicates a classification problem or a regression problem.

Users can analyze correlations among the features for any target variable for prediction. This makes it easy for a domain expert to design a use case in a few clicks. Data engineers can analyze features one by one or together by selecting various input features vs different targeted features. Think of it as looking at the feature set in your data as hyperparameters being autotuned for the optimal performance of the dataset.

The results are as expected. Not all the features are equally important to predict the species i.e. the target variable. Of a total of 1, the petal_length and petal_width cumulate to 0.92 and are therefore good discriminators. The other two features i.e. sepal_length and sepal_width cumulate to 0.08.

But now, let’s go ahead and make a few classification models using a few machine learning algorithms to know if the Forward Feature Importance is similar to what the models may also show in their Feature Importance.

In the ML Engine, hitting AutoPilot on the dataset creates several models and gave their performance scores.

While there are a dozen that we can compare, let’s just select the Random Forest Classifier and the Support Vector Machine and check the Feature Importance.

Random ForestSupport Vector Machine
ROC Curve
Feature vs. Importance Score

The Random Forest and Support Vector Machine algorithms made models that show a good ROC curve that is balanced between Precision and Recall. It is not surprising that both models also detected the petal_length and petal_width to be the most important features, cumulating to ~90%. Both show sepal_length and sepal_width to cumulate to ~10%.

The Feature Importance shown by these algorithms is similar to what we knew before we started modeling.

In conclusion, processing high dimensional data is a challenge. ML algorithms interpret data that they prefer to understand best. Dimensional reduction of data by feature selection can be advantageous to efficient model building and improved model performance.

Three things to remember

  1. With feature selection, models are less complex, more interpretable, and there are reduced chances of overfitting.
  2. Don’t forget the cost of training and computation.
  3. Collecting and moving data is the expensive part – why do it for a feature if other features are adequate for an equally reliable model!