Skip to content
Search
Generic filters
Exact matches only

A faster Hyper Parameter Tuning using Nature-Inspired Algorithms in Python

If you are using Scikit-Learn in your machine-learning project, you can use a python library called Sklearn Nature-Inspired Algorithms which will allow you to use nature-inspired algorithms for hyperparameter tuning. We will be using this library in this tutorial, install it via pip.

pip install sklearn-nature-inspired-algorithms

Let’s assume that we would like to optimize the parameters of our Random Forest Classifier.

from sklearn.ensemble import RandomForestClassifierclf = RandomForestClassifier(random_state=42)

Now we need to define the set of the parameters which we will try, the usage is similar to scikit-learn’s GridSearchCV. This set of parameters totals in 216 different combinations.

param_grid = {
'n_estimators': range(10, 80, 20),
'max_depth': [2, 4, 6, 8, 10, 20],
'min_samples_split': range(2, 8, 2),
'max_features': ["auto", "sqrt", "log2"]
}

Also, we need a dataset. We will artificially create one using make_classification.

from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=1000, n_features=10, class_sep=0.8, n_classes=2)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(f'train size - {len(X_train)}ntest size - {len(X_test)}')

Now we can use the nature-inspired algorithms for hyper-parameter tuning. We are using the Bat Algorithm for optimization. We will train the population size of 25 individuals and will stop the algorithm if the algorithm won’t find a better solution in 10 generations. We will do this 5 times and will be using 5 fold cross-validation. To learn more about NatureInspiredSearchCV parameters refer to their documentation, you can also use other algorithms than the Bat Algorithm.

from sklearn_nature_inspired_algorithms.model_selection import NatureInspiredSearchCVnia_search = NatureInspiredSearchCV(
clf,
param_grid,
cv=5,
verbose=1,
algorithm='ba',
population_size=25,
max_n_gen=100,
max_stagnating_gen=10,
runs=5,
scoring='f1_macro',
random_state=42,
)

nia_search.fit(X_train, y_train)

It took some time, ~1 minute (it would take GridSearch ~2 minutes, the larger the parameter grid is, the bigger difference is). Now you can fit your model with the best parameters found. The best parameters are stored in nia_search.best_params_.

from sklearn.metrics import classification_reportclf = RandomForestClassifier(**nia_search.best_params_, random_state=42)

clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)

print(classification_report(y_test, y_pred, digits=4))

Now you have successfully trained the model with the best parameters selected by the nature-inspired algorithm. Here is the full example.

I also did a comparison (in this notebook) between the GridSearchCV and NatureInspiredSearchCV. I used the larger dataset and much more hyper-parameters (1560 combinations in total). The NatureInspiredSearchCV found the same solution as GridSearchCV and it was 4.5 times faster! It took GridSearchCV 2h 23min 44s to find the best solution, NatureInspiredSearchCV found it in 31min 58s.