What are hyperparameters?
A hyperparameter in ML is a parameter that is not learned from training data but is specified by the practitioner prior to the training process.
Model parameters and hyperparameters terms that are sometimes used equally, although they are not quite the same thing:
- Model parameters are the attributes of the training data that the ML model learns during training. Parameters are therefore intrinsic to the model. The coefficients in regression, or the weights and biases in a neural network, are examples of model parameters since they are estimated during the training phase.
- Model hyperparameters are model settings that are predefined and govern the training process. The model does not have hyperparameters. The prefix “hyper-” denotes that they are elevated characteristics that govern the learning process.
Importance of Hyperparameter Optimization
Tuning hyperparameters aids in the generalization of machine learning models. Hyperparameter optimization in Python is available through the scikit-learn machine learning library. The capacity of the algorithm to perform effectively on both training and fresh data is referred to as generalization. A model fails as a result of:
- Overfitting occurs when a model learns certain patterns in the training so well that it performs badly on the test dataset. This signifies that the model is exclusively applicable to the training and does not adapt to fresh data.
- Underfitting occurs when the model performs badly on both training and test data.
Three main methods for Hyperparameter Optimization
There are three basic approaches for determining the ideal set of hyperparameter values:
- Grid search is the process of finding a system of values for every hyperparameter, executing the system with each conceivable mixture of these variables, and selecting the system of values that yields the best results. Because the variables to be attempted are set individually by the practitioner, grid search entails guessing.
- To identify an ideal set of values, a random search includes selecting random variations of hyperparameter values from provided statistical distributions. The benefit of this search versus grid search is that it enables a greater range of values to be searched while not increasing the number of tests.
- Bayesian search is a progressive strategy that improves the next search process by using the outcomes of prior sets of hyperparameters. Bayesian search shortens optimization time, particularly for systems trained on huge amounts of data.
Application of HPO
Every machine learning system includes hyperparameters, and the most fundamental goal in automated ML is to adjust these hyperparameters automatically to maximize performance. Recent deep neural networks, in particular, rely heavily on a wide variety of hyperparameter options for the neural network’s construction, regularization, and optimization. There are various key applications for hyperparameter optimization in machine learning.
- Lower the amount of human work required to use machine learning. This is especially true in the context of AutoML.
- Increase machine learning algorithm performance; this has resulted in new state-of-the-art results for significant machine learning benchmarks.
- Increase scientific study repeatability and fairness. HPO that is automated is clearly more replicable than manual search. It allows for fair comparisons since various procedures can only be evaluated properly if they are all tuned to the same level for the issue at hand.
HPO has a lengthy history, reaching back to 1990, and it was discovered early on that different hyperparameter combinations perform better for different datasets. HPO, on the other hand, is a relatively new discovery that may be utilized to modify broad pipelines to particular application domains. It is now commonly accepted that customized hyperparameters outperform the default settings offered by most machine learning packages.
Because of the expanding use of ML in businesses, HPO is of significant economic nature and plays an increasingly important role there, whether in corporation products, as a component of ML cloud services, or as a standalone service.
Challenges of HPO
HPO confronts various obstacles that make it a difficult problem to solve in practice:
- For huge models, sophisticated ML pipelines, or enormous datasets, function evaluations might be prohibitively costly.
- Configuration space is frequently complicated and multidimensional. Additionally, it is not always evident which algorithms for hyperparameter optimization must be tuned and in what ranges.
- In most cases, we do not have access to the loss or gradient function when it comes to the hyperparameters.