Surrogate Model

A surrogate model is a technique used in engineering when a result of interest cannot be readily and directly assessed, and a model of the outcome is utilized instead. To assess the relationship between design goals, constraint functions, and design variables, experiments and/or simulations are often used.

However, for many practical issues, running even a single simulation might take a considerable amount of time. Therefore, basic activities like design exploration, sensitivity analysis, and what-if analysis are unfeasible since they need hundreds or millions of simulation evaluations. To lessen this load, approximation models that behave similarly to the simulation model may be built, but would need fewer computing resources to assess. These models are called surrogate models, metamodels, or emulators.

  • Surrogate modeling is an application of supervised machine learning within the engineering design domain.

The construction of surrogate models follows a data-driven, bottom-up methodology. It is not required that the entire inner workings of the simulation code are known; just the input-output behavior is essential.

A model is created by simulating the simulator’s reaction to a restricted number of carefully selected data points. Black-box modeling and behavioral modeling are terms that are often used interchangeably to describe this method. When there is just one design variable, the method is known as Curve Fitting. The use of surrogate model optimization in place of costly experiments and simulations is increasingly prevalent in engineering design, but may also be employed in other scientific fields where expensive experiments and/or function assessments are required.


When training a deep learning surrogate model (sometimes called a “metamodel” or an “emulator”), one must rely on empirical evidence. It obtains its training data by probing the simulation outputs at several intelligently chosen places within the design parameter space. Each of these undergoes a comprehensive simulation to compute the associated simulation outcome.

Obtaining a training dataset consists of matching input (design parameter) and output (measurement) pairs, from which a statistical model may be developed.

Does the process of developing a predictive model using a labeled training dataset constitute supervised machine learning? Yes, indeed! Surrogate modeling is an application of supervised ML within the engineering design domain. Popular machine learning approaches, such as polynomial regressions, support vector machines, Gaussian Procedures, and neural networks are also commonly used as surrogate models to speed up product design and analysis processes.

Thanks to this, engineers may use the known machine learning processes to construct, verify, and choose surrogate models and successfully combat underfitting overfitting issues.


  • Sampling. We start by generating the initial training data. To do this, we choose a representative sample from the parameter space of the design parameters. This method is also known as Experimental Design.

It is ideal to have data that are uniformly distributed over the parameter space at this stage. This is advantageous because we may have representations of the estimated input-output relationship from all parts of the parameter space under investigation.

  • Output evaluations. Once the initial training samples have been established, we compute their matching output values by executing them. After combining the specified training sample pairs and their accompanying output values, the first training dataset has been created.
  • Build the substitute model. In this stage, we build the surrogate model machine learning using the training data acquired in the previous step. Validation and selection of models are well-established machine learning procedures that should be used to assist the model training process. In addition, modern machine learning methods such as bagging and boosting may improve the performance of the surrogate model.
  • Learning. In general, an analyst can’t predict the number of samples necessary to construct an appropriate surrogate model. The complexity of the estimated input-output relation determines this. As training proceeds, it becomes more logical to enrich the training dataset. This method is referred to as Active Learning.

Training dataset. Once the new sample has been discovered, a new simulation run is undertaken to determine the output value corresponding to it. The surrogate model is then retrained on the enhanced training dataset. This procedure is repeated until the accuracy of the substitute model is satisfactory.

Testing. CI/CD. Monitoring.

Because ML systems are more fragile than you think. All based on our open-source core.

Our GithubInstall Open SourceBook a Demo

Webinar Event
The Best LLM Safety-Net to Date:
Deepchecks, Garak, and NeMo Guardrails 🚀
June 18th, 2024    8:00 AM PST

Register NowRegister Now