Skip to content

Commit

Permalink
Added history matching function to the metrics.py and used in the car…
Browse files Browse the repository at this point in the history
…diac tutorial
  • Loading branch information
marjanfamili committed Feb 3, 2025
1 parent a36f768 commit 3768af1
Show file tree
Hide file tree
Showing 2 changed files with 91 additions and 25 deletions.
36 changes: 35 additions & 1 deletion autoemulate/metrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@
from sklearn.metrics import r2_score
from sklearn.metrics import root_mean_squared_error


def rmse(y_true, y_pred, multioutput="uniform_average"):
"""Returns the root mean squared error.
Expand Down Expand Up @@ -38,3 +37,38 @@ def r2(y_true, y_pred, multioutput="uniform_average"):
"rmse": rmse,
"r2": r2,
}


def history_matching(obs, expectations, threshold=3.0, discrepancy=0.0, rank=1):
"""
Perform history matching to compute implausibility and identify NROY and RO points.
Parameters:
obs (tuple): Observations as (mean, variance).
expectations (tuple): Predicted (mean, variance).
threshold (float): Implausibility threshold for NROY classification.
discrepancy (float or ndarray): Discrepancy value(s).
rank (int): Rank for implausibility calculation.
Returns:
dict: Contains implausibility (I), NROY indices, and RO indices.
"""
obs_mean, obs_var = np.atleast_1d(obs[0]), np.atleast_1d(obs[1])
pred_mean, pred_var = np.atleast_1d(expectations[0]), np.atleast_1d(expectations[1])

discrepancy = np.atleast_1d(discrepancy)
n_obs = len(obs_mean)
rank = min(max(rank, 0), n_obs - 1)
# Vs represents the total variance associated with the observations, predictions, and potential discrepancies.
Vs = pred_var + discrepancy[:, np.newaxis] + obs_var[:, np.newaxis]
I = np.abs(obs_mean[:, np.newaxis] - pred_mean) / np.sqrt(Vs)
I_ranked = np.partition(I, rank, axis=0)[rank]

NROY = np.where(I_ranked <= threshold)[0]
RO = np.where(I_ranked > threshold)[0]

return {
"I": I_ranked,
"NROY": list(NROY),
"RO": list(RO)
}
80 changes: 56 additions & 24 deletions docs/tutorials/03_emulation_sensitivity.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@
},
{
"cell_type": "code",
"execution_count": 20,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -106,7 +106,7 @@
},
{
"cell_type": "code",
"execution_count": 21,
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -135,7 +135,7 @@
},
{
"cell_type": "code",
"execution_count": 22,
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -204,7 +204,7 @@
"\n",
"Secondly we sample the parameter space usning the Latin hypercube sampling (LHS) method. \n",
"\n",
"Latin Hypercube Sampling (LHS) is a statistical method used for generating a sample of plausible values from a multidimensional distribution. It is particularly effective for uncertainty quantification, sensitivity analysis, and simulation studies where the goal is to explore the behavior of a system or a model across a wide range of input parameter values."
"Latin Hypercube Sampling (LHS) is a statistical method used for generating a sample of plausible values from a multidimensional distribution and guarantees that every sample is drawn from a different quantile of the underlying distribution. It is particularly effective for uncertainty quantification, sensitivity analysis, and simulation studies where the goal is to explore the behavior of a system or a model across a wide range of input parameter values."
]
},
{
Expand Down Expand Up @@ -234,7 +234,7 @@
},
{
"cell_type": "code",
"execution_count": 26,
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -248,10 +248,10 @@
"source": [
"### Simulate\n",
"\n",
"Here we run the simulation which involves \n",
"1- Generating the pulse functions \n",
"2- Solving the differential equations \n",
"3- Returning the pressure and flow rate in y for every set of parameters sampled (every row of sample_df data frame )\n",
"Here we run the simulation which involves: \n",
"1- Generating the pulse functions. \n",
"2- Solving the differential equations. \n",
"3- Returning the pressure and flow rate in y for every set of parameters sampled (every row of sample_df data frame).\n",
"\n",
"The full output of the simulation is quite large for fitting an emulator model. Therefore, it is standard practice to grab the most relavant information for training the emulator. Here we have chosen to train the emulator on the maximum pressure in each compartment. There are a variety of outputs that could be considered (max / min / average /mean / median / gradiant / PCA ).\n",
"\n",
Expand All @@ -264,7 +264,6 @@
"metadata": {},
"outputs": [],
"source": [
"\n",
"# Fixed parameters: Number of compartments and cycles\n",
"ncomp = 10\n",
"ncycles = 10\n",
Expand Down Expand Up @@ -320,7 +319,7 @@
"em = AutoEmulate()\n",
"parameter_names = list(parameters_range.keys())\n",
"em.setup(sample_df[parameter_names], Y, models = ['gp','cnp'])\n",
"best_model = em.compare()"
"best_model = em.compare()\n"
]
},
{
Expand All @@ -343,31 +342,63 @@
},
{
"cell_type": "code",
"execution_count": 31,
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"em.evaluate(best_model) \n",
"best_emulator = em.refit(best_model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# history matching\n",
"\n",
"In this section, we perform History Matching on the predictions from the best_emulator. This allows us to see which reagions of the parameter space are plausible. The Implausibility metric is calculated using the following relation for each set of parameter:\n",
"\n",
"$I_i(\\overline{x_0}) = \\frac{|z_i - \\mathbb{E}(f_i(\\overline{x_0}))|}{\\sqrt{\\text{Var}[z_i - \\mathbb{E}(f_i(\\overline{x_0}))]}}$\n",
"\n",
"Where if implosibility ($I_i$) exceeds a threshhold value, the points will be rulled out. \n",
"The outcome of history matching are the NORY (Not Ruled Out Yet) and RO (Ruled Out) points.\n",
"\n",
"The NORY region can be used for resampling and re-training as a part of active-learning process. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from autoemulate.metrics import history_matching\n",
"\n",
"pred_mean, pred_std = best_emulator.predict(sample_df, return_std=True)\n",
"pred_var = np.square(pred_std) # Convert std to variance\n",
"expectations = (pred_mean, pred_var) \n",
"hist_match = history_matching(obs = [700,50], expectations = expectations )\n",
"\n",
"print(f'The Not Rulled Out Points are {hist_match[\"NROY\"]}')\n",
"sample_df.loc[hist_match['NROY']].head()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Sensitivity Analysis\n",
"\n",
"1. Define the problem by creating a dictionary which contains the names and the boundaries of the parameters \n",
"2. \n",
"contribution of each parameter , omportance of each parameter \n",
"S1 : contribution of each paramter \n",
"S2 : interaction of two paramaters \n",
"S"
"2. contribution of each parameter , importance of each parameter \n",
" - S₁: First-order sensitivity index.\n",
" - S₂: Second-order sensitivity index.\n",
" - Sₜ: Total sensitivity index."
]
},
{
"cell_type": "code",
"execution_count": 32,
"execution_count": 30,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -380,8 +411,7 @@
" 'num_vars': len(parameter_names),\n",
" 'names': parameter_names,\n",
" 'bounds': parameter_bounds\n",
"}\n",
"\n"
"}\n"
]
},
{
Expand Down Expand Up @@ -483,11 +513,13 @@
]
},
{
"cell_type": "code",
"execution_count": null,
"cell_type": "markdown",
"metadata": {},
"outputs": [],
"source": []
"source": [
"\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
Expand Down

0 comments on commit 3768af1

Please sign in to comment.