The Monte Carlo method uses repeated random sampling to simulate the effects of the variables on the measurement model. One of the main benefits of this method is that it can handle complex models, non-linear relationships, and interactions between variables.
This post shows how to estimate the measurement uncertainty in R with Monte Carlo and plot the output distribution for the determination of Total Organic Carbon (TOC) in aqueous samples. Additionally, an Excel Macro is provided for the same computation.
1. Workflow
The fundamental steps of the method are definition of the measurement model, description of the probability distributions of the input variables, random sampling and running the simulations, and finally analysis of the output distribution.
2. Example: Determination of Total Organic Carbon
R script and Excel Macros: MonteCarlo-Uncertainty
Measurement model
The analytical method includes acidification of the sample, high-temperature combustion, and NDIR detection of CO2(g) produced. Two calibrations curves are built based on appropiate standards, one for total carbon (TC) and the other for inorganic portion (IC). The measurement model is shown below and consists of the difference between the concentrations of TC and IC from the calibration curve, recovery and matrix effects factors, and method blank corrections.
y<-(((A_CT-b_CT)/m_CT)*F_RecupCT*F_MatrizCT-((A_CI-b_CI)/m_CI)*F_RecupCI*F_MatrizCI)*F_dil+(B_CT-B_CI)
Probability distributions of the input variables
The probability distributions can be obtained from previous experiments, or derived from experience or knowledge. Some common distributions are normal, triangular, and rectangular. For this example, all input variables are considered normally distributed.
for (i in 1:n){
A_CT<-rnorm(1, mean=20000, sd=1000)
b_CT<-rnorm(1, mean=8.0924, sd=0.80924)
m_CT<-rnorm(1, mean=1051.3, sd=52.565)
F_RecupCT<-rnorm(1, mean=1.111, sd=0.02222)
F_MatrizCT<-rnorm(1, mean=1.111, sd=0.05556)
A_CI<-rnorm(1, mean=14000, sd=700)
b_CI<-rnorm(1, mean=1814.5, sd=181.45)
m_CI<-rnorm(1, mean=1167, sd=116.7)
F_RecupCI<-rnorm(1, mean=1.111, sd=0.02222)
F_MatrizCI<-rnorm(1, mean=1.111, sd=0.05556)
F_dil<-rnorm(1, mean=2, sd=0.01)
B_CT<-rnorm(1, mean=0.1, sd=0.01)
B_CI<-rnorm(1, mean=0.3, sd=0.03)
y<-(((A_CT-b_CT)/m_CT)*F_RecupCT*F_MatrizCT-((A_CI-b_CI)/m_CI)*F_RecupCI*F_MatrizCI)*F_dil+(B_CT-B_CI)
COT_v<-c(COT_v,y)
}
Random sampling and simulation
The number of simulations can be set with readline
n <- readline(prompt="Set the number of simulations:")
n <- as.integer(n)
Collect all output values from the simulations and plot the results
x_vector<-sample(1:n)
COT<-data.frame(x_vector,COT_v)
ggplot(COT, aes(x = x_vector, y = COT_v)) + geom_point(size = 1, color = "darkblue", alpha = 0.2) + labs(y = "COT_v (mg/L)", x = "Run number")
hist(COT_v, ylab = "COT_v (mg/L)", col = "lightblue")
Analyze the final distribution
Finally, the obtained distribution should be checked for signs of particular curtosis or skewness. You can revise the mean and the summary of the output variable with
mean(COT_v)
summary(COT_v)
The final combined uncertainty and the coefficient of variation are
sd(COT_v)
cv=sd(COT_v)/mean(COT_v)*100
3. Final remarks
The obtained uncertainty based on the Monte Carlo method can be further compared with other estimation methods to acquire a broader understanding of the output variable distribution and the relationships between variables. For example, if the GUM, Kragten, and Monte Carlo approaches lead to similar results, it could be assumed that the measurement model is not greatly affected by input variances and that their relationships are mainly linear with the output variable. However, if the results differ, this might indicate latent variable dependencies and model sensitivity to small changes in the input variables that should be further studied.
This method provides a computational estimate of uncertainty based on previous laboratory data for non-tested conditions. In particular, for the TOC analysis shown here as an example, the Monte Carlo approach can be advantageous for rapid estimates when studying a broad concentration range and diverse matrices compared to other estimation methods.