# Application of built-in distribution of Monte Carlo simulation SciPy with Python

2022-01-30 11:52:23

#### use SciPy The probability distribution is used for scenario analysis Monte Carlo , Picture by Luka Nguyen come from Pixabay, Free for commercial use

This tutorial will demonstrate how we can Python Monte Carlo simulation model is established in . We will

• Use SciPy Built in distribution , especially . Normal distribution 、Beta The distribution and Weibull Distribution .
• by β-PERT Distribution add a new distribution subclass .
• Random numbers are extracted by Latin high cubic sampling .
• Three Monte Carlo simulation models are established .

### 0. Dependence

In addition to importing SciPy Of _ Statistics _ Outside the library , Also imported our all-weather software package pandas and numpy.

### The concept of Monte Carlo simulation The picture is from Pixabay Of Hans Braxmeier, Free for commercial use

Monte Carlo （MC） Method It's a simulation technology , It is a model of _ Output _ Variables construct probability distributions , Some of them _ Input _ The parameter is a random variable .MC Methods are sometimes referred to as _ Multi probability simulation _ technology , Because it integrates multiple random variables , The combined effect of these random variables is not easy to be described by a closed equation .

MC The method is by John - feng - Neumann and Stannis - Where's Ulan 20 century 40 Invented in the late s , They were working at the Los Alamos Laboratory . When they try to calculate neutron collisions in a deterministic way , They came across a dead end . Uwam's idea is to use random sampling , From the early " supercomputer "ENIAC Support , To get an approximate solution . Government secrecy requires the use of a code name ; feng - Neumann and a colleague from Ulan suggested using _ Monte Carlo _ The name , Ulan's uncle often went to the casino there before World War II ： He " Just go to Monte Carlo " gambling .(Ułam)

One MC Examples of simulation problems ： An enterprise wants to simulate the future success of a new product , So the following simulation model is established .

• Expected sales ,v, It's a random variable , follow β-PERT Distribution , from 3 Point estimates arrive at .
• The price of the product p Will be affected by the negotiations -- Some customers will ask for quantity discount or advance payment discount ; The planner decided to model the uncertainty of the average price the company will achieve as an average of 20、 The standard deviation is 2 Normal random variable .
• The raw material cost of the product ,m, Will be affected by the upcoming price rise of suppliers , Estimated by another normal distribution .
• The model should provide multiple results ： income 、 profits 、 The total cost .

Generally speaking , The simulation model accepts input parameters defined as random variables ; It then returns a probability distribution for each output variable , Each output variable is a function of its input parameters . The function can be a simple sum or product , It can also be the index of input , perhaps -- As we will see in an example -- A higher-order function including nested random variables . The target variable will represent a density function , Describe how the distribution of input variables affects each other .

The simulation model obtains the output distribution by sampling the input distribution .1,000 or 100,000 This iteration provides data points that describe the behavior of each input variable . The equation of the output variable converts these inputs into as many samples of the output variable as possible . Once we get this sample , We can be like any built-in SciPy Examine its pattern like distribution -- Its average 、 Tendency of magnitude and outliers .

### 2. Latin high cube sampling

NumPy Of _ Random _ The library provides a way to fill an array with uniform random numbers .SciPy All distribution objects have the function of generating random variables _rvs()_ Method . However , Monte Carlo simulation requires a large number of random number arrays , For these arrays ,_ Stratified sampling _ Method , Such as ** Latin hypercube sampling （LHS）** Is the best way .

LHS Divide the sample space into non overlapping units , The probability of each unit is equal .LHS It ensures that the random number is more evenly extracted from the full range of the sample space . It prevents random numbers from being sampled _ caking _ , And caking distorts the frequency curve .LHS Sampling will lead to more accurate simulation results . It will result in a lower standard error , And reduce the number of iterations .

To demonstrate , We use SciPy Of LatinHyperCube The sampler creates a 10 A small array of random numbers . uniform (0;1) After the random number is generated , You can rescale to fill a wider range , for example 0 To 100 Between . But for our purposes ,0 To 1 The standard random number between can meet the requirements . In the next chapter , We will use LHS Samples to generate random variables with a specific distribution ：PERT、 Normal and Weibull function .

### 3. The probability distribution of modeling the uncertainty in the simulation

#### 3.1 Used to simulate expert estimates PERT Distribution

My previous articles --" use β-PERT The distribution is modeled for expert estimates "-- It's an explanation PERT Distribution A tutorial for （Python Scenario analysis . use β-PERT The distribution models the expert estimates | Towards Data Science ）.)

If a field 、 Process or industry experts can provide a so-called... For a random process _ Three points _ It is estimated that （Three -point estimation）, Including the worst case 、 Possible and best case results , that PERT Function can connect these points , Fairly . It will convert these points into probability distributions that we can assign to random variables in the simulation model .

SciPy Of 123 A distributed directory does not include PERT function . therefore , We create it as a new subclass , Inherited from SciPy Of _rv_continuous_ Parent class .

For our example , We instantiate... With four parameters PERT Subclass , Predict the sales volume of a new product .

We chose this PERT Distribution to simulate the expected sales volume , The most likely value is 12,000 individual , The box is at the minimum 8,000 One and the largest 18,000 Between the two . Its expectations , That's the average , It will be 12,333 Sales units . It has a small positive skewness , Still similar to the normal distribution ; And a moderately negative excess kurtosis -0.62, This makes it a plate-shaped or bulky distribution , In its tail, it is easy to have fewer outliers than the normal distribution .

The first 3 Line extracts a N=10,000 Of a uniform random variable LHS sample . Then the first 4 Line array as its input parameter . Percentage points _ function ppf()_ take 10,000 A standard uniform number is interpreted as a probability , And calculate 10,000 individual PERT measurement , And fill it into the array randP in .

The histogram shows the simulated PERT The shape of the distribution . #### 3.2 Normal distribution

We apply the same method （ Remove the need to create a new distribution subclass ）, Extract... From the normal distribution 10,000 individual LHS sample , We hope it can reflect the uncertainty inherent in the sales price and unit cost of raw materials . The average selling price will be 20 euro , The standard deviation is 2 euro .  The second normal distribution simulates the unit cost of raw materials , The average value is 13 euro , The standard deviation is 1.40 euro .

#### 3.3 simulation 1-- The sum and product of random variables are modeled

We created three random variables , Represents the uncertainty of three factors , These factors will determine the probability of profit or risk of loss of new products .

• 1x PERT-- For sales v.
• 2 A normal variable -- Represents the selling price p And unit cost of raw materials m.

Let's assume that other costs , That is, those costs that have nothing to do with the supplier's price , You can use a deterministic variable o To reflect .

In order to calculate the target variables of the simulation model -- Gross profit of the product GP--, We link these random variables together , As shown below .

• v = Number ,p = Price ,m = Unit cost of raw materials ,o = Other unit costs
• GP = v * (p - m - o)

As a secondary output variable , We can simulate revenue R Simultaneous simulation GP. Generally speaking , Once we define their input random variables , We can set the simulation model with any number of output variables . Each output variable can be composed of one or more input random variables , And will be determined by its own 10000 An array of variables to describe .

• R = v * p

This establishes our simple simulation model . In order to review the return gross profit random variable GP Properties of the distributed array , We wrote a function _dist_properties()_ , It will read the array and return its moment and the selected quantity value .

For skewness and kurtosis , We make use of SciPy Of _skew() and kurtosis()_ function , We need the help of numpy Of _asscalar()_ The conversion method changes the result from an array to a flat number . We are in the dictionary _dict_moments_ Collect their values in , And add the name of the measure . The first 14 Line list understanding prints out the contents of the dictionary line by line .

And then we're in the second 17 The following line calculates the quantity value , Collect them in another dictionary _dict_quantiles_ in , And through 27 The list of rows is understood to print it .

We will these two dictionaries , That is, the moment value and quantity value are combined , Form a comprehensive dictionary , namely _dict_metrics_.  The average profit will reach 49,250 euro , Close to the median .

Skewness and kurtosis are quite moderate , It means that compared with the normal distribution , The tendency of outliers is small .

The values in the table show , If the selling price and the unit cost of raw materials tend to the worst at the same time , The new product will produce 5% The risk of . The profit will exceed 10,731 euro , The probability of 90%（10% Dimension of ）.

In the next chapter , We will go further , Build a high-order simulation model . Although the first model assembles output variables from the sum and product of simple random variables , But we will now develop one that includes _ nesting _ Simulation model of random variables .

#### 3.5 Simulation 2-- Nested distribution

If you've read my previous articles (Intro to Probability Distributions and Distribution Fitting with Python's SciPy ), You will remember that we used Weibull Distribution to solve failure Time or Component life problem . The life of a component can usually be used Weibull Distribution to model , Its shape reflects _ Failure rate _, The proportional parameter is set to the so-called _ Characteristic life _.

The first spacecraft to Mars will be equipped with electronic circuit boards , Its characteristic life may be 50,000 Working hours . In order to estimate the redundancy required in spacecraft design , We will simulate which part of all the circuits will burn down in how many hours .

Let's assume that , The reality is ,Weibull Two parameters of the distribution -- Shape and feature life -- Not sure . therefore , We will estimate these distribution parameters as our own random variables .Weibull The shape and its size can fluctuate within the boundaries set by its distribution pattern .

• For shape parameters , The engineering team estimated that it should be about an average of 1.5, The standard deviation is 0.1 Is a normal distribution .
• For characteristic life , We use... Provided by the engineering team 3 Point estimation , That is to say 45,000、50,000 and 60,000 Maximum operating hours 、 Possible and minimum values . Then we derive a PERT Distribution to reflect the range of uncertainty .

Two auxiliary functions are sampled by Latin hypercube , Separate extraction 10,000 Random variables .

• _wei_shp()_ Returns the shape parameter of normal distribution .
• _wei_charlife()_ Returns the lifetime of the feature PERT Variable array .
• We will Weibull Positional arguments （ Also known as waiting time ） Set to 0, It means that product faults will emerge immediately after production and quality testing . In the 7 In line , We put random variables together in a Weibull Output variables , Used to calculate the time before failure .Weibull Function to obtain its shape and scale parameters , Auxiliary function called . Target variable _rand_CL_ Will save a file created by 10,000 An array of analog outputs , Represents the life of the component , It's in hours .

There are three nested distribution results ： One PERT And a normal random variable as Weibull Parameters of the distribution . We can call this result flexible or random Weibull Distribution , Instead of a fixed distribution of point values .

For how to review the attributes of the results , We can choose two options .

• Analyze it as an array , Just like our previous simulation results
• application SciPy Of _rv_histogram class _,_ This kind _ Divide the output array into histograms , And turn it into a " real "SciPy A probability distribution , To do this, we can call pdf and ppf Equal distribution function .

Let's look at the histogram class . The figure shows us in blue in the array rand_CL Split life simulated in . Orange shows _rv_histogram_ Class the probability density function derived from the array .

A function _histdist_properties()_ To calculate the histdist_ Properties of probability distribution . This function is different in some ways from the function we wrote before _distribution_properties() Different ._rv_histogram_ Classes do not provide the same distribution properties in all cases . for example , The minimum and maximum values must be from ._support()_ Method .

We call _histdist_properties() function , Get the measurements in the table . The data framework includes not only the cumulative distribution function and quantity value , And there's a list of random variables . Sampling function .rvs()_ Enables us to extract some random numbers from our output distribution -- If necessary , Or thousands , But no LHS Methods stratified sampling . above , We analyzed _rv_histogram_ Continuous... From the array Distribution . As shown in the figure , The distribution curve is not exactly the same as the array data .

below , We are 10,000 Call the function... On the original array of output numbers _dist_properties()_, And explain the moments and values . • The average life of components will reach 45,647 Hours .
• Life will be right , natural , There are more than 100,000 Outliers of hours .
• Kurtosis is unique -- The distribution is skewed , There is a tendency to have a long-lived component at its tail .
• 11,076 Hours later, ,10% The components will be burned .

Mission control informed us ,11000 Hours of operation will mark a critical threshold . After completing these utilization hours , Circuits will no longer be critical . These numbers mean , In terms of circuit boards , The design of the spacecraft should take into account at least 10% Redundancy of ： By adding spare circuit boards , These boards will turn on automatically , As a replacement for those circuit boards that start to fail .

So our spacecraft is almost ready to launch . Picture by Peter H from Pixabay shooting , Free for commercial use

#### 3.6 Model 3： Fixed parameter simulation

As a final exercise , Let's check Weibull How sensitive is the model to the uncertainty of its shape and characteristic life .

We will use a " Fixed form " Of Weibull Monte Carlo simulation of distribution .

• The shape parameters will be fixed to the average 1.5.
• Characteristic life ： by PERT The average of the distribution .50,500 Hours .  We call... Again _dist_properties()_ function , And insert its indicator into a data frame , So we can fix Weibull And previously nested 、 So it is " agile "Weibull Make parallel comparisons . As expected , We see fixed Weibull The standard deviation of the variable is low , It is generated without uncertainty of shape and scale parameters . But the difference is small . obviously , flexible Weibull The uncertainty of the shape and scale parameters of the model is quite closely condensed in our fixed Weibull Around the average used in the model . Lower skewness and kurtosis indicate a reduction in the tendency to outlier and asymmetry .

Simulate with different parameter values or different distributions , May show greater differences caused by increased uncertainty .

### 4.4. Conclusion

That's the end of today's tutorial .

We have experienced three examples of Monte Carlo simulation , Yes, it is SciPy Created by the toolbox provided by the library .

• The first model -- Profit simulation -- The simple sum and product of random variables are shown .
• The second simulation --Weibull Time before failure -- The concept of nested random variables is explained , When the parameter of the probability distribution itself is a random variable .
• The third simulation made a bold assumption ： The parameters of the distribution are precisely known . If we ignore the source of uncertainty in the simulation model , We may underestimate the distribution of possible results , however , Unless the uncertainty is condensed into the average of the parameters .

Jupyter Notebooks are available from GitHub download ：h3ik0th/MonteCarloSim. use SciPy Monte Carlo simulation (github.com)

• Monte Carlo , Picture by Luka Nguyen come from Pixabay, Free for commercial use
• Monte Carlo Casino , picture source ：Hans BraxmeierfromPixabay, Free for commercial use
• factory , picture source ：Peter HfromPixabay, Free for commercial use
• All other pictures ： Provided by the author from Python Driven Monte Carlo simulation Originally published in Medium Upper Towards Data Science, There, people continue the dialogue by emphasizing and responding to the story .