Program Description pwlCopula and VGen.
J.Ch.Strelen
Copulas comprehend the entire dependence structure of
multivariate distributions, not only the correlations. Together
with the marginal distributions of the vector elements, they
define a multivariate distribution which can be used to generate
random vectors with this distribution. The MATLAB program
pwlCopula implements input models with this method, for random
vectors and time series. The copulas are estimated from observed
samples of random vectors. It is fast and allows for random vectors
with high dimension, for example 100. The generation algorithm is
also implemented with Java methods in VGen.
The technique is described in the paper J.Ch.Strelen, Tools for
Dependent Simulation Input with Copulas, submitted for
publication. See also www.informs-sim.org/wsc07papers/058.pdf
Basically, the MATLAB program pwlCopula calculates the copula,
provides some statistics and diagrams which serve the purpose to
examine the quality of this model, and can generate random
vectors and time series. The Java classes generate random vectors
and time series using a copula which was calculated with
pwlCopula. Java classes are easier to integrate into simulation
models than MATLAB programs.
For the copula, the MATLAB program uses a sample of independent
vectors with dimension D or a time series whose elements can be
vectors as well, dimension D' (or Ds in the program). They are
stored as follows: First value is the dimension D or D', respectively,
the second is the sample size n, then one vector after the other,
all without line feed.
The copula can be stored in a copula file (.cop). During the
calculation, the program needs empirical marginal distributions.
They can be stored, too (.emp). For these calculations, the
user must specify some parameters:
* K, integer, determines the accuracy. This is the
granularity, the higher, the more accurate. We used
values between 10 and 4000.
* n_by_K, integer, defines the sample size n = n_by_K * K.
Thus K divides n.
* The name of the file which the sample is read from.
* The window width m only if the copula is concerned with
a time series. This defines how accurately the dependence
between succeeding time series elements are modelled.
We tried m=2,3,4.
Using this copula, pwlCopula can generate random vectors or a
time series which can be strored in a file for later use in a
simulation model. For the generation, the user specifies
* The random number stream
* How many vectors are to be generated
* The kind of inverse transformation
o One method (2) generates only values which occur in the
sample
o The other (1) with linear interpolation of the empirical
distribution function also values in between
Moreover, the program can calculate statistics and plot diagrams
with the generated random vectors or the time series, and
corresponding statistics and diagrams with the given sample. The
modeller can compare them in order to obtain insight in how good
is the copula model. The statistics concern the means and the
variances in each dimension, and correlations between pairs of
dimensions. They are calculated for the original sample on one
hand, and for the generated vectors on the other. The absolute
values of the differences are taken as measure of accuracy. We
consider a difference of means absolutely if at least one of the
absolute values of the means is less than 0.00001, relatively
otherwise. We consider the difference of two coefficients of
variation if both according absolute values of the means are
greater than 0.00001, the difference of the standard deviations
otherwise. We consider the difference of two correlations if both
according standard deviations are greater than 0.00001, the
difference of the covariances otherwise. The greatest absolute
value of these differences, the maximum statistical deviation,
is a combined measure of accuracy.
If one replicates the generation process, say r times, the
smallest observed maximum statistical deviation and the greatest
observed maximum statistical deviation are an (approximate)
confidence interval to the confidence level 1 – 0.5^(r-1).
Scatter diagrams are for visual inspection. In each of them, the
value pairs of two different elements of the vectors are plotted
as points. Looking on the diagram, one gets insight in the
structure of dependency of these two dimensions: There may be
regions with no points - obviously the corresponding value pairs
do not occur at all, or with small probabilities. In the other
regions, the points may be differently dense which indicates
different probabilities of occurrence in this region.
The modeller can compare corresponding scatter diagrams of the
original sample on one hand, and of the generated vectors on the
other. If regions without points correspond, and if the visual
impression of the frequency is similar, this is a hint that the
copula model is accurate.
For time series, we calculate also correlations between two
vector elements in the same dimension, but at different times i_1
and i_2 with the lag |i_1 - i_2|. Again, these correlations are
calculated for the original sample and for the generated vectors,
and the absolute value of their difference is taken as measure of
accuracy. These differences grow with growing lag, in general.
Therefore it makes no sense to consider only their maximum value.
We provide diagrams with differences for different lags instead.
The Java classes are only for the generation of random vectors
and time series, they implement the same algorithms as the
according part of the MATLAB program pwlCopula. They import a
copula and empirical distributions which were calculated and
stored in files .cop and .emp before with pwlCopula.
They are not interactive, the parameters must be passed to the
Java objects via method calls, the file name without extender.
The Java generation is about 80 times faster.
The class VectorGenerator containes the methods for setup the
program, buildCopula and buildEmpDistr, and for generating
vectors, gen_u_vector, gen_u_ar, and gen_z. The class
Zufalls_Zahlen is for univariate uniform random numbers. If the
copula is for random vectors, succeeding calls of gen_u_vector
and gen_z generate one vector with dimension D. If the copula is
for time series, succeeding calls of gen_u_ar and gen_z generate
one element of a time series, a vector with dimension D'. The
first generated elements of the time series are not stationary,
they should be skipped.
pwlCopula and the Java programs VGen are copyright the original
author and the University of Bonn, and is published here under
the GNU General Public License (See
http://www.fsf.org/licenses/licenses.html).