@ecfairle/mccli

cli for CVD Policy Model Montecarlo simulation

Downloads in past

Stats

StarsIssuesVersionUpdatedCreatedSize
@ecfairle/mccli
1.10.46 years ago8 years agoMinified + gzip package size for @ecfairle/mccli in KB

Readme

Montecarlo CLI (Command Line Interface)

Usage

``` Usage: mc options Commands: init initialize Montecarlo files run-sims iterations run Montecarlo simulations Options: -h, --help Show help boolean ```

Getting Started

Installation

  1. Download Nodejs v6.5
  1. Download Python v3.5.2
  1. In the command line, install the montecarlo CLI by running npm install -g @ecfairle/mccli (this same command can be used to update to the latest version)

Initialization

Execute mc init (in command line within model directory) to initialize Montecarlo inputs:
  1. default number of iterations
  1. name of model executable
  1. dat files (from modfile) to be varied
  1. inp files to use
creating folder structure as follows: ``` MC └───inputs
input_data.json
``` where input_data.json contains the initial data for montecarlo simulation.

Modfile setup

  1. Copy files for simulation to corresponding montecarlo files using the naming convention {name}_mc0.dat (or {name}_mc0.inp) where name is the file name specified when chossing .dat/.inp files during mc init.
  1. Add mc.dat files to corresponding .lst files and change lines in mc0.inp file to choose the apporopriate line from the .lst file.
  1. Create files with the same format as original model files but with standard deviations instead of means. These files use a similar naming convention: {name}_sd.dat (not for .inp files)

.inp file setup.

Then create inp_distribution.txt in directory MC/inputs, which should break down the .inp file variation into sections by keyword (indicating the lines to vary), e.g.: ``` HIEFFECT,1
g=1,0.5477,0.02
MODEFFECT,1
g=1,0.4,0.02
HICOSTAHA,6
g=2, 0.0095, 0.0030, 0.0        #Myopathy
g=3, 1.17, 0.15, 0.0            #Liver panel
g=4, 7.30, 0.91, 0.0            #Doctor Visit
g=5, 1.50, 0.47, 0.0            #Stroke
g=6, 7.75, 3.00, 0.0            #Diabetes
g=7, 148.30, 37.04, 0.0         #Statin, high intensity 
MODCOSTAHA,6
g=2, 0.0095, 0.0030, 0.0        #Myopathy
g=3, 1.17, 0.15, 0.0            #Liver panel
g=4, 7.30, 0.91, 0.0            #Doctor Visit
g=5, 1.50, 0.47, 0.0            #Stroke
g=6, 7.75, 3.00, 0.0            #Diabetes
g=7, 48.67, 12.17, 0.0          #Statin, moderate intensity 
STATINQALY,5
0.000001, 0.0000005, 0.0     #Myopathy
0.0000312, 0.00001560, 0.0   #Stroke
0.0000747, 0.0000448, 0.0    #Diabetes
0.0001, 0.000248, 0.0        #Unforeseen
0.0, 0.0008, 0.0             #Pill disutility
``` The sections are further broken down by components, which each make up a part of their overall distribution. Here, sections include HIEFFECT (one component), HICOSTAHA (six components) etc.
Components
A section can consist of a single component but multiple components allows you to separate data in ways that aren't considered by the model itself.
Distributions
The program will sample from distribution dist_name (normal if omitted) with parameters param1,param2,... and sum the results from each line. The sum will replace the value on the lines in which keyword is found. Supported distributions are:
  • Normal - param1: mean, param2: standard deviation (with exception of using mean option)
  • LogNormal - param1: mean, param2: standard deviation
  • Beta - param1: alpha, param2: beta
  • Gamma - param1: shape, param2: scale
Correlated Components
To indicate that samples should be correlated, give them the same group name (can be between labels). If a component shouldn't be correlated with any other component, either exclude the group argument or give it a unique group
Upper and Lower Bounds
Lower and/or upper bounds can be included but will default to -inf, +inf respectively. To add upper bound w/o lower bound put nothing inside lowerbound commas e.g. param1,param2,,upper_bound
MEAN option
For normal distributions param1 can simply be 'MEAN', indicating the mean of the distribution should be determined by the line in the .inp file. Here, param2 must be a coefficient of variation. This option is used to simplify the case in which there are many lines with the same significance but different means (these will be assumed to be correlated and have the same coefficient of variation). ```
keyword,num_components 
[g=group_name,][dist_name,]param1,param2,...[,lower_bound][,upper_bound]  
[g=group_name,][dist_name,]param1,...   
... 
[g=group_name,]...
```

Running Simulations

Execute mc run to run the default number of simulations or mc run n to run n simulations. This creates a folder structure as follows: ``` MC ├───inputs │ input
data.json │ inpdistribution.txt │ ├───inputvariation │ │ inp.txt │ │ │ └───datfiles │ prfp0.dat │ prfp1.dat │ rsk0.dat │ rsk1.dat │ └───results
│   .run
├───breakdown
│       0712_0.frmt
│       0712_1.frmt
├───cumulative
│       0712_0.dat
│       0712_1.dat
└───summary
ageranges_1ST_MI.csv
ageranges_95PLUS_LYRS.csv
ageranges_CHD_DEATH.csv
ageranges_DISC_LYRS.csv
ageranges_DISC_NCVD$.csv
ageranges_DISC_QALY.csv
ageranges_DISC_TOT$.csv
ageranges_DIS_DEINTERV$.csv
ageranges_DIS_DHCHD$.csv
ageranges_DIS_DHINTERV$.csv
ageranges_DIS_DHSTR$.csv
ageranges_INC_CHD.csv
ageranges_INC_STROKE.csv
ageranges_NCVD_DEATH.csv
ageranges_PREV.csv
ageranges_STROKE_DEATH.csv
ageranges_TOT_DEATH.csv
ageranges_TOT_MI.csv
ageranges_TOT_STROKE.csv
``` Monte Carlo runs produce two output directories: results and input_variation.

Results

Directory results contains model outputs, including:
  1. cumulative results (copies of outfile.dat). Naming convention: {name}_{simulation #}.dat
  1. breakdown results (rearranged data from .out file). Naming convention: {name}_{simulation #}.frmt
  1. summary results (comma separated value files split up by outcome and organized by age-range and gender).

Input Variation

Directory input_variation contains varied model inputs. These can be used to verify that inputs follow the desired distributions. In particular:
  1. File inp.txt shows the ultimate value used to replace corresponding values in the .inp file (regardless if it's actually used). In addition, at the top it includes counts of the number of places in each .inp file the label is found.
  1. Directory dat_files contains copies of the modified dat files (from modfile) for each run. Naming convention: {name}_{simulation #}.dat