Specifying input data¶
This document explains how to create input data for Maud.
Overview¶
Maud inputs are structured directories, somewhat inspired by the PEtab format. A Maud input directory must
contain a toml file called
config.toml
which gives the input a name, configures how Maud will be
run and tells Maud where to find the information it needs. It must also include
a file containing a kinetic model definition, one specifying independent prior
distributions, a file with information about the experimental setup and another
one recording the results of measurements. Finally, the input folder can also
optionally include extra files specifying non-independent priors and a file
specifying initial parameter values for the MCMC sampler.
For some working examples of full inputs see here.
The configuration file¶
The file config.toml
must contain these top-level fields:
name
String naming the inputkinetic_model_file
Path to atoml
file defining a kinetic modelpriors_file
Path to acsv
file specifying independent priorsexperimental_setup_file
Path to atoml
file specifying the experimental setupmeasurements_file
Path to a code:csv file specifying measurementslikelihood
Boolean representing whether to use information from measurements
The following optional fields can also be specified:
reject_non_steady
Boolean saying whether to reject draws that enter non-steady statesode_config
Table of configuration options for Stan’s ode solvercmdstanpy_config
Table of keyword arguments to the cmdstanpy method samplecmdstanpy_config_predict
Table of overriding sample keyword argments for predictionsstanc_options
Table of valid choices for CmdStanModel argument stanc_optionscpp_options
Table of valid choices for CmdStanModel argument cpp_optionsvariational_options
Arguments for CmdStanModel.variationaluser_inits_file
path to a csv file of initial valuesdgf_mean_file
path to a csv file of formation energy meansdgf_covariance_file
path to a csv file of formation energy covariancessteady_state_threshold_abs
absolute threshold for Sv=0 be at steady statesteady_state_threshold_rel
relative threshold for Sv=0 be at steady statedrain_small_conc_corrector
number for correcting small conc drains
Here is an example configuration file:
name = "linear"
kinetic_model_file = "kinetic_model.toml"
priors_file = "priors.csv"
measurements_file = "measurements.csv"
experimental_setup_file = "experimental_setup.toml"
likelihood = true
steady_state_threshold_abs = 1e-6
[cmdstanpy_config]
refresh = 1
iter_warmup = 200
iter_sampling = 200
chains = 4
save_warmup = true
[ode_config]
abs_tol = 1e-4
rel_tol = 1e-4
max_num_steps = 1e6
timepoint = 1e3
This file tells Maud that a file representing a kinetic model can be found at
the relative path kinetic_model.toml
, and that priors, experimental
setup information and measurements can be found at priors.csv
,
experimental_setup.toml
and measurements.csv
respectively.
The line likelihood = true
tells Maud to take into account the
measurements in measurements.csv
: in other words, not to run in
priors-only mode.
When Maud samples with this input, it will create 4 MCMC chains, each with 200 warmup and 200 sampling iterations, which will all be saved in the output csv files. the ODE solver will find steady states by simulating for 1000 seconds, with a step limit as well as absolute and relative tolerances.
The kinetic model file¶
A Maud input should use exactly one kinetic model file, which is written in the
toml markup language and pointed to by
the kinetic_model
field of the input’s config.toml
file. This
section explains how to write this kind of file.
If it doesn’t make sense, make sure to check the code that tells Maud what a kinetic model should look like.
name¶
This top level field is a string describing the kinetic model.
compartment¶
A table with the following obligatory fields:
id
A string identifying the compartment without any underscore characters.name
A string describing the compartmentvolume
A float specifying the compartment’s volume
Here is an example compartment table:
compartment = [
{id = 'c', name = 'cytosol', volume = 1},
{id = 'e', name = 'external', volume = 1},
]
metabolite¶
A table with the following obligatory fields:
id
A string identifying the metabolite without any underscore characters.name
A string describing the metabolite
Here is an example metabolite table:
metabolite = [
{id = "M1", name = "Metabolite number 1"},
{id = "M2", name = "Metabolite number 2"},
]
metabolite_in_compartment¶
A table that specifies which metabolites exist in which compartments, and whether they should be considered balanced or not. The fields in this table are as follows:
metabolite_id
The id of an entry in themetabolite
tablecompartment_id
The id of an entry in thecompartment
tablebalanced
A boolean
For a metabolite_in_compartment
to be balanced means that its
concentration does not change when the system is in a steady state. Often
metabolites in the external compartment will be unbalanced.
Here is an example metabolite_in_compartment
table:
metabolite_in_compartment = [
{metabolite_id = "M1", compartment_id = "e", balanced = false},
{metabolite_id = "M1", compartment_id = "c", balanced = true},
{metabolite_id = "M2", compartment_id = "c", balanced = true},
{metabolite_id = "M2", compartment_id = "e", balanced = false},
]
enzyme¶
A table with the following obligatory fields:
id
A string identifying the enzyme without any underscore characters.name
A string describing the enzymesubunits
An integer specifying how many subunits the enzyme has.
enzyme = [
{id = "r1", name = "r1ase", subunits = 1},
{id = "r2", name = "r2ase", subunits = 1},
{id = "r3", name = "r3ase", subunits = 1},
]
reaction¶
A table with the following obligatory fields:
id
A string identifying the reaction without any underscore characters.name
A string describing the reactionmechanism
A string specifying the reaction’s mechanismstoichiometry
A mapping representing the stoichiometric coefficient for eachmetabolite_in_compartment
that the reaction creates or destroys.
In addition the following optional fields can be specified:
water_stoichiometry
A float indicating the reaction’s water stoichiometrytransported_charge
A float indicating the reaction’s transported charge
Valid options for the mechanism
field are:
reversible_michaelis_menten
irreversible_michaelis_menten
drain
Each key in the stoichiometry
should identify an existing
metabolite_in_compartment
using a metabolite
id and a
compartment
id, separated by an underscore.
Here is an example of an entry in a reaction table:
[[reaction]]
id = "r1"
name = "Reaction number 1"
mechanism = "reversible_michaelis_menten"
stoichiometry = { M1_e = -1, M1_c = 1}
enzyme_reaction¶
A table indicating which enzymes catalyse which reactions, with the following fields:
enzyme_id
The id of an entry in theenzyme
tablereaction_id
The id of an entry in thereaction
table
Here is an example enzyme_reaction
table:
- enzyme_reaction = [
{enzyme_id = “r1”, reaction_id = “r1”}, {enzyme_id = “r2”, reaction_id = “r2”}, {enzyme_id = “r3”, reaction_id = “r3”},
]
allostery¶
An optional table with the following fields:
enzyme_id
The id of an entry in theenzyme
tablemetabolite_id
The id of an entry in themetabolite
tablecompartment_id
The id of an entry in thecompartment
tablemodification_type
A string specifying the kind of modification
Valid options for the modification_type
field are:
activation
inhibition
Here is an example of an entry in a allostery table:
[[allostery]]
enzyme_id = "r1"
metabolite_id = "M2"
compartment_id = "c"
modification_type = "activation"
competitive_inhibition¶
An optional table with the following fields:
enzyme_id
The id of an entry in theenzyme
tablereaction_id
The id of an entry in thereaction
tablemetabolite_id
The id of an entry in themetabolite
tablecompartment_id
The id of an entry in thecompartment
table
Here is an example of an entry in a allostery table:
[[competitive_inhibition]]
enzyme_id = "r2"
reaction_id = "r2"
metabolite_id = "M1"
compartment_id = "c"
phosphorylation¶
An optional table with the following fields:
enzyme_id
The id of an entry in theenzyme
tablemodification_type
A string specifying the kind of modification
Valid options for the modification_type
field are:
activation
inhibition
Here is an example of an entry in a allostery table:
[[phosphorylation]]
enzyme_id = "r1"
modification_type = "activation"
The experimental setup file¶
This is a file written in toml, giving qualititative information about the input’s experimental setup.
This section describes this file’s fields.
experiment¶
An obligatory table containing information that is specific to each of the input’s experiments, with the following fields:
id
A string identifying the experiment, without any underscoresis_train
A boolean indicating whether to include the experiment in the training datasetis_test
A boolean indicating whether to include the experiment in the test datasettemperature
A float specifying the experiment’s temperature.
enzyme_knockout¶
An optional table specifying knockouts of enzymes, with the following fields:
experiment_id
Id of the knockout’s experimentenzyme_id
Id of the enzyme that was knocked out
phosphorylation_knockout¶
An optional table specifying knockouts of phosphorylation effects, with the following fields:
experiment_id
Id of the knockout’s experimentenzyme_id
Id of the enzyme whose phosphorylation was knocked out
The measurements file¶
This is a csv file with the following fields:
measurement_type
A string specifying what kind of thing was measuredtarget_id
A string identifying the thing that was measuredexperiment
A string specifying the measurement’s experimentmeasurement
The measured value, as a floaterror_scale
The measurement error, as a float
Valid options for the measurement_type
field are:
mic
Concentration of ametabolite_in_compartment
enzyme
Concentration of an enzymeflux
Flux of a reaction
error_scale is the standard deviation of a normal distribution for flux measurements or the scale parameter of a lognormal distribution for concentration measurements.
The priors file¶
This is a csv file representing pre-experimental information that can be represented by independent probability distributions.
The priors table has the following fields:
parameter
String identifying a parametermetabolite
String identifiercompartment
String identifierenzyme
String identifierreaction
String identifierexperiment
String identifiermodification_type
String identifierlocation
Float specifying a locationscale
Float specifying a scalepct1
: First percentile of the prior distributionpct99
: 99th percentile of the prior distribution
See the id_components
fields in the corresponding code file
for which columns need to be specified for each kind of
prior.
Prior distributions can either be specified by a location and scale or by a 1st and 99th percentile, but not both.
Multivariate priors for formation energy parameters¶
The use of a single csv file for priors was motivated by the fact that, for most model parameters, it is safe to model the pre-experimental information as independent. For example, knowing the value of one enzyme’s \(kcat\) parameter does not significantly narrow down another enzyme’s \(kcat\) parameter. Thus in this case, and most others, specifying each parameter’s marginal prior distribution is practically equivalent to specifying the full joint distribution.
However, the available information about formation energy parameters is typically not independent. In this case the available information is mostly derived from measurements of the equilibrium constants of chemical reactions. Knowing the formation energy of one metabolite is often highly informative as to the formation energy of another metabolite which produced or destroyed by the same measured chemical reaction. Metabolites with common chemical groups are also likely to have similar formation energies, introducing further non-independence.
In some cases this dependence is not practically important, and Maud will work well enough with independent priors in a csv file as above. For other cases, Maud allows non-independent prior information to be specified in the form of the mean vector and covariance matrix of a multivariate normal distribution. This information is specified as follows.
First, to indicate where to find the required vector and matrix, the fields
dgf_mean_file
and dgf_covariance_file
should be added to the
top level of the file config.toml
in the input folder. For example:
name = "methionine_cycle"
kinetic_model = "methionine_cycle.toml"
priors = "priors.csv"
experiments = "experiments.csv"
dgf_mean_file = "dgf_prior_mean.csv"
dgf_covariance_file = "dgf_prior_covariance.csv"
These fields should be paths from the root of the input folder to csv
files. The dgf_mean_file
should have columns caled metabolite
and prior_mean_dgf
, with the former consisting of ids that agree with
the rest of the input folder (in particular the kinetic model file) and the
latter of non-null real numbers. For example
metabolite |
prior_mean_dgf |
5mthf |
-778.2999561 |
adn |
-190.9913035 |
ahcys |
-330.3885785 |
amet |
-347.1029509 |
atp |
-2811.578332 |
cyst-L |
-656.8334114 |
… |
The dgf_covariance_file
should be a valid covariance matrix surrounded
by metabolite ids. The first column should be called metabolite
and
populated with ids that are consistent with the other inputs. Subsequent
columns should have names that match the first column. Here is (the start of)
an example:
metabolite |
5mthf |
adn |
ahcys |
amet |
atp |
cyst-L |
5mthf |
457895.226 |
0.023993053 |
2.911539829 |
38.09225442 |
0.023892737 |
0.610913519 |
adn |
0.023993053 |
2.081489779 |
1.034504533 |
1.00E-10 |
0.444288943 |
0 |
ahcys |
2.911539829 |
1.034504533 |
16.2459485 |
4.297104388 |
0.341195482 |
13.08072127 |
amet |
38.09225442 |
1.00E-10 |
4.297104388 |
1000025.576 |
-1.00E-10 |
2.066261457 |
atp |
0.023892737 |
0.444288943 |
0.341195482 |
-1.00E-10 |
2.22005692 |
0 |
cyst-L |
0.610913519 |
0 |
13.08072127 |
2.066261457 |
0 |
16.61784088 |
… |
The initial parameter values file¶
Initial parameter values can be entered in a json
file. This file should
be a valid option for the inits
argument of the cmdstan sample method.