Specifying input data

This document explains how to create input data for Maud.

Overview

Maud inputs are structured directories, somewhat inspired by the PEtab format. A Maud input directory must contain a toml file called config.toml which gives the input a name, configures how Maud will be run and tells Maud where to find the information it needs. It must also include a file containing a kinetic model definition, one specifying independent prior distributions, a file with information about the experimental setup and another one recording the results of measurements. Finally, the input folder can also optionally include extra files specifying non-independent priors and a file specifying initial parameter values for the MCMC sampler.

For some working examples of full inputs see here.

The configuration file

The file config.toml must contain these top-level fields:

  • name String naming the input

  • kinetic_model_file Path to a toml file defining a kinetic model

  • priors_file Path to a csv file specifying independent priors

  • experimental_setup_file Path to a toml file specifying the experimental setup

  • measurements_file Path to a code:csv file specifying measurements

  • likelihood Boolean representing whether to use information from measurements

The following optional fields can also be specified:

  • reject_non_steady Boolean saying whether to reject draws that enter non-steady states

  • ode_config Table of configuration options for Stan’s ode solver

  • cmdstanpy_config Table of keyword arguments to the cmdstanpy method sample

  • cmdstanpy_config_predict Table of overriding sample keyword argments for predictions

  • stanc_options Table of valid choices for CmdStanModel argument stanc_options

  • cpp_options Table of valid choices for CmdStanModel argument cpp_options

  • variational_options Arguments for CmdStanModel.variational

  • user_inits_file path to a csv file of initial values

  • dgf_mean_file path to a csv file of formation energy means

  • dgf_covariance_file path to a csv file of formation energy covariances

  • steady_state_threshold_abs absolute threshold for Sv=0 be at steady state

  • steady_state_threshold_rel relative threshold for Sv=0 be at steady state

  • drain_small_conc_corrector number for correcting small conc drains

Here is an example configuration file:

name = "linear"
kinetic_model_file = "kinetic_model.toml"
priors_file = "priors.csv"
measurements_file = "measurements.csv"
experimental_setup_file = "experimental_setup.toml"
likelihood = true
steady_state_threshold_abs = 1e-6

[cmdstanpy_config]
refresh = 1
iter_warmup = 200
iter_sampling = 200
chains = 4
save_warmup = true

[ode_config]
abs_tol = 1e-4
rel_tol = 1e-4
max_num_steps = 1e6
timepoint = 1e3

This file tells Maud that a file representing a kinetic model can be found at the relative path kinetic_model.toml, and that priors, experimental setup information and measurements can be found at priors.csv, experimental_setup.toml and measurements.csv respectively.

The line likelihood = true tells Maud to take into account the measurements in measurements.csv: in other words, not to run in priors-only mode.

When Maud samples with this input, it will create 4 MCMC chains, each with 200 warmup and 200 sampling iterations, which will all be saved in the output csv files. the ODE solver will find steady states by simulating for 1000 seconds, with a step limit as well as absolute and relative tolerances.

The kinetic model file

A Maud input should use exactly one kinetic model file, which is written in the toml markup language and pointed to by the kinetic_model field of the input’s config.toml file. This section explains how to write this kind of file.

If it doesn’t make sense, make sure to check the code that tells Maud what a kinetic model should look like.

name

This top level field is a string describing the kinetic model.

compartment

A table with the following obligatory fields:

  • id A string identifying the compartment without any underscore characters.

  • name A string describing the compartment

  • volume A float specifying the compartment’s volume

Here is an example compartment table:

compartment = [
  {id = 'c', name = 'cytosol', volume = 1},
  {id = 'e', name = 'external', volume = 1},
]

metabolite

A table with the following obligatory fields:

  • id A string identifying the metabolite without any underscore characters.

  • name A string describing the metabolite

Here is an example metabolite table:

metabolite = [
  {id = "M1", name = "Metabolite number 1"},
  {id = "M2", name = "Metabolite number 2"},
]

metabolite_in_compartment

A table that specifies which metabolites exist in which compartments, and whether they should be considered balanced or not. The fields in this table are as follows:

  • metabolite_id The id of an entry in the metabolite table

  • compartment_id The id of an entry in the compartment table

  • balanced A boolean

For a metabolite_in_compartment to be balanced means that its concentration does not change when the system is in a steady state. Often metabolites in the external compartment will be unbalanced.

Here is an example metabolite_in_compartment table:

metabolite_in_compartment = [
  {metabolite_id = "M1", compartment_id = "e", balanced = false},
  {metabolite_id = "M1", compartment_id = "c", balanced = true},
  {metabolite_id = "M2", compartment_id = "c", balanced = true},
  {metabolite_id = "M2", compartment_id = "e", balanced = false},
]

enzyme

A table with the following obligatory fields:

  • id A string identifying the enzyme without any underscore characters.

  • name A string describing the enzyme

  • subunits An integer specifying how many subunits the enzyme has.

enzyme = [
  {id = "r1", name = "r1ase", subunits = 1},
  {id = "r2", name = "r2ase", subunits = 1},
  {id = "r3", name = "r3ase", subunits = 1},
]

reaction

A table with the following obligatory fields:

  • id A string identifying the reaction without any underscore characters.

  • name A string describing the reaction

  • mechanism A string specifying the reaction’s mechanism

  • stoichiometry A mapping representing the stoichiometric coefficient for each metabolite_in_compartment that the reaction creates or destroys.

In addition the following optional fields can be specified:

  • water_stoichiometry A float indicating the reaction’s water stoichiometry

  • transported_charge A float indicating the reaction’s transported charge

Valid options for the mechanism field are:

  • reversible_michaelis_menten

  • irreversible_michaelis_menten

  • drain

Each key in the stoichiometry should identify an existing metabolite_in_compartment using a metabolite id and a compartment id, separated by an underscore.

Here is an example of an entry in a reaction table:

[[reaction]]
id = "r1"
name = "Reaction number 1"
mechanism = "reversible_michaelis_menten"
stoichiometry = { M1_e = -1, M1_c = 1}

enzyme_reaction

A table indicating which enzymes catalyse which reactions, with the following fields:

  • enzyme_id The id of an entry in the enzyme table

  • reaction_id The id of an entry in the reaction table

Here is an example enzyme_reaction table:

enzyme_reaction = [

{enzyme_id = “r1”, reaction_id = “r1”}, {enzyme_id = “r2”, reaction_id = “r2”}, {enzyme_id = “r3”, reaction_id = “r3”},

]

allostery

An optional table with the following fields:

  • enzyme_id The id of an entry in the enzyme table

  • metabolite_id The id of an entry in the metabolite table

  • compartment_id The id of an entry in the compartment table

  • modification_type A string specifying the kind of modification

Valid options for the modification_type field are:

  • activation

  • inhibition

Here is an example of an entry in a allostery table:

[[allostery]]
enzyme_id = "r1"
metabolite_id = "M2"
compartment_id = "c"
modification_type = "activation"

competitive_inhibition

An optional table with the following fields:

  • enzyme_id The id of an entry in the enzyme table

  • reaction_id The id of an entry in the reaction table

  • metabolite_id The id of an entry in the metabolite table

  • compartment_id The id of an entry in the compartment table

Here is an example of an entry in a allostery table:

[[competitive_inhibition]]
enzyme_id = "r2"
reaction_id = "r2"
metabolite_id = "M1"
compartment_id = "c"

phosphorylation

An optional table with the following fields:

  • enzyme_id The id of an entry in the enzyme table

  • modification_type A string specifying the kind of modification

Valid options for the modification_type field are:

  • activation

  • inhibition

Here is an example of an entry in a allostery table:

[[phosphorylation]]
enzyme_id = "r1"
modification_type = "activation"

The experimental setup file

This is a file written in toml, giving qualititative information about the input’s experimental setup.

This section describes this file’s fields.

experiment

An obligatory table containing information that is specific to each of the input’s experiments, with the following fields:

  • id A string identifying the experiment, without any underscores

  • is_train A boolean indicating whether to include the experiment in the training dataset

  • is_test A boolean indicating whether to include the experiment in the test dataset

  • temperature A float specifying the experiment’s temperature.

enzyme_knockout

An optional table specifying knockouts of enzymes, with the following fields:

  • experiment_id Id of the knockout’s experiment

  • enzyme_id Id of the enzyme that was knocked out

phosphorylation_knockout

An optional table specifying knockouts of phosphorylation effects, with the following fields:

  • experiment_id Id of the knockout’s experiment

  • enzyme_id Id of the enzyme whose phosphorylation was knocked out

The measurements file

This is a csv file with the following fields:

  • measurement_type A string specifying what kind of thing was measured

  • target_id A string identifying the thing that was measured

  • experiment A string specifying the measurement’s experiment

  • measurement The measured value, as a float

  • error_scale The measurement error, as a float

Valid options for the measurement_type field are:

  • mic Concentration of a metabolite_in_compartment

  • enzyme Concentration of an enzyme

  • flux Flux of a reaction

error_scale is the standard deviation of a normal distribution for flux measurements or the scale parameter of a lognormal distribution for concentration measurements.

The priors file

This is a csv file representing pre-experimental information that can be represented by independent probability distributions.

The priors table has the following fields:

  • parameter String identifying a parameter

  • metabolite String identifier

  • compartment String identifier

  • enzyme String identifier

  • reaction String identifier

  • experiment String identifier

  • modification_type String identifier

  • location Float specifying a location

  • scale Float specifying a scale

  • pct1: First percentile of the prior distribution

  • pct99: 99th percentile of the prior distribution

See the id_components fields in the corresponding code file for which columns need to be specified for each kind of prior.

Prior distributions can either be specified by a location and scale or by a 1st and 99th percentile, but not both.

Multivariate priors for formation energy parameters

The use of a single csv file for priors was motivated by the fact that, for most model parameters, it is safe to model the pre-experimental information as independent. For example, knowing the value of one enzyme’s \(kcat\) parameter does not significantly narrow down another enzyme’s \(kcat\) parameter. Thus in this case, and most others, specifying each parameter’s marginal prior distribution is practically equivalent to specifying the full joint distribution.

However, the available information about formation energy parameters is typically not independent. In this case the available information is mostly derived from measurements of the equilibrium constants of chemical reactions. Knowing the formation energy of one metabolite is often highly informative as to the formation energy of another metabolite which produced or destroyed by the same measured chemical reaction. Metabolites with common chemical groups are also likely to have similar formation energies, introducing further non-independence.

In some cases this dependence is not practically important, and Maud will work well enough with independent priors in a csv file as above. For other cases, Maud allows non-independent prior information to be specified in the form of the mean vector and covariance matrix of a multivariate normal distribution. This information is specified as follows.

First, to indicate where to find the required vector and matrix, the fields dgf_mean_file and dgf_covariance_file should be added to the top level of the file config.toml in the input folder. For example:

name = "methionine_cycle"
kinetic_model = "methionine_cycle.toml"
priors = "priors.csv"
experiments = "experiments.csv"
dgf_mean_file = "dgf_prior_mean.csv"
dgf_covariance_file = "dgf_prior_covariance.csv"

These fields should be paths from the root of the input folder to csv files. The dgf_mean_file should have columns caled metabolite and prior_mean_dgf, with the former consisting of ids that agree with the rest of the input folder (in particular the kinetic model file) and the latter of non-null real numbers. For example

metabolite

prior_mean_dgf

5mthf

-778.2999561

adn

-190.9913035

ahcys

-330.3885785

amet

-347.1029509

atp

-2811.578332

cyst-L

-656.8334114

The dgf_covariance_file should be a valid covariance matrix surrounded by metabolite ids. The first column should be called metabolite and populated with ids that are consistent with the other inputs. Subsequent columns should have names that match the first column. Here is (the start of) an example:

metabolite

5mthf

adn

ahcys

amet

atp

cyst-L

5mthf

457895.226

0.023993053

2.911539829

38.09225442

0.023892737

0.610913519

adn

0.023993053

2.081489779

1.034504533

1.00E-10

0.444288943

0

ahcys

2.911539829

1.034504533

16.2459485

4.297104388

0.341195482

13.08072127

amet

38.09225442

1.00E-10

4.297104388

1000025.576

-1.00E-10

2.066261457

atp

0.023892737

0.444288943

0.341195482

-1.00E-10

2.22005692

0

cyst-L

0.610913519

0

13.08072127

2.066261457

0

16.61784088

The initial parameter values file

Initial parameter values can be entered in a json file. This file should be a valid option for the inits argument of the cmdstan sample method.