## Abstract

To understand how ion channels and other proteins function at the molecular and cellular levels, one must decrypt their kinetic mechanisms. Sophisticated algorithms have been developed that can be used to extract kinetic parameters from a variety of experimental data types. However, formulating models that not only explain new data, but are also consistent with existing knowledge, remains a challenge. Here, we present a two-part study describing a mathematical and computational formalism that can be used to enforce prior knowledge into the model using constraints. In this first part, we focus on constraints that enforce explicit linear relationships involving rate constants or other model parameters. We develop a simple, linear algebra–based transformation that can be applied to enforce many types of model properties and assumptions, such as microscopic reversibility, allosteric gating, and equality and inequality parameter relationships. This transformation converts the set of linearly interdependent model parameters into a reduced set of independent parameters, which can be passed to an automated search engine for model optimization. In the companion article, we introduce a complementary method that can be used to enforce arbitrary parameter relationships and any constraints that quantify the behavior of the model under certain conditions. The procedures described in this study can, in principle, be coupled to any of the existing methods for solving molecular kinetics for ion channels or other proteins. These concepts can be used not only to enforce existing knowledge but also to formulate and test new hypotheses.

## Introduction

Ion channels are highly adapted to perform specific functions in the cell. To give just one example, voltage-gated sodium (Nav) channels have finely tuned kinetic properties that allow neurons and other excitable cells to generate action potentials of specific shape and frequency (Bean, 2007). The properties that enable Nav and other channels to perform such complex and well-calibrated behavior are encoded in the kinetic mechanism, defined as a set of conformational and functional states, interconnected by a network of allowed state transitions that may depend on ligand concentration, membrane potential, or other physical variables (Colquhoun and Hawkes, 1995a,b). To understand how ion channels function, one must decrypt the kinetic mechanism. The same is true for all proteins, from ion channels and receptors to enzymes and molecular motors (Popescu and Auerbach, 2003; Milescu et al., 2006; Müllner et al., 2010; Syed et al., 2010).

A kinetic mechanism can be solved by fitting experimental data with a mathematical model. However, decades of ion channel research have shown that kinetic mechanisms cannot be fully captured by any single type of experiment. Instead, to update or construct a new model, one must fit a comprehensive data collection (Horn and Lange, 1983; Hawkes et al., 1990, 1992; Vandenberg and Bezanilla, 1991; Hoshi et al., 1994; Zagotta et al., 1994a,b; Schoppa and Sigworth, 1998a,b; Rothberg and Magleby, 2000; Milescu et al., 2005), ideally generated by multiple experimental paradigms (Vandenberg and Bezanilla, 1991; Akk et al., 2005; Milescu et al., 2008). For example, we know that Nav and other channels have four voltage sensors that gate with different timing and voltage sensitivity (Bezanilla, 2000; Pantazis et al., 2014). These fundamental aspects cannot be easily resolved by single-channel or whole-cell recordings alone, but they can be addressed in combination with other types of experiments, such as patch-clamp fluorometry (Chanda and Bezanilla, 2002; Zaydman et al., 2013; Pantazis et al., 2014).

Optimizing a model against multiple types of data is difficult in itself. A further complication is that some results—quantitative or qualitative—cannot be added to the data collection that is used for fitting. The number of voltage sensors, the existence of open-state block, and numerical relationships between parameters due to allosterism, etc., are examples of such results. Instead, this prior knowledge about the channel must be encoded directly into the model. In this way, the model will explain the new data but will also remain consistent with what is already known.

How do we introduce prior knowledge into a model? We present here some strategies for addressing this issue. At the most basic level, structural assumptions about the kinetic mechanism can be stated implicitly by choosing a specific set of states and connectivity, as we explain with an example from the literature (Kuo and Bean, 1994). Further, quantitative or qualitative assumptions can be introduced by defining a set of constraints that the model has to satisfy, while also explaining the new data. These model constraints can be formulated as explicit mathematical relationships between rate constants or other model parameters, or they can specify the behavior of the model under certain conditions (Fig. 1).

To implement these ideas, we developed new computational tools with a focus on parameter estimation. First, we build upon an existing method for enforcing linear constraints between rate constants (Qin et al., 1996; Colquhoun et al., 2004; Milescu et al., 2005) and extend it to cover arbitrary linear constraints between model parameters, including allosteric factors. Furthermore, we provide a new formalism for handling both equality and inequality relationships. In the companion article (see Navarro et al. in this issue), we test a method for implementing behavioral constraints, as well as arbitrary parameter relationships, by adding a penalty term to the cost function of the fitting algorithm. The theory and computational procedures described here can be coupled, in principle, to any of the existing methods for solving molecular kinetics, for ion channels or other proteins. These concepts can be used not only to enforce existing knowledge but also to formulate and test new hypotheses.

## Materials and methods

All the mathematical and computational algorithms described in this study were implemented and tested with the freely available MLab edition of the QuB program.

### Theoretical background

#### Kinetic mechanisms

Ion channel kinetic mechanisms are well described by Markov models, which reduce the continuum of molecular conformations that can be assumed by the protein to a small set of discrete states that can be detected experimentally or inferred statistically (Colquhoun and Hawkes, 1995a,b; Colquhoun and Sigworth, 1995). These states correspond to various conformations of functional and structural elements, such as resting or activated voltage sensors, bound or unbound ligands, closed or open pore, and inactivated or noninactivated channel. Direct transitions are permitted between certain states, and the frequency of these transitions is quantified by rate constants, which can be functions of ligand concentration, membrane potential, tension, or other physical variables. The topology (or structure) of a kinetic mechanism is defined by the set of states and their transition connectivity, including information on which rates are ligand dependent, voltage dependent, etc.

Here, we assume that all microscopic rate constants follow the Eyring formalism (Eq. 1; Eyring, 1935), with the implication that complexity in kinetic behavior should be explained with more elaborate state models, rather than through over-parameterized and ad hoc rate constant expressions. Accordingly, voltage-dependent rate constants are simple exponential functions of voltage:*k*_{ij} is the rate constant of the transition from state *i* to state *j*, and *V* is the membrane potential. The *z*_{ij} is the electrical charge moving over the fraction δ_{ij} of the electric field, *F* is Faraday’s constant, *R* is the gas constant, and *T* is the absolute temperature. For voltage-insensitive rates, *L*]. When a model lumps several states together, some rates become macroscopic and contain a statistical factor in their expression (e.g., the transition between C_{1} and C_{2} in the model shown in Fig. 2 B).

The set of

#### Formulating the topology of the model

The first step in building a kinetic model is to identify a particular topology that defines the structural and functional elements of the channel and their connecting pathways. The topology can be used to specify the number of voltage sensors, the identity of voltage-sensitive transitions, the number of inactivated states, the presence of multiple open states, the existence of allosteric relationships, etc. The model shown in Fig. 2 B illustrates how a topology can be formulated to capture the key features of a Nav channel kinetic mechanism, as detailed in the Results. This model was based on a large body of knowledge accumulated in the field and, not surprisingly, has provided a flexible enough framework that explained voltage-clamp recordings of sodium currents in several neuronal preparations (Kuo and Bean, 1994; Raman and Bean, 2001; Taddese and Bean, 2002; Milescu et al., 2010).

In contrast, when little is known about the channel, one must take a purely data-driven approach and build a parsimonious topology that explains the data reasonably well. Of course, if one wishes a realistic model, the Eyring theory and related concepts must still be obeyed. Some kinetic properties are intuitive enough and can be easily translated into model features (Salari et al., 2016). For example, whole-cell recordings in which the current first rises and then decays require a model with one conducting and at least two nonconducting states. In general, searching for the right topology can be approached as an iterative process, where one tests a series of models of increasing complexity, adding more and more states and connections. For each tested model, one must determine whether the topology is compatible with the data. If no parameter values can be found that result in a good fit, the topology must be reformulated and the parameters reestimated. Because larger models can inherently fit better, one should take into account the number of degrees of freedom when ranking models. Thus, unless a larger model improves the fit substantially, one should give preference to a model with fewer free parameters. The search across topologies can be terminated when the fit no longer improves.

This model search process is not trivial and relies heavily on the experience of the investigator. The number of nonequivalent (Kienker, 1989) connectivity schemes can be prohibitively large, even for models with a relatively small state count (Bruno et al., 2005). A possible solution is to use a smart optimization algorithm that not only estimates parameters for a given topology but also searches efficiently across topologies at the same time (Gurkiewicz and Korngreen, 2007; Menon et al., 2009). Furthermore, one may be able to reduce the searched state space by using some information contained in the data. For example, statistical analysis of single-channel electrical recordings can provide reasonable estimates on the number of conductance levels (through visual inspection or amplitude histogram analysis) and the minimum number of kinetic states in each conductance level (through dwell-time histogram analysis; Colquhoun and Hawkes, 1982; Hawkes et al., 1990). Other methods can provide more direct evidence about the structural conformations and transition pathways of the channel, such as the number of voltage sensors, the number of inactivated states, or the identity of voltage- or ligand-dependent transitions (Grosman et al., 2000; Ahern et al., 2016). In principle, an automated search across model topologies can incorporate this information.

#### Parameter estimation

A computational procedure for finding the “best” parameters for a proposed model topology combines an algorithm that measures how well a given model explains the data with an optimization engine that searches the parameter space for the “best” solution (Fletcher, 2013). This optimal solution minimizes the error between the data and the prediction of the model (e.g., the sum of square errors) or maximizes a probability function (e.g., the likelihood that the experimental data were generated by the model or the Bayesian posterior probability; Horn and Lange, 1983; Hawkes et al., 1990; Qin et al., 1996, 2000a; Celentano and Hawkes, 2004; Milescu et al., 2005; Csanády, 2006; Moffatt, 2007; Calderhead et al., 2013; Stepanyuk et al., 2014; Epstein et al., 2016). Intuitively, the first approach can be described as minimizing a “cost function,” whereas the second, as maximizing a “goodness of fit.” Throughout this study, we will use the "cost function" term, but with the understanding that it could refer to either minimizing the sum of square errors or, equivalently, minimizing the negative log-likelihood. When one also searches for an appropriate topology, the value of the cost function can be used as a score to rank the model. Even though the kinetic mechanism is fully characterized by the

## Results

Once a model topology is selected to appropriately express what is known or hypothesized about the channel, thus encoding prior knowledge, the next step is to find a set of parameters that explain the data well. However, these parameters can also contain prior knowledge. In fact, the parameter-estimation procedure itself can be designed to enforce prior knowledge, by making it generate parameter values that are in agreement with a set of model constraints. We classify these model constraints in two categories: (1) parameter constraints, discussed in this study, and (2) behavioral constraints, discussed in the companion article (Navarro et al., 2018). A parameter constraint is formulated as an explicit mathematical relationship that involves rate constants or other model parameters. An example is the scaling of one rate constant to another or restricting the range of a parameter to positive values. In contrast, a behavioral constraint quantifies the behavior of the model under certain conditions, without explicitly referring to rate constants or other model parameters. An example is enforcing the maximum open probability (*P*_{O}) reached by the channel during a specific voltage-clamp stimulation protocol.

The mathematical and computational procedures discussed here for enforcing parameter constraints are limited to linear relationships. However, nonlinear relationships can be enforced using the mechanism developed for behavioral constraints, presented in the second part of this study (Navarro et al., 2018). As illustrated in Fig. 1, linear parameter constraints that enforce an equality relationship reduce the dimensionality of the parameter space, eliminating one dimension for each relationship. In contrast, both inequality parameter constraints and behavioral constraints preserve dimensionality but reduce the size of the parameter space. To describe it intuitively, inequality parameter constraints present the optimizer with a reduced road map, whereas behavioral constraints guide the optimizer through toll-free roads only.

### Implementing prior knowledge with linear parameter constraints

In this section, we discuss the implementation of prior knowledge via linear parameter constraints. To illustrate this concept, we use the Nav channel kinetic mechanism shown in Fig. 2 A. We chose this model because it covers many of the parameter constraints that our formalism can enforce. The model was originally formulated (Kuo and Bean, 1994) with several mechanistic assumptions in mind, which are reflected in the number of states and connections, and in the mathematical relationships between various kinetic parameters (Fig. 2 B). These assumptions can be regarded and expressed as parameter constraints.

#### Model assumptions

The first assumption is that channel activation involves four identical and independent voltage sensors, and all four must be activated to open the pore. Thus, to simplify the kinetic mechanism, all closed states with the same number *n* of resting sensors are lumped into a single compound state. The result is the five-state activation pathway C_{1} … C_{5}. The frequency of activation transitions for any of the compound states C_{1} … C_{5} is equal to *n* times the frequency of the activation transition for a single sensor, where *n* is a statistical factor. The same rule applies to deactivation transitions. For example, when the channel resides in a closed state that has three resting voltage sensors (C_{2}, *n* = 3), the compound activation rate (*k*_{2,3}) is three times the activation rate of a single sensor (*k*_{4,5} or α_{m}). Thus, if *k*_{4,5} and *k*_{2,1} are the microscopic transition rates of a single sensor activating or deactivating, respectively, the assumption of identical and independent voltage sensors is expressed by the following mathematical relationships, where one rate is scaled to another by a constant factor:

Another assumption is that the channel can inactivate not only from the open state O_{6} but also from any of the C_{1} … C_{5} closed states into the I_{7} … I_{12} inactivated states. However, the transition into inactivation from the closed states depends on the degree of activation: as more voltage sensors are activated, the C to I transitions become faster, whereas the I to C transitions become slower. As envisioned in the original model, this property is implemented with the allosteric factors *a* and *b*. Thus, the rate of inactivation from a closed state C_{n} is equal to the rate of inactivation from the previous closed state C_{n−1}, multiplied by the *a* factor. For example, *k*_{2,8} = *a* × *k*_{1,7}, *k*_{3,9} = *a* × *k*_{2,8}, etc. The opposite is true for the return rates: *k*_{8,2} = *b*^{−1} × *k*_{7,1}, *k*_{9,3} = *b*^{−1} × *k*_{8,2}, etc. Taking *k*_{1,7} and *k*_{7,1} as the reference microscopic rates, this assumption is expressed by the following mathematical relationships, where one rate is scaled to another by an undetermined factor:_{7} … I_{11} pathway, but with different kinetics. This is also encoded by the allosteric factors *a* and *b*, resulting in another set of similar mathematical relationships:*a* and *b* quantities in Eqs. 5 and 6, are unknown and need to be determined from the data.

Finally, the last assumption is that the channel satisfies the condition of microscopic reversibility, i.e., no energy input is required for gating and opening. Under this condition, for any reaction loop in the model, the clockwise product of rates around the loop must equal the counterclockwise product (Song and Magleby, 1994; Rothberg and Magleby, 2001; Colquhoun et al., 2004). As the model in Fig. 2 A has five independent loops, the following mathematical relationships must hold true:

#### Voltage- and ligand-dependent rate constants

Some of the mathematical relationships used to express parameter constraints may involve rate constants that are functions of membrane potential. Unless otherwise specified, all these mathematical relationships must be true for any membrane potential value. For example, the scaling relationship *V* = 0, in which case it simplifies to*k*_{ij} is scaled by a constant factor *c* to another ligand-dependent rate constant *k*_{kl} can be expanded as follows:*L*]) terms is equal to zero. Thus, a ligand-dependent rate cannot be scaled to a ligand-independent rate, except for a single concentration value. In the case of microscopic reversibility, this condition requires that the clockwise and counterclockwise transitions around the loop involve the same number of ligand-dependent steps. When the channel binds multiple types of ligands, each type must satisfy these conditions. Models formulated without taking these precautions are, in principle, physically unrealistic.

#### Allosteric, statistical, and other multiplicative factors

Multiplicative factors can be introduced in the rate constant equation to formulate macroscopic rates and to express a variety of parameter constraints. One obvious application is to implement allosteric model behavior, as previously discussed, where the *a* and *b* factors multiply the rate constant pre-exponential term *a*_{k} is a multiplicative factor that stands for δ_{ij} × *z*_{ij}, and *C*_{k} is a numerical constant equal to *F*/*RT*, as in Eq. 2. This approach would make it possible to mix data collected at different temperatures, in the same way as we can already account for different voltages or ligand concentrations.

As another example, one may want to enforce a relationship between the δ_{ij} and δ_{ji} values for a given transition, such as δ_{ij} = 1 − δ_{ji}. In this case, assuming that *z*_{ij} and *z*_{ji} are known quantities, we would write the following constraint expressions:*a*_{k1} and *a*_{k2} are multiplicative factors that stand for δ_{ij} and δ_{ji}, respectively, and *C*_{k} is a numerical constant equal to z*F*/*RT*, where *z* = *z*_{ij} = *z*_{ji}.

As the multiplicative factors are logarithmically transformed, they are subject to some restrictions. Thus, a pre-exponential parameter *C* is a positive numerical constant. Taking the logarithm from *C* is an arbitrary constant. As explained further, a given multiplicative factor can only be used as a pre-exponential-type factor, as in Eq. 19, or as an exponential-type factor, as in Eq. 20, but not as both simultaneously.

#### Inequality constraints

So far, we have only considered parameter constraints that are formulated as mathematical equalities. However, prior knowledge may also be expressed through inequality parameter constraints. First, there is a physical requirement that all pre-exponential rate parameters *a* and *b* allosteric factors in Eqs. 5 and 6, must also be restricted to positive values in order to keep rates positive. Both of these constraints are automatically handled by the logarithm transformation of variable, as explained further down. In contrast, the exponential factors

To give another hypothetical example, we may want the ratio between two rate constants at a certain membrane potential *V*_{0} to be smaller than a numerical constant *c*:*z*^{2}, a quantity that, by definition, is positive:*z* = 0, we can find a value for *z* that converts the inequality into an equality:*V*, not just at *V*_{0}, then we have*V* = 0, and thus, we obtain two simultaneous relationships:*z*^{2} to the right side, rather than subtract it.

In the jargon of optimization theory, *z* is called a “slack” variable (Fletcher, 2013). With equality constraints, one has to find a set of model parameters that satisfy a set of relationships. When inequalities are added to the model and transformed into equalities using slack variables, one has to find both a set of model parameters and a set of slack variables that together satisfy the constraint relationships. A slack variable is not a true parameter of the model, but merely a variable that is temporarily used to handle inequality constraints during the search for optimum parameters. Very importantly, the quantity added or subtracted via slack variables must take positive values, which is why we use *z*^{2} and not *z*. The reason for converting inequality relationships to equalities is to have all linear constraints handled by the same linear algebra mathematical formalism, as explained further.

#### Model parameters

As discussed in the previous section, the core parameters of a kinetic model are *a*_{k} that describe allosteric coupling or other properties (e.g., the *a* and *b* allosteric factors in the model shown in Fig. 2 B) or help parameterizing the rate constants in more detail. However, other parameters *q*_{l} may also be added to the modeling framework, depending on the particular application. These external parameters are not necessarily present in any of the rate constant expressions. Instead, they may describe the data or experimental variables. For example, when fitting macroscopic currents, one may also need to estimate the number of channels in the record or the single channel conductance (Milescu et al., 2005). Thus, we define a set **K**, of size *N*_{K}, which contains all of these model parameters:*a*_{k}), and “external parameters” (*q*_{l}), may be involved in the mathematical relationships that express parameter constraints, as discussed further.

#### A general equation for linear parameter constraints

All the mathematical relationships that were used to implement the assumptions made for the Nav model in Fig. 2 A have something in common: regardless of type (scaling, microscopic reversibility, etc.), each of these equality and inequality parameter constraints result in one or two equations involving *a*_{k}, and *z*^{2}, each multiplied by a constant. Although not shown in these examples, those relationships can also contain external parameters *q*_{l}. Thus, a general form that covers all these examples can be written as follows:*f*_{k} is an invertible function of the multiplicative factor *a*_{k} (e.g., *f*_{l} is an invertible function of the external model parameter *q*_{l}. The *C*_{k}, *C*_{l}, *C*, and *C*_{z} quantities are numerical constants, with *C*_{z} = 1 for a “≥” inequality, and *C*_{z} = −1 for a “≤” inequality.

Specific parameter constraints (e.g., scaling one rate constant to another) can be obtained from the general equation by selecting a subset of *f*_{k}(*a*_{k}), and *f*_{l}(*q*_{l}) via nonzero multiplication constants *C*_{k}, or *C*_{l}. As discussed throughout the article, a variety of useful constraints can be implemented using this mechanism, such as making a rate equal to a constant, scaling two rates by a constant factor, scaling two rates by a variable factor, constraining the total charge for a set of transitions, enforcing microscopic reversibility, constraining a reaction loop out of microscopic balance, restricting a model parameter to a range, expressing explicit temperature dependence, etc. Some of these constraints will require a single mathematical relationship, whereas others will require two. We note that using multiplicative factors in constraints generally makes sense when the same factor is used in multiple relationships. Otherwise, these factors can be simply calculated after the parameters are estimated.

#### Converting between model parameters and free parameters

One can easily verify that the model in Fig. 2 B was parameterized in such a way as to implicitly satisfy most assumptions: identical and independent voltage sensors, allosteric coupling of inactivation to activation, and microscopic reversibility. For example, the condition of identical and independent voltage sensors is enforced by the 4, 3, 2, 1 or 1, 2, 3, 4 statistical factors multiplying the α_{m} or β_{m} quantities, respectively. In other words, any values can be assigned to the model parameters, *a*, *b*, etc., and the assumptions will be automatically satisfied. There is only one exception: the C_{5}–O_{6}–I_{12}–I_{11}–C_{5} reaction loop is not implicitly balanced. In this case, the balance equation (i.e., α_{mo} × β_{ho} × β_{mh} × *b*^{−4} × α_{h} = β_{mo} × *a*^{4} × β_{h} × α_{mh} × α_{ho}) is not true by definition. Instead, it must be enforced by choosing an appropriate set of numerical values for all the parameters involved. In contrast, all the other loops are automatically balanced (e.g., 4α_{m} × β_{h} × *a* × β_{m} × *b*^{−1} × α_{h} = β_{m} × β_{h} × 4α_{m} × *a* × α_{h} × *b*^{−1}).

To formulate a parameterization that implicitly satisfies all parameter constraints, a commonly used strategy is to identify a subset of independent model parameters that can be estimated by the optimization engine and a subset of dependent parameters that can be simply derived from the independent ones. This is exactly how the model in Fig. 2 B was formulated. However, finding this parameterization is not trivial in some cases (Colquhoun et al., 2004). Moreover, it is not clear to us how constraints that are defined by inequality relationships would be handled by this type of parameterization. A potentially easier and certainly more flexible strategy is to define the constraints as an invertible transformation *f*_{c} between the set of interdependent model parameters **K** and a set of independent or “free” parameters **X**:**K** parameters, which are interdependent and thus cannot take arbitrary values but only those values that satisfy the user-defined parameter constraints. In contrast, the **X** parameters are independent of each other and are “free” to take any value in the –∞ to +∞ range. We emphasize that the **X** parameters are not simply a subset of **K**, as explained in the following paragraphs. These free parameters are passed to a model-blind optimizer that can search without any constraint in the parameter space defined by **X**, where it finds a solution that best explains the data. This optimal solution can be translated from the free parameter space back into the model parameter space, via the

If we want to implement the linear parameter constraints defined by Eq. 28 or 29, how do we define the *f*_{C} and **K** into the free parameters **X** and vice versa? In preparation for this, we need to recognize that the left side of the generalized Eq. 28 or 29 is nonlinear with respect to *a*_{k}, and perhaps to some external parameter *q*_{l}. However, we can make the following invertible transformations of variable:*a*_{k} is an allosteric factor or a similar quantity that multiplies a rate constant *k*_{ij}, then *f*_{k}(*a*_{k}) is ln(*a*_{k}), which is invertible. If *a*_{k} is a factor that multiplies a voltage-sensitivity parameter *f*_{k}(*a*_{k}) is *a*_{k}, which is obviously invertible as well. Similar logic applies to the external parameters *q*_{l}. For example, if *q*_{l} refers to the number of channels, we can also use the logarithm transformation (Milescu et al., 2005). In all cases, the logarithm has two desirable effects: it restricts the variables to positive values, and it scales the parameters to more similar values relative to each other, helping the optimization engine to find the solution.

We can rewrite the generalized Eqs. 28 and 29 with these transformations of variable:_{k}, and φ_{l}. Next, we define a vector **R**, of dimension *N*_{R}, with elements that correspond to _{k}, and φ_{l}. For a more intuitive notation, we refer to an element of **R** as *r*_{i}, when its type is unspecified, or as *r*_{k}, or *r*_{l}, when we emphasize its identity as a specific type of model parameter (*a*_{k}, or *q*_{l}, respectively). **R** has the same size as **K** (*N*_{R} = *N*_{K}). Thus, a parameter constraint is expressed as a linear relationship between the elements *r*_{i} of **R**, as follows:*C*_{i} stands for one of the numerical constants *C*_{k}, or *C*_{l}, respectively. Then, assuming that we have *N*_{C} constraint relationships, we can write the generalized constraint from Eqs. 34 or 35 in a more compact matrix form (Qin et al., 1996; Fletcher, 2013), as follows:**M** is a matrix of dimension *N*_{C} × *N*_{R}, and **V** is a vector of dimension *N*_{C}. Each row of **M** corresponds to the numerical constants on the left side of the generalized Eqs. 34 or 35, whereas each element of **V** represents the right side of Eqs. 34 or 35:_{k}, and φ_{l} variables is encoded by row *c* of matrix **M** and element *c* of vector **V**. Eq. 36 encapsulates, in matrix form, all the linear parameter constraints imposed on the model, including both equality and inequality relationships.

Although they linearize the constraint relationships, the transformed model parameters **R** are still interdependent through the constraint relationships. How do we remove the interdependence and convert **R** into the set of free parameters **X**? Ignoring, for now, that **V** is not a constant vector because it depends (nonlinearly) on the optional slack variables *z*, we can take advantage of the linear form of the matrix equation **M** × **R** = **V** and express **R** as a linear function of **X** (Qin et al., 1996) and vice versa:**X**, of dimension *N*_{X} = *N*_{R} − *N*_{C}, contains the independent parameters. Note that Eq. 39 is obtained from **X** = **A**^{−1} × (**R** − **B**), because **A ^{−}**

^{1}multiplied by

**B**is equal to a zero vector. The matrix

**A**, of dimension

*N*

_{R}×

*N*

_{X}, and the vector

**B**, of dimension

*N*

_{R}, can be determined from

**M**and

**V**using the singular value decomposition (Golub and Reinsch, 1970). First,

**M**is decomposed as follows:

**U**

_{M}is an orthogonal matrix of dimension

*N*

_{C}×

*N*

_{C},

**V**

_{M}is an orthogonal matrix of dimension

*N*

_{R}×

*N*

_{R}, and

**S**

_{M}is a diagonal matrix of dimension

*N*

_{C}×

*N*

_{R}that contains the singular values of matrix

**M**. Then,

**A**can be extracted as a submatrix of

**V**

_{M}:

**A**matrix,

**A**

^{−}^{1}, is similarly obtained from

**V**

_{M}is orthogonal,

**V**

_{M}transposed (

**B**can be calculated as follows:

**M**

^{+}, of dimension

*N*

_{R}×

*N*

_{C}, is the pseudoinverse of

**M**and can be calculated as follows:

**S**

_{M}by replacing all nonzero diagonal elements (the singular values) with their inverse. With the

**A**and

**B**matrices obtained as in Eqs. 41 and 42, we can now calculate

**R**from

**K**, and then

**X**from

**R**, and vice versa.

How do we deal with inequality constraints and slack variables? We found a solution that is actually quite simple, although, perhaps, not immediately obvious. First, we define a vector **Z**, of dimension *N*_{Z} equal to the number of inequality constraints. This vector contains all the slack variables, one for each inequality constraint. Then, we define another vector **X** and **Z**:*N*_{X} + *N*_{Z}. The slack variables **Z** are arbitrary and thus independent of each other and of the free parameters **X**. Hence, the elements of **Z** is used to recalculate **V** (Eq. 37), which, in turn, is used to recalculate **B** (Eq. 42). The **A** matrix remains the same because the coefficient matrix **M** contains only constants. Thus, we can calculate the transformed model parameters **R** as follows:**B**_{z} and **V**_{z} are the **B** and **V** quantities calculated for a given vector **Z**.

During optimization, the slack variables in **Z** are provided by the optimizer, together with **X**. However, **Z** must be initialized at the beginning of the optimization from a given set of transformed model parameters **R** and the appropriate relationships in Eq. 37. Let *z*_{c} be the slack variable introduced by the inequality relationship defined by row *c* of the constraint matrix **M**. Then, *z*_{c} can be calculated with the following equation:**M**_{c} is a vector corresponding to row *c* of **M**. This equation has the obvious solution:**K** and the free parameters

#### Redundant constraints

One should take care to prevent redundancy and use only the minimum number of mathematical relationships that are necessary to implement the assumptions of the model. Intuitively, a constraint relationship is redundant if its intended consequence is already enforced by other relationships. With our example model, one could use either the scaling *k*_{7,8} = *a* × *k*_{1,2}, or the scaling *k*_{7,8} = 4 × *a* × *k*_{4,5}, but not both because that would create an additional relationship between *k*_{1,2} and *k*_{4,5}. Similarly, the condition in which the algebraic sum of the voltage sensitivities around a reaction loop is equal to zero may already be enforced by some rate-scaling relationships and should not be duplicated. When in doubt, one could check the rank of the **M** matrix, which will be reduced by redundant constraints.

Redundant constraints may also arise from inequalities. Although tempting, using inequality relationships to enforce a range constraint on a model parameter is not possible, unfortunately, because it would result in two equality relationships that are redundant. However, this limitation can be overcome through the same mechanism that handles behavioral constraints (Navarro et al., 2018). Furthermore, although inequality constraints add slack variables to the overall set of free parameters, the total number of equality and inequality constraints must still be strictly smaller than the number of model parameters **K**.

#### Calculating the cost function and its gradients

In a typical scenario, the cost function *F* is an explicit function of the rate constants *k*_{ij} and of some external model parameters *q*_{l}:*F* would not depend explicitly on the multiplicative factors *a*_{k}, which are generally used to establish relationships between other parameters. Because the model parameters **K** can be obtained from the free parameters *F* can also be written as a function of the free parameters **K**^{*}, it actually searches for a solution in the space defined by the free parameters *F* to be calculated for each proposed *F* depends on the type of application; it could be a sum of square errors, a likelihood, or a Bayesian posterior probability or it could be a mixture of these expressions, when multiple types of data are bundled together (e.g., single-channel and whole-cell traces).

The optimizer may also require the gradients of *F* with respect to the free parameters *F* with respect to a free parameter *k*_{ij} are functions of *q*_{l} is a function of φ_{l}. These _{l} quantities are entries in the **R** vector and, thus, are explicit functions of a free parameter *r*_{m}, *r*_{n}, and *r*_{p} are the elements of the **R** vector that correspond to _{l}, respectively. The ∂*F*/∂*k*_{ij} and ∂*F*/∂*q*_{l} quantities depend on the specific application, e.g., the maximum interval likelihood (Qin et al., 1996) or the maximum point likelihood of single-channel data (Qin et al., 2000a) or the maximum likelihood of macroscopic currents (Milescu et al., 2005). The other partial derivatives can be calculated as follows:*q*_{l}/∂*r*_{p} partial derivative depends on the specific transformation between *q*_{l} and φ_{l}, as illustrated in Eq. 53 for the logarithm and identity transformations:*r*_{i} with respect to any *a*_{i,k} and **A** and **M**^{+} matrices, respectively. In the last two expressions of Eq. 54, the *c* subscript is the index of the inequality constraint relationship that uses **V**; Eq. 45).

Using all these quantities, the overall analytical derivative of *F* with respect to *m*, *n*, and *p* used for the *a* and *m*^{+} quantities are equal to the indices in the **R** vector that correspond to _{l}, respectively.

#### Calculating the error of the estimates

When estimating the parameters of a model, it is important to have a measure of confidence in those estimates. The variance of a free parameter estimate measures the curvature of the cost function with respect to that parameter. Intuitively, the variance tells us how much the calculated prediction of the model will change when the value of a free parameter

One could calculate the variance of a free parameter estimate, **K**, and some transformation must be applied to the variance. Thus, the variance of a model parameter, *Var*(*k*_{i}), can be calculated using the following approximation (Qin et al., 2000a; Milescu et al., 2005):*a*_{k}, and *q*_{l}), we use the chain differentiation rule. For rate constant parameters *a*_{k}, we have the following expressions:*q*_{l}, the expression depends on the transformation function. For the logarithm and the identity function, we have the following expressions:*r* (*r*_{k}, and *r*_{l}) with respect to **X** or a slack variable in **Z**.

## Discussion

Enforcing prior knowledge when fitting new data is not trivial, and one reason is that prior knowledge may take different forms. For example, it can be a linear mathematical relationship between two sequential, ligand-binding transitions or it can describe the dynamics of the channel during complex episodes of action-potential firing. The first example could be easily handled through model parameterization: the independent parameters are identified by the user and passed on to a search engine, whereas the remaining (dependent) parameters are simply derived from the first set, whenever necessary. However, a more elegant and flexible solution, in our opinion, is the method of reduction (Fletcher, 2013), first applied to kinetic modeling algorithms some 20 years ago (Qin et al., 1996, 2000a; Milescu et al., 2005). However, even the reduction method, despite its reach, is not the universal solution to enforcing prior knowledge. Although very powerful, this method is limited to constraints that can be formulated as explicit linear equality relationships between model parameters. Thus, it cannot handle inequalities, nonlinear relationships, and implicit constraints that describe a model property or behavior, such as the maximum open state occupancy during an action potential.

In this two-part study, we proposed a comprehensive set of mathematical and computational tools that address all these limitations and greatly expand the range of prior knowledge that can be enforced. First, as described in this article, we enhanced the reduction method to handle both equality and inequality linear parameter constraints. Furthermore, we expanded the range of parameters that can be constrained to include not only rate constant parameters but also allosteric and other similar factors and external parameters that describe the data or the experiment. Any relationship between these parameters can now be enforced, as long as it is linear. Second, as described in the companion article (Navarro et al., 2018), any other types of model constraints, such as range constraints, nonlinear parameter relationships, or model properties and behavior, are handled by applying a penalty to the cost function. Together, the reduction method and the penalty method can handle virtually any type of model constraint that is likely to be encountered in the field.

The constraining methods described here and in the companion article are available through the freely available QuB software, as maintained by our laboratory. These methods are also easy to implement by interested readers. The only high-level mathematical operation involved is the singular value decomposition, which is readily available from many free, linear algebra packages. As illustrated in Fig. 3, the code can be implemented as a pair of functions: one that converts a set of interdependent model parameters into a set of free parameters, and a second function that performs the reverse operation. The first function is called only once, when the optimization is started, to initialize the free parameters from the model parameters. Any optimization package has one user-customizable callback function that is called each time the search engine needs the cost function for a given set of parameters. The function that converts free parameters into model parameters can be inserted at the beginning of this callback function. For interested users, a step-by-step numerical example is given in the companion article.

## Acknowledgments

We thank the members of the Milescu laboratories for their constructive comments and suggestions. This work was supported by American Heart Association grants 13SDG16990083 to L.S. Milescu and 13SDG14570024 to M. Milescu and a training grant fellowship from the Graduate Assistance in Areas of National Need Initiative/Department of Education to M.A. Navarro.

The authors declare no competing financial interests.

Author contributions: L.S. Milescu developed the mathematical and computational algorithms and implemented them in software. All authors contributed to designing and testing the algorithms and software and to writing the manuscript.

Richard W. Aldrich served as editor.

- Submitted: 28 September 2017
- Accepted: 6 December 2017

This article is distributed under the terms of an Attribution–Noncommercial–Share Alike–No Mirror Sites license for the first six months after the publication date (see http://www.rupress.org/terms/). After six months it is available under a Creative Commons License (Attribution–Noncommercial–Share Alike 4.0 International license, as described at https://creativecommons.org/licenses/by-nc-sa/4.0/).