## Abstract

Kinetic mechanisms predict how ion channels and other proteins function at the molecular and cellular levels. Ideally, a kinetic model should explain new data but also be consistent with existing knowledge. In this two-part study, we present a mathematical and computational formalism that can be used to enforce prior knowledge into kinetic models using constraints. Here, we focus on constraints that quantify the behavior of the model under certain conditions, and on constraints that enforce arbitrary parameter relationships. The penalty-based optimization mechanism described here can be used to enforce virtually any model property or behavior, including those that cannot be easily expressed through mathematical relationships. Examples include maximum open probability, use-dependent availability, and nonlinear parameter relationships. We use a simple kinetic mechanism to test multiple sets of constraints that implement linear parameter relationships and arbitrary model properties and behaviors, and we provide numerical examples. This work complements and extends the companion article, where we show how to enforce explicit linear parameter relationships. By incorporating more knowledge into the parameter estimation procedure, it is possible to obtain more realistic and robust models with greater predictive power.

## Introduction

To understand how ion channels and other proteins function at the molecular and cellular levels, one must decrypt their kinetic mechanism, defined as a set of interconvertible structural conformations, with transitions quantified by rate constants that depend on external variables (e.g., membrane potential, ligand concentration, etc.). Modeling molecular kinetics is not trivial, but sophisticated algorithms have been developed that can extract the rate constants for a given model from a variety of experimental data types, such as single-channel or whole-cell voltage-clamp currents (Colquhoun and Hawkes, 1982; Colquhoun and Sigworth, 1995; Qin et al., 1996, 2000; Venkataramanan and Sigworth, 2002; Colquhoun et al., 2003; Milescu et al., 2005; Csanády, 2006; Stepanyuk et al., 2011, 2014), single-molecule fluorescence (Weiss, 2000; Milescu et al., 2006a,b; Liu et al., 2010), or even current-clamp recordings (Milescu et al., 2008). Automated algorithms that can identify the model itself have also been attempted (Gurkiewicz and Korngreen, 2007; Menon et al., 2009). This abundance of data and analysis algorithms is great, but it raises an important issue: how do we make sure that a model is consistent with all these data, new and old? In other words, how do we extract a model that explains new experimental data but also satisfies existing knowledge?

In the first part of this study (see Salari et al. in this issue), we discussed the general principles of enforcing prior knowledge using model constraints. We identified two main types: parameter constraints and behavioral constraints. Parameter constraints represent explicit mathematical relationships between model parameters, which include the pre-exponential and exponential rate constant factors, allosteric and other multiplicative factors, and any external variables that describe the experimental data and the recording conditions. In part one, we presented a unified mechanism that handles both equality and inequality linear parameter constraints, using relatively simple linear algebra methods that convert the interdependent parameters of the model into a reduced set of independent (“free”) parameters that can be passed to the optimization engine. Linear relationships can implement a surprisingly broad range of constraints, particularly after some model parameters are first transformed by the logarithm function. For example, one can scale one rate to another, parameterize allosteric relationships, enforce microscopic reversibility, restrict a parameter to a range of values, etc. Nevertheless, linear parameter constraints are not a universal solution.

We present here a complementary modality for enforcing prior knowledge, which can be used to enforce any type of model behavior, as well as arbitrary parameter relationships. For example, one can fit stationary single-channel data using a maximum likelihood method but simultaneously use constraints to enforce voltage dependence or other nonstationary behavior, as obtained from other types of data or from the literature. The basic idea is to calculate for each applied behavioral constraint a penalty that represents the degree by which the model deviates from that constraint. The penalty is then added to the cost function that measures the error between the data and the prediction of the model. As the optimizer minimizes the cost function in search for an optimal solution, it will generate a model that will not only fit the data but will also satisfy all the prior knowledge.

We illustrate here the two types of model constraints (linear parameter constraints and behavioral constraints) and test their respective computational procedures with a simple ion channel modeling example. First, we simulate stochastic macroscopic data in response to a typical voltage-clamp protocol. Then, we fit these data while enforcing different combinations of model constraints. The calculations are explained step by step, and detailed numerical examples are given. All computational procedures were implemented in our freely available software (Milescu, 2015).

## Materials and methods

All the mathematical and computational algorithms described in this study, as well as the simulation, data processing, and model optimization, were implemented, tested, and performed with the freely available MLab edition of the QuB program, running under the Microsoft Windows operating system.

### Model parameters

To simulate the test data, the model shown in Fig. 2 A was tweaked by hand to generate macroscopic currents resembling voltage-dependent sodium currents (Fig. 3). The simulated data were fitted in multiple runs, with different sets of constraints applied to the model (Fig. 2 B). The model parameter values (true, initial, and estimated) are listed in Table 1.

### Stochastic simulations

Ion channel macroscopic traces were simulated stochastically under the voltage-clamp paradigm, using established procedures (Milescu et al., 2005). To approximate the properties of sodium currents, the single-channel conductance was 10 pS and the reversal potential was +60 mV. Random Gaussian noise with zero average and 5-pA standard deviation was added to each trace to approximate whole-cell recording noise. To generate the activation/inactivation time course (Fig. 3 A) and activation/availability steady-state curves (Fig. 3 B), we used a typical voltage-clamp protocol: the channels were first equilibrated at −120 mV and then subjected to a 200-ms conditioning step at potentials ranging from −120 mV to +40 mV, followed by a 50-ms test step at 0 mV. The peak current from each conditioning step was extracted and converted to conductance (assuming a linear relationship), and the obtained values were used to construct the activation curve. Similarly, the peak current from the test step was extracted and used to construct the availability curve. Together, the currents evoked during the first 5 ms of the conditioning step in the −50 mV to +40 mV range (Fig. 3 A) and the activation and availability curves (Fig. 3 B) were used for model optimization.

### Model optimization

The algorithms were tested by fitting the data shown in Fig. 3. Optimization trials were run on a dual eight-core 3.3 GHz Intel Xeon processor computer, running Windows 7 (64 bit). Each optimization run took less than 20 min to complete. The model was optimized by minimizing the cost function with a modified version of the Davidon–Fletcher–Powell optimizer (dfpmin; Fletcher and Powell, 1963; Press et al., 1992). For efficiency, the cost function was coded for parallel computation. The cost function was calculated as the sum of square errors between the data and the prediction of the model, normalized to the total number of points, plus a penalty term for those optimization trials involving a behavioral constraint, as detailed in Results. The gradients of the cost function with respect to the free parameters were calculated numerically. The prediction of the model for a given set of parameters was obtained by simulating the deterministic response of the model to the same stimulation protocols as used for simulation. Then, the resulting traces were processed to extract the time course and the activation and availability curves, following the same procedure as for the simulated test data (Milescu et al., 2010; Salari et al., 2016).

## Results

### Implementing prior knowledge with model constraints

A theoretical background was given in part one of this study (Salari et al., 2018), where we briefly introduced a few prerequisite topics (kinetic mechanisms, model topology, and parameter estimation). Then, we presented a mathematical formalism and computational procedure for enforcing linear parameter constraints. Here, in part two, we continue with a mechanism for enforcing model behavior and arbitrary parameter relationships. Then, we give a step-by-step numerical example that illustrates the implementation of all types of constraints. Because we will make multiple references to the first part, the equations introduced here are numbered in continuation of those in part one. The mathematical symbols that are not explicitly defined here have been introduced in part one.

#### Behavioral constraints and arbitrary parameter relationships

A good amount of prior knowledge about the channel can be expressed as linear relationships between model parameters, resulting in constraints that can be handled with relatively straightforward linear algebra methods. However, some channel behaviors and properties cannot be easily formulated as explicit functions of model parameters or they need nonlinear functions that are not so easily tractable. For voltage-gated channels, examples of important functional behavior include the open probability (*P*_{O}), the voltage dependence of activation or inactivation, or the use-dependent availability. These properties cannot be easily formulated as functions of rate constants, except for very simple kinetic mechanisms. Furthermore, they must be prescribed in the context of a specific experiment (e.g., a voltage-clamp step protocol).

*Expanding the cost function*

Without explicit parameter relationships, we cannot solve behavioral constraints simply by converting model parameters into free parameters, as we did for linear parameter constraints. Likewise, we cannot use that formalism to solve any parameter constraint that cannot be written as a linear relationship between the transformed model parameters, as captured by the generalized linear constraint Eqs. 32 and 33 (Salari et al., 2018). Instead, the solution we propose here for handling behavioral constraints and arbitrary parameter relationships is to include them into the cost function. Thus, the cost function *F*, which is minimized by the optimizer in search for an optimal solution, can be expanded to include multiple components, one for each set of experimental data and one for each constraint:*i* and *j*. The α quantities are relative weighting factors that multiply the cost function components. Including multiple components in the cost function is known in the optimization literature as multi-objective fitting (Druckmann et al., 2007; Bandyopadhyay and Saha, 2013; Fletcher, 2013). For example, *P*_{O}). The cost function components that denote constraints should be formulated in such a way that they take a value of zero when the underlying constraint is satisfied, and a very large value (relative to the data cost components) when the constraint fails, as explained further.

*Formulating behavioral constraints and arbitrary parameter relationships*

Some behavioral constraints can be formulated as mathematical relationships involving simple properties of the channel. For example, we could constrain the maximum open probability reached during a depolarization step to take certain values or to fall within a range:*a* and *b* could stand for the δ_{ij} and *z*_{ij} parameters in Eq. 2 (Salari et al., 2018) and *C* is a constant equal to *F*/(*R* × *T*). Likewise, this equation cannot be handled with the formalism from part one because it is nonlinear.

Algebraically, any equality or inequality relationship can be converted to "=0" or "≥0," respectively. Thus, the above constraints could be rewritten as follows:*F*^{C} that correspond to the above equality and inequality relationships can be formulated as follows:

*Nonparametric behavioral constraints*

In principle, the same logic can be applied to any other model property. However, some model behaviors cannot be reduced to a single value or cannot be easily calculated theoretically. For example, many functional aspects, such as the recovery from inactivation or the use dependence, can be empirically fitted by one or two exponentials. Unfortunately, these apparent time constants cannot be directly and easily calculated from the model, which actually predicts a larger number of exponentials, equal to the state count minus one (Milescu et al., 2005; Salari et al., 2016). Likewise, the voltage-dependent activation curve can be well approximated and fitted by a Boltzmann equation with only two parameters, but calculating the half-activation and the sensitivity values directly from the model is generally not practical.

In cases like these, it is simpler to simulate the response of the channel to the same stimulation protocol as was used to obtain the experimental (or hypothesized) data. Then, a cost function component can be calculated as the sum of square differences between the simulated and the experimental data:*y*_{i} and *x*_{i} are experimental and simulated data points, respectively, and *N* is the number of data points. In the above equation, one could use the raw data directly, point by point, or one could extract some properties from the raw data and use the points on that property curve. For example, when the stimulation protocol is designed to extract the time course of a macroscopic current, one would fit the raw data directly. In contrast, when the stimulation protocol is designed to extract a behavior, such as the recovery from inactivation, one would fit the property curve. Although extracting a property curve involves additional computation, it has the substantial benefit of concentrating the information on a very specific aspect of channel behavior. For example, in a curve that represents the recovery from inactivation, every data point informs directly on the apparent time constants of inactivation. Likewise, every data point in a voltage-dependent activation curve informs directly on the two parameters of the Boltzmann equation.

Whether the cost function for these nonparametric behavioral constraints is calculated from raw data or from property curves or is based on hypothetical values, one must consider the presence of random noise and other artifacts that contaminate the experimental data. Thus, even a perfect model would not generate zero cost for the constraints, which may confuse the optimization engine. A simple solution is to reformulate the problem as an inequality:

#### A computational framework for solving behavioral constraints and arbitrary parameter relationships

We presented above some examples of behavioral constraints and arbitrary parameter relationships. In general, the problem we must solve is to find a model that not only best explains the experimental data but also satisfies a set of equality or inequality constraints. Mathematically, the problem can be formulated as the minimization of a function subject to a set of nonlinear constraints:*N*_{E} equality and *N*_{I} inequality constraints, respectively. In the case of maximum likelihood methods, instead of maximizing the log-likelihood, one can equivalently minimize its negative.

As discussed in the previous section, one possible solution to this constrained function minimization is to add the constraints to the cost function (Eq. 60). This approach is equivalent to the method of penalties (Fletcher, 2013), which reformulates the problem as an unconstrained optimization, by adding a penalty term to the cost function *h*_{i} and *g*_{j} expressions that correspond to any of the equality and inequality constraints above is straightforward. For example, the first two *P*_{O} constraints given above (Eq. 64) become:*h*_{i} and *g*_{j} with respect to a free parameter

The main advantage of the penalty method is that it can be used with any optimization algorithm that was originally designed for nonconstrained problems. The main issue, however, is the choice of the penalty parameter α. On the one hand, if α is too small, then the solution found by the optimizer will be pulled toward *n*-dimensional parameter space. Thus, although it is conceptually and computationally very simple, using the penalized cost function is not exactly a plug-and-play solution, as in the case of linear constraints.

A possible strategy is to find the solution iteratively, starting with a relatively small *α*, and increasing it until some convergence criteria are satisfied (Himmelblau, 1972). This is the approach we are taking here, as summarized in Fig. 1. Once a model topology is chosen, the workflow starts with defining the linear parameter constraints and the behavioral constraints (if any), including any other arbitrary parameter constraints. The next step is to define the cost function and the penalized cost function, according to the specific application (e.g., macroscopic fitting, single-channel maximum likelihood, etc.). Then, we choose a set of model parameters as the starting point, **K**_{0}. Mathematically, these parameters do not need to satisfy either set of constraints (parameter or behavioral), but starting as close as possible is recommended. From the initial model parameters **K**_{0}, we then calculate the initial set of free parameters, _{0}, equal to a small positive number. In practice, this value can be chosen so as to make the data and the penalty components of the overall cost function be approximately equal.

Once these quantities are defined and initialized, we start the optimization procedure, which involves two embedded loops, as shown in Fig. 1. An outer loop, indexed by *p*, handles the schedule for updating the penalty parameter α_{p} that is used to calculate the penalized cost function *k*, handles model optimization for a given α_{p}. The penalty parameter α_{p} is progressively increased at each outer loop iteration, to increase the relative weight of the penalty component in _{p} is increased. The outer loop can be run for a predefined number of iterations or can be terminated when the behavioral constraints are satisfied, if at all possible.

In principle, any type of optimization engine can be used in the inner loop. As explained, the optimizer is completely model and penalty blind. Essentially, the optimizer solves an unconstrained minimization problem, operating with a set of free parameters **R**_{k} are calculated from **K**_{k} are calculated from **R**_{k}, as outlined in Fig. 3 of the companion article (Salari et al., 2018). The model optimization in the inner loop can be run for a predefined number of iterations or can be terminated when some convergence criteria are satisfied. Typically, convergence requires that there be no substantial changes in the free parameter values and in the cost function (and its gradients be close to zero) from one iteration to the next.

### Testing the algorithm

To clarify the computational procedures described in both parts of this study, we give a step-by-step numerical example. For illustration purposes, we chose the model shown in Fig. 2 A, which is complex enough to accommodate an allosteric factor (Fig. 2 A, *a*_{1}), an external parameter representing the number of active channels in the recording (*N*_{C}), and several parameter and behavioral constraints. At the same time, the model is small enough to allow us to print the vectors and matrices used in the numerical computation. Readers who wish to implement their own code can use these examples for verification. Briefly, we tested the algorithms by fitting a stochastically simulated set of macroscopic data, generated in response to a typical voltage-clamp step protocol. We intentionally chose a relatively small dataset (the time course of activation and inactivation at different voltages and the voltage-dependent steady-state activation and inactivation curves, as shown in Fig. 3) to illustrate potential parameter identifiability issues and the effect of constraints. The data were fitted in multiple runs, with each run enforcing a different set of constraints, as outlined in Fig. 2 B. The simulation, data analysis, and fitting procedures are explained in Materials and methods.

We define the following set of model parameters **K**:*a*_{1} is the allosteric factor and *q*_{1} is the channel count. Thus, we have a total of 14 model parameters, with numerical values given in Table 1. The corresponding vector of transformed model parameters **R** is:

#### Applying linear parameter constraints

The test model has allosteric relationships that require two sets of linear parameter constraints. The first set applies to the forward transitions C_{1} to C_{2} and C_{2} to C_{3}:_{3} to C_{2} and C_{2} to C_{1} have a similar allosteric relationship, which results in another set of two equality relationships:**R**:*V* = 0. However, the second equation is not sufficient to enforce the allosteric relationships at any arbitrary voltage. To do so, we must add either one of the following two equations:

Another assumption that we made about our test model is that the O_{3} to I_{4} transition has the same voltage sensitivity as the C_{2} to O_{3} transition. This results in one mathematical relationship:

The final assumptions we made involve inequality constraints. Thus, we constrained the rate of recovery from inactivation (I_{4} to O_{3}) to have negative voltage dependence and the deactivation rates (O_{3} to C_{2} and C_{2} to C_{1}) to have a voltage sensitivity greater than −0.15 mV^{−1}:*z*_{1} and *z*_{2}, and write two equality relationships:

We summarize here all the constraint equations:

#### Linear algebra calculations

We can now formulate the constraint matrix **M** and vector **V**, as in Eq. 37 (Salari et al., 2018):

As **M** contains only constant values, it can now be decomposed with the singular value decomposition technique into three matrices, as in Eq. 40 (Salari et al., 2018):

From **V**_{M}, we can now obtain the **A** matrix, as shown in Eq. 41 (Salari et al., 2018):**A**^{−1} matrix is simply obtained by transposing **A**, and we do not show it here. To obtain the **B** vector, we must first calculate the pseudoinverse of **M**, **M**^{+}, as shown in Eq. 43 (Salari et al., 2018). First, we calculate the pseudoinverse of **S**_{M}, **S**_{M}^{+}:**S**_{M}^{+}, we can calculate **M**^{+}:**M**^{+} matrix can now be used to calculate the **B** vector, as in Eq. 42 (Salari et al., 2018). However, when the model contains inequality constraints, the **V** vector will contain elements that depend on the slack variables *z*_{1} and *z*_{2}. During optimization, the slack variables are changed freely by the parameter estimation engine. However, at the beginning of the optimization, they must be initialized by solving their corresponding constraint equation. In this case, *z*_{1} is initialized as follows:*z*_{2} is initialized as:*z*_{1} and *z*_{2} values, we can now calculate the initial **V** and **B** vectors:**X** and **Z** vectors (Eq. 44; Salari et al., 2018). **Z** contains the slack variables, which are initialized as shown above, whereas **X** is initialized from the initial set of model parameters **R**_{0}, using Eq. 39 (Salari et al., 2018). Altogether, the initial free parameter values are:**R** are calculated from the free parameters **K** are calculated from **R**.

#### Applying arbitrary parameter constraints and behavioral constraints

In addition to linear parameter constraints, we also tested a few simple but useful constraints that cannot be implemented with the linear algebra formalism (Salari et al., 2018). First, we tested an arbitrary parameter constraint that restricts the channel count *N*_{C} to a range of values. The test data were simulated with *N*_{C} = 5,000. However, to test the algorithms under more realistic conditions, we enforced a range of values away from the true value (6,000–8,000). The same strategy was used with all the behavioral constraints introduced next. The constraint and the corresponding cost function component are the following:_{1} and β_{2} are numerical factors with the following properties:

The second is a behavioral constraint that enforces the maximum open probability reached during a brief depolarization step from −120 to 0 mV, as illustrated in Fig. 2 B (run IV). With the true parameter values, the model predicts a maximum *P*_{O} of ∼0.42, but we constrained it to 0.5. The constraint equation and the corresponding cost function component are the following:*τ*_{R}, we can calculate the corresponding *t* and use that *t* = 50 ms, at −80 mV, but we constrained it to 0.8. The constraint equation and the corresponding cost function component are the following:

#### Optimizing the model

We illustrate the performance of the algorithms with six optimization runs, each implementing a different set of constraints, as described in Fig. 2 B. Together, these examples test the full range of constraints that the algorithms are designed to handle, as they are likely to occur in practical modeling applications: linear equality and inequality parameter constraints and model behavior and properties. Furthermore, we test all types of model parameters, as defined in the companion paper: rate constant parameters, multiplicative factors (*a*_{1}), and external parameters (*N*_{C}). The true parameter values, as well as the initial and the estimated values obtained in each optimization run, are given in Table 1.

In run I, we enforced only equality linear parameter constraints (Eq. 83). The cost function that was minimized by the optimizer had the following expression:*N*_{V} is the number of traces, *N*_{t} is the number of samples in each trace, *y*_{V,t} and *I*_{V,t} are the data point and the predicted current, respectively, at voltage *V* and time *t*, and *y*_{peak} is the largest negative peak current in the entire dataset. With these normalizations, all three data cost components take comparable values. We have not done it here but, in principle, one should further normalize the data to account for potentially different levels of noise, such as between the time course traces and the activation and availability curves. One possibility would be to multiply each cost component by a factor inversely proportional to its normalized variance, to ensure that less noisy datasets will be fitted more tightly by the model. This variance can be approximated through fitting each dataset with an appropriate mathematical function (e.g., a sum of exponentials for the time course data and a Boltzmann for the activation and inactivation curves).

In run II, we used the same conditions as for run I, but we added the inequality linear parameter constraints (Eqs. 90 and 91). In runs III through VI, we applied the same linear parameter constraints as in run II, but in each of these runs, we added different constraints that were implemented via the penalty mechanism: an arbitrary parameter constraint that restricts *N*_{C} to a range of values (run III) and behavioral constraints that enforce *P*_{O} (run IV), the recovered fraction *f*_{R} (run V), or both *P*_{O} and *f*_{R} simultaneously (run VI). In runs III through VI, the optimizer minimized a penalized cost function with the following expression:

The optimization results shown in Fig. 4 demonstrate the proper functioning of the algorithm with all types of constraints. To test the convergence of the optimizer, we intentionally chose starting parameters (Table 1) that generate prediction curves that deviate substantially from the data, as shown by the blue traces in Fig. 3. In all cases, the cost function virtually settled in ∼30 iterations (Fig. 4 A, left), after which most model parameters changed little (Fig. 4 B). For run I, the final parameter values are within ∼10% of the true values (Table 1), which is to be expected under these conditions (Milescu et al., 2005). For the other runs, the constraints push some of the parameters away from their true values, as intended. Although the final parameter values (Table 1) vary across the six runs, they all predict virtually identical fit curves, all represented by the red traces in Fig. 3 (A and B).

The effect of inequality linear parameter constraints can be observed by comparing runs I and II. In run I, the ^{−1}, finally converging to a slightly positive value, even though the true value is slightly negative (−0.05 mV^{−1}). The convergence to a positive value for

In runs III through VI, the cost function is replaced by a penalized cost function, which adds penalty components (Eq. 73). In all of these cases, the penalty function quickly drops by four or five orders of magnitude during the optimization (Fig. 4 A, right). In run III, where the penalty mechanism enforces a range of values for *N*_{C}, the penalty function occasionally drops to zero (Fig. 4 A, right, orange trace), whenever *N*_{C} is within the allowed range and the constraint is exactly satisfied (Fig. 5 A). Although the initial value of *N*_{C} (3,000) was outside the acceptable range, the optimizer quickly brought *N*_{C} within the range, in just a few iterations. We find it interesting that the convergence value of *N*_{C} does not lie on the edge of its allowed range (6,000), as close as possible to the convergence value found in run II (5,500). This suggests the existence of multiple solutions that predict identical fits.

In runs IV through VI, the penalty mechanism was used to enforce equality relationships for *P*_{O} and *f*_{R}. Like with *N*_{C} in run III, the initial values of *P*_{O} and *f*_{R} were quite different from their enforced values. However, a few iterations were sufficient to bring *P*_{O} or *f*_{R} close to their enforced values, as illustrated in Fig. 5 (B and C, green and magenta traces). In contrast to run III, the penalty function approaches a small value but does not reach zero (Fig. 4 A, right, green, blue, and magenta traces). Accordingly, the enforced quantities hover in a small neighborhood centered on their enforced values (Fig. 5, B and C). The size of this neighborhood depends on the numerical value of the penalty parameter α_{p}: the larger the α_{p}, the smaller the neighborhood. In principle, enforcing the penalty might require several cycles, where each cycle increases the value of α_{p}, as illustrated in Fig. 1, and tightens the constraint. However, for these relatively simple optimization examples, we initialized the penalty factor as α_{0} = 1, which enforced the constraints tightly enough in a single penalty cycle.

As expected, adding these constraints that push *P*_{O} and *f*_{R} away from their true values also results in slightly suboptimal fits in runs IV through VI, compared with runs I through III. Furthermore, these constraints expose correlations between properties of the model (*P*_{O} and *f*_{R}) and certain model parameters. Thus, *P*_{O} is inversely correlated with *N*_{C}. Without any constraint, *P*_{O} and *N*_{C} are estimated as ∼0.42 and 5,100, respectively. In contrast, when *P*_{O} is constrained (runs IV and VI), the *N*_{C} estimate is lowered to 4,000 (Fig. 5 A). Vice versa, when *N*_{C} is constrained to a larger value, the estimated parameters predict a lower *P*_{O} (Fig. 5 B). Likewise, *f*_{R} is correlated with the rate of recovery from inactivation (the I_{4} to O_{3} transition). Thus, enforcing *f*_{R} to a larger value (0.8) than the true value (0.43) results in a smaller estimate for

## Discussion

We have presented here a set of mathematical and computational tools that can be used to estimate kinetic mechanisms that explain new data but also satisfy user-defined prior knowledge. In part one of this study (Salari et al., 2018), we derived a procedure for enforcing explicit linear equality and inequality parameter relationships. Here, in part two, we introduced a procedure for enforcing arbitrary model properties and behaviors, as well as arbitrary parameter relationships. Together, these methods are capable of handling virtually all types of model constraints that are likely to arise in practical situations. To demonstrate our approach, we provided a step-by-step numerical example. Interested readers can use these examples to implement the constraint algorithms in their own software and to verify correctness. We also implemented these algorithms in the freely available QuB software, as maintained by our laboratory (Milescu, 2015).

### Compatibility with existing optimization frameworks

The procedures described here can be easily adapted into a typical optimization package. As illustrated by the workflow diagram in Fig. 1, only a few modifications would be required: adding a function that converts between free parameters and model parameters (

### Constrained fitting versus multiobjective fitting

There is a certain similarity between constrained fitting and simply including those data that underlie the constraints into a more comprehensive dataset to be fitted. The second approach is generally described as multiobjective fitting (Druckmann et al., 2007; Bandyopadhyay and Saha, 2013). Although it is not a substitute for the reduction method that is used to enforce linear parameter constraints, it could be a substitute for the penalty method. As the name implies, in this case the optimizer would need to find a solution that satisfies multiple objectives (i.e., datasets). This is conceptually equivalent to constrained fitting, but there is also one important difference: in multiobjective fitting, the optimal solution found by the search engine may actually explain poorly each and all of the individual datasets, as long as it is the best overall compromise. Moreover, to find this compromise solution, one must choose a set of weighting factors that encode how much each dataset is worth to the model, which is not trivial.

In contrast, the constraining mechanism described in this study will give the highest priority to the constraints and satisfy them exactly (the linear parameter constraints, via the reduction method) or at least very closely (all other constraints, via the penalty method). Only after the constraints are satisfied will the model adapt to explain the data (in as much as it is possible). Nevertheless, as we explained in the paper, a certain margin of error can be built into the constraints to accommodate noise and potential artifacts, but the constraints will stay tightly within this margin. Then, one advantage to the constraint approach is that one can more easily detect when a model is incompatible with the data. Furthermore, one could also detect inconsistent knowledge, as signaled by incompatible constraints.

### Model behavior: To enforce or not?

The need for enforcing explicit parameter relationships is obvious, if only to consider microscopic reversibility or the ratio of sequential activation rates. However, it may be less clear to the reader why model behavior and properties need to be enforced. Why not derive them directly from the data? After all, once model parameters are estimated, they can be used to predict any model property or behavior. The problem resides in the potential lack of model and parameter identifiability. In an ideal case, the model would be uniquely identifiable, which means that no other topology exists that can explain the data equally well (Kienker, 1989; Bruno et al., 2005). Furthermore, the data would be noise and artifact free and the model parameters would be fully identifiable, which means that the model admits a unique solution and the optimizer is able to find it from the data. If this were the case, then it would make little sense to enforce a model behavior or property except to test the sensitivity of the parameters with respect to that behavior.

In reality, however, the true model may never be known, and the working model may be just one out of many equivalent topologies. Furthermore, the parameters may not be fully identifiable, either because the model admits multiple solutions (theoretical parameter identifiability) or because the data are corrupted by noise and artifacts that flatten the cost function surface (practical parameter identifiability; Milescu et al., 2005; Raue et al., 2009; Siekmann et al., 2012; Hines et al., 2014; Middendorf and Aldrich, 2017). Thus, estimating the kinetic mechanism from limited data may result in a parameter set that is just one out of many possible solutions and potentially one with poor predictive power.

This is actually the case with our numerical example: in all runs, the estimates obtained by the optimizer are close to the true values (Table 1), except when otherwise constrained (e.g., *N*_{C} in run III). However, the estimates differ across runs, even though the fits are virtually identical between runs and follow closely the data (Fig. 3, red lines). How can different sets of parameters produce the same solution? The explanation for this apparent contradiction is that the parameters are not uniquely identifiable given the reduced dataset. Clearly, adding more constraints or enforcing other model behaviors would improve parameter identifiability and would select only those parameter solutions that are compatible with that behavior. Furthermore, it would also improve model identifiability, making it easier to discover the correct model. Short of the true model, we would at least obtain more robust models, as nature always does.

## Acknowledgments

We thank the members of the Milescu laboratories for their constructive comments and suggestions.

This work was supported by the American Heart Association (grants 13SDG16990083 to L.S. Milescu and 13SDG14570024 to M. Milescu) and the Graduate Assistance in Areas of National Need Initiative/Department of Education (training grant fellowship to M.A. Navarro).

The authors declare no competing financial interests.

Author contributions: L.S. Milescu developed the mathematical and computational algorithms and implemented them in software. All authors contributed to designing and testing the algorithms and software and writing the manuscript.

Richard W. Aldrich served as editor.

- Submitted: 28 September 2017
- Accepted: 6 December 2017

This article is distributed under the terms of an Attribution–Noncommercial–Share Alike–No Mirror Sites license for the first six months after the publication date (see http://www.rupress.org/terms/). After six months it is available under a Creative Commons License (Attribution–Noncommercial–Share Alike 4.0 International license, as described at https://creativecommons.org/licenses/by-nc-sa/4.0/).