CONTRIBUTIONS TO MATHEMATICAL BIOLOGY
JACQUES NINIO
( included in the web site http://www.lps.ens.fr/~ ninio )
TOPICS DISCUSSED HERE:
- general comments on mathematical biology
- probabilistic tools in enzyme kinetics
- RNA topology
- a link between vision and molecular biology
- appendix: on models in times of consensus
SEE IN OTHER SECTIONS
- algorithmic work (in bio-informatics section)
- the cube method in small-angle X-ray scattering (in section on RNA structure)
- the correspondence problem in stereoscopic vision (in stereo vision section)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++
"The smallest prime number greater than one"
uniquely identifies "three" (---)
Charles R. Gallistel, member of the National Academy of Sciences of the USA
and Rochel. Gelman, page 45
in "Numerical cognition", edited by Stanislas Dehaene, Blackwell, 1993
++++++++++++++++++++++++++++++++++++++++++++++++++++++++
GENERAL COMMENTS
------- How to be useful ------
I do not belong to the community of mathematical biologists. Often, I meet students or researchers trained in mathematics, physics or informatics, who tell me about their desire to apply their competence to a pertinent biological problem. Invariably, I tell them that the first motivation in a research work should be the interest of the subject itself. So, they have to determine first which biological question is of the greatest interest to them, then work with humility in a laboratory in which the question is addressed. Later, at one time or another, the fact that they have a different background from that of the other researchers in the field could inspire to them questions that their colleagues did not raise, or let them see features, or implications in the data that the colleagues did not notice, or allow them to devise new tools. But they should not put the tool before the subject.
Mathematicians or physicists often make a deep mistake when they try to "model" biological data. Actually, the published data are heavily contaminated by the implicit or explicit models of the time. The experimenter designs an experiment, selecting what he believes are the important questions, and the pertinent procedures and controls. He decides to discard incomprehensible results, and focus on the results which make sense, with which he will be able to present a coherent story that will be acceptable by the reviewers. Actually, he is constructing a piece of reality that fits his prejudices. Therefore, the mathematicians or the physicists do not model the data. they make instead a formal model of the exeprimenter's conceptual world. More on this topic in Appendix 1 below.
Some models, it is true, are made in complete ignorance of experimental publications, which is worse. As long as their authors stand alone in defending their models, the consequences are small. But when an army of modelizers comes next, and explores all the consequences of the initial model, still without seeking any contact with experimental reality, this becomes damaging to science. I have witnessed the development af such a situation for the first time in relation to Eigen's initial papers on the origins of life. But now , vacuous "virtual reality" science is becoming a common pathology in many disciplines.
------ The biologist's understanding of mathematics ------
Of course, much of the biological work performed today is not motivated by scientific questions, it is merely technology-driven research. "Using fashionable tools to address fashionable questions" could be the generic title for most grant applications. Accordingly, this type of work produces results devoid of real conceptual or practical interest.
Most biologists have an extremely poor understanding of mathematics, or even logics:
- Confusion between a proposition and the reciprocal proposition. In logical thinking, the statement "A implies B" is not equivalent to the reciprocal statement "B implies A". In biological writings, if A implies B then when B is observed, "it is likely" that A is true.
- Deductive versus inductive reasonings. While in Latin, Slavic, Arabic or Indian cultures, deductive reasonings are valued, they are rejected in the English-speaking biological tradition, which puts faith, instead, in inductive reasonings (there are a few fortunate exceptions, among which Crick's "wobble hypothesis" article, which uses deductive reasoning: Journal of Molecular Biology, vol.19, pp. 548-555, year 1966). Otherwise a perfectly rigorous piece of work, from the point of view of deductive reasoning, may appear to a native English-speaking biological reviewer as a piece of "complete and utter nonsense which has no place in a supposedly serious scientific article" - a kind of comment my articles often receive. On the other side, many highly valued theoretical papers (from people like Thomas Jukes, for instance) use inductive reasonings, and appear to me like dog's breakfast.
In practice, there is in a theoretical work, a cluster of starting ideas a, b, c, d ... and a cluster of relevant experimental facts u, v, w, x .... In a deductive mode, one would work out the theoretical consequences of a, b, c, d ... and reach a theretical proposal P. Then the whole would be confronted with the set of facts u, v, w, ... and the author would determine how to modify the theory, or how to reevaluate the facts to seek a better agreement between facts and theory. In inductive reasoning, one takes idea a and fact u, and combines them into a hybrid idea i, then one takes ideas i and b, combines them with fact v, and derives the half-baked idea j, and so on. This is what most reviewers demand.
As a consequence of this state of affairs, I have had, in a number of cases, to hide to the readers the true heuristic line of reasonings which led me to a (valid) conclusion, and present instead some inductive stinking substitute, more palatable to the reviewer.
Happily, native English-speaking mathematicians and physicists do use deductive reasonings. Otherwise, they would not survive in their discipline.
- Failure to see the threads. Mathematicians strive for elegant proofs, which go to the heart of things, and can be expressed as concisely as possible. If a mathematician works out a proof that takes twenty pages of equations, then finds a way to go straight to the conclusion in three sentences without equations, then he is proud of having made this big step forward. In physics, there is the art of the "back of the envelope" calculations. Top physicists may reach important conclusions by doing mathematically elementary order-of-magnitude calculations. The simplicity of these calculations is deceptive. You need to really understand what you are doing to succeed. In mathematical and physical reasonings, there are threads: you know from where you start, you know where you are heading, and the succession of steps that will take you there. All this is foreign to most biologists. For most biologists, doing mathematics is writing down lots of equations, combining them to derive other equations, and continue the mixing and extraction procedures until, at some point, the desired result miraculously emerges. So they are impressed by mathematical treatments which cover pages and pages, although the treatments may be totally uninspired and if they happen to be correct, could be replaced with much shorter derivations.
---- contributions to biomathematics ----
Mathematics is the discipline I liked best in my studies. My favourite branches in mathematics were geometriy, analytical geometry, algebra and probabilities. I did not like integrals and differential equations. Although I do not have the level for doing mathematical research, and although my mathematical competence has declined over the years, I keep some lucidity in seeing the threads of a reasoning, and am quite perspicacious at detecting the mistakes. While I work in a physics laboratory, I feel that I am more a mathematician, or an engineer than a physicist. My understanding of chemistry is extremely poor, and this restricted my capacity to contribute to the origins of life. Occasionally, I use mathematics in my work, or "back of the envelope" calculations. But once I get the biologically pertinent results, I stop there and do not try to push the mathematics further.
While doing my thesis work, Piotr Slonimski submitted to me a problem in the interpretation of pictures taken by electron microscopy. It was, I think about having series of slices taken from yeast cells, and reconstructing the shape, assumed to be ellipsoidal from the slices. I found within a week an elegant solution to the problem, equating (if I remember well) two expressions of the volume, one as a Riemann and one as a Lebesgue integral. But Slonimski lost interest in the problem, and the calculations remained in the drawers.
In the thesis work, I described a method (using Fourier transform) for calculating the theoretical X-ray scattering curve from a molecule in solution [1,2]. The mathematical part is Luzzati's merit (see sections on RNA structure and on bio-informatics).
At the time of the thesis work, I also had the ambition to develop a classification of all possible RNA structures, and planned a theoretical work in four parts, entitled: "Properties of nucleic acid representations". Part 1 dealt with topological properties, and was published in 1971 [3] (see "RNA topology" below). Part 2 was about orientational properties (questions such as which side of the nucleotide is "up" in a structure, what makes a UA dinucletide different from AU, what are the different kinds of triple-stranded associations). This is partly reflected in my thesis [2], pages 57- 64, but otherwise unpublished. Having clear notions on these topics was however very useful in my subsequent bio-informatics work. Part 3 was supposed to deal with the metric properties of nucleic acids, for instance, in what environment a G.U pair coud be present without introducing too much distorsion?. Part 4 was supposed to deal with the influence of local sequence on the structure (e.g., what are the preferred sequences in stems or loops). In practice, the work on the energy models for predicting RNA secondary structures [4, 5] fulfills part of the goal of part 4.
When I started working on the kinetic theory of accuracy, I developped probabilistic methods to understand the outcome of the competitions between cognate and non-cognate substrates [6 - 12] (see "probabilistic tools in enzyme kinetics" below). Later, I succeeded in calculating "absolute" rates in enyme kinetics, from first passage times and pathway probabilities [13]. I consider [13] as my best contribution to mathematical biology.
I used my skills in analytical geometry to produce original stereoscopic images (see [14] and the forthcoming section on stereoscopic vision) and my understanding of projective geometry to deal with the correspondence problem in stereoscopic vision [15]. Having good geometrical intuition allowed me to produce several interesting pieces of work in visual perception, but since most workers in this domain do not understand geometry, this work has had little impact.The level of geometric illiteracy in the field is, in fact, amazing. People are not even conscious of their limitations. It is just that geometry is not part of their intellectual world. This and other pathologies which are damaging to the field of visual perception will be described elsewhere.
Last, there are curious bits of mathematical insights which allowed me to solve simultaneously problems in quite different domains (see " a link between vision and molecular biology" below).
+++++++++++++++++++++++++++++++++++++++++++++++++++++++
PROBABILISTIC TOOLS IN ENZYME KINETICS
------ probabilistic derivations ------
When I started thinking seriously about the accuracy of molecular processes, the conceptual tools were not yet available. The enzymologists were describing the mechanics of enzyme activity using a steady-state treatment, in which they lined up equations saying that the concentrations of all the intermediates in an enzyme reaction did not vary with time. The set of simultaneous equations was solved and led to the expression of the concentration of the products of the reaction as ratios between two rather complicated expressions. Nothing could be grasped intuitively in this approach.
So, I started thinking in terms of probabilities. I imagined a molecule of enzyme, its random encounters with molecules of substrate, the variable time during which the enzyme and the substrate remained bound, the random occurrence of an action which pushed the initial enzyme-substrate complex to another state, etc. All this was strictly equivalent, mathematically, to the steady-state picture, but I was not aware of that. In fact, chemical kinetics are intrinsically probabilistic. Macroscopic descriptions, such as those embodied in the steady-state treatments have their justification in the underlying probabilistic view of molecular processes.
Reasoning with probabilities (instead of applying the steady-state treatments) was very productive, and allowed me to derive all my important results on the accuracy of molecular processes [6-8, 11]. The mathematics were simple and intuitive. One reason which made the mathematics simple is that, in accuracy problems, one deals with the competition between two or several substrates, and what matters is the ratio of the compounds of each kind which are formed. Using probabilities, most of the complexities are removed when taking the ratios. On the other hand, with the steady-state treatments, there is a whole set of additional equations for each competing substrate, and all equations must be solved simultaneously. Despite the elegant simplicity of the probabilistic approach, most people in enzyme kinetics remained glued to the steady-state treatments, to KM's and Kcat's and other horrors. And since they were not able, with their tools, to see the implications of the schemes they were studying, they used so-so semi-empirical arguments and often reached erroneous conclusions. The situation, over the last thrirty years has not improved, and in particular, the International Union of Enzymology is, with respect to this topic, still in the stone age. On the question of erroneous treatments, see the forthcoming sections on the kinetic theory of accuracy, and on error and fraud in science.
The biologists with whom I discuss often consider probabilistic treatments as speculative, and do not try to invest time to understand them. They think it is a romantic view of enzyme mechanisms, yet it is chiefly a return to first principles. The mathematics of probabilities are subtle, not complex. One has to accept paradoxes such as the infinite sum "p = 1/2 + 1/4 + 1/8 + 1/16 ... = 1". More strikingly, one has to accept that p can be computed, very easily, from "p = 1/2 + p/2". Do you see why ?. All the power of the probabilistic treatment comes from the capacity to write such simple equations in which p is on both sides, because one explores a tree of events, and there are, on some branches, a return to earlier conditions.
I gave most of the (probabilistic) recipes to calculate error-rates for various enzyme mechanisms in my review article in the book on accuracy by Kirkwood, Rosenberger and Galas [12]. Simple, general expressions are given for several classes of mechanisms.
----- the peelback problem -------
The most advanced calculations were about the "peelback problem". Early in 1974, I had in my hands an issue of the Journal of Molecular Biology, containing an article by Goodman, Gore, Muzyczka and Bessman, which contained a theoretical treatment of the interplay between the polymerization and excision functions of the DNA polymerases. At that time I was well-advanced in the mathematics of accuracy, and felt immediately that something was wrong with their treatment. They had completely overlooked the "next nucleotide effect" (which I knew, but had not yet published) according to which, when a polymerase has a proofreading function, the errors made at one position increase with the concentration of the nucleotide to be incorporated next (see forthcoming sections on accuracy, and on DNA polymerase mechanisms). Furthermore, they had derived their theoretical equation by combining several equations which were mere definitions that did not involve any physical insight.
I found rapidly that their equation was in fact a concealed identity (something having the same status as (a square + b square) = (a+b)(a-b). So this "theoretical tretament" was a fantastic mathematical blunder. And the error had escaped the attention of four authors, two reviewers and an editor! Although the equation was an identity (therefore satisfied by any set of results, real or imaginary), the authors managed to plot experimental points which were not exactly on the curve representing their equation, some of the points being above, and others below the curve. They achieved this trick by plotting, not the individual results, but averages of the results. An identity which is verified by any arbitrary set of values, need not be verified by separate averages on the values. This article was, therefore, a piece of anthology in bio-mathematical blunders. Yet, the authors never retracted it.
I solved the problem they had addressed - deriving the error-rate for a DNA polymerase having a proofreading activity, taking into account all the forward and backward steps that the polymerase could make on the template. The calculations were performed under the simplifying condtion that the incorporation and excision kinetic constants at one position were independent of the nature of the nucleotide at the preceding position. The resulting equation was given, without proof, in the 1975 Biochimie paper [7]. The same equation was derived by Galas and Branscomb, and published in their J. Mol. Biol. 1978 paper. Later, I made a simpler treatment and extended the work to study the effect of pyrophosphorolysis [10]. The peelback problem was solved under increasingly general conditions by E.G. Malygin and L.N. Yashina (1980) then by Jean Durup (1982) -see references in [12]. These treatments have remained mathematical curiosities. The enzymologists prefer to use the erroneous treatments recommended to them by the International Union of Enzymology.
--------- The Russian school --------
In my search for the most elegant expressions, I was inspired by work from the Russian school of chemical kinetics. I had access to this work, owing to a visit Ernst Malygin made in Chapeville's laboratory in Paris. I saw that the Russians had developped a powerful tool to write down at once the equations corresponding to a kinetic scheme. So they published papers in Russian with ten different equations for ten different schemes, each equation being given without comment, as though it was obvious. At the same time, authors from the Western block were publishing papers in Biochem J. or J. Theoret. Biol, with pages of derivations, merely to deal with a single scheme. Knowing the elegant forms of the expressions helped me to derive my own expressions in [12]. On the other hand, I did not really understand the Russian method. It was introduced by M. I. Temkin in Doklad. Akad. Nauk. Phys. Chem. (1963) 152, 782-785. In this paper, the reasonings appear to be elementary ones, and it seems that there is mainly a great talent for rearranging algebraic expressions, making them beautifully simple. This year (2004), I was discusing with my colleague Bernard Derrida of the Statistical Physics Laboratory at Ecole Normale Supérieure. Having shown to him a "Russian" formula (Equation 2-59 in "Non-formal enzyme kinetics" by R. Schmid and V.N. Sapunov, Verlag-Chemie, 1982) Derrida saw at once how it could have been obtained, and derived the formula on the spot, before my eyes. The trick was to use a Laplace transform of the steady-state equations. I do not know whether or not Derrida's shortcut was ultimately used by the Russian school. The answer, perhaps, is in the monograph (which I have not seen) by N.M. Emanuel and D.G. Knorre: "Chemical kinetics", Wiley, New York, 1973.
-------- First passage times in enzymology ------
Having developped the probabilistic tool for calculating error-rates (which are ratios of rates) I attempted to see, to make the story complete, whether or not this type of thinking could be extended to have a direct access to "absolute" rates of reactions. Ultimately, I succeeded, and managed to derive reaction rates for enzyme mechanisms, using first passage times and pathway probabilities [13]. Among other advantages, this treatment allows one to make general statements about classes of kinetic mechanisms. While the steady-state treatment is "unstable" (one needs to compute everything from scratch when one changes a detail in a reaction mechanism) my method allows one to leave many details, including topological ones, unspecified. Furthermore, my treatment allows one to incorporate, in a reaction scheme, diffusive steps which cannot be simulated with any small-sized classical subscheme. This paper should have been taken as a very significant progress in enzymology. A few authors built upon this method (Alexey K. Mazur, J. Theoret. Biol., 1990 or Janos Südi, BBA, 1997), but on the whole, it has not yet set (formal) enzymology in motion. The paper was channelled to PNAS by John Hopfield, and the reviewers made valid constructive suggestions. While the article was under review, I sent a copy to Leslie Orgel, and he sent me immediately a draft of a paper he had in his drawers, expounding the same philosophy (how a rate could be computed from times and probabilities) and applying it to a famous issue in enzyme mechanism optimization. The title of the paper was "Albery and Knowles made easy". However, it was not as advanced, mathematically, as mine, and Orgel declined the invitation to co-sign my article.
+++++++++++++++++++++++++++++++++++++++++++++++++++
RNA TOPOLOGY [3]
This work on RNA topology was part of an ambitious program, in Bourbaki's axiomatic style, to describe the space of possible nucleic acid structures. It did not have any impact on the field, and I do not take it as major work. It could be fun, for people who would like to write about the early papers on nucleic acid topology, to look at this one. One reason, perhaps, for the low impact of this article, was that very soon much work was to be invested in the topology of DNA, not RNA molecules. Its is only much more recently, with the determinations of the structures of complex RNA molecules, that this article might take value. One thing I discuss, in this article, is what I called "the principle of contiguity" which would, in current, looser terminology, be restated as a principle of prevalence of "unknotted" secondary structures. I pointed out that structures which obey this principle do not need to be formed sequentially, while the strcutures which do not obey the principle may fold according to different 3d configurations, depending upon the sequence of formation of the secondary structure segments. This is a valid point, and I do not know wheteher or not it has been brought to the surface again, now that it is really useful.
I give now the abstract of the article:
"Our general purpose in undertaking this study was to develop a conceptual tool which might be convenient to understand the structures of nucleic acids presenting much lower regularity than DNA fibers, and might be of some help for the conception and the construction of molecular models.
"The matter dealt with is that of topological properties, that is, the properties for which the nucleic acid is considered as a thread, regardless of the sequence of the bases, their orientation or the configuration of the sugarphosphate backbone. The thread can be closed or open and the associations of the various regions of the molecule can result in various patterns. After study of these patterns, the two concepts of parallelism and contiguity in the association are defined with precision, and the problems entailed by circularity are briefly reviewed. Under the chapter of knots, some topological singularities peculiar to tRNA models or other nucleic acids are discussed. The distinction is made between true toplogical knots, and slip knots. The physical presence of such situations is discussed in relation to the properties of contiguity in the associations of the molecule."
++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A LINK BETWEEN VISION AND MOLECULAR BIOLOGY
In 1975-76, I was deeply thinking about the geometry underlying visual illusions, and saw that much of the phenomonology could be expressed in terms of relationships between the lengths of two segments a and b, of the form:
perceived length of (a+b) > perceived length of (a) + perceived length of (b),
or, when b > a,
perceived length of (b) / perceived length of (a) > length of (b) / length of (a)
or the reverse. Such relationships are typical of convex functions which take the value zero at the origin.
I was simultaneously working on kinetic amplification mechanisms, and it is apparent that the second condition expresses some amplification of the b/a ratio. Understanding that "convexity" was the most general mathematical translation of "amplification" allowed me to produce a proof of (quasi) impossibility of amplification mechanisms which would operate on a forward step in a reaction scheme. For a more precise statement, see [8].
It was fun to put the same figure on convex functions in my J. Theoret. Biol. article on illusions ([16], Figure 7) and in the Biochimie article on kinetic amplification on forward steps ([8], Fig. 1).
++++++++++++++++++++++++++++++++++++++++++++++++++++++++
APPENDIX 1: ON MODELS IN TIMES OF CONSENSUS
What follows is a summary (in French) of a talk at a meeting on "Biologie théorique, modélisation et enseignement de la biologie", organized by Anne-Marie Leseney, Université Paris VI, November 10th, 1998.
LA MODELISATION A L'HEURE DE LA PENSEE UNIQUE.
Souvent, le progrès scientifique est amené par la découverte d'une anomalie: une très légère déviation dans un phénomène qu'on croyait parfaitement maîtrisé. La difficulté, pour le théoricien, est de savoir reconnaître ces anomalies essentielles, et les distinguer des écarts dûs à des effets secondaires. Mieux encore, un bon théoricien est capable de percevoir, derrière la structure des résultats classques, bien expliqués par des modèles classiques, la possibilité de modèles alternatifs rendant également compte des résultats connus, mais prédisant aussi des anomalies non encore recensées.
Trop souvent, la modélisation est conçue comme une parure construite pour valoriser après coup un corps de résultats admis par tous. Je donnerai quelques exemples, tirés de ma propre expérience en biologie, pour montrer qu'elle peut être également utilisée pour dévoiler des phénomènes bien dissimulés derrière les apparences.
Les administrateurs de la science encouragent les relations de travail autour d'un sujet commun, entre laboratoires de disciplines éloignées. Cette interdisciplinarité, bureaucratiquement programmée, produit des effets pervers: un consensus factice se crée entre chercheurs de différentes disciplines, autour d'un corps de savoir minimal qui appauvrit les phénomènes et, par ses effets normalisateurs, fait disparaître les anomalies sources des progrès futurs.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++
REFERENCES
[1] Ninio, J., Luzzati, V. and Yaniv, M. (1972) Comparative small-angle X-ray scattering studies on unacylated, acylated and cross-linked Escherichia coli transfer RNA Val. 1. J. Mol. Biol. 71, 217-229.
[2] Ninio, J. (1971) Etude de la structure de l'ARN de transfert par diffusion centrale des rayons X, et de ses implications biologiques. Thèse d'Etat, Université Paris 7.
[3] Ninio, J. (1971) Properties of nucleic acid representations. 1. Topology. Biochimie 53, 485-494.
[4] Ninio, J. (1979) Prediction of pairing schemes in RNA molecules. Loop contributions and energy of wobble and non-wobble pairs. Biochimie 61, 1133-1150.
[5] Papanicolaou, C., Gouy, M. and Ninio, J. (1984) An energy model that predicts the correct folding of both the tRNA and the 5S RNA molecules. Nucleic Acids Res. 12, 31-44.
[6] Ninio, J. (1974) A semi-quantitative treatment of missense and nonsense suppression in the strA and ram ribosomal mutants of Escherichia coli . Evaluation of some molecular parameters of translation in vivo. J. Mol. Biol. 84, 297-313.
[7] Ninio, J. (1975) Kinetic amplification of enzyme discrimination. Biochimie 57, 587-595.
[8] Ninio, J. (1977) Are further kinetic amplification schemes possible? Biochimie 59, 759-760.
[9] Bernardi, F., Saghi, M., Dorizzi, M. and Ninio, J. (1979) A new approach to DNA polymerase kinetics. J. Mol. Biol. 129, 93-112.
10] Herbomel, P. et Ninio, J. (1980) Fidélité d'une réaction de polymérisation selon la proximité de l'équilibre. Comptes-Rendus Acad. Sci. Paris, Série D, 291, 881-884.
[11] Ninio, J. (1986) Fine tuning of ribosomal accuracy. FEBS Lett. 196, 1-4.
[12] Ninio, J. (1986) Kinetic and probabilistic thinking in accuracy. Dans Accuracy in Molecular Processes (Kirkwood. T.B.L., Rosenberger, R. and Galas, D.J., eds) Chapman & Hall, London, pp. 291-328.
[13] Ninio, J. (1987) Alternative to the steady-state method : Derivation of reaction rates from first passage times and pathway probabilities. Proc. Nat. Acad. Sci. USA 84, 663-667.
[14] Ninio, J. (1981) Random-curve stereograms : a flexible tool for the study of binocular vision. Perception 10, 403-410.
[15] Ninio, J. (1977) The geometry of the correspondence between two retinal projections. Perception 6, 627-643
[16] Ninio, J. (1979) An algorithm that generates a large number of geometric visual illusions. J. Theoret. Biol. 79, 167-201.