On the Entropy of Protein Families -
Barton, John P. and Chakraborty, Arup K. and Cocco, Simona and Jacquin, Hugo and Monasson, RemiJOURNAL OF STATISTICAL PHYSICS 162,
1267-1293 (2016) LPS
Abstract : Proteins are essential components of living systems, capable of
performing a huge variety of tasks at the molecular level, such as
recognition, signalling, copy, transport, ... The protein sequences
realizing a given function may largely vary across organisms, giving
rise to a protein family. Here, we estimate the entropy of those
families based on different approaches, including Hidden Markov Models
used for protein databases and inferred statistical models reproducing
the low-order (1- and 2-point) statistics of multi-sequence alignments.
We also compute the entropic cost, that is, the loss in entropy
resulting from a constraint acting on the protein, such as the mutation
of one particular amino-acid on a specific site, and relate this notion
to the escape probability of the HIV virus. The case of lattice
proteins, for which the entropy can be computed exactly, allows us to
provide another illustration of the concept of cost, due to the
competition of different folds. The relevance of the entropy in relation
to directed evolution experiments is stressed.