Theoretical framework
The current Chilam Platforms are all based on a Bayesian classifier approach that calculates the conditional posterior probabilities, P(C(t)|X(t’)), for classes of interest, C, given a set of predictors X(t’) = (X1(t’), X2(t’),…, XN(t’)). Some current examples of classes of interest are:
- Positive cases of SARS-Cov-2
- Cases of Chagas disease
- Obese individuals
- Individuals suffering from Metabolic Syndrome
- Vectors and hosts of emerging or reemerging diseases, such as Zika, Dengue and Leishmaniasis
- Sedentariness
while our datasets of predictors cover multiple areas, including socio-demographic and socio-economic data, climate data, mobility data, biodiversity point collection data, epidemiological data, genetic data, psychological data and clinical data among others. We take a non-parametric approach, wherein all data is allowed to “speak for itself”, by converting all data types and variables, Xi, to binomial variables by a suitable discretisation (“coarse graining”).
The classifiers P(C(t)|X(t’)) can be interpreted as describing the “niche” of a class of interest, where the variable configuration, X, describes the niche of C in the case where P(C|X) > P(C) and “anti-niche” on the contrary, where P(C|X) < P(C). Thus, X can represent those risk factors – socio-economic, behavioural etc. – that lead to a higher propensity for obesity; or those socio-demographic and socio-economic conditions where there is more likelihood of an infected person dying from COVID19; or those biotic and abiotic conditions that favour the presence of a disease vector.
A Bayesian approach offers several key advantages: it naturally incorporates elements of human intuition in the form of Bayesian priors, while, at the same time, allowing for the incorporation of quantitative information from data in the form of a likelihood function to then, using Bayes theorem, combine the two to form a posterior probability. In this way, new information and beliefs can be subsequently be incorporated in the form of new priors and likelihood to form adjusted posteriors. It is also offers a natural framework in which causality can be introduced.
The classifiers P(C(t)|X(t’)) can be calculated using different statistical or machine learning models. Currently all platforms rely on using the Naive Bayes approximation, that is based on Bayes theorem and a subsequent factorisation of the likelihoods P(X | C), and which is well known for both its computational simplicity and transparency.
This Bayesian classifier-based approach is particularly suitable for modelling Complex Adaptive Systems. Firstly, because it is probabilistic in nature, with statistical inference models that accommodate uncertainty in a rigorous way; secondly, because it deals in a straightforward and efficient way with the enormous multi-factoriality of such systems, where the probability of a class of interest depends on a very large set of potential risk/niche factors, that range from the micro to the macro, and which span a large spectrum of scales and, consequently, scientific disciplines; finally, it makes manifest adaptation in the case that the relations between C and Xi can change over time.
As well as a “niche” perspective, the Chilam platforms also offer a network-based perspective in the form of Complex Inference Networks. In this case, the nodes of the network are either classes of interest, C, or niche/risk factors Xi. The networks links are both directed and weighted, being associated with a statistic that is a measure of the correlation between C and Xi. For instance, P(C | Xi) could be used as a link weight between two nodes C and Xi. In distinction to a niche-based perspective, here, the focus is on a “community” in ecological terms. Such a perspective allows for a more global analysis, whereby it can be observed that a certain class has certain risk factors (links) in common with another class. So, for example, some classes of interest may be hypertension, hyperglycaemia and hypertriglyceridaemia and obesity noted as a risk node that links to all three.