Theoretical framework

The current Chilam Platforms are all based on a Bayesian classifier approach that calculates the conditional posterior probabilities, P(C(t)|X(t’)), for classes of interest, C, given a set of predictors X(t’) = (X1(t’), X2(t’),…, XN(t’)). Some current examples of classes of interest are:

Positive cases of SARS-Cov-2
Cases of Chagas disease
Obese individuals
Individuals suffering from Metabolic Syndrome
Vectors and hosts of emerging or reemerging diseases, such as Zika, Dengue and Leishmaniasis
Sedentariness

while our datasets of predictors cover multiple areas, including socio-demographic and socio-economic data, climate data, mobility data, biodiversity point collection data, epidemiological data, genetic data, psychological data and clinical data among others. We take a non-parametric approach, wherein all data is allowed to “speak for itself”, by converting all data types and variables, Xi, to binomial variables by a suitable discretisation (“coarse graining”).

The classifiers P(C(t)|X(t’)) can be interpreted as describing the “niche” of a class of interest, where the variable configuration, X, describes the niche of C in the case where P(C|X) > P(C) and “anti-niche” on the contrary, where P(C|X) < P(C). Thus, X can represent those risk factors – socio-economic, behavioural etc. – that lead to a higher propensity for obesity; or those socio-demographic and socio-economic conditions where there is more likelihood of an infected person dying from COVID19; or those biotic and abiotic conditions that favour the presence of a disease vector.

A Bayesian approach offers several key advantages: it naturally incorporates elements of human intuition in the form of Bayesian priors, while, at the same time, allowing for the incorporation of quantitative information from data in the form of a likelihood function to then, using Bayes theorem, combine the two to form a posterior probability. In this way, new information and beliefs can be subsequently be incorporated in the form of new priors and likelihood to form adjusted posteriors. It is also offers a natural framework in which causality can be introduced.

The classifiers P(C(t)|X(t’)) can be calculated using different statistical or machine learning models. Currently all platforms rely on using the Naive Bayes approximation, that is based on Bayes theorem and a subsequent factorisation of the likelihoods P(X | C), and which is well known for both its computational simplicity and transparency.

This Bayesian classifier-based approach is particularly suitable for modelling Complex Adaptive Systems. Firstly, because it is probabilistic in nature, with statistical inference models that accommodate uncertainty in a rigorous way; secondly, because it deals in a straightforward and efficient way with the enormous multi-factoriality of such systems, where the probability of a class of interest depends on a very large set of potential risk/niche factors, that range from the micro to the macro, and which span a large spectrum of scales and, consequently, scientific disciplines; finally, it makes manifest adaptation in the case that the relations between C and Xi can change over time.

As well as a “niche” perspective, the Chilam platforms also offer a network-based perspective in the form of Complex Inference Networks. In this case, the nodes of the network are either classes of interest, C, or niche/risk factors Xi. The networks links are both directed and weighted, being associated with a statistic that is a measure of the correlation between C and Xi. For instance, P(C | Xi) could be used as a link weight between two nodes C and Xi. In distinction to a niche-based perspective, here, the focus is on a “community” in ecological terms. Such a perspective allows for a more global analysis, whereby it can be observed that a certain class has certain risk factors (links) in common with another class. So, for example, some classes of interest may be hypertension, hyperglycaemia and hypertriglyceridaemia and obesity noted as a risk node that links to all three.