## Theoretical framework

The current Chilam Platforms are all based on a Bayesian classifier approach that calculates the conditional posterior probabilities, *P(C(t)|X(t’))*, for classes of interest, *C*, given a set of predictors *X(t’) = (X1(t’), X2(t’),…, XN(t’))*. Some current examples of classes of interest are:

- Positive cases of SARS-Cov-2
- Cases of Chagas disease
- Obese individuals
- Individuals suffering from Metabolic Syndrome
- Vectors and hosts of emerging or reemerging diseases, such as Zika, Dengue and Leishmaniasis
- Sedentariness

while our datasets of predictors cover multiple areas, including socio-demographic and socio-economic data, climate data, mobility data, biodiversity point collection data, epidemiological data, genetic data, psychological data and clinical data among others. We take a non-parametric approach, wherein all data is allowed to “speak for itself”, by converting all data types and variables, *Xi*, to binomial variables by a suitable discretisation (“coarse graining”).

The classifiers *P(C(t)|X(t’))* can be interpreted as describing the “niche” of a class of interest, where the variable configuration, *X*, describes the niche of *C* in the case where *P(C|X) > P(C)* and “anti-niche” on the contrary, where *P(C|X) < P(C)*. Thus, *X* can represent those risk factors – socio-economic, behavioural etc. – that lead to a higher propensity for obesity; or those socio-demographic and socio-economic conditions where there is more likelihood of an infected person dying from COVID19; or those biotic and abiotic conditions that favour the presence of a disease vector.

A Bayesian approach offers several key advantages: it naturally incorporates elements of human intuition in the form of Bayesian priors, while, at the same time, allowing for the incorporation of quantitative information from data in the form of a likelihood function to then, using Bayes theorem, combine the two to form a posterior probability. In this way, new information and beliefs can be subsequently be incorporated in the form of new priors and likelihood to form adjusted posteriors. It is also offers a natural framework in which causality can be introduced.

The classifiers *P(C(t)|X(t’))* can be calculated using different statistical or machine learning models. Currently all platforms rely on using the Naive Bayes approximation, that is based on Bayes theorem and a subsequent factorisation of the likelihoods *P(X | C)*, and which is well known for both its computational simplicity and transparency.

This Bayesian classifier-based approach is particularly suitable for modelling Complex Adaptive Systems. Firstly, because it is probabilistic in nature, with statistical inference models that accommodate uncertainty in a rigorous way; secondly, because it deals in a straightforward and efficient way with the enormous multi-factoriality of such systems, where the probability of a class of interest depends on a very large set of potential risk/niche factors, that range from the micro to the macro, and which span a large spectrum of scales and, consequently, scientific disciplines; finally, it makes manifest adaptation in the case that the relations between *C* and *Xi* can change over time.

As well as a “niche” perspective, the Chilam platforms also offer a network-based perspective in the form of Complex Inference Networks. In this case, the nodes of the network are either classes of interest, *C*, or niche/risk factors *Xi*. The networks links are both directed and weighted, being associated with a statistic that is a measure of the correlation between *C* and *Xi*. For instance, *P(C | Xi)* could be used as a link weight between two nodes *C* and *Xi*. In distinction to a niche-based perspective, here, the focus is on a “community” in ecological terms. Such a perspective allows for a more global analysis, whereby it can be observed that a certain class has certain risk factors (links) in common with another class. So, for example, some classes of interest may be hypertension, hyperglycaemia and hypertriglyceridaemia and obesity noted as a risk node that links to all three.