Skip to main content

Table 1 How malaria datasets are simulated

From: Markov chain Monte Carlo and expectation maximization approaches for estimation of haplotype frequencies for multiply infected human blood samples

Patient #

MOI

BIOMASS

f.BIOMASS

msp1

msp2

ta109

Haplotype

Observed MOI

Observed genotype

1

1

5.29E+10

1.000

10

34

3

112

1

112

2

3

8.06E+09

0.100

24

23

5

112

1*

111*

  

6.48E+10

0.803

20

6

5

111

  
  

7.86E+09

0.097

16

27

5

112

  

3

2

5.06E+10

0.474

24

35

3

111

2

111

  

5.62E+10

0.526

1

34

4

111

  

4

2

5.52E+10

0.487

21

34

4

122

2

133

  

5.81E+10

0.513

18

33

4

111

  

5

3

3.16E+10

0.432

23

32

9

111

2*

133*

  

1.35E+09

0.018

21

28

7

112

  
  

4.03E+10

0.550

23

27

9

122

  
  1. The ‘population’ frequencies of different MOI classes, polymorphic markers (msp1, msp2, ta109) and resistance haplotypes in the local malaria population are first defined. A number of patients are then simulated, five in this case but more usually 100. For each patient a MOI is first sampled according to the local “population” frequencies (which will depend on local transmission intensity). This MOI then determines the number of malaria clones in the patient. These clones are then simulated. The first step is to assign a biomass to the clone. The clone polymorphic markers are assigned at random according to the local true frequencies. Finally a resistance haplotype is assigned to the clone, again sampled from the local true frequencies. This process is repeated for each clone in each patient and gives rise to the data given in black font below. The genetic signal observed in each patient (last two columns) is then calculated as described in the main text. In this example, genetic signals are not detected if they constitute ≤10 % of the biomass (f.BIOMASS gives relative biomass for each clone in a patient). What is actually observed, and available for analysis, is the information given in italics; genotyping limits produce errors and those erroneous data are indicated by a asterisk: they are the data available to the researcher but do not truly reflect the genetic data of the parasites in that patient
  2. Haplotype is the resistance haplotype for each clone. It is defined at three SNPs, for each clone: 1 = wildtype, 2 = mutat. Observed genotype is observed genotype for each patient. It is defined at three SNPs; for each SNP: 1 = wildtype alone, 2 = mutant alone, 3 = both wildtype and mutant genetic signals observed in the blood sample