Forensic Population Genetics – Letter to the Editor Analysis of 17 STR data on 5362 southern Portuguese individuals – an update on reference database

ABSTRACT


Dear Editor,
As we all know, using proper allele frequencies and statistical data representative of the population in study is of great importance in forensic casework.In order to obtain this information, populations of interest are studied and their data published so that the scientific community can access and use that information when necessary; this journal being a good source of such kinds of work.However, over time, population parameters can undergo certain deviations from their original state.Studies that were representative of the time in which they were conducted, rather like a photograph taken at that particular time, no longer accurately reflect the present conditions.Genetic variation occurs due to phenomena such as migration or genetic drift, both of which can affect population statistical parameters, making it imperative to keep databases updated as much as possible.
The last study on AmpFlSTR Identifiler ® (Applied Biosystems) and Powerplex ®     16 System (Promega Corporation) in the southern Portuguese population was published in 2006 [1].Since these, and similar kits are still the main ones used by this laboratory in both paternity testing and forensic casework, it was necessary to update our reference database.
In order to do that, we performed a retrospective study using the genotypic data of 5362 unrelated, Caucasian, southern Portuguese individuals involved in paternity testing casework from 2005 to 2014.All of these individuals gave their informed consent at the time that they were sampled for blood and saliva.Blood samples were analyzed with AmpFlSTR Identifiler ® or AmpFlSTR Identifiler ® Plus, and buccal swabs were analyzed with Powerplex ® 16 or Powerplex ® 16     HS System over the ten year period.During laboratory analysis, internal procedures consisting of half volume modified manufacturer's instructions, followed by capillary electrophoresis in Applied Biosystems 3130xl or 3130 Genetic Analyzers were used.These kits, combined, included 17 autosomal STR loci: CSF1PO, D2S1338, D3S1358, D5S818, D7S820, D8S1179, D13S317, D16S539, D18S51, D19S433, D21S11, FGA, TH01, TPOX, vWA, Penta D and Penta E. All genotypic information was then collected from the .fstfiles stored in our genetic analyzer backup systems, and visually confirmed with the electrophoregrams obtained at the time they were first generated for paternity testing purposes.The information of both kits was combined to obtain the complete 17 loci genetic information that was then compared to the information present in the reports.Following this, the data was filtered in order to extract only the pertinent individuals -Caucasian unrelated Portuguese -and was anonymized for the present study.Allele frequencies, observed heterozygosity (Ho), expected heterozygosity (He), Hardy-Weinberg equilibrium (HWE) and Linkage Disequilibrium (LD) were estimated using Arlequin v3.5.1.2software [2].The calculations of population pairwise genetic distances (F ST ) between the contemporary southern Portuguese population and the others were also performed with Arlequin v3.5.[2].The populations compared in this study were the northern Portuguese [3] and central Portuguese [4] and also the 2006 southern Portuguese, [1] alongside those from Spain [5], Italy [6], Greece [7], Romania [8], Morocco [9], Angola [10] and Korea [11].With Arlequin's results a phylogram was constructed using Molecular Evolutionary Genetics Analysis v.6.06 software [12] applying a neighbour-joining methodology.Statistical parameters of forensic interest such as power of discrimination (PD), power of exclusion (PE), polymorphic information content (PIC), typical paternity index (TPI), and matching probability (MP) were calculated with PowerStats v1.2 [13] spreadsheet, modified by the authors in order to support and manage the large amount of samples.Minimum allele frequencies (MAF) were calculated as 5/2N.Allele frequencies along with statistical and forensic parameters are presented in Supplementary Table I.The average level of genetic diversity (He) was 0.801 and the most variable loci were: D18S51, Penta E, FGA and D2S1338 with 16 or more alleles each, and a He of over 85%.TPOX proved to be the least polymorphic marker and Penta E the most.Power of discrimination (PD) ranged from 0.823 (TPOX) to 0.977 (Penta E), with the combined PD equal to 0.999999999999999999997.Power of exclusion ranged from 0.344 (TPOX) to 0.756 (Penta E), with the combined PE equal to 0.99999989.MAF values varied between 0.000466 and 0.000473, depending on loci.There were four loci with statistically significant deviations from HWE (p > 0.05).After applying the Bonferroni's correction (p > 0.0029) there are still three loci with statistically significant deviations from HWE (D18S51, Penta D and TPOX).The most problematic marker was TPOX which had a p value of zero.This is the least polymorphic locus with the highest frequency of homozygosity in our study, resulting in a significant HWE deviation, probably caused by their heterozygote deficiency.From all the CODIS markers, TPOX shows the least variation between individuals [14].The same happens here to this marker.Although, p values of zero indicate deviations from HWE, these occur because we are not dealing with perfect populations subject to Hardy-Weinberg principles.Because of this, some studies can be found where this kind of observation occurs [15,16], some in a large number of loci [17].In fact, in the three affected markers the Ho was lower than the He, which means theoretically, we would expect a higher heterozygosity, which was not the case.In nature, phenomena like inbreeding, for example, may cause deviations from HWE due to the decrease of random mating, increasing homozygosity.LD was evaluated using shuffling test for all possible combinations between loci.Twenty pairs of loci presented significant LD (p<0.05)among 136 pairwise comparisons.After applying Bonferroni correction only two pairs of loci exhibited significant LD: the pair Penta E and Penta D and the pair D2S1338 and D19S433, both with p=0.00000.Because these are not in the same chromosomes, these loci have been considered genetically unlinked.Therefore, all 17 markers could be treated as independent loci at the population level, which means no LD was detected in the studied loci in the southern Portuguese population.

Fig. 1
Fig. 1 in phylogram form.As expected, the population from Korea and Angola

Fig. 1 .
Fig. 1.Neighbor-joining phylogram for the 17 combined Identifil Click here to download high resolution image Portugal (southern, northern and central, as well as those from the 2006 southern Portuguese study), together with Spain, Italy, Greece, Romania, Morocco, Angola, and Korea are presented in Supplementary Table II and displayed in Supplementary TableI-Allele frequencies and statistical and forensic parameters for the 17 STR loci in 5362