Computational analysis of dimer G6PD structure to elucidate pathogenicity of G6PD variants

An inherent genetic enzyme disorder in humans, known as glucose-6-phosphate dehydrogenase (G6PD) deficiency, arises due to specific mutations. While the prevailing approach for investigating G6PD variants involves biochemical analysis, the intricate structural details remain limited, impeding a comprehensive understanding of how different G6PD variants of varying classes impact their functionality. This study 22 examined the dynamic properties of G6PD wild types and six G6PD variants from 23 different classes using molecular dynamic simulation (MDS). The wild-type and variant 24 G6PD structures unveil high fluctuations within the amino acid range of 274–515, the structural NADP+ binding site, pivotal for enzyme dimerization. Specifically, two variants, G6PDZacatecas (R257L) and G6PDDurham (K238R), demonstrate compromised structural stability at the dimer interface, attributable to the disruption of a salt bridge involving Glu 206 and Lys 407, along with the disturbance of hydrogen bonds formed by Asp 421 at the βN-βN sheets. Consequently, this impairment cascades to affect the binding affinity of crucial interactions, such as Lys 171-Glucose-6-Phosphate (G6P) and Lys 171-catalytic NADP+, leading to diminished enzyme activity. This study underscores the utility of computational in silico techniques in predicting the structural alterations and flexibility of G6PD variants. This insight holds promise for guiding future endeavors in drug development targeted at mitigating the impacts of G6PD deficiency.


Introduction
G 6PD enzyme (E.C.1.1.1.49)plays a crucial role in the pentose phosphate pathway, which protects the red blood cells by maintaining NADPH levels against oxidative damage.G6PD deficiency causes hemolysis-prone red blood cells, resulting in a wide range of medical consequences [1,2].Patients with G6PD deficiency often suffer from acute hemolytic anemia, chronic non-spherocytic hemolytic anemia, favism, and neonatal jaundice [38].Additionally, it increases the likelihood of developing diabetes mellitus, hypertension, cancer, and cataracts [3].Clinical attributes are significantly more dominant in males, as they possess only one Xchromosome.Conversely, females who are heterozygous with one normal G6PD gene and one mutated G6PD gene on the other chromosome can undergo a 50 % deficiency effect [4].World Health Organization (WHO) classified G6PD variants into four classes based on median enzyme activity and hemolysis severity.Class A G6PD variants cause Chronic NonSpherocytic Hemolytic Anemia (CNSHA) with less than 20 % median G6PD activity.Class B has median G6PD activity of less than 45 % and causes acute hemolysis, while class C indicates no hemolysis with median G6PD activity between 60 % and 150 %.Finally, class U G6PD variants have uncertain clinical significance.According to WHO data, over 230 G6PD variants have been discovered in the past 36 years.
The active human G6PD enzyme is composed of tetramer and dimer forms [5].Each monomer has a substrate-binding site and two NADP þ coenzyme binding sites.There are two domains in each monomeric structure: the beaeb domain (N terminal) and bþa domain (C terminal).The beaeb domain includes amino acids from 1 to 200 with a catalytic NADP þ binding site shown in Fig. 1(a), while the bþa domain consists of amino acids from 201 to 515 with a Glucose-6-Phosphate (G6P) substrate binding site and a structural NADP þ binding site represented by an antiparallel nine-strand sheet [6].The b þ a domains at bN (415e423) -al (424e427) form the extensive dimer relation shown in Fig. 2(a).Leu 420, Asp 421, and Leu 422 located at bN are highly conserved in all species, indicating their importance for enzyme stability and activity.Structural NADP þ is crucial for maintaining dimer structure in higher organisms [3].These ligands influence the formation of active enzymes either in the form of dimer or tetramer [7].A high concentration of G6P can disrupt dimer formation, whereas the presence of NADP þ favors tetramer formation [8].
The dimer interface is stabilized by four salt bridges connecting Glu 206 to Lys 407 and Glu 419 to Arg 427, as indicated in Fig. 2(b), with only Lys 407 being wholly conserved.
The hydrogen bond between Asp 421-Asp 421 that connects the two bN strands of the b sheets is crucial for structural stability [9].Three conserved sequences have been identified as shown in Fig. 1(b): RIDHYLGKE, a nine-residue peptide at amino acid at position 198e206 [10], conserved Lys 205, which is critical for substrate binding and catalysis of the enzyme.The next GxxGGDLA at position 38e44 connects the catalytic NADP þ binding site.The final conserved sequence is EKPxG located at amino acid 170e174, where Lys 171 is required to properly position the G6P substrate and NADP þ coenzymes during the enzymatic process [10].
Published biochemical data reported that the catalytic efficiency and protein instability contribute to the clinical phenotypes in G6PD variants [11,12].Simulations have been carried out in the past on both the monomeric and dimeric forms of G6PD [13].However, the mechanism of mutation-induced structural stability and the physiological influence of the ligands in the G6PD enzyme structure remains unclear to date.Hence, this study focuses on constructing a complete structure of the G6PD dimer mutant by using in silico site-directed mutagenesis.Then, characterizes the structural changes through molecular dynamic simulation (MDS) analysis.The GROMACS program utilizes the structural model of G6PD wild type and variants bound with G6P substrate and NADP þ cofactors [14,15].Protein backbone Root Mean Square Deviation (RMSD), Carbon Alpha Root Mean Square Fluctuation (RMSF), Radius of Gyration (Rg), Solvent Accessibility Surface Area (SASA), intermolecular hydrogen bond of enzyme, and confirmational changes derived from MDS were used to analyze structural stability and flexibility.A total of six variants which includes G6PD Zacatecas (R257L), G6PD Durham (K238R), G6PD Valladolid (R136C), G6PD Mediterranean (S188F), G6PD Aþ (N126D) and G6PD Mexico City (R227Q) were analyzed in this study.

Variant selection
An extensive search was conducted in databases such as Science Direct and Web of Science using the

Site-directed mutagenesis
The structure of the human G6PD dimer enzyme with the ligands was prepared using Autodock software using two crystal structures from the PDB database (PDB entries 2BH9 and 2BHL) in the previous study.PyMOL software performed in silico site-directed mutagenesis on a complex G6PD enzyme dimer structure [23].The resulting structures were saved in PDB format and validated using the SPDB viewer tool.The mutated dimer structures were then loaded into the SPDB viewer to confirm the location of the mutated residue [24].

Molecular dynamic simulation (MDS)
The G6PD dimer structures were simulated for 100ns using GROMACS 4.6.7 package [25].Ligands were separated from the protein and saved as pdb files.Topologies for ligands and proteins were constructed using different servers.The GRO-MOS96 53a7 force field was used to build the topologies for the proteins via the pdb2gmx program.The ligand topologies were generated using the Automated Topology Builder (ATB) server [26].The ligand pdb and its files were saved in the directory to reconstruct the protein-ligand complex.
The protein, ligands, and minimized system were combined to create a complex structure.grofile.The complex protein structure was solvated in a cubic box with the SPC water model.To neutralize electric charges for simulation complexes, water molecules were replaced by Na þ and Cl À ions within the box [27].The system's energy was minimized, and output was checked using xmgrace software.The neutralized method was equilibrated for 50,000 steps using the NVT and NPT systems [27].Following this, a production MD stage was performed for 100 ns at a constant temperature of 300 K and pressure of 1.0 atm.
QTGRACE software.The structural configuration was analyzed using PyMOL software.

Mutation site
In this study, six G6PD variants were chosen from different G6PD classes namely G6PD Zacatecas (R257L), G6PD Durham (K238R), G6PD Valladolid (R136C), G6PD Mediterranean (S188F), G6PD Aþ (N126D) and G6PD Mexico City (R227Q) as shown in Fig. 3.All the variants have a noticeable single missense mutation which is prominent in different regions of the world, such as Europe (G6PD Valladolid and G6PD Mediterranean ), Africa (G6PD Aþ ), and America (G6PD Zacatecas , G6PD Durham , and G6PD Mexico City ) [30].Analysis of the structural changes resulting from each missense mutation were conducted by calculating the average distance and number of hydrogen bonds connecting the mutation site with its adjacent residues.
As shown in Fig. 3, the G6PD Zacatecas (R257L) mutation causes a change from polar Arg to nonpolar Leu at amino acid position 257 in a helical structure close to the G6P substrate at the ai loop.The mutation of Arg to Leu led to a loss of hydrogen bonds between the side chains Fig. 4 (a).The mutation also induces loss of salt bridge interactions between Lys 257 and Glu 473 in tetrameric form.Replacement of non-polar Leu exposes a hidden hydrophobic region on the enzyme, presumably resulting in loss of secondary structure [1].Mutation in G6PD Durham (K238R) demonstrated the replacement of polar Lys to the positive charged Arg residue at amino acid 238.This mutation led to a decrease in the distance by a factor of 0.6 Å and retained only one hydrogen bond between a-bG and bK at the dimer interface domain (refer Table 1).The G6PD Durham (K238R) mutation unequivocally diminishes the affinity of structural NADP þ .This is apparent from the absence of a crucial hydrogen bond between highly conserved amino acid 238 and structural NADP þ which is clearly illustrated in Fig. 4 (b).
G6PD Valladolid (R136C) represents substitution between two polar amino acids, Arg to Cys, at amino acid 136.The mutation is located at the highly conserved bD strand near catalytic NADP þ .This mutation retained two bonds from the wild type: Trp 164 and Arg 166 at ad-bE loop, with bond lengths of 2.2 Å and 1.9 Å, respectively.In contrast, a mutation in G6PD Mediterranean (S188F) showed a change of polar Ser to non-polar Phe on ae strand at amino acid 188.It is important to note that even though the mutation is not situated in the conserved region, any alteration in polarity could potentially interfere with the formation of the enzyme's secondary structure.In the wild type, Ser 188 makes two hydrogen bonds with Ser 184 in the same ae strand.However, substituting Phe causes the loss of a hydrogen bond and disrupts the formation of the secondary structure of the enzyme.
In G6PD MexicoCity (R227Q), polar Arg at position 227 changed to polar Gln at ag-bG loop and retained the two hydrogen bonds between Arg 348 and Asp 350 with reduced distance from wild type.The substitution of Gln leads to steric hindrance, probably affecting the structural conformation [31].It has been reported by G6PD Aþ (N126D) that a negatively charged Asp residue situated at position 126 (ac strand) has been replaced with a polar Asn residue.This variant has resulted in the reduction of the distance of the ac strand, thereby allowing it to retain a hydrogen bond with Asn 122.
G6PD Zacatecas , G6PD Durham , G6PD Valladolid , and G6PD Mediterranean exhibited reduced affinity based on the presence and proximity of intermolecular interactions with adjacent residues.In contrast, G6PD MexicoCity and G6PD Aþ displayed comparable affinity.

Trajectory analysis
Molecular Dynamic Simulation was performed for dimer wild type and six G6PD variants from different classes (see Table 1).The Gromos54a7 forcefield in the GROMACS package was used to simulate the structures for 100ns.The analysis of  2.
To assess the Ca backbone divergence between the variant protein structure and the wild type G6PD enzyme, the RMSD value was employed.A higher RMSD value unequivocally indicates lower protein structural stability throughout the simulation.Wild type G6PD stabilized after 60ns with an average RMSD value of 0.35 nm.All the average RMSD values of variants showed the least deviation compared to the wild type.However, G6PD Zacatecas (R257L), G6PD Durham (K238R), and G6PD Mediterranean (S188F) were observed to have unstable fluctuations throughout the simulation, as indicated in Fig. 5(a) and b).While G6PD Valladolid (R136C), G6PD Aþ (N126D) and G6PD Mexico City (R227Q) started to stabilize after 60ns.
The RMSF values of variants Ca backbone amino acid residues were computed and compared to the wild type structure.Any RMSF readings exceeding 0.05 nm (0.5 Å) from the wild type signify a substantial elevation in amino acid flexibility [19].A greater degree of flexibility is indicated by a higher RMSF value, while a low RMSF value suggests restricted movement compared to its average position throughout the simulation [32,33].The amino acid residues at the catalytic NADP þ binding site are represented by residue 27e169, the G6P substrate binding site composed of residues from position 170e273, while the structural NADP þ binding site is from residue 274e515, as indicated in Fig. 6.The average RMSF value of variant structures shows no drastic deviation.However, wild type and variants fluctuate more at the structural NADP þ site consisting of coils and b-sheets than at catalytic NADP þ and G6P substrate sites.The impact of intra-molecular hydrogen bonds on protein structure and dynamics is significant [34].Importantly, intermolecular hydrogen bonds have the potential to decrease protein fluctuations and enhance rigidity.From the intramolecular hydrogen bonds results G6PD Aþ have higher hydrogen bond with lower RMSF fluctuation.
Determining the structural compactness of a protein is dependent on the average Rg value [35].This value is known to decrease as the compactness of the protein increases, providing clear and indisputable evidence of its structural integrity.G6PD Mexico City (R227Q) showed a slight increase in the Rg value, indicating the mutation could affect the enzyme structure's compactness, as shown in ORIGINAL ARTICLE Fig. 7.There was no significant deviation in the structural compactness among other variants, with an average Rg value ranging from 0.70 nm to 0.71 nm.
The analysis of the accessible surface area for solvent molecules is performed using SASA results.A higher SASA value indicates a more exposed region for the solvent [36].Variant G6PD Mexico City (R227Q) showed a higher SASA value than the wild type, which denotes relative expansion of the structure, as indicated in Fig. 8.The SASA was lowered due to increased hydrophobic region exposure caused by the cysteine mutation in G6PD Valladolid .Moreover, it is evident from Fig. 9 that the intermolecular hydrogen bond in this specific variant has significantly increased.This is because the frequency of hydrogen-bonded side chains in proteins is inversely related to the solvent accessibility of their donors and acceptors.Thus, intramolecular hydrogen bonding favors buried and partially buried residues [37].The SASA value and intermolecular hydrogen bond for other variants fall within the range of 432 nm 2 e436 nm 2 and 777 to 787 bonds, respectively, with no noticeable alterations.

Dimer interface
The bN strands of two monomers form the dimer interface.This interface is crucial for enzyme stability and activity since the G6PD enzyme is in dimeretetramer equilibrium [3,12].To determine the existence of two hydrogen bonds between Asp421Asp421, which connecting the two bN strands, the computation of the distance between residues was performed.Two hydrogen bonds were present between Asp421-Asp421 at 100ns in the wild   type.However, other variants are unable to retain hydrogen bonds between Asp 421 as the distance between the residues is more than 3.3 Å, as indicated in Fig. 10.
There are four salt bridges present at the dimer interface.The establishment of contacts between two subunits of a G6PD dimer is accomplished through salt bridges between highly conserved residues Glu 206eLys 407 and Glu 419eArg 427.Table 3 shows that the only variant without the two salt bridges between Glu 206 and Lys 407 is G6PD Zacatecas (R257L).The reduced affinity of structural NADP þ in this variant affects the stability of the dimer interface by inducing the loss of the salt bridge.However, the two salt bridges between Glu 419 and Arg 427 were absent in the wild type and all the variants.

Affinity of ligands
The study compared the enzyme function of G6PD wild type and variants, analyzing the impact of molecular changes.The occupancy of ligand binding pockets compared with the enzyme affinity values derived from biochemical data of K m G6P and K m catalytic NADP þ [10,39].The enzyme's affinity for G6P and catalytic NADP þ varies across each variant, with different K m values listed in Table 4. Higher K m values indicate lower protein-ligand affinity based on biochemical data [40].As per the heatmap of ligand binding pockets occupancy, it is evident that the wild type structure in chain A exhibits a lower affinity for G6P.However, it is capable of forming robust hydrogen bonds with catalytic NADP þ .Chain B shows higher G6P affinity and is able to retain Lys 171, which is crucial for oxidation and catalysis of the G6P substrate.The catalytic NADP þ has a lower affinity in chain A and a higher affinity in chain B due to a hydrogen bond with Lys 171, as shown in Fig. 11.The wild type also showed significant occupancy at the structural NADP þ binding site, which accounts for its robust structural integrity at the dimer.
G6PD Durham (K238R), G6PD Valladolid (R136C), G6PD Mexicocity , and G6PD Aþ show lower affinity towards G6P chain A and higher affinity for chain B which is similar to wild type.However, G6PD Zaca- tecas (R257L) and G6PDMediterrenean (S188F) lost affinity towards G6P on chain A and lower affinity at chain B. Results of G6PD Zacatecas (R257L) correlate with higher G6P K m value reported in biochemical data.
Moreover, G6PD Durham (K238R), G6PD Zacatecas (R257L), G6PD Valladolid (R136C), and G6PD Mexicocity (R227Q) show similar ligand affinity as wild type for catalytic NADP þ .G6PD Mediterrenean (S188F) lost its affinity for catalytic NADP þ in chain B but exhibited higher affinity in chain A. The affinity of both chains of G6PD Aþ (N126D) for catalytic NADP þ is notably lower.This correlation is consistent with the reported catalytic NADP þ K m results.G6PD Zacatecas (R257L), G6PD Durham (K238R), G6PD Valladolid (R136C) have lower affinity towards structural NADP þ .While G6PD Mediterrenean (S188F), G6PD Mexicocity (R227Q), and G6PD Aþ (N126D) lost affinity for structural NADP þ .bEeaeloop serves a vital role in G6PD catalysis with the presence of critical residues Lys 171 involved in substrate oxidation and catalysis (3).From the results reported in Fig. 11 G6PD Zacatecas (R257L) and G6PD Durham (K238R) are unable to retain G6P-Lys 171 hydrogen bond.While the catalytic NADP þ -Lys 171 bond absent in G6PD Durham (K238R), G6PD Valladolid (R136C) and G6PD Aþ (N126D).G6PD Zacatecas (R257L) and G6PD Durham (K238R) revealed loss of Lys 171 hydrogen bond either with G6P or catalytic NADP þ .Additionally, the results showed that G6PD variant structures with inadequate structural integrity at dimer interfaces exhibit high enzyme activity.The unstable dimer interface for G6PD Zacatecas (R257L), which is revealed via loss of salt bridges and Asp421-Asp421 hydrogen bond, could be the reason for unable to retain both Lys 171-G6P and Lys 171-catalytic NADP þ hydrogen bond via the bEeae loop.This could be the root cause of their diminished enzyme activity.Furthermore, the loss of hydrogen bond between highly conserved amino acid 238 with structural NADP þ and Asp421-Asp421 attributed to the reduced enzyme activity in G6PD Durham (K238R).This is supported by the evidence of loss of Lys 171-G6P and Lys 171-catalytic NADP þ hydrogen bonds.

Conclusion
In summary, a complete dimeric structure with structural NADP þ , catalytic NADP þ, and G6P is crucial to understand protein dimerization which reveal the structural stability and flexibility of the G6PD enzyme.This study stands as the pioneer in simulating a selected G6PD dimer variants with ligands and verifying the results with established biochemical data.The unstable dimer interface is shared among all variants, even though each of them demonstrates unique structural alterations as demonstrated by MD simulations.The dimer interface's integrity is crucial in determining stable G6PD enzyme activity.Loss of salt bridges and hydrogen bond eventually impact the binding affinity of conserved amino acid residues, which ultimately leads to a significant reduction in G6PD enzyme activity.In silico computational techniques can be employed to precisely forecast the stabilization of the dimer interface in chosen G6PD variants in developing new drugs.

Conflict of interest
I certify that all my affiliations with or financial involvement in, within the past 5 years and foreseeable future, any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript are completely disclosed.

Fig. 1 .
Fig. 1. a) Human native G6PD enzyme structure, showing the ligand (sticks) in the monomer.b) Location of conserved amino acids sequence (orange cartoon) in the monomer G6PD enzyme based on multiple sequence alignment.The figure was prepared using PyMOL software and adapted from PDB entries 2BHL and 2BH9.

Fig. 2 .
Fig. 2. a) Visual representation of G6PD dimer structure with dimer interface bN (415e423) -al (424e427).b) Human native G6PD enzyme's crystallographic structure shows four salt bridges forming amino acid residues (pink stick represents chain A; yellow stick represents chain B) at the dimerization site.The Cyan strand represent conserved region Leu 420, Asp 421, and Leu 422 with stick representation of Asp 421-Asp 421 hydrogen bonds at the dimer interface.The figure was prepared using PyMOL software (Adapted from PDB entries 2BHL and 2BH9).

Fig. 6 .
Fig. 6.RMS Fluctuation.The root mean square fluctuation (RMSF) of the Ca backbone of wild type G6PD at 300K at 100ns.

Table 3 .
Distance of salt bridge residues for G6PD dimer wild type and variant structures.

Table 2 .
Average trajectory analysis values for the wild type and variant structures of G6PD dimer.

Table 1 .
Comparison of the intermolecular interactions between the mutation site and neighbouring residues for the wild type and variants.