The emerging role of GATA transcription factors in development and disease

The GATA family of transcription factors consists of six proteins (GATA1-6) which are involved in a variety of physiological and pathological processes. GATA1/2/3 are required for differentiation of mesoderm and ectoderm-derived tissues, including the haematopoietic and central nervous system. GATA4/5/6 are implicated in development and differentiation of endoderm- and mesoderm-derived tissues such as induction of differentiation of embryonic stem cells, cardiovascular embryogenesis and guidance of epithelial cell differentiation in the adult.

The importance of GATA factors for development is illustrated by the embryonic lethality of most single GATA knockout mice. Moreover, GATA gene mutations have been described in relation to several human diseases, such as hypoparathyroidism, sensorineural deafness and renal insufficiency (HDR) syndrome, congenital heart diseases (CHDs) and cancer. GATA family members are emerging as potential biomarkers, for instance for the risk prediction of developing acute megalokaryblastic leakemia in Down syndrome and for the detection of colorectal-and breast cancer.
The origin and molecular structure of the GATA family In vertebrates, six GATA transcription factors have been identified. Based on phylogenetic analysis and tissue expression profiles, the GATA family can be divided into two subfamilies, GATA1/2/3 and GATA4/5/6 (Ref. 1). Although in non-vertebrates GATA genes are linked together onto chromosomes, in humans they are segregated onto six distinct chromosomal regions (Table 1), indicating segregation during evolution (Ref. 2). Most GATA genes encode for several transcripts and protein isoforms. GATA proteins have two zinc finger DNA binding domains, Cys-X 2 -C-X 17 -Cys-X 2 -Cys (ZNI and ZNII), which recognise the sequences (A/T)GATA(A/G) ( Tissue-specific roles of GATA factors in development and disease Haematopoietic system GATA1/2/3 knockout mice die at the embryonic stage due to haematological abnormalities (Table 2), indicating a pivotal role of these transcription factors in haematopoietic development (Ref. 1). GATA1, the first recognised member of the GATA family, is specifically expressed during haematopoietic development of erythroid, and megakaryocytic cell lineages ( Fig. 2) (Ref. 11). Loss of GATA1 in mouse embryo-derived stem cells results in a complete lack of primitive erythroid precursor production (Ref. 5). Definitive erythroid precursors, on the other hand, are normally produced, but undergo a maturation arrest at the proerythroblast stage followed by apoptosis (Ref. 12). Ablation of GATA1 in adult mice also results in a maturation arrest at the same proerythroblast stage (Ref. 13). The requirement of the different GATA1 functional domains during primitive and definitive erythropoiesis has been investigated in vivo, showing that both zinc fingers are needed to rescue GATA1 germline mutant mice (Ref. 14). In haematopoietic stem cells (HSCs), GATA1 gene expression is suppressed, which is indispensable for the maintenance of these stem cells. The mechanism behind this suppression is not fully understood yet. Recently, it was shown that decreased DNA methylation of the GATA1 locus leads to increased GATA2 binding and that increased GATA2 binding results in GATA1 gene transactivation. According to these study results, Takai et al. proposed a mechanism in which GATA1 hypomethylation results in an accessible locus for GATA2 binding which enables transactivation of GATA1 gene expression to initiate erythropoiesis in megakaryo-erythroid progenitors (Ref. 15). Loss of GATA1 results in a marked increase of GATA2 expression, indicating not only that GATA2 partially compensates for GATA1 but also that GATA1 suppresses GATA2 transcription during normal erythropoiesis (Ref. 16). This suppression is mediated by the displacement of GATA2 from its upstream enhancer by increasing levels of GATA1 referred to as the 'GATA switch' (Ref. 17). The combined loss of GATA1 and GATA2 in double-knockout embryos leads to an almost complete absence of primitive erythroid cells, suggesting functional overlap between these transcription factors early in the primitive erythropoiesis (Ref. 18).
Requirement of functional GATA1 for haematopoiesis is also observed in several human diseases, such as anaemia, leukaemia and thrombocytopenia (Table 3). Splice site mutations of GATA1 have been found in a family with macrocytic anaemia and in patients with Diamond-Blackfan anaemia (an anaemia characterised by a selective hypoplasia of erythroid cells), resulting in impaired production of the full-length form of the GATA1 protein (Refs 19,20).
Conditional megakaryocytic lineage specific GATA1 knockout mice show excessive marrow megakaryocyte proliferation whereas the platelet numbers are decreased. The maturation of these hyperproliferated megakaryocytes is severely impaired and the produced platelets are structurally and functionally abnormal (Ref. 21). Additionally, megakaryocyte-expressed genes with functional GATA1-binding sites (e.g. STAT1) are downregulated in GATA1 −/− megakaryocytes (Ref. 22). Loss of GATA1 leads to overexpression of GATA2 in megakaryocytes. However GATA1-deficient megakaryocytes still show abnormal megakaryocytic proliferation and differentiation, establishing no functional redundancy of these transcription factors in megakaryopoiesis (Ref. 23). In contrast to erythropoiesis, GATA2 remains to be expressed after the GATA switch in late megakaryopoiesis, suggesting a divergent function for both GATA proteins (Ref. 24).
Children with trisomy 21 are at risk of developing leukaemia, in particular acute megakaryoblastic leukaemia (AMKL). Nearly all Down syndrome patients with AMKL harbour somatic mutations in the GATA1 gene (   GATA1 mutations are also detected in a specific form of X-linked hereditary thrombocytopenia and are described with and without thalassemia (Table 3 and  Supplemental Table 1). Hereditary thrombocytopenia without thalassemia has been associated with GATA1 missense mutations that are located in the N-terminal zinc finger region. These mutations lead to loss or inhibition of GATA1 interaction with friend-of-GATA(FOG)1-cofactor (Ref. 39). The degree of disrupted GATA1-FOG1 interaction depends on the mutation, explaining different clinical presentations (Ref. 40). The only GATA1 mutation reported in hereditary X-linked thrombocytopenia with thalassemia is the missense mutation R216Q which is located in the DNA binding surface of the GATA1 N-terminal zinc finger and results in reduced DNA binding rather than affecting GATA1-FOG1 interaction (Ref. 41).
In vertebrates, GATA2 is expressed in haematopoietic progenitor cells (HPCs), early erythroid cells, mast cells and megakaryocytes, closely resembling the cellular distribution of GATA1 (Fig. 2). A deficit in primitive erythropoiesis is apparent in GATA2 −/− mice since the total number of blood cells during embryonic development is markedly reduced, leading to lethality because of severe anaemia (Table 2) (Ref. 6). In GATA2 +/− mice haematopoietic defects are seen within HSCs and granulocyte-macrophage progenitor cells. Moreover, the loss of GATA2 in adult mice leads to profound abnormalities in definitive haematopoiesis, also directing to a defect at the level of HSCs ( Refs 6,42,43). The function of GATA2 in haematopoietic development has recently been reviewed by Bresnick et al. (Ref. 44), describing GATA2 as one of the key components establishing the transcriptional program for early haematopoietic development.
Two different GATA2 alterations have been reported in patients with chronic myeloid leukemia (CML) during blast crisis formation (Table 3). In contrast to the in-frame deletion Δ341-346, which leads to decreased transcriptional activation, GATA2 L359V is a gain-of-function mutation and leads to increased DNA binding. Transduction of GATA2 L359V (in vitro and in vivo) resulted in disturbed myelomonocytic differentiation/proliferation, suggesting GATA2 mutations are involved in the acute myeloid transformation of CML (Ref. 45).
GATA2 gene mutations that predisposed to myelodysplastic syndrome (MDS) and acute myeloid leukaemia (AML) were reported (Supplemental Table 1). This occurred either in the absence (non-syndromic) or presence of certain syndromes, including Emberger syndrome and monoMAC syndrome  Similar expression patterns of GATA1, GATA2 and GATA3 in human, murine and avian erythroid cells indicate a conserved role for these GATA transcription factors in vertebrate erythropoiesis (Ref. 48). Beyond its expression in erythroid lineages, GATA3 is also expressed in T lymphocytes (Ref. 49). During haematopoiesis vertebrate GATA3 is expressed in HSCs and in developing T lymphocytes. Murine GATA3 −/− embryos are predominantly affected during definitive haematopoiesis in the fetal liver. Although later than GATA2 −/− mice, these embryos appear also anaemic and die in utero, probably owing to massive internal bleeding (  In T cell development, GATA3 has a pivotal role from the generation of early T lineage progenitors to CD4 + specification [as reviewed in (Ref. 52)]. During antigen presentation by specialised antigen-presenting cells, the TCR is stimulated, thereby driving differentiation from peripheral naïve CD4 + T cells towards T helper cell type 1(T H 1) or 2 (T H 2). GATA3 expression in differentiating T H 2 cells is mediated by different pathways as clearly reviewed in Ho et al. (Ref. 53). GATA3 and STAT6 in T H 2 lineage account for lineage specific expression of T cell lincRNAs. At the moment, the function of lincRNAs during T cell development and differentiation is under investigation (Ref. 54). An essential function for GATA3 beyond T H 2 differentiation is also described demonstrating GATA3 controls proliferation and maintenance of mature T cells (Ref. 55).
GATA3 dysregulation is described in leukaemia. Together with T-cell acute lymphocytic leukemia 1 (TAL1) and RUNX1, GATA3 forms an autoregulatory loop that positively regulates the v-myb avian myeloblastosis viral oncogene (MYB) oncogene, which in turn controls the gene expression program in T-cell acute lymphoblastic leukaemia (T-ALL) (Ref. 56). Thereby, whole-genome sequencing of patients with early T-cell precursor ALL, an aggressive subtype of T-ALL, revealed GATA3 inactivating mutations (Supplemental Table 1) (Ref. 57).
In summary, GATA1/2/3 are essential regulators in the development of erythroid and megakaryocytic cell lineages and in the molecular pathogenesis of different haematopoietic diseases.

Cardiovascular system
The mesoderm gives rise to numerous organs, including the heart and genitourinary tract. GATA4/5/6 proteins are expressed in the mesodermal precursors that develop into the heart (Ref. 58).
GATA4 is one of the earliest transcription factors expressed in developing cardiac cells, already detectable in murine precardiac splanchnic mesoderm and associated endoderm (Ref. 8). GATA4 −/− mice display severe defects in ventral foregut closure and heart morphogenesis, resulting in embryonic lethality at embryonic day 8 ( A number of mutations have been described for GATA6 in the aetiology of CHD (Table 3;  Supplemental Table 1). For example, two GATA6 mutations were found in patients with PTA disrupting the transcriptional activity of the GATA6 protein on downstream genes involved in the development of the cardiac outflow tract (Ref. 71).
Thus, the GATA4/5/6 transcription factors have closely related functions during cardiovascular development, and defects lead to CHD and other heart conditions.

Gastrointestinal tract
The endoderm gives rise to the respiratory and gastrointestinal tract as well as the associated organs such Whereas GATA4 expression is absent from the distal ileum, GATA6 is expressed throughout the entire small intestine. Conditional deletion of GATA6 in the ileum results in a decrease of crypt cell proliferation and numbers of enteroendocrine and Paneth cells, an increase in numbers of goblet-like cells in crypts and altered expression of genes specific to absorptive enterocytes. GATA4/6 factors are therefore required for proliferation, differentiation and gene expression in the small intestine (Ref. 77).
In humans, GATA4 and GATA5 are expressed in normal gastric and colon mucosa (Refs 78, 79). In gastric and colorectal cancer (CRC) these genes are frequently transcriptionally silenced by methylation ( Refs 80,78). In addition, we reported that GATA4 and GATA5 exhibit tumour suppressive properties in human CRC cells in vitro (Ref. 80). The potential biomarker capacities of GATA4 are discussed below.

Liver and pancreas
In the mouse, the ventral foregut endoderm differentiates to form the parenchymal components of the liver and ventral pancreas. Although GATA4 has an essential function in embryonic liver development, the protein seems to be dispensable in the adult liver function  Table 1). In addition, these patients suffered from CHD, biliary tract abnormalities, gut developmental disorders, neurocognitive abnormalities and other endocrine abnormalities (Ref. 85). In contrast to these results, Martinelli et al. described that GATA6 is dispendable for pancreas development. However, GATA6 is essential for acinar differentiation and maintenance of adult exocrine homeostasis in mice (Ref. 86). An explanation for this contradiction might be the timepoint of GATA6 inactivation which is earlier in agenesis patients compared with the mouse model used by Martinelli et al. Together these data show the need for further research to unravel the role of GATA6 in pancreatic development.
In pancreatic cancer, GATA6 is often overexpressed, which correlates with GATA6 amplification ( Urogenital tract and kidney GATA1 is abundantly expressed in the Sertoli cells of the testis during murine prepubertal testis development (Fig. 2). GATA1 expression decreases thereafter and is in the adult mouse testis only found in the Sertoli cells during different stages of the spermatogenesis (Ref. 93). Surprisingly, Sertoli-specific GATA1 knockout mice show no alterations in testis development, spermatogenesis, male fertility and expression of putative testis-specific GATA1 target genes (Ref. 94). Further research has to clarify whether there is a functional redundancy between GATA factors in the testis.
During urogenital development, GATA4 is expressed in somatic ovarian and testicular cell lineages, and is suggested to have an important regulatory role in gonadal gene expression (Fig. 2)  The GATA4 gene has also been implicated in a disorder of sex development (DSD). A GATA4 mutation, which abrogates the binding with FOG2, was discovered in a family with both CHD and 46,XY DSD (Table 3) (Ref. 100). The phenotype closely resembles that of the mouse GATA4 ki/ki model (Ref. 97). The data described above indicate that GATA4, in combination with FOG2, is necessary for proper mammalian sex differentiation.
Murine GATA5 is expressed in the urogenital ridge during foetal development (Ref. 63). GATA5 −/− female mice exhibit abnormalities of the genitourinary tract including malpositioning of the urogenital sinus, vagina and urethra, whereas males are unaffected ( Table 2). These defects suggest that early morphogenic movements in the lower genitourinary tract are disrupted in the absence of GATA5. GATA5 and GATA6 are coexpressed in the developing urogenital ridge but do not seem to have entirely overlapping functions during development of the female genitourinary system (Ref. 9).
GATA6 is expressed during both testicular and ovarian fetal development (Fig. 2) (Ref. 63). In the developing gonads, GATA4 and GATA6 have overlapping, but distinct expression patterns, which suggest different roles for these transcription factors. In addition, it is also possible that these factors complement each other's functions because GATA4 and GATA6 are expressed in similar cell types in the testis and ovary (Refs 101, 102).
Loss of GATA6 expression has been found in ovarian cancer and has been associated with hypoacetylation of histones H3 and H4 and loss of H3K4me3 at the promoter region (Ref. 90). Downregulation of GATA6 expression results in nuclear deformation and aneuploidy of ovarian surface epithelial cells (Ref. 103). In contrast to other cancers, these data indicate a tumour suppressor role for GATA6 in ovarian cancer. Tumour suppressing activities are also suggested for GATA4 and GATA5 whereas introduction of these genes into ovarian tumour cell lines greatly inhibits cell growth and survival (Ref. 104).
During pronephros formation human GATA3 expression is already detected in the nephric duct ( Fig. 2) (Ref. 105). Subsequently, ureter tips and the collecting duct system of the metanephros are formed, which both show GATA3 expression (Ref. 106). Inactivation of the murine GATA3 locus results in a morphologically abnormal nephric duct with an aberrant elongation path, loss of ureteric bud and a severe growth disturbance of de mesonephros due to the disturbance of a regulatory cascade consisting of GATA3 with β-catenin as upstream regulator and Ret as downstream target (Ref. 107).
In humans, GATA3 haploinsufficiency leads to the HDR syndrome, a rare and complex disease characterised by the combination of HDR, associated with GATA3 mutations (Table 3, Supplemental Table 1) (Ref. 108). The majority of these mutations leads to loss of DNA binding caused by a disrupted ZnF2, or altered FOG2 interaction and/or DNA binding affinity by a disrupted ZnF1 (Table 3). Most of the HDR probands without GATA3 mutations do not have renal abnormalities and no GATA3 mutations are found in patients with isolated hypoparathyreoidism (Ref. 109). This suggests that GATA3 mutations are highly penetrant and result in the HDR phenotype. In addition, GATA3 +/− mice show small size parathyroids resulting in failure to correct hypocalcaemia similar to HDR patients (Ref. 110). When GATA3 is specifically deleted in the developing inner ear, defective formation of the cochlear prosensory domain and loss of spiral ganglion neurons is shown (Ref. 111). However, the exact mechanisms leading to the HDR phenotype remain to be elucidated.

Respiratory tract
The mammalian lung develops from budding of the foregut endoderm, in which both GATA4 and GATA6 are expressed. In vitro analysis of lung development from GATA4 ki/ki mice show abnormal lobar development, revealing GATA4 as a candidate for FOG2-mediated early pulmonary development (Ref. 112). GATA6-regulated Wnt signalling controls the balance between bronchioalveolar stem cell expansion and epithelial differentiation required for both lung development and regeneration after lung injury (Ref. 113).
However, data about defects in GATA factors in lung diseases are scarce. Recently, GATA2 requirement for oncogenic Kras-driven lung tumorigenenis was reported. Moreover, inhibition of GATA2 regulated pathways in mice with KRAS mutant non-small cell lung cancer results in tumour regression (Ref. 114). Finally, a lung cancer susceptibility locus downstream of GATA3 was identified (Ref. 115).

Mammary gland
Using GATA3/LacZ knock-in mice, GATA3 expression is observed at the earliest stages of embryonic mammary development (Fig. 2). During puberty GATA3 is expressed in the terminal-end buds and within the adult mammary gland only in luminal epithelial cells. Targeted GATA3 deletion at different stages of the embryonic mammary development showed loss or absence of mammary primordia and nipples (Ref. 116). Postnatal GATA3 deletion resulted in loss of mammary gland development, and diminished expression of luminal differentiation markers, which indicates an important role of GATA3 in the luminal epithelium (Refs 116, 117). Loss of the oestrogen receptor α (ERα) expression is observed in both GATA3 knock-out mice and FOG-2 knock-out mice (Ref. 117). Involvement of GATA3 and ERα in a positive cross-regulatory loop, which has been shown in breast cancer, may be an explanation for these phenomena (Ref. 118). Collectively, these data show that GATA3 is essential during embryonic development The crucial role of GATA3 in the mammary gland is further demonstrated by the observation of GATA3 mutations in ∼10% of human breast cancers. The spectrum of somatic mutations is diverse and cluster predominantly in the vicinity of the highly conserved C-terminal second zinc-finger (Table 3; Supplemental  Table 1 GATA4 is expressed in the embryonic and adult CNS and acts as a negative regulator of astrocyte proliferation and growth (Fig. 2) (Ref. 134). In the adult mouse and human, GATA6 is expressed in neurons, astrocytes, choroids plexus epithelium and endothelial cells (Fig. 2) (Ref. 135).
Loss of expression of GATA4 and GATA6 occurs in glioblastoma multiforme (GBM). Both GATA4/6 gene promoters were found to be methylated and for GATA4 also somatic mutations were found (Refs 136,137). Limited evidence indicates that GATA4 regulates apoptosis-related genes in cultured GBM cell lines (Ref. 136). GATA6 was identified in a mouse astrocytoma model as a novel tumour suppressor gene. Knockdown of GATA6 expression in RasV12 or p53 −/− astrocytes led to acceleration of tumourigenesis. Mutations of GATA6 occur during malignant progression of murine and human astrocytomas (Ref. 135).

Regulation of GATA genes and proteins in disease
Although mainly GATA gene mutations have been described above, chromosomal alterations as well as regulation of GATA genes and proteins on transcriptional and post-transcriptional levels can also contribute to disease development.
Recently it has been shown that combined tet methylcytosine dioxygenase 2 (TET2) and fms related tyrosine kinase 3 (FLT3) mutations regulate epigenetic silencing of GATA2 by promotor hypermethylation in human AML (Ref. 138). In clear cell renal cell carcinomas downregulation of GATA3 expression by promoter hypermethylation results in decreased expression of TbetaRIII, a protein with tumour suppressor features, during disease progression (Ref. 139). Presence of suppressive histone (H3K27) trimethylation of GATA3 together with absence of the GATA3 protein in anaplastic large cell lymphoma implicates epigenetical contribution in the pathogenesis of this disease (Ref. 140). Clues about the transcriptional regulation of the GATA4 and GATA6 genes come from a SUMO-specific protease 2 (SENP2) knockout model. These mice have reduced expression of GATA4 and GATA6 and defects in the embryonic heart. In SENP2 deficient embryos sumoylation of CBX4, accumulates and occupies the promoters of GATA4 and GATA6, thereby leading to transcriptional repression (Ref. 141).
GATA4 is located at chromosome 8p, a chromosomal locus frequently deleted in multiple tumour types such as colorectal and oesophageal cancer (Refs 142,143). Alternatively GATA4 can be downregulated via epigenetic silencing, such as hypoacetylation of histones H3 and H4 (Ref. 90) and promoter CpG island hypermethylation, which has been observed in colorectal, gastric, oesophageal, lung, ovarian and HPV-driven oropharyngeal cancer, in GBM and in diffuse large B-cell lymphoma ( Refs 80,78,88,89,104,136,144,145). In contrast, GATA4 amplification is recently described in certain gastric cancer which indicates a more oncogenic function (Ref. 92). Further studies are needed to unravel the molecular mechanisms of GATA4 amplified in comparison with GATA4 methylated gastric cancers.
GATA5 is located at chromosome 20q13, a locus which is often amplified and methylated in multiple cancer types. No coding sequence mutations in GATA4 and GATA5 have been described so far in colorectal-and breast cancer ( Refs 146,147). However, promoter methylation of GATA5 might be established in order to downregulate increased gene expression imposed by amplification. Identified post-transcriptional modifications on GATA proteins include acetylation, phosphorylation and methylation (Fig. 1). Protein stability of GATA2 and GATA3 is regulated by phosphorylation and ubiquitilation. Phosphorylation of GATA3 by respectively Cyclin-dependent kinase 1 (CDK1) and CDK2 was required for F-box/ WD repeat-containing protein 7 (Fbw)-7 mediated ubiquitilation and degradation and contributed to precise differentiation of HSCs and T-cell lineages (Refs 148,149). How GATA acetylation influences transcriptional processes has been investigated for GATA1. It turns out that bromodomain protein Brd3 binds to acetylated GATA1 to regulate the chromatin occupancy at erythroid target genes (Ref. 150). For GATA4 post-transciptional modifications have mainly been studied in the context of hypertrophy of the heart. Activation of GATA4 occurs in part through acetylation by the transcriptional coactivator p300. Takaya et al. identified 4 GATA4 lysine residues that, when mutated, lacked p300-induced acetylation, DNA binding and transcriptional activities ( Fig. 1 Clinical applications of GATA transcription factor alterations The above mentioned alterations in GATA factors might be applicable as biomarkers for early detection, diagnosis and prediction of prognosis and response to therapy. Early detection markers. Non-invasive early diagnosis of CRC reduces mortality of this disease (Ref. 156). We have shown that GATA4 promoter methylation is highly prevalent in CRC, suggesting that methylation is an early event in colorectal carcinogenesis. GATA4 methylation, detected in faecal DNA has potential to be used as a biomarker for improving pre-selection tests for colonoscopy (Ref. 80), especially if the clinical and analytical sensitivity and specificity can be improved by adding additional biomarkers and by introducing sensitive analysis techniques such as for example methylation on beads technology (Ref. 157).
Diagnostic markers. The expression of several GATA factors can be helpful in establishing a correct diagnosis. In ovarian cancer loss of GATA4 precedes loss of GATA6 expression and can differentiate between histological subtypes. Loss of both GATA4 and GATA6 expression is found in serous, clear cell and endometrioid ovarian cancer, but their expression can be detected in mucinous carcinomas (Ref. 158).
Prognostic markers. As already described above, GATA1 mutations are found in nearly all AMKL patients with Down syndrome and are already detectable in the precursor lesion TMD. In addition, Down syndrome-neonates without GATA1 mutations do not develop AMKL (Refs 159, 160). Together, the presence of GATA1 mutations in Down syndrome-children might be a potential prognostic marker for identifying infants at higher risk of developing AMKL (Ref. 161). Besides having a clinical value in AMKL, prognostic properties of GATA transcription factors are also described in T-ALL. Inherited genetic GATA3 variants are identified in Philadelphia-like ALL (an ALL subtype with a poor prognosis) and are associated with early treatment response and a higher risk of relapse (Ref. 162).
GATA3 downregulation has been observed in ERnegative breast cancers and has been described as a strong prognostic indicator of breast cancer. Low GATA3 expression was strongly associated with aggressive disease and poor survival (Ref. 117). Vice versa, breast cancers expressing GATA3-and estrogen regulated genes exhibit a good prognosis and have better relapse-free and overall survival (Ref. 163). GATA3 has been considered to be a better prognostic marker for disease-free survival than commonly used variables such as ER status (Ref. 164) although conflicting data have been published. However, GATA3 expression is highly correlated with the luminal A subtype which has a relatively favourable outcome compared with luminal B and basal-like subtypes (Ref. 165). An explanation could be the downregulation of p18 INK4C transcription by GATA3 resulting in expansion of luminal progenitor cells thereby favouring the development of luminal type breast cancer (Ref. 166).
Recent studies indicate that GATA2 may be a useful biomarker for predicting prognosis in AML. GATA2 mutations are frequent in patients with a biallelic CEBPA mutation and are associated with a better survival (Ref. 167).
In oropharyngeal carcinomas, a methylation signature of 5 gene promoters, including GATA4, correlates with improved survival (Ref. 144). Eventually, loss of expression of GATA4 in GBM is associated with unfavourable patient survival (Ref. 136).
Recently it has been described that low GATA6 expression in lung adenocarcinomas is linked to increased incidence of metastasis and poor outcome (Ref. 168).
Predictive markers. Whole genome sequencing of samples from patients with ER-positive breast cancer, participating in aromatase inhibitor clinical trials identified 18 significantly mutated genes, including GATA3. Mutant GATA3 correlated with suppression of proliferation upon aromatase inhibitor treatment and might therefore be a positive predictive marker for aromatase inhibitor response (Ref. 169).
Re-expression of GATA4 in GBM cells conferred sensitivity to temozolomide, a DNA alkylating agent used in GBM therapy (Ref. 136).
Recently, GATA5 methylation was described as a potential predictive marker for patients with high-risk non-muscle-invasive bladder tumours. These patients had a better survival after treatment with Bacillus Calmette-Guérin (BCG) when GATA5 was methylated (Ref. 170).
Therapeutic interventions. For regenerative medicine the generation of functional differentiated cell types is of great therapeutic interest. Since heart disease occurs frequently and the heart has little regenerative capacity after damage, procedures are sought that can transdifferentiate fibroblast into cardiac myocytes. A cocktail of transcription factors, including GATA4 converts cardiac non-myocytes into cardiomyocyte-like cells in vivo, and alleviates cardiac injury ( Refs 171,172). Also in mouse liver engineering experiments GATA4 was one of the essential factors that contributed to the conversion of fibroblasts into functional hepatocytelike cells (Ref. 173). These induced cells were able to restore liver function in half of fumarylacetoacetatehydrolase-deficient mice. GATA4 is thus one of the pivotal genes that in combination with other transcription factors can be utilised to improve heart and liver function after damage. These promising results are the first steps for bringing regenerative medicine to the clinic. More knowledge of the different GATA protein functions and their downstream target genes is necessary before therapeutic strategies can be developed.

Conclusions and future perspectives
An increasing number of studies are being published, describing expression and function of GATA genes during development in different species.
Causal relationships between aberrations in GATA genes and several human diseases have become apparent. Numerous mutations in the GATA genes have been described above. Many disease-associated mutations are located in and around the Zinc finger regions. As those mutations are not specifically limited to the two Zinc fingers themselves, it is clear that the whole region is important for the proteins to be fully operational. Most likely mutations hinder the correct folding of the proteins and thereby obstruct GATA proteins from binding to their relevant binding partners. The application of next-generation sequencing technologies through whole-genome, whole-exome and whole-transcriptome approaches allows for substantial advances, which is expected to reveal more diseaseassociated alterations whithin GATA genes.
A better understanding of the regulation of GATA factors on transcriptional, translational and post-translational levels will give more leads to how GATAs can be used as biomarkers. Prospective clinical trials, based on these data, are necessary to determine the translational value of GATA genes as biomarkers.