The National Project on Achievement in Twins (NatPAT) is based in the United States (US) and focuses on early reading and math development. We began recruitment into the project registry in 2017 (see Figure 1). The initial goals of the NatPAT project, including recruitment strategies, measures and such, have been previously reported (Hart et al., Reference Hart, Martinez, Kennedy, Ganley and Taylor2019). The first phase of the NatPAT project focused on enrolling twins and integrating data available on those twins from numerous sources, including the school reading and math data from the University of Oregon (UO) DIBELS Data System (DDS). The UO DDS is an online repository that contains data on approximately 1.4 million children a year who attend schools that use the Dynamic Indicators of Basic Early Literacy Skills (DIBELS) and easyCBM math assessment products (University of Oregon Center on Teaching and Learning, 2019). Data in the UO DDS includes reading and math progress monitoring data from kindergarten through 8th grade, with testing typically occurring three times a year. The first phase of NatPAT also included integrating survey data collected from parents and twins as well as publicly available data on neighborhood and school contexts. A recent publication from this phase of the project focused on the role of age as a moderator of the genetic and environmental influences on reading, finding small increasing genetic influences and larger decreasing shared environmental influences on reading from ages 5 to 7 that began to plateau around age 8 (Little et al., Reference Little, Lancaster, Johnson, Kennedy, Taylor, Ganley, Johnson and Hart2025).

Figure 1. The NatPAT logo.
In response to the COVID-19 pandemic, we pivoted data collection of NatPAT to focus on short- and long-term impacts of the pandemic on children, by integrating reading data from the UO DDS, accessing a broad array of publicly available contextual data related to COVID-19 using geographical information systems technology, and annual surveys to parents and twins. We use a risk-resilience framework (Catts & Petscher, Reference Catts and Petscher2022) to understand the role of COVID-19 related impacts on children’s reading skills. Risk-resilience models focus on the balance between factors that can increase the likelihood of low reading skills (risk factors) and positive factors that can decrease the likelihood of poor reading outcomes (resilience factors). COVID-19 represents an extreme mediating factor that disrupted children’s resilience factors and impaired their ability to mitigate the relation between risk factors and reading skills.
The first publication from the COVID-19 focused portion of the project was a registered report describing the ways children went to school, what parents did to assist in their children’s education, parent–teacher interactions, schoolwork struggles, technology access, and school services provided to children, during the first full school year after the start of the COVID-19 pandemic (Johnson et al., Reference Johnson, Little, Shero, van Dijk, Holden, Daucourt, Norris, Ganley, Taylor and Hart2024). Following this publication, a subsequent study from this project examined the causal direction between COVID-19 impacts and children’s resilience. We found that resilience explained 32% of variance in overall COVID-19 impacts, consistently buffering negative impacts from multiple domains (e.g., social connection, mental health; Du et al., Reference Du, Hicks, Meyer, Kennedy, Little and Hart2024). Another study focused on the contribution of time spent on online learning to the degree of parent involvement in children’s education during the pandemic. We found that parents of children who spent a higher percentage of the school year in online learning were more involved in providing schoolwork help to their children but had less positive feelings towards their children’s school and teachers (Johnson et al., Reference Johnson, Little, Shero, van Dijk, Holden, Daucourt, Norris, Ganley, Taylor and Hart2024).
In 2022 we began collecting genomic data on parents and twins within NatPAT, with the goal of examining the association of genetic variants on reading skills and related cognitive and behavioral phenotypes, as well as identifying intergeneration transmission factors. Work on this aspect of the project is still ongoing, with no publications yet.
International Context
Although there has not previously been a national twin project in the US focused on child reading and math achievement, there have been projects like NatPAT across the world. To name a few, we join the Twins Early Development Study in the United Kingdom, the Netherlands Twin Register in The Netherlands, numerous registries across Scandinavia (see Hur et al., Reference Hur, Bogl, Ordonana, Taylor, Hart, Tuvblad, Ystrom, Dalgård, Skytthe and Willemsen2019). In addition, there have been other large-scale twin registries focused on children’s educational achievement, including the Florida Twin Project, Western Reserve Reading and Math Project, Colorado Twin Project, and the Quebec Newborn Twin Study (Hur et al., Reference Hur, Bogl, Ordonana, Taylor, Hart, Tuvblad, Ystrom, Dalgård, Skytthe and Willemsen2019). Of interest, the diversity of the NatPAT families, including to name a few, where they live, their economic capital available, the school instruction received, and political viewpoints, has resulted in greater environmental variance than typically seen by many of the previous twin projects named above. We are just starting to publish work from this sample, but based on our findings from the first NatPAT publication (Little et al., Reference Little, Lancaster, Johnson, Kennedy, Taylor, Ganley, Johnson and Hart2025), we believe that we will see greater environmental variance in achievement than reported previously in the literature (e.g., Little et al., Reference Little, Haughbrook and Hart2017).
Materials and Methods
Recruiting Twins
This project represents the first national level twin sample focused on school achievement in the US. To do this work at scale in the US, which does not have a central state-run repository for school achievement data like other national twin projects have used (e.g., TEDS in the UK), we had to partner with a company that administers a school testing product used by many schools, UO DDS. The UO DDS agreement with participating schools regarding data reuse dictates that they could not tell us who was in their data without parent consent, but they can tell us which schools are in their database. Therefore, we recruit families by contacting each school in the UO DDS via the phone. At contact, we ask school administrative assistants to agree to send home our recruitment materials to any families of multiples in their schools. If the administrative assistant agrees (thus far, approximately 68%), we then mail them a pack of our recruitment envelopes (which contain a cover letter explaining the study, a one-page demographic and zygosity questionnaire, and a stamped envelope addressed to project staff at Florida State University; these materials are all available https://osf.io/6fw5c/) along with a $5 gift card as a thank you to the school contact. All recruitment materials are also available in Spanish, sent upon request.
Families that receive recruitment materials have the option to fill in the questionnaire and return it directly to us to enroll in the project and receive a $10 gift card for doing so. Twin zygosity is determined through parent rating of the similarity of the twins using a five-item zygosity questionnaire (Lykken et al., Reference Lykken, Bouchard, McGue and Tellegen1990). As part of their enrollment into the registry, they give us permission to access their OU DDS data and integrate it with their demographic and zygosity data.
As of this writing, August 2025, we have recruited 1997 twin pairs into the NatPAT registry (see Table 1), who are enrolled in 1112 different schools across 50 states (see Figure 2). At the time of their recruitment, 121 twin pairs were in preschool, 682 twin pairs were in kindergarten, 652 twin pairs were in first grade, 596 twin pairs were in second grade, 505 twin pairs were in third grade, 529 twin pairs were in fourth grade, 449 twin pairs were in fifth grade, 293 twin pairs were in sixth grade, and the remainder were in various grades higher than sixth or pairs were spread between two grades. Given the multiple data types combined for the project, the effective sample size and representativeness of the available sample varies depending on the individual analysis done.
Table 1. NatPAT sample characteristics

Note: Sample size is twin pairs. Two families have unknown zygosity.

Figure 2. A map of where the NatPAT twins live.
Note: Each family’s location has been randomly offset (jittered) to protect participant privacy.
Reading and Math Measures
We use the demographic information of the twins to pull all available data from the UO DDS, including all previously collected data (and we will update as they age and continue to have new data). The OU DDS contains reading data and math data, typically available three times a year per child.
The DIBELS Next is a set of seven benchmark progress monitoring reading measures given in kindergarten through eighth grade, administered individually and designed to be short (Good & Kaminski, Reference Good and Kaminski2011). Not all of the measures are administered in all of the grades; as children become proficient in some more foundational reading skills, some of the measures are designed to be phased out and other measures focused on more complex skills introduced (see Hart et al., Reference Hart, Martinez, Kennedy, Ganley and Taylor2019, for full list of the measures).
The easyCBM Math measure is grade-level measure of math content knowledge and skills, given three times a year from kindergarten to eighth grade. The content tested by the easyCBM Math measure changes at each testing point, although generally the test items focus on numbers and operations, geometry, measurement, algebra, data analysis, and ratios. The test items are multiple choice and testing occurs online.
Contextual Data
We geocode participant homes and school addresses to be able to gather data on the contexts around children’s homes and schools. Geocoding involves using geographic information software (GIS) to map selected locations from multiple data types including census data, addresses, and global positioning system data (Goldberg et al., Reference Goldberg, Wilson and Knoblock2007). Using addresses with GIS software involves translating and standardizing an address entry, locating the address entry from within available reference data, then isolating and pinpointing the best potential match as a specific locus (X, Y, coordinate) on a map (Zandbergen, Reference Zandbergen2008). Once geocoded we merge family and school locations with the data harvested from publicly available databases, including: existing indices of opportunity (e.g., Child Opportunity Index and Social Vulnerability Index) as well as data on home (i.e., income level by census-tract) and school (i.e., Title 1 designation, school lunch expenditures) conditions. Data are available at several geographic levels including county, zip code, census tract, and census block group. This geographic range allows for both broad and more fine-grained measures of contextual conditions. Importantly, these data are common across the geographic span of the US allowing us to create a comprehensive picture of the contexts within which children learn and develop across our geographically spread participants.
Survey Data Collection
We also send surveys home to the families in our registry. In total, there will be seven waves of survey data collection, all of which are filled out by one caregiver and both twins (if they have completed third grade, typically about 8 years of age or older) unless noted. The original project included a single survey data collection, which was completed March-May 2022, focused on reading and math attitudes, behaviors, and environments (N = 671 parents, N = 1290 twins). Six survey rounds are focused on COVID-19 impacts, with an initial caregiver-only survey sent May-August 2021 (N = 833 parents), and then annual surveys sent September-November, beginning in 2022 (2022 N = 603 parents, N = 1162 twins; 2023 N = 584 parents, N = 1121 twins; 2024 N = 614 parents, N = 1115 twins), set to end after the 2026 collection. One caregiver, and twins who are old enough, can complete the survey online or on paper, with questions focused on mental health, school experiences, attitudes and behaviors important for school (e.g., attention, interest, test anxiety).
Genomic Data Collection
In 2022 we sent all NatPAT families an interest letter to participate in a genomic data collection. Families were asked how many ‘parent’ saliva collection tubes they wanted to go with two tubes for their twins. We did not want to disenfranchise any families, so we did not dictate who a ‘parent’ could be to be included for data collection, but we did prioritize biological parents for decisions on what tubes to complete sequencing on. Families who replied with interest received up to four saliva collection tubes, with collection instructions, in the mail. Thus far, we have received 646 tubes back from 176 families. All received saliva tubes have gone through DNA extraction and storage, and 437, representing 233 twins and 204 parents, have had 30x whole genome sequencing completed, in partnership with Yale University.
Data Sharing
As these projects are funded by the National Institutes of Health under the data sharing policy, all data will be shared on LDbase.org (Hart et al., Reference Hart, Schatschneider, Reynolds, Calvo, Brown, Arsenault, Hall, van Dijk, Edwards, Shero, Smart and Phillips2020), and some data is already available (e.g., Du et al., Reference Du, Hicks, Meyer, Kennedy, Little and Hart2024; see Hart et al., Reference Hart, Ganley and Taylor2022 for full project). We use common IDs and good data management practices (e.g., double data entry, good documentation) to integrate the different data sources together.
Lessons Learned and Challenges Faced
Our recruitment parameters meant that we did not know who or where families of twins were, so we had to work on ways to get to the twin parents knowing only what schools were in the UO DDS. We tried many ways to get our recruitment materials to families, including cold mailing our recruitment materials to schools, social media posts in the regions of greater density of UO DDS schools, word of mouth through recruited families to other families of multiples they know in the region, and going through district offices of larger districts to have them assist us in getting our materials sent home (e.g., getting permission to get addresses). Refining our cold calling script to be succinct was the most effective method we tried. We learned how to quickly identify ourselves in a way that made us not sound like a telemarketer.
We also have learned that despite having child name, birthday, sex, and school name, mining the OU DDS to find their reading and math data, has been very complicated, and sometimes simply not successful. We do have an adequate sample size of participants with available reading data from the UO DDS. However, when linking this dataset to participant surveys, we found a more limited overlap than anticipated. We believe that many individuals in the UO DDS database who could be matched have been, but we are currently conducting a deeper dive to verify whether additional indicators might improve the matching rate. Importantly, the challenge does not appear to be one of unreliable matches, but rather differences in timing (or potentially other factors) that limit the overlap between those with available achievement data and those who responded to surveys at a given time point.
A large part of our data collection also includes finding publicly available contextual data on the twins and their families, based on their home and school addresses. A limiting factor has been the decentralization of data in the US, with many data resources being available at the state level, with each state having different data storage methods and data management techniques. To address this challenge, we have adopted a dual strategy. First, we focus on large, national data indices such as the Child Opportunity Index (diversitydatakids.org), which offer standardized metrics and enable consistent comparisons across geographic regions. Second, we have invested effort in collecting state-level data, such as COVID-19 policies or policies on school funding allocation. These state-specific resources, once standardized, will allow us to incorporate fine-grained policy context into our analyses while maintaining a coherent national framework. Another challenge is when a state changes what data is collected, and when, meaning data can be missing for reasons outside of our control. We believe our efforts to find and download these contextual data are important because so much of what we know from population-level twin registers are limited to European socialized countries with good central data collection, but these countries generally do not represent the level of geographical, socio-economic, and racial diversity that the US has.
Future Directions
We are approaching the end of our current COVID-19 impacts funding and are currently looking for new funding opportunities to keep the NatPAT project going. The twin families are very engaged, and we would like to keep our contact with them, following these twins as they continue to age. One future direction we are actively trying to get new funding to support is to continue the whole genome sequencing and subsequent genomic analyses. If new funding is secured, we plan to use more engaged recruitment of our registry families into the genomics portion of the project. We have completed a community engagement process to get feedback from parents about our recruitment materials and have made changes based on their comments. These changes include a more infographic-style recruitment letter, with more information about why DNA is needed and what we can learn to help children with learning disabilities, and more content about what DNA is and how we process the saliva samples, which we have now created videos on. Once funded, we plan to use the whole genome sequencing data to confirm zygosity, do individual level genomics analysis (e.g., polygenic score calculation, rare variant analysis). We also plan to do intergenerational modeling (e.g., genetic nurture designs) with the full family DNA.
We are also interested in contributing to international collaborative work, sharing both our behavioral data and our genomics data. If we are not successful in receiving any new funding for NatPAT, all data collection, including OU DDS, contextual data, and surveys, will end during the summer of 2027. However, we will continue to publish papers from NatPAT for many years.
Acknowledgments
This research was supported by the Eunice Kennedy Shriver National Institute of Child Health & Human Development Grants P50HD052120 and R01HD109292. JRG is supported by P50 HD027802 and R01HG010171, the Manton Foundation, and generous gifts from Kurt and Tammy Mobley, Tim and Cindy Wollaeger, Anna Nordberg, and the Lambert Foundation. Views expressed herein are those of the authors and have neither been reviewed nor approved by the granting agencies. This research was undertaken, in part, thanks to funding from the Canada Excellence Research Chairs Program to the first author. Finally, this research was supported by the W. Russell and Eugenia Morcom Chair at Florida State University.

