"Your data is fiercely protected by security," declares 23andMe's website, which boasts "software, hardware, and physical security measures" for data protection. However, a recent security breach at the direct-to-consumer genetic testing firm has far-reaching privacy implications for those who sequence their genome and those who don't.
This breach challenges the limits of established security measures. It also underscores a broader ethical issue. In an era where genetic testing has become routine, the decision to share your DNA carries implications for your relatives, potentially denying them the agency to choose privacy.
In October 2023, 23andMe confirmed the unsettling news of a data breach, which exposed the personal details of 0.1% of their clientele, approximately 14,000 individuals.
This month, while working with a third-party auditor, the company uncovered a more extensive web of victims. Beyond the initial 14,000, 23andMe said that "a significant number of files containing profile information about other user's ancestry" were also accessed by the hackers (via TechCrunch). These "other users" were likely genetically related to victims who signed up for the DNA Relative Feature.
The DNA Relative Feature, designed for customers to share selected data, including names, relations, birth years, and locations, inadvertently allowed the hackers more access to private information. Exploiting this feature, they obtained the genetic information of a staggering 5.5 million people within 23andMe's database.
Adding another layer to the breach, an additional 1.4 million individuals found their family tree profiles compromised, contributing to the overall tally of affected users. This elevated the victim count to 6.9 million, nearly half of 23andMe's client base.
Beyond the names, birth years, familial relationships, and DNA shared with relatives, the breach included sensitive health data, potentially disclosing risks for diseases such as Type 2 diabetes or Celiac disease.
The magnitude of this breach serves as a stark reminder of the vulnerabilities inherent in large-scale DNA databases. It prompts critical reflections on the balance between scientific advancement and individual data protection.
The hackers employed a tactic known as credential-stuffing, exploiting instances where users recycled login credentials across multiple platforms, including 23andMe.com. In response, 23andMe announced on its blog that more stringent measures would be implemented to fortify security, including mandatory two-step verification and password resets for all clients.
DNA databases aren't new. Professional and amateur genealogists have used sites like GEDmatch and FamilyTreeDNA for years as free, user-driven resources. Law enforcement has its own DNA database of convicted and arrested people, such as the Combined DNA Index System (CODIS). However, recent high-profile cases such as the identification of the Golden State Killer have confirmed that law enforcement is branching out to these third-party providers for their forensic genetic genealogy investigations. Requiring no warrant, these sites have proved instrumental in solving cold cases. When wielded as a law enforcement tool, direct-to-consumer DNA databases have raised concerns about privacy infringement.
While sites like Ancestry.com and 23andMe claim not to have turned over genetic information to law enforcement agencies, open data sites like GEDmatch and FamilyTreeDNA are freely accessible to police. Qiagen, the parent company of GEDmatch, admits that 1.8 million profiles, nearly 70% of their user database, are viewable to law enforcement. This information does not include "raw data" describing genetic code details but kinship relations (via Science).
Investigating cures instead of criminals, scientists frequently use DNA databases for their research. For example, the Genome Aggregation Database (gnomAD) holds genetic data from over 60,000 people (via The Guardian). Made available in 2014, researchers have used this genetic pool to study the genes linked to cardiomyopathy. In 2016, research utilizing this resource identified clinically relevant markers for the disease. Without any identifying features to connect DNA data with a contributing person, this exome and genome sequencing data appears useful for research only.
However, the recent genetic data hack underscores the risks associated with widespread DNA and genealogy information. Unlike traditional genetic tests focused on single genes, new whole genome testing exposes individuals to the risk of reidentification even without conventional identifiers like names or birthdates. There's also the responsibility to protect the privacy of biological family members. The ability to identify individuals through advanced cross-referencing of unique DNA sequences, as demonstrated by Columbia University researchers, exposes a new frontier in genetic privacy concerns.
A study by researchers at Columbia University revealed a compelling statistic – individuals of European ancestry in the United States face a 60% chance of having at least a third cousin in a DNA database.
To demonstrate the reach of current DNA databases, these researchers illustrated what a long-range familial search can expose about a random database member. The 1000 Genomes Project is a DNA database that serves as a reservoir of common human genetic variation by utilizing voluntarily provided samples from self-reported healthy individuals.
Initiating their investigation with the DNA profile of an anonymous female participant from the 1000 Genomes Project, the Columbia researchers used direct-to-consumer DNA database information to find distant relatives with shared genetic traits. After a day of meticulously mapping the lineage with publicly available genealogy records, the researchers followed records for these relatives back to a common great-grandparent and successfully identified the target female. This compelling evidence underscores the ripple effect of shared DNA information and how the choices of distant relatives impact the entire familial tree–including those who choose not to divulge their genetic data.
The government, law enforcement, insurance companies, employers, and radical people may all discriminate based on freely available DNA information.
The bad actors responsible for 23andMe's privacy breach introduce the risk of racial targeting. On the BreachForum website, a braggart posted a sample of data they claimed to have stolen from 23andMe. The anonymous poster described the data as containing 1 million data points about Ashkenazi Jews and more on people of Chinese descent (via Wired). The hacker had the gall to ask for $1 to $10 for a random individual's discriminating information.
From bad actors to bad companies, government oversight of the misuse of DNA databases remains limited. The 2008 Genetic Information Nondiscrimination Act (GINA), which protects against discrimination in health insurance and employment, falls short in protecting against life insurance companies using DNA profiles for coverage decisions. The Department of Justice's "interim" policy limiting law enforcement's use of these DNA databases lacks teeth, prompting concerns about the need for independent judicial oversight.
Calls for robust regulation to protect privacy and medical research are gaining traction. Maryland took a step in 2021, regulating law enforcement's use of consumer DNA databases and limiting searches to substantial public safety or national security threats.
"This incident really highlights the risks associated with DNA databases," said Brett Callow, a threat analyst at security firm Emsisoft (via Wired). "The fact that accounts had reportedly opted into the 'DNA Relatives' feature is particularly concerning as it could potentially result in extremely sensitive information becoming public."
The 23andMe breach starkly reminds us that guaranteed DNA privacy is unlikely. Unexpected events can rapidly lead to your genetic information being circulated without your consent: data breaches, changes in national or international law, government or law enforcement overreach.
Confirming your identity with your mother's maiden name will likely become a thing of the past as genetic testing becomes entrenched in everyday life. The 23andMe data breach exposes a desperate need for comprehensive regulation, such as the explicit statutory limitations on using familial genetic genealogy and familial searches enacted in Maryland and Montana (via Eff).
With genetic anonymity becoming a relic of the past, scientists are advocating for measures to protect patient data while allowing continued medical research. Some have suggested that direct-to-consumer providers cryptographically sign raw genotype information, which can be identifiable by third-party services (via Science).
This breach and the others soon to come will teach millions that once your DNA leaves the control of you and the genetic laboratory, all expectations of privacy are lost.
Sources: 23andMe (1)(2), Bloomberg, Tech Crunch, Science (1) (2) (3), Forbes, Maryland.gov, Genetics in Medicine, The Guardian, Clinical Chemistry, The Palm Beach Post, Wired (1)(2).