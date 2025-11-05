Since January 2024, Dr. Wain-Hobson has written weekly essays for Biosafety Now discussing risky research in virology. You can read his entire series here.

Autumn sonata, Chevreuse valley, France

On reading Generative design of novel bacteriophages with genome language models by Samuel King and colleagues at the Arc Institute in Palo Alto and Stanford University, USA posted on the Biorxiv server. This means that it has not undergone peer review.

Here we have artificial intelligence being used to generate novel viruses. Surely you mean bacteriophages? Exactly! Viruses that infect bacteria are called bacteriophages, or simply phages. The term is aptly derived from Ancient Greek, φαγεῖν (phagein) ‘to devour’. This semantic difference goes back more than a century to when we knew so much less than today.

The manuscript is very dense and not knowing a thing about language models On reading cannot appreciate the methods. However, the findings are comprehensible virus wise so here goes.

They’re itching to be the first to design a new virus. Genome language models have emerged as a promising strategy for designing biological systems, but their ability to generate functional sequences at the scale of whole genomes has remained untested. And it would be possible to learn underlying evolutionary constraints that could allow for the generation of phage genome sequences not yet seen in nature. Unsurprisingly, this leads to Here, we report the first generative design of complete genomes.

AI uses large data sets, so the programs were pretrained on large corpora of DNA sequences including over two million bacteriophage genomes. Now, this is an incredibly heterogenous bunch, akin to grouping mammals, fish, reptiles and more. They generated sequences are distinct from those in nature with genetic features reminiscent of natural phage genomes. This is unsurprising given the training set. Indeed, anything else would have been surprising. However, reminiscent is a vague word.

Yet, they go one step too far, IMHO. Together, these data demonstrate that pretrained Evo models can design biologically realistic phage genomic sequences while introducing sequence diversity beyond natural evolution. Realistic? There are no data whatsoever that these ‘phage genome sequences’ are that. They are strings of DNA which encode look alike phage proteins. This is what reminiscent was all about. No more. They are science fiction until tested.

Fortunately, they get to the point in the next section. We next hypothesized that generative genomic design… could propose novel, complete, and viable phage genomes. Basically, they did the same thing but this time on a dataset of approximately 15 thousand Microviridae sequences. This is like concentrating on genetic material from monkeys.

They also added two other constraints plus a third to make gene synthesis easier. Plus we applied an additional quality control constraint requiring at least seven predicted protein hits to natural ΦX174 proteins, plus we also developed a “genetic architecture” constraint to capture preservation of global gene arrangement relative to ΦX174 that we used to remove sequences that too closely resembled ΦX174 plus we applied a tropism constraint requiring that generated genomes encode spike proteins with moderately high sequence identity (≥ 60%) to the ΦX174 spike protein. Plus a few other things.

In short, they stacked the decks in favor of generating variations on a theme by ΦX174, one of the best-known phages, isolated from a Paris sewer sample. Bottom line - nature did the hard lifting.

They closed in on a set of 302 diverse generated phage genome candidates… Most genomes encoded 11 genes in total, with 10 preserving synteny with ΦX174 which makes sense given the tight pro-ΦX174 criteria used. Synteny refers to gene order within the genome. Remember, they’re still science fiction genomes, or genome candidates which sounds better, but not for long.

They were able to synthesize 285 phage genome candidates, the 17 remainders being too difficult to make. When they put the genome DNA into lab bacteria one by one, they recovered phage in only 16 cases, or about a 6% success rate. Of these nine phages didn’t pick up any mutations on growth while the remaining seven did pick a few. It is stunning that more than half of the novel phages were so good they didn’t acquire any further mutations. If it had been 1/16 On reading would have muttered ‘Hmm, one didn’t need any further mutation, that was good!’ So stunning.

Compared to the genomes in the training data the 16 viable phages recovered had between 67 and 392 novel mutations, or between 1.2 and 7% across the genome. This may be considerable in the DNA phage field but to a HIV person, it’s child’s play.

They tested the growth characteristics of the novel phages in comparison to natural phages and found that some grew better and had some enhanced traits that any savvy researcher would recognize. They go on to show that some of these novel phages could be useful in the developing field of phage therapy – the use of phage to kill pathogenic gut bacteria.

The discussion section of the paper is very upbeat, for example the work may expand biotechnological toolkits and transform phage therapy pipelines, enabling more adaptive and resilient antimicrobial strategies Let’s hope so. But they go too far, for example …generative design of whole genomes offers unique opportunities for studying evolution. Expressing thoughts can be treacherous and it is hoped they are referring to natural evolution. If so, then no. Evolution proceeds by a series of single mutations where every intermediate along the way must be viable. Occasionally but rarely complex mutations arise. This is best summarized by John Maynard Smith’s brilliant example:

WORD -> WORE -> GORE -> GONE -> GENE

Single viable mutations accumulated in a sequential manner can effect big changes. The series WORD -> GORE -> GENE involving two mutations at a time is highly unorthodox and attainable in the lab using forced mutation rates. A jump of 392 mutations in one go isn’t natural evolution. It’s akin to tunnelling through sequence space. That said, this jump tells us the number of viable solutions for ΦX174 like phages is far greater than anyone had ever contemplated, certainly not this writer.

It’s fascinating work which might deliver something for phage therapy, but it’s unlikely to tell us much about natural evolution.

Some of the novel phage showed enhanced properties compared to a panel of natural phages meaning there was Gain Of Function, this time via a totally new route. Making existing viruses more transmissible or virulent have always been part and parcel of dangerous GOF research. Only here, who cares about bacteria? And if ever bacteriophage therapy is used to treat gut infections big time, perhaps by way of using an enhanced phage, then this is clearly a case where there is an upside to a GOF experiment.

The authors write that their programs can’t be used to do the same on viruses of mammals. Laudable.

But.

Once you know something can be done, it lowers the bar for others to get there. It’s no longer a question of if, just how and that stimulates the brain. We now know experimentally that it is possible to enhance the characteristics of an existing - 3 tries out of 285 - while at the same time generating a majority of duds. Yet 1% is impressive and could not have been predicted.

Given the number of people in the AI space, others no doubt can write similar code. Will they stay away from mammalian or human viruses? Which brings us back to Dual Use Research of Concern (DURC). There’s always been a morbid fascination for the danger zone, dixit Terry Pratchett: Some humans would do anything to see if it was possible to do it. If you put a large switch in some cave somewhere, with a sign on it saying ‘End-of-the-World Switch. PLEASE DO NOT TOUCH’, the paint wouldn’t even have time to dry.

The question is how can this madness be stopped?

• In the US the Presidential Executive Order on dangerous GOF research needs to be translated into law, asap.

• Elsewhere, state and philanthropic funding agencies across the globe could add a few lines at the bottom of every research contract requiring that the researchers not undertake dangerous GOF research on microbes.

• If by chance an experiment didn’t go the way anticipated and a group found themselves going down the dangerous GOF research road, they should stop, consult with their funding agency and seek advice. Harvard have thought this through.

• Meanwhile universities the world over should require life science PhD students to take the Hippocratic Oath at the end of their thesis viva. Given the AI incursion into virology, that should be broadened out to all science, technology, engineering and mathematics (STEM) PhD students.

And if any organization or university needs help understanding what dangerous GOF research is – it is simple to define - please contact Biosafety Now.

Not surprisingly, this manuscript was noticed. An Opinion Piece in the NYT entitled ‘AI prompt that could end the world’ started with this from a pioneer in AI, Yoshua Bengio. …he was worried that an A.I. would engineer a lethal pathogen — some sort of super-coronavirus — to eliminate humanity. “I don’t think there’s anything close in terms of the scale of danger,” he said.

The article finishes with Dr. Bengio’s pathogen is no longer a hypothetical. [emphasis in original] In September, scientists at Stanford reported they had used A.I. to design a virus for the first time. Their noble goal was to use the artificial virus to target E. coli infections, but it is easy to imagine this technology being used for other purposes.

I’ve heard many arguments about what A.I. may or may not be able to do, but the data has outpaced the debate, and it shows the following facts clearly: A.I. is highly capable. Its capabilities are accelerating. And the risks those capabilities present are real. Biological life on this planet is, in fact, vulnerable to these systems. On this threat, even OpenAI seems to agree.

In this sense, we have passed the threshold that nuclear fission passed in 1939. The point of disagreement is no longer whether A.I. could wipe us out. It could. Give it a pathogen research lab, the wrong safety guidelines and enough intelligence, and it definitely could. A destructive A.I., like a nuclear bomb, is now a concrete possibility. The question is whether anyone will be reckless enough to build one.

If nobody speaks out fast, someone will and it could be too late. Yet virologists across the globe still don’t move. Beyond madness.

Others have commented on AI’s foray into virus ‘design’, although the NYT opinion out performs the others.

Conclusions

• A new technology is now available to heat up viruses.

• Action is needed fast and it’s not complicated. The four bullet points above are a start.

• Making novel pathogens for humans, animals and plants, or revving up existing ones must be stopped and ultimately banned. Grain and rice feed the world.

• It needs a just little thought but mostly balls.

Aside 1

Balls are in short supply.

Aside 2

There are bacteriophages with RNA genomes. Indeed, the very first complete genome sequence published was that of the RNA phage MS2 from the Belgian group of Walter Fiers back in 1976. https://www.nature.com/articles/260500a0. It came in at 3569 RNA building blocks, called bases, and encoded just three proteins.

There are viruses with smaller genomes, in the 1000-1200 range. However, the smallest pathogens by far and away are the viroids of plants, naked RNA molecules made up of 250-350 building blocks.

Aside 3

Trawling through RNA viral ‘‘dark matter’’ in 10,487 metatranscriptomes AI is being used to identify new viruses from diverse global ecosystems. Metatranscriptomes is just jargon for huge databases of raw RNA sequences. A program called LucaProt integrates both sequence and predicted structural information, enabling the accurate detection of RdRP sequences. Using this approach, we identified 161,979 potential RNA virus species and 180 RNA virus supergroups, including many previously poorly studied groups, as well as RNA virus genomes of exceptional length (up to 47,250 nucleotides) and genomic complexity.

Combining sequence and structural prediction data is smart for given time, for many proteins all but a few of the building blocks can be changed limiting detection. Genomes of 47,000 building blocks requires copy error correction mechanisms beyond what we currently know. There is still so much to do and learn in virology which is why it is the Queen of the biological sciences.

Simon Wain-Hobson is an emeritus professor at the Institute Pasteur, Paris, from which he retired in 2021. He and his colleagues were the first to sequence the genome of HIV, and Wain-Hobson has published more than 230 papers on virology and cancer.