Is current AI able to generate RNA sequences of viruses?

Question

After seeing a video of a combination of CRISPR and AI, and an article of someone who made two babies immune against HIV and still healthy, I wondered about something. If such complicated thing, to edit someone's genome, so that the person isn't likely to get other diseases or complications, while having a selected function, is (almost) possible, will it be possible currently to let AI generate RNA of virus particles like from a Generative Adversarial Network?

We show that these tools capture important structures of the data and, when applied to designing probes for protein binding microarrays (PBMs), allow us to generate new sequences whose properties are estimated to be superior to those found in the training data.

https://arxiv.org/abs/1712.06148 In this research article, it is stated that the researchers were successful in training the model so that it would generate new sequences, while having a specific functioning protein. Note that this was from the time when ProGAN wasn't even invented yet, so they used a Wasserstein-GAN.

While we found these techniques to generally be well-behaved, we did notice some undesirable cases during our explorations.

They concluded that even one mistake in nucleotides could mess the protein up. But now in 2022, we have StyleGAN-Ada (https://github.com/NVlabs/stylegan2-ada), which is far better in generating stuff, we've already (almost) overcome that stuff? So if you basically train such model on RNA of virus particles (most viruses have really short genome compared to an eukaryote cell, so a supercomputer is not required for training such thing, and there are databases online: Publicly available genome sequence database for viruses?), such as every mutation of SARS-CoV-2, HIV, ebola and so on, it would basically be possible to let it generate something that is as contageous as SARS-CoV-2, whereas as deadly as HIV? Keep in mind that one single generated sequence that functions, would already be devastating. It doesn't need to be 100% accurate, because after all we can just experiment with it and find out whether it is working or not as supposed to be.

It doesn't have to be a GAN:

We present three approaches: creating synthetic DNA sequences using a generative adversarial network (GAN); a DNAbased variant of the activation maximization (“deep dream”) design method; and a joint procedure which combines these two approaches together.

The RNA-sequence can be brought into a cell, and the cell will automatically create the components of the virus, which will become the virus ultimately.

Anyone can recall that this is possible? I have done my searches on Google, but I haven't found any article particularly describing this concept.

I hope this is on-topic, I don't know how the forums work. Thanks in advance.

I'm closing this question for multiple reasons, including: opinion based, speculation about dangerous experiments, doesn't restrict to a specific problem. The user name "Random user" doesn't give me much incentive to be lenient on this. — gringer, Nov 13 '22 at 01:42

M__ · Accepted Answer · 2022-11-11T20:36:51.723

Reinforced learning is an interesting technology, but they are prone to extreme results - which are not necessarily at all biological feasible. Google can use it to outperform, or very possibly outcompete, the world's best chess player, because all of the rules are known. In biology we simply do not know all the rules.

To reiterate, the problem with reinforced learning in biology is not only do we not know all the rules, in fact in reality we know very few of the rules. HIV - why does it cause persistent infection? SARS-CoV-2 why doesn't the vaccine confer complete protection? Moreover, the GAN originator needs to know all of the biological known rules and apply them within the GAN and there are a lot of rules. Does the originator know the conformational change of env or spike and on which amino acid that are invariant? I doubt it. That information would be pivotal to generate AI sequence for a biologically feasible virus.

Even in the original Atari video game experiment * whilst GANS outcompeted humans by >1000%, equally they under-performed humans considerable for around >20% games tested. Thus even if all the rules are known GANs can mess-up a large proportion of the time.

These are simply hypotheses in an emerging technology within biology that require biological verifiable results and those proofs will be difficult, i.e. time consuming. This is not the case within video games, or board games.

In the first instance, GAN output can be investigated for biological validity against strongly established bioinformatics analysis - the stuff we use to assess whether there are errors in the data. However, there needs to be a lot incentive to perform this research at this point in the technological cycle. When the authors have something concrete, we might then reinvestigate the fundamental issues of their predictions.

*, the old Atari video games were subject to GANs and the computer then compared its performance in the video game against an average human. The results were highly variable, moreover humans are highly variable - which to my knowledge was ever investigated - so it was never at all clear the "the average human" represented in these tests.

A GAN isn't 'trained against [biological entity]' in the technical way supervised learning is trained. A GAN 'trains' against itself via a 'win/lose' quantification. Establishing a valid 'win/lose' is a pivotal part of the GAN. Thus in chess thats easy ... you win or lose the game, in biological that is not trivial.

Note To address the point below, yes GANs devise their own "rules" to maximise win/loss WITHIN the rules already in place. However, they needs 'ground rules' otherwise how could it play a game of chess if it doesn't follow the rules of chess? How could it play an Atari video game if it does not know what a "joystick" is? The problem is without knowing the ground rules (what is biologically permitted), what's the validity of the subsequent rule set, i.e. predicted sequence in this case?

So a virus that can't attach to the host cellular receptor, isn't a virus because it will never be able to enter the cell.

Further note

The OP pointed out in the notes

"The idea of style transfer has largely only been explored in image-based tasks, which we attribute in part to the specific nature of loss functions used for style transfer."

The author is talking about a variant of transfer learning. What the author is reminding us is that overall transferring-learning is the technique which powered the success of computer vision. They are saying they have replicated that in their GANs.

Please note any further points should be separate questions

https://uzman-anwar.github.io/projects/IL_Atari/report.pdf Do you mean this article for atari? — Random user, Nov 11 '22 at 16:36

Is current AI able to generate RNA sequences of viruses?

1 Answers1