3

I'm trying to separate RNA from protein in a complex protein/RNA PDB file and I want all RNA info with the hetero atoms in between the bases BUT without H20 etc. In short I want RNA part of pdb file without discontinuous lines.

I managed to separate RNA from protein with Bio PDB Select but it consider hetero atoms as amino acid when I use is_aa(residue). So hetero atoms wont appear in my "only RNA" file.

from Bio.PDB import *
from Bio.PDB import PDBParser, PDBIO, Select
import os

class ProtSelect(Select):
    def accept_residue(self, residue):
        return 1 if is_aa(residue) == True else 0

class RNASelect(Select):
    def accept_residue(self, residue):
        return 1 if is_aa(residue) == False and residue.id[0] != "W" else 0

pdb = PDBParser().get_structure("2bh2", "pdb2bh2.ent")
io = PDBIO()
io.set_structure(pdb)
io.save("seqprotest.pdb", ProtSelect())
io.save("seqRNAtest.pdb", RNASelect())

EDIT: I managed to do it:

class RNASelect(Select):
    def accept_residue(self, residue):
        return 1 if ((residue.id[0] != 'H_ MG') and ((residue.get_resname() == '  A' or residue.get_resname() == '  U' or   residue.get_resname() == '  G' or  residue.get_resname() == '  C' or residue.get_resname() == '  T') or  (residue.id[0][0:2] == 'H_' and (residue.get_resname()[2] == 'U'  or residue.get_resname()[2] == 'A' or residue.get_resname()[2] == 'C' or residue.get_resname()[2] == 'G' or residue.get_resname()[2] == 'T'))))  else 0

It's not pretty but it do the job. The idea is to get everything that finish by a nucleotide. I figured out that hetatom's names of RNA chains always finish by a nucleotide letter so i just have to check everything that begins by 'H_' and ends by 'A,T,C,G,U'. Just remove 'H_ MG' and you're good.

Raph
  • 61
  • 5
  • I've not done , RNA vs protein but separating protein vs protein is easy in Pymol. Whether Pymol will separate heteroatoms I have no idea – M__ Jul 13 '19 at 09:39
  • I don't remember now but when I had to work with PDB the biopython tutorial had a long section about it. I think there was some other property/method to find if it was really an amino acid or something else. It is worth exploring – llrs Jul 16 '19 at 15:12
  • @Raph post the solution as an answer to help other too (It can be missed if it is in the body of the question) – llrs Jul 16 '19 at 15:12

1 Answers1

3

Solution I found: (c/p of body)

class RNASelect(Select):
    def accept_residue(self, residue):
        return 1 if ((residue.id[0] != 'H_ MG') and ((residue.get_resname() == '  A' or residue.get_resname() == '  U' or   residue.get_resname() == '  G' or  residue.get_resname() == '  C' or residue.get_resname() == '  T') or  (residue.id[0][0:2] == 'H_' and (residue.get_resname()[2] == 'U'  or residue.get_resname()[2] == 'A' or residue.get_resname()[2] == 'C' or residue.get_resname()[2] == 'G' or residue.get_resname()[2] == 'T'))))  else 0

It's not pretty but it do the job. The idea is to get everything that finish by a nucleotide. I figured out that hetatom's names of RNA chains always finish by a nucleotide letter so i just have to check everything that begins by 'H_' and ends by 'A,T,C,G,U'. Just remove 'H_ MG' and you're good.

Raph
  • 61
  • 5