5

So I have been trying to get into visualizing proteins in python, so after some research I ended up on a tutorial that was teaching you how to visualize a protein from the COVID-19 virus, so I went and setup anaconda, got jupyter notebook working in vscode, and downloaded the necessary files from the PDB database, and made sure they were in the same directory as my notebook but when I run the the nglview.show_biopython(structure) function I get ValueError: I/O opertation on a closed file. I'm stymied this is my first time using jupyter notebook so maybe there is something I'm missing, I don't know.

This what the code looks like

from Bio.PDB import * 
import nglview as nv

parser = PDBParser() structure = parser.get_structure("6YYT", "6YYT.pdb") view = nv.show_biopython(structure)

The error looks like this

Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_8776\2743687014.py in <module>
----> 1 view = nv.show_biopython(structure)

c:\Users\jerem\anaconda3\lib\site-packages\nglview\show.py in show_biopython(entity, kwargs) 450 ''' 451 entity = BiopythonStructure(entity) --> 452 return NGLWidget(entity, kwargs) 453 454

c:\Users\jerem\anaconda3\lib\site-packages\nglview\widget.py in init(self, structure, representations, parameters, kwargs) 243 else: 244 if structure is not None: --> 245 self.add_structure(structure, kwargs) 246 247 if representations:

c:\Users\jerem\anaconda3\lib\site-packages\nglview\widget.py in add_structure(self, structure, kwargs) 1111 if not isinstance(structure, Structure): 1112 raise ValueError(f'{structure} is not an instance of Structure') -> 1113 self._load_data(structure, kwargs) 1114 self._ngl_component_ids.append(structure.id) 1115 if self.n_components > 1: ... --> 200 return io_str.getvalue() 201 202

ValueError: I/O operation on closed file

I only get this error when using nglview.show_biopython, when I run the get_structure() function it appears to read the file just fine. I can visualize other molucles just fine, or maybe that's because I was using the ASE library instead of a file. I don't know, that's why I'm here.

Update: Recently I found out that I can visualize the protein using nglview.show_file() instead of using nglview.show_biopython(). Even though I can visualize proteins now and technically my problem has been solved I would still like to know why the show_biopython() function isn't working properly.

pippo1980
  • 1,088
  • 3
  • 14
  • I tried this in Jupyter Lab and VS code but couldn't reproduce the error. What OS are you on and what are your versions of Python, Biopython and nglview? I would like to help fix this as I added the show_biopython function many moons ago. In particular it seems that the StringIO object at https://github.com/nglviewer/nglview/blob/master/nglview/adaptor.py#L193-L200 gets closed early, preventing it being read. – jgreener Dec 12 '22 at 15:45
  • @jgreener I'm on windows, the anaconda python kernel version is 3.9.13, my nglview version is 3.03 and my biopython version is 1.80. – Jeremiah Wade Dec 13 '22 at 00:04
  • answered, better tried to here : https://stackoverflow.com/questions/74737766/file-i-o-error-using-nglview-show-biopythonstructure/74782007#74782007 – pippo1980 Dec 13 '22 at 08:48

4 Answers4

4

I also figured out another way to fix this problem. After going back to the tutorial I was talking about I saw that it was made back in 2021. After seeing this I wonder if we were using the same verions of each package, turns out we were not. I'm not sure what version of nglview they were using, but they were using biopython 1.79 which was the latest verion back in 2021 and I was using biopython 1.80. While using biopython 1.80 I was getting the error seen above. But now that I'm using biopython 1.79 I get this enter image description here

So I guess there is something going on with biopython 1.80, so I'm going to stick with 1.79

  • 2
    Thanks, for posting back a solution. Could you kindly upvote @pippo1980's response please? I'm not an expert, by any means, but it looks to be a good response to me. Certainly useful. – M__ Dec 13 '22 at 17:36
  • yep, same happens with my error, it goes away if I use biopython 1.79 – pippo1980 Dec 14 '22 at 12:40
3

I had a similar problem with:

from Bio.PDB import * 
import nglview as nv

parser = PDBParser(QUIET = True) structure = parser.get_structure("2ms2", "2ms2.pdb")

save_pdb = PDBIO() save_pdb.set_structure(structure) save_pdb.save('pdb_out.pdb')

view = nv.show_biopython(structure) view

error was like in question:

.................site-packages/nglview/adaptor.py:201, in BiopythonStructure.get_structure_string(self)
    199 io_str = StringIO()
    200 io_pdb.save(io_str)
--> 201 return io_str.getvalue()

ValueError: I/O operation on closed file

I modified site-packages/nglview/adaptor.py:201, in BiopythonStructure.get_structure_string(self):

def get_structure_string(self):
        from Bio.PDB import PDBIO
        from io import StringIO
        io_pdb = PDBIO()
        io_pdb.set_structure(self._entity)
        io_str = StringIO()
        io_pdb.save(io_str)
        return io_str.getvalue()

with :

def get_structure_string(self):
        from Bio.PDB import PDBIO
    import mmap

    io_pdb = PDBIO()

    io_pdb.set_structure(self._entity)

    mo = mmap_str()

    io_pdb.save(mo)

    return mo.read()

and added this new class mmap_str() , in same file:

import mmap
import copy

class mmap_str():

import mmap #added import at top

instance = None

def __init__(self):

    self.mm = mmap.mmap(-1, 2)

    self.a = ''

    b = '\n'

    self.mm.write(b.encode(encoding = 'utf-8'))

    self.mm.seek(0)

    #print('self.mm.read().decode() ',self.mm.read().decode(encoding = 'utf-8'))

    self.mm.seek(0)

def __new__(cls, *args, **kwargs):
    if not isinstance(cls.instance, cls):
        cls.instance = object.__new__(cls)
    return cls.instance

def write(self, string):

    self.a = str(copy.deepcopy(self.mm.read().decode(encoding = 'utf-8'))).lstrip('\n')

    self.mm.seek(0)

    #print('a -&gt; ', self.a)

    len_a = len(self.a)

    self.mm = mmap.mmap(-1, len(self.a)+len(string))

    #print('a :', self.a)

    #print('len self.mm ', len(self.mm))

    #print('lenght string : ', len(string))

    #print(bytes((self.a+string).encode()))

    self.mm.write(bytes((self.a+string).encode()))

    self.mm.seek(0)

    #print('written once ')

    #self.mm.seek(0)

def read(self):

    self.mm.seek(0)

    a = self.mm.read().decode().lstrip('\n')

    self.mm.seek(0)

    return a

def __enter__(self):

    return self

def __exit__(self, *args):

    pass

if I uncomment the print statements I'll get the :

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it. 

error , but commenting them out I get:

while using thenglview.show_file(filename) I get:

enter image description here

tha's because, as could be seen looking at the pdb_out.pdb file

outputted by my code, Biopytho.PDB.PDBParser.get_structure(name , filename) doesnt retrieve the pdb header responsible for generate full CRYSTALLOGRAPHIC SYMMETRY/or biopython can't handle it (not sure about this, help if you know better), but just the coordinates.

Still don't understand what is going on with the :

--> 201 return io_str.getvalue()

ValueError: I/O operation on closed file

it could be something related to jupiter ipykernal ? hope somebody could shed more light into this, dont know how the framework runs, but is definitely different from a normal python interpreter. As an example:

enter image description here

Same code in one of my Python virtualenv, will run forever, so it could be ipykernel dont like StringIO()s or do something strange to them ?

OK thanks to the hint in the answer below, I went inspecting PDBIO.py in github repo for version Biopython 1.80 and compared the save method of PDBIO : def save(self, file, select=_select, write_end=True, preserve_atom_numbering=False): with the one in Biopython 1.79,

see first bit: enter image description here

and last bit: enter image description here

so apparently the big difference is the with fhandle: block in version 1.80.

So I realized that changing adaptor.py with adding a subclass of StringIO that looks like:

from io import StringIO
class StringIO(StringIO):
def __exit__(self, *args, **kwargs):

    print('exiting from subclassed StringIO !!!!!')

    pass

and modifying def get_structure_string(self): like this:

def get_structure_string(self):
        from Bio.PDB import PDBIO
        #from io import StringIO
        io_pdb = PDBIO()
        io_pdb.set_structure(self._entity)
        io_str = StringIO()
        io_pdb.save(io_str)
        return io_str.getvalue()

was enough to get my Biopython 1.80 work in jupiter with nglview.

That told I am not sure what are the pitfalls of not closing the StringIO object we use for the visualization, but apparently its what Biopython 1.79 was doing like my first answer using a mmap object was doing too (not closing the mmap_str)

pippo1980
  • 1,088
  • 3
  • 14
  • What operating system are you on? I'm wondering if it could be a Windows issue, I can also test later on my Windows machine. There was a similar error reported a while ago for iPython and StringIO: https://github.com/ipython/ipython/issues/9168. – jgreener Dec 13 '22 at 15:50
  • I am on a Debian VM (VMware on Debian Host), dont remember ipykernel version but tried downgrading it lower than 8 because of this https://github.com/ipython/ipython/issues/9168 but getting same error nevertheless – pippo1980 Dec 13 '22 at 16:33
3

This is an issue with Biopython 1.80, in particular introduced by a commit that automatically closes IO objects sent to PDBIO.save.

I made a PR to change the behaviour which fixes this issue and should hopefully be in the next Biopython release. Until then, avoiding using Biopython 1.80 should work.

Thanks @jeremiah-wade and @pippo1980 for digging into this.

jgreener
  • 941
  • 4
  • 9
  • it took You 2 lines to explain what it took me a couple of days to figure out ... . My question is what is / if is there one the pitfall of having unclosed IO objects around ? I am really new to coding and python so in really plain words if you can. thanks again – pippo1980 Dec 14 '22 at 23:58
  • and on the same line, any chance to modifying the nglview adaptor.py class BiopythonStructure(Structure) ?? – pippo1980 Dec 15 '22 at 00:03
  • It's explained a bit in the linked PR but you can't read from a closed StringIO object. So if the PDBIO.save function closes the object then an error will be thrown when the string is read back. In general closing an IO object is best done with a with block or .close(), rather than being done silently when the object is given as a function argument. In general the with block is better since it's quite common to forget to call .close(). Closing IO objects once they are no longer needed frees memory and lets other software use them safely. – jgreener Dec 15 '22 at 11:59
  • They could just add a if isinstance(filename , StringIO) and close_file = 0 in PDBIO like 1.79 and change close_file = 1 for the #filehandle I hope bit – pippo1980 Dec 15 '22 at 15:23
  • think solution could be like PDBParser handles it: from Bio.File import as_handle , https://github.com/biopython/biopython/blob/7ef56557bdca66eabd62334e61c4dc3153c997ff/Bio/File.py#L29 and in PDBIO.py save method change with fhandle: to with as_handle(fhandle): – pippo1980 Dec 16 '22 at 12:22
  • or remove if isinstance else ... and change with fhandle: towith as_handle(file, mode = 'w') as fhandle: – pippo1980 Dec 18 '22 at 00:36
1

tried to understand git, I ended up with this, seems more coherent with the previous habits in the biopython project, but cant push it.

enter image description here

Anyone could pass it along ? [of course needs to be checked out, my tests are OK, but I am a novice].

pippo1980
  • 1,088
  • 3
  • 14