1

I have a python module that I need to adapt from py2 to py3. The problem is, it accepts an std::string from a C++ module as part of a struct, which was readable in py2 since the default py2 string type was bytes. When trying to launch it with py3, however, it tries to interpret that string with utf8 whenever I try to do anything with it.

Basically, the message deserializer is expecting a bytes-like object, but is getting a normal, albeit unreadable, string instead.

For instance, this doesn't work:

msg_raw_data = bytes(msg.raw_data, encoding='latin-1')
    UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 4: invalid start byte

Unfortunately, I cannot change the way the string comes into the module, but I don't need to read that string as an actual valid string - I just need to extract a bytes object from it without discarding any values. Is there a way to do that?

fwiffo
  • 37
  • 5
  • This is just a character decoding issue, try a different encoding such as `windows-1252`: https://stackoverflow.com/a/48067785/1399491 – Alex W Jul 26 '21 at 12:30
  • I have tried a few different encodings that I've found in various SO questions, including `windows-1252`, `ascii`, `latin-1`, `string_escape`, `unicode_escape`, `raw_unicode_escape`, but none of them have worked so far. – fwiffo Jul 26 '21 at 12:42
  • Have you tried using something like [chardet](https://pypi.org/project/chardet/) ? – Alex W Jul 26 '21 at 14:54
  • No, but the problem is that unlike the person in the question you've linked, I do not have the luxury of choosing encoding when opening a file - what I get is a string object directly, though the service that is sending it is highly likely sending a `bytes` object. That's why I don't need to try and decode that as a string, instead I just need a way to extract the underlying bytes without discarding them (so can't use errors='ignore' parameter). – fwiffo Jul 26 '21 at 17:10

1 Answers1

0

For the lack of a better option, had to ask the C++ team to change their python bindings to return a bytes wrapper instead of std::string from their side.

fwiffo
  • 37
  • 5