2

I'm using boost gzip example code here. I am attempting to compress a simple string test and am expecting the compressed string H4sIAAAAAAAACitJLS4BAAx+f9gEAAAA as shown in this online compressor

static std::string compress(const std::string& data)
{
    namespace bio = boost::iostreams;
    std::stringstream compressed;
    std::stringstream origin(data);

    bio::filtering_streambuf<bio::input> out;
    out.push(bio::gzip_compressor(bio::gzip_params(bio::gzip::best_compression)));
    out.push(origin);
    
    bio::copy(out, compressed);
    return compressed.str();
}

int main(int argc, char* argv[]){
    std::cout << compress("text") << std::endl;
    // prints out garabage

    return 0;
}

However when I print out the result of the conversion I get garbage values like +I-. ~

I know that it's a valid conversion because the decompression value returns the correct string. However I need the format of the string to be human readable i.e. H4sIAAAAAAAACitJLS4BAAx+f9gEAAAA.

How can I modify the code to output human readable text?

Thanks

Motivation

The garbage format is not compatible with my JSON library where I will send the compressed text through.

Tom
  • 1,109
  • 7
  • 16
  • Looks like that website shows you the compressed data encoded to Base64. So encode it as well, e.g. https://stackoverflow.com/questions/7053538/how-do-i-encode-a-string-to-base64-using-only-boost – Dan Mašek Jan 07 '22 at 18:23
  • 1
    @DanMašek hehe - timing – sehe Jan 07 '22 at 18:24
  • 1
    It's not "garbage". It's binary data. The whole point of compression is to compress, so the output uses all 256 possible byte values to permit the output to be as small as possible. You can encode the data into a smaller number of byte values to make it readable, e.g. 64 values in Base64, which is what you are looking at with `H4s`... That _expands_ the data, cancelling some of the compression. See the answers [here](https://stackoverflow.com/questions/1443158/binary-data-in-json-string-something-better-than-base64) for alternatives to your JSON embedding problem. – Mark Adler Jan 07 '22 at 18:39

1 Answers1

4

The example site completely fails to mention they also base64 encode the result:

base64 -d <<< 'H4sIAAAAAAAACitJLS4BAAx+f9gEAAAA' | gunzip -

Prints:

test

In short, you need to also do that:

Live On Coliru

#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/gzip.hpp>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <iostream>
#include <sstream>

#include <boost/archive/iterators/binary_from_base64.hpp>
#include <boost/archive/iterators/base64_from_binary.hpp>
#include <boost/archive/iterators/transform_width.hpp>

std::string decode64(std::string const& val)
{
    using namespace boost::archive::iterators;
    return {
        transform_width<binary_from_base64<std::string::const_iterator>, 8, 6>{
            std::begin(val)},
        {std::end(val)},
    };
}

std::string encode64(std::string const& val)
{
    using namespace boost::archive::iterators;
    std::string r{
        base64_from_binary<transform_width<std::string::const_iterator, 6, 8>>{
            std::begin(val)},
        {std::end(val)},
    };
    return r.append((3 - val.size() % 3) % 3, '=');
}

static std::string compress(const std::string& data)
{
    namespace bio = boost::iostreams;
    std::istringstream origin(data);

    bio::filtering_istreambuf in;
    in.push(
        bio::gzip_compressor(bio::gzip_params(bio::gzip::best_compression)));
    in.push(origin);

    std::ostringstream compressed;
    bio::copy(in, compressed);
    return compressed.str();
}

static std::string decompress(const std::string& data)
{
    namespace bio = boost::iostreams;
    std::istringstream compressed(data);

    bio::filtering_istreambuf in;
    in.push(bio::gzip_decompressor());
    in.push(compressed);

    std::ostringstream origin;
    bio::copy(in, origin);
    return origin.str();
}

int main() { 
    auto msg = encode64(compress("test"));
    std::cout << msg << std::endl;
    std::cout << decompress(decode64(msg)) << std::endl;
}

Prints

H4sIAAAAAAAC/ytJLS4BAAx+f9gEAAAA
test
sehe
  • 350,152
  • 45
  • 431
  • 590