16

This is a follow up to this question: Creating Vector Polygons with rendering performance like GISCloud?

In his answer, Yagi outlines a rationale for encoding geographic information in an image format and decoding it in the browser. He observes that "Currently, to do this, you have to roll your own". He also observes that there's currently no standard for this.

Given the awesome performance being demonstrated, it seems like the community might benefit from a standard. From my understanding of the problem, it sounds like a standard way of dealing with it could be implemented. Call it B-WFS.

My question, then: what would a useful protocol for encoding vector data as images look like? Is there something making it too complex to tackle usefully, or is it just a case of "no one's done this yet"?

canisrufus
  • 2,474
  • 1
  • 18
  • 30
  • I'm sorry for my ignorance, maybe I didn't get the point, but, a geotiff with color table woundn't do the job? – Pablo Oct 28 '11 at 17:08
  • 2
    Sorry for my ignorance, too ;) I'm not sure what a color table is, but I don't think so. The goal isn't to pass along an image with corresponding metadata. As you mention, that's a solved problem. The goal is to pass along vector data with metadata in a more compact format than human readable UTF-8. Given that JavaScript is ill equipped to deal with binary data, the workaround posed is to encode data in an image binary and decode it using HTML 5 Canvas to decode the image and then turn it into vector objects. – canisrufus Oct 28 '11 at 17:31
  • 1
    @Pablo Presuming that network I/O (rather than parsing) is the bottleneck in dealing with vectors on the web, having an established way to deal binary encoded vectors ought to make it easier to write better performing web maps. – canisrufus Oct 28 '11 at 17:36
  • Interesting, now I get it... I'm beginning to work with webmaps now and I'm still learning the basics. BTW, a colortable or colormap is a table that ties a raster cell value to a class. – Pablo Oct 28 '11 at 18:30
  • I'm just learning, but isn't "encoding vector data as image" basically mean rasterizing the data? You should be able to then create a Tile Layer and serve the formerly vector data as a standard Map Tile. If you want the related metadata you could create a separate request to retrieve the data based on a selected point or given area on-demand. The client/browser doesn't need all the metadata for all tiles all the time. Just some thoughts, I'm still trying to figure this out myself. – monkut May 23 '12 at 01:57
  • 1
    @monkut Yeah, it's different. :) Rasterizing a set of vectors is just rendering it. Voila. Raster! What I was talking about in this question is different. You should read Ragi's answer in the question I linked to; that should start to explain what I mean. If you find it's still not clear, I will take some time to pen up a real answer. – canisrufus May 24 '12 at 19:43

2 Answers2

5

It turns out that this is a needless work around. XHR2, part of the upgrades to javascript, will allow the import and parsing of binary data without coercing anything.

canisrufus
  • 2,474
  • 1
  • 18
  • 30
4

It doesn't need to be a separate standard as such, because the WFS Implementation Specification 04-094, clause 9.4 says:

Other output formats (including older versions of GML, non-XML, binary and vendor specific formats) are also possible as long at the appropriate values for the outputFormat attribute are advertised in the capabilities document [clause 13]. This specification recommends that a descriptive narative [sic] be included in the capabilities document for each output format listed there.

The easiest way to add binary support is to just GZIP a JSON stream, where the decompression is handled automatically by most browsers. That said, I've not tried it, but it would require minimal work on both server and client side, assuming both already support uncompressed JSON.

MerseyViking
  • 14,543
  • 1
  • 41
  • 75
  • +1 for point about the standard.

    Zipping is not binary encoding in the same sense. Questions about performance implications between the two approaches, a zipped geojson vs. geometries encoded in an image, are certainly worth exploring.

    – canisrufus Oct 31 '11 at 15:58
  • You're right, this reduces the network bottleneck, but puts greater load on the client and server. But encoding vector data in an image is, IMO, a sub-optimal approach because of the variable length of vector data. It's also obfuscating the nature of the vector data. A better approach might be to have two parallel data streams, one for vector and one for raster, that could be handled by different servers and storage devices and are then combined by the client. – MerseyViking Oct 31 '11 at 17:01
  • The issue of the variable length of vector data can be dealt with in basically the same way networks handle sending packets. I agree that its sub-optimal, but we seem to be pushed into that by the fact that JS doesn't deal well with binary.

    I'm going to just write and implement something myself, as I have time. I'll put it here when I get something working...

    – canisrufus Oct 31 '11 at 17:48
  • GZIPing a JSON stream is just compressing something that is completely bloated already. You are still talking about producing unnecessary long files in both ends. – Ragi Yaser Burhum Nov 01 '11 at 02:44
  • XHR2 notwithstanding, within reason the length of a generated file has little impact on the time it takes to generate. It tends to get swallowed up by the time it takes to query the database, generalise the features, and so on. Parsing the data on the client side is, I grant you, probably a more time-consuming process, but it does strike me as a solution to a problem that hasn't been well defined, with an assumption that binary==faster. "Premature optimization is the root of all evil." – MerseyViking Nov 01 '11 at 10:19
  • 1
    I really think Ragi defined it clearly in his answer. I'd agree that I didn't. :) It may be that the hypothesis that a binary format could be be an overall faster data transfer format is wrong. The difference could just be negligible. I did say that "performance implications... are worth exploring." Obviously I can't just define a binary format and then declare victory. We'll see! – canisrufus Nov 01 '11 at 12:45
  • 1
    @MerseyViking Without having to repeat my answer again, let me put this in perspective in terms of CPU cycles (since your assumption is about premature optimization). Accessing L1 Cache = 1 CPU cycle, L2= 14 cycles, RAM~250 cycles, Disk=41,000,000 , Network (depends on bandwith, so lets be kind)= 240,000,000. I/O, whether disk-based or network-based(our case) is orders of magnitude slower. How is shifting load from the last portion of the spectrum into the first one "premature" by any scale? – Ragi Yaser Burhum Nov 01 '11 at 15:43
  • My point was that a poor algorithm (or even just one that requires lots of processing), can make IO costs seem fairly insignificant. If an algorithm thrashes the CPU caches, then it's only as quick as RAM access, and so on up the scale. Also, you overlooked the asynchronous nature of disk and network IO which can amortize the cost further still. My idea of using GZIP was off the top of my head, and reduces the number of cycles you've highlighted as the biggest bottleneck, i.e. IO. I agree its not perfect, merely a suggestion. The thing that I was worried about was using images for vector data. – MerseyViking Nov 01 '11 at 16:13
  • Perhaps this discussion is getting out of scope for this site (for which I accept some blame), and I'd be happy to continue it over on SO. – MerseyViking Nov 01 '11 at 16:13