Realistic rendering: which processes of the human eye and brain do I need to consider?

Question

Photorealistic rendering has the goal of rendering an image as a real camera would capture it. While this is already an ambitious goal, for certain scenarios you might want to take it further: render an image as the human eye would capture it or even as the human being would perceive it. You could call it visiorealistic or perceptiorealistic rendering, but if anyone could come up with a catchier term (or tell me that there is one already in existence) I'd appreciate that.

Here are some examples to make my point clear. When you take a picture with a camera at a low illumination level, you either have a good lens or get a noisy image. For a human observer, scotopic vision kicks in and gives rise to the Purkinje effect (colors are shifted towards blue). This effect depends on the HDR luminance information, which are lost when I display the image on a LDR display. In addition, the human brain may use depth information to 'filter' the perceived image - information which are lost in a final (non-stereo) rendering.

Assembling an exhaustive list is probably an elusive goal. Could you suggest some of the effects of the eye and the brain that I would need to consider?

i heard once that we do get the same noise than cameras because noise is actually physical and not only electrical. (i.e there are not so many photons after all). But the brain erases it, using temporal antialiasing I reckon. (i.e we see with lots of motion blur at night). — v.oddou, Oct 01 '15 at 01:27
I don't quite get the idea. If you render an image in low light and simulate a Purkinje effect, it won't look realistic as the human eye will add its own effect, won't it ? — , Oct 01 '15 at 06:47
@YvesDaoust Since the image is shown on a LDR monitor under unknown lighting conditions, probably not. Simply put, the image you see on the screen will be brighter so it's easier to perceive. If we were using a HDR monitor and could reproduce the luminance values of a nighttime scene exactly (and have an otherwise dark room), you're right. — David Kuri, Oct 01 '15 at 06:50
There's nothing wrong with what your striving for, but im afraid that this seems a bit too broad to me as there are so many effects that we need to consider. I could not write this in SE format, because it would indeed be wrong. However if you adjust your scope a bit like "Can you suggest some of the effects that i would need to consider" than it would be easier to begin. — joojaa, Oct 01 '15 at 07:10

score 6 · Answer 1 · answered Oct 05 '15 at 05:25

you might want to take it further: render an image as the human eye would capture it or even as the human being would perceive it.

There are two ways to interpret this. I'll do both.

Interpretation 1: Render an image that looks perceptually realistic.

At the end of the day, your image still needs to be displayed somewhere. Here's the key: you want to render your image in such a way that when you *display* that image on a particular display device, it will produce the same sensation the original radiometric image would have produced.

Here's how to unpack that idea.

In the real world, radiometric spectra (i.e., real distributions of light) enter your eye and stimulate approximately¹ four light receptors. The stimulations of the receptors produce the sensations of color we associate with images.

In rendering, we don't have arbitrary control over the spectra we produce. Fortunately, since we (usually) have only three cones, each of which produces only a scalar value, color vision can be reproduced by using exactly three primaries. The bottom line is you can produce any color sensation by using a linear combination of three wavelengths only (up to a few colors that might have to be negative, in which case, you just use different primaries).

You don't have a choice of primaries. Almost all color display devices use the sRGB standard, which provides three primaries (which actually usually don't have a single wavelength). That's fine because it turns out it's all abstracted and you don't have to care.

To clarify the mess that is perceptually accurate rendering, here's the algorithm:

Render your image using correct radiometric calculations. You trace individual wavelengths of light or buckets of wavelengths. Whatever. In the end, you have an image that has a representation of the spectrum received at every point.
At each pixel, you take the spectrum you rendered, and convert it to the CIE XYZ color space. This works out to be integrating the product of the spectrum with the standard observer functions (see CIE XYZ definition).
This produces three scalar values, which are the CIE XYZ colors.
Use a matrix transform to convert this to linear RGB, and then from there use a linear/power transform to convert linear RGB to sRGB.
Convert from floating point to uint8 and save, clamping values out of range (your monitor can't represent them).
Send the uint8 pixels to the framebuffer.
The display takes the sRGB colors, does the inverse transform to produce three primaries of particular intensities. Each scales the output of whatever picture element it is responsible for. The picture elements light up, producing a spectrum. This spectrum will be (hopefully) a metamer for the original spectrum you rendered.
You perceive the spectrum as you would have perceived the rendered spectrum.

Interpretation 2: Attempt to simulate the end data the human eye might receive for visualization purposes or compensation for LDR displays.

This one has a less useful meaning, I think. Essentially, you're trying to produce an image that tweaks the way the brain perceives it for fun/profit.

For example, there was a paper at SIGGRAPH this year where they simulated afterimages and color reduction to make images appear perceptually different. Of course, the only reason they do this at all is because the displays we're working with are all low-dynamic range (LDR). The point is to simulate the effects someone might see if exposed to a real high-dynamic range (HDR) display as actual image data.

In practice, this turns out to not work very well. For afterimages, for example, we see afterimages because of a very bright stimulus exhausting color cells. If you instead try to stimulate the effect with a fake afterimage, it might look kindof similar--but since it's a completely different mechanism, it's not very convincing.

This sort of graphics is actually underexplored in the literature if you want to make a go at it. The mentioned paper is an example of more-or-less the most state-of-the-art approaches we have. I think the current consensus, though, is that it is not really worth trying to simulate (at least at this time), since at best you'd only be approximating real vision effects by substituting different ones, and that this doesn't really work.

¹Rod+3*cones, the usual case. Approximate because humans may have as few as zero functional light receptors up to a conjectured maximum of seven (with the highest ever observed being five).

Thanks for your answer. I consider it to be incomplete though. Your statement from Interpretation 1 "You perceive the spectrum as you would have perceived the rendered spectrum" is arguably wrong. When perceiving the real spectrum, effects kick in that don't when using the conversion you described (e.g. you'd have to use a scotopic standard observer in low lighting conditions, as mentioned in Jameson, Hurvich: Visual Psychophysics). What you described is the idea of spectral rendering. Interpretation 2 is what I want to learn more about. The paper will be a good start, thanks for that. — David Kuri, Oct 05 '15 at 08:30

score 5 · Answer 2 · edited Oct 16 '15 at 08:20

When it comes to perception, there is also the issue of what we are almost blind to (in terms of properties or artifacts), and what we have enforced perception of.

For instance as for sound, you have contrast or frequencies that make you less or not aware of other contents (an old SIGGRAPH paper illustrated how texture can mask mesh resolution), plus all the time aspects (google for "change blindness"). Similarly, details of surface, normals, BRDF might or might not be seen depending of values and relative values.

Also since our perceptive system tends to adapt locally and globally, for low frequency values it is important to have maxima and minima at the right places, but their exact value won't really be noticed.

Sometimes cognition might be there to, like you allow a lot of errors in clouds and trees but surely less in human faces. (Sometimes you might downgrade this to statistics of parameter for a given category.)

That's why I prefer using the word "plausible" rather than "photorealistic".

Contrarily, we are ultra-sensitive to artifacts such as false pixels or flickering pixels, ultra-sensitive to correlations like fronts of error of 1 grey level only, unwanted patterns such as Moiré, aliasing or bad random, etc.

By the way it's one of the reasons that optimization-based solutions summing everything in a simple energy can be a very bad idea perceptively speaking as in all paradoxical situations this can be prone to concentration of errors in lines or points. For the same reason early global illumination people were really disappointed that energy exact solutions were less accepted than anti-aliased approximate shadows (and then came from quad-tree based to mesh-based adaptive methods).

A (quite general) overview about perception for graphics can be found in this SigAsia'11 course on "Perception in Graphics, Visualization, Virtual Environments and Animation"

Realistic rendering: which processes of the human eye and brain do I need to consider?

2 Answers2