The infamous answer to this question is:
It depends
The 256 x 256 recommendation that is given many times is a very good starting point. In general it will lead to decent render times and sufficcient image update cycles while you're waiting for the render to complete. But as with any renderer, tuning this setting can indeed lead to speed-ups, depending on
- the final render pixel size
- the complexity of the scene
- the tracing engine used
- the GPU you're running the render on
- how many pixels are irrelevant (because they don't sample anything, as there are no objects hit by the rays)
- likely many more factors
The most important one is the final render pixel size. If you render super high res, having a smaller number of tiles to be computed in total can speed up the rendering. At a resolution of 1280 x 720, you won't see much of a difference. At 8.000 x 6.000 you very likely will. When rendering on our TitanX rigs, we tend to go to 512 x 512, sometimes 1.024 x 1.024 tiles with such resolutions to cut render time. Less complex scenes also tend to support bigger tiles better, but again, no general rule here.
The GPU merely matters in terms of what generation it is (Fermi, Maxwell, Pascal, etc), the number of cores, and the amount of memory present. Maxwell GPUs can to my knowledge benefit from bigger tiles, with the others I have no personal experience.
The tracing engine I only mention because of a "feature" in Windows: TDR Delay. If you Google that, you'll find heated discussions plus a Microsoft support page on how to set it. Basically, when you render, a work package is sent to your GPU (the tile), and the system waits for the results to come back from the GPU (the resulting sampled image at a certain number of samples). If Windows does not hear back from the GPU after 2 seconds (the TDR delay), it restarts the graphics driver - and kills the rendering while at it. Especially when using Branched Path Tracing, the answer time for the GPU increases by A LOT. That causes renders to fail. Reducing the tile size is one fix to this, setting a higher TDR delay value another.
Auto Tile size
An Add-on called "Auto Tile Size" is bundled with Blender, which makes setting the tile size more convenient. When enabling it, you'll however notice that it actually doesn't set the render tile size to 256 x 256. Istead, it tries to choose values which are close to this, but round them up or down, so that the image is split up into tiles of all equal size. See this example:

The Render Dimensions are set to 1280 x 720 pixels, the tile size is set to 256 x 240. Why? Because that gives the GPU 5 x 3 = 15 tiles of equal size to process. If you keep the tile size at 256 x 256, the last row of tiles will have a height of 208 pixels, as 720 cannot be divided evenly by 256. This is potentially, but not necessarily, inefficient.