8

I'm wondering how large sites like StackOverflow handle their access logs. A write to the disk on every request seems a little bit uneconomical, but is Google Analytics that reliable to use it as your only information resource?

Stephen Ostermiller
  • 98,758
  • 18
  • 137
  • 361

3 Answers3

2

There is much information that web server logs contain which will never be available to Google Analytics, two things I can think of:

  • Errors like 404, etc.
  • Access on media files, like pictures, etc. (including external websites linking to your images)
  • IP addresses, although an answer to another question pointed out that it can be set as a user variable
  • Full length referral URLs, for example Google sends links from product search, web search, etc. each search has keywords but Google Analytics does not display the other variables like &source=products, etc.

Must be more stuff I just can't think of right now.

And there are also the error logs; must be important for a website to run smoothly, in my opinion. Not something you would ignore.

Zistoloen
  • 10,036
  • 6
  • 35
  • 59
Evgeny Zislis
  • 1,412
  • 1
  • 15
  • 26
  • Thats right, but how do they store their logs? File or database as suggested by Lèse majesté seems to heavy to me. –  Nov 02 '10 at 19:57
  • Actually, when you gzip this kind of repetitive data ... it compresses really well. So I imagine you can just store it as compressed text files and remove them after a while ... just your basic rotation. – Evgeny Zislis Nov 02 '10 at 22:09
2

On a *nix system you could use syslog-ng to store log messages on a dedicated log server for your load-balanced cluster(s) and then use a log analysis solution like Splunk to keep tabs on things - as for what the StackExchange sites actually run, may be a good question for StackOverflow Meta.

danlefree
  • 12,838
  • 4
  • 42
  • 59
0

I don't really look at these logs and end up deleting them on a monthly basis. I only look at them for trouble shooting. As for as monitoring application use Google Analytics, CrazyEgg and others do a great job.

Before such services existed these logs were very valuable. Now, they are good developer tools, but I don't know of any of my colleagues or friends that actively archive these logs or parse them for data.

Stephen Ostermiller
  • 98,758
  • 18
  • 137
  • 361
Frank
  • 1,451
  • 10
  • 23