0

I'm trying to read a log file from a github url, add some geographic info using the IP as a lookup key, and then write some log info and the geographic info to a file. I've got the reading from and writing to file from the log, but I'm not sure what lib to use for looking up coordinates and such from an IP address, nor how to really go about this part. I found the regex module, and by the time I started to understand it, I found out it's deprecated. Here's what I've, got, any help would be great.

import urllib2 
apacheLog = 'https://raw.githubusercontent.com/myAccessLog.log'

data = urllib2.urlopen(apacheLog)
for line in data:
    with open('C:\LogCopy.txt','a') as f:
        f.write(line)
Bhargav Rao
  • 45,811
  • 27
  • 120
  • 136
RagePwn
  • 346
  • 2
  • 5
  • 20
  • So, you are now trying to parse 'C:\LogCopy.txt'? Show what you have *tried*. – Dr. Jan-Philip Gehrcke Feb 07 '15 at 19:27
  • I'm writing to C:\LogCopy.txt from the file on github. The manipulation will happen before I write to LogCopy. I don't know what to use to break the lines up, besides some messy slicing, maybe. It looks like file is in Common Log Format, and I think I can use %x to pull pieces out, but I don't know if that is just for use with regex or what. I'm just not sure where to start. I'm not asking for the answer, just a push in the right direction. – RagePwn Feb 07 '15 at 19:39
  • without knowing what output you expect it is pretty had to give any reasonable answer, there is a re module you can use. – Padraic Cunningham Feb 07 '15 at 19:47

2 Answers2

1
  1. The re module isn't deprecated, and is part of the standard library. Edit: here's the link for the 2.7 module
  2. Your for loop is opening and closing the file at each iteration. Probably not a big deal but it might be faster for large files to open the file once and write what needs to be written. Just swap the locations of the for and with lines.

So

data = urllib2.urlopen(apacheLog)
for line in data:
    with open('C:\LogCopy.txt','a') as f: # probably need a double backslash
        f.write(line)

becomes

data = urllib2.urlopen(apacheLog)
with open('C:\LogCopy.txt','a') as f: # probably need a double backslash
    for line in data.splitlines():
        f.write(line) # might need a newline character
        # f.write(line + '\n')
  1. Similar question regarding geolocation Python library

Best of luck!

Edit: added the data.splitlines() call after reading Piotr Kempa's answer

Community
  • 1
  • 1
Andy Kubiak
  • 169
  • 6
1

Well the first part is simple. Just use for line in data.split('\n') assuming the lines end with a normal newline (they should).

Then you use the re module (import re) - I hope it was still in use in python 2.7... You can extract the IP address with something like re.search(r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}", line), look up the re.search() function for details how to use it.

As for locating the IP geographically, it was already asked I think, try this question: What python libraries can tell me approximate location and time zone given an IP address?

Community
  • 1
  • 1
Piotr Kempa
  • 412
  • 2
  • 6
  • Oops, we posted two similar answers :) The part about moving the `open()` outside of the loop is a great suggestion in the other answer, you should follow it too! – Piotr Kempa Feb 07 '15 at 19:52