5

Does anyone know of a corpus of US addresses where the parts of the addresses have been tagged.

Something like

<streetNumber>123</streetNumber> 
<streetDirection>E.</streetDirection>
<streetName>Main</streetName> 
<streetType>Rd.</streetType>
,
<place>Chicago</place>
<state>IL</state>
<postalCode>60647</postalCode> 

I'm not looking for a corpus of addresses like OpenAddresses. I'm also not looking for a data standard for tagging addresses like the United States Thoroughfare, Landmark, and Postal Address Data Standard.

I'm looking for a corpus of addresses, where the parts of the addresses have been tagged. Something like this.

fgregg
  • 5,108
  • 16
  • 37

4 Answers4

3

http://openaddresses.io/

It's an initiative to collect open (CC0) addresses around Unites States and globally. You could transform these as you like..

sabas
  • 395
  • 1
  • 7
2

microformats do that, off hand look @ web data commons for a large repo:
http://webdatacommons.org/
more real world examples from microformats wiki:
http://microformats.org/wiki/dataset-examples#Real-World_Examples

albert
  • 11,885
  • 4
  • 30
  • 57
2

If the number of addresses is thousands at a time, and not more, you can use the Google Geocoding API for free. You can even use the API without a key for small sets of addresses.

The URL looks like this:

http://maps.googleapis.com/maps/api/geocode/json?sensor=false&address=Museum für Gestaltung Zurich Switzerland

And returns a JSON with structured and normalized data (click link above to see JSON response)


Downside:

The Geocoding API may only be used in conjunction with a Google map; geocoding results without displaying them on a map is prohibited. For complete details on allowed usage, consult the Maps API Terms of Service License Restrictions.

philshem
  • 17,647
  • 7
  • 68
  • 170
2

You can use Texas A&M's Address Parsing and Normalization service to turn any set of addresses into a tagged set of addresses.

Joe Germuska
  • 5,488
  • 20
  • 46