|
| 1 | +# Standardize Addresses with Geoapify API |
| 2 | + |
| 3 | +This project demonstrates how to batch-geocode addresses using the [Geoapify Geocoding API](https://www.geoapify.com/geocoding-api/) and produce standardized outputs in a customizable format. |
| 4 | + |
| 5 | +The script: |
| 6 | +- Reads addresses from a text file |
| 7 | +- Geocodes each address using the Geoapify Forward Geocoding API |
| 8 | +- Saves full geocoding results to an NDJSON file |
| 9 | +- Writes a standardized address list to a CSV file using a user-defined template |
| 10 | + |
| 11 | + |
| 12 | +## Requirements |
| 13 | + |
| 14 | +- Python 3.11 or higher |
| 15 | +- `pip` (Python package manager) |
| 16 | + |
| 17 | +## Setup Instructions |
| 18 | + |
| 19 | +### 1. Clone the Repository |
| 20 | + |
| 21 | +```bash |
| 22 | +git clone https://geoapify.github.io/maps-api-code-samples/ |
| 23 | +cd maps-api-code-samples/python/ |
| 24 | +``` |
| 25 | + |
| 26 | +### 2. (Optional) Create a Virtual Environment |
| 27 | + |
| 28 | +```bash |
| 29 | +python -m venv env |
| 30 | +source env/bin/activate # On Windows: env\Scripts\activate |
| 31 | +``` |
| 32 | + |
| 33 | +### 3. Install Dependencies |
| 34 | + |
| 35 | +```bash |
| 36 | +pip install requests |
| 37 | +``` |
| 38 | + |
| 39 | + |
| 40 | +## Usage |
| 41 | + |
| 42 | +```bash |
| 43 | +cd address-standardization |
| 44 | + |
| 45 | +python address_standardization.py \ |
| 46 | + --api_key YOUR_API_KEY \ |
| 47 | + --input input_example.txt \ |
| 48 | + --output geocoded.ndjson \ |
| 49 | + --standardized_output standardized_addresses.csv \ |
| 50 | + --format "{street} {housenumber}, {city}, {state_code}, {postcode}, {country}" |
| 51 | +``` |
| 52 | + |
| 53 | + |
| 54 | +## Command-Line Arguments |
| 55 | + |
| 56 | +| Argument | Required | Description | |
| 57 | +|--------------------------|----------|-----------------------------------------------------------------------------| |
| 58 | +| `--api_key` | Yes | Your [Geoapify API key](https://myprojects.geoapify.com). | |
| 59 | +| `--input` | Yes | Path to the input file (one address per line). | |
| 60 | +| `--output` | Yes | Path to the output NDJSON file with full geocoding results. | |
| 61 | +| `--standardized_output` | Yes | Path to the CSV file for formatted addresses. | |
| 62 | +| `--country_code` | No | Restrict results to a specific country (For example, `us`, `de`, `fr`, etc). | |
| 63 | +| `--format` | Yes | Template for standardized output using placeholders (see below). | |
| 64 | + |
| 65 | + |
| 66 | +## Address Format Placeholders |
| 67 | + |
| 68 | +The `--format` option lets you define how addresses should be output during **address standardization in Python** using this script. You can mix any of the following placeholders: |
| 69 | + |
| 70 | +- `{name}` – Place name |
| 71 | +- `{housenumber}` – House/building number |
| 72 | +- `{street}` – Street name |
| 73 | +- `{suburb}` – Suburb or neighborhood |
| 74 | +- `{district}` – District or borough |
| 75 | +- `{postcode}` – Postal code |
| 76 | +- `{city}` – City or town |
| 77 | +- `{county}` – County or administrative division |
| 78 | +- `{county_code}` – County code (if available) |
| 79 | +- `{state}` – State or province |
| 80 | +- `{state_code}` – State code (e.g., `"CA"` for California) |
| 81 | +- `{country}` – Country name |
| 82 | +- `{country_code}` – Country code (ISO 3166-1 alpha-2) |
| 83 | + |
| 84 | +Missing fields will be replaced with an empty string. |
| 85 | + |
| 86 | +Here are some ready-to-use format examples: |
| 87 | + |
| 88 | +```bash |
| 89 | +--format "{street} {housenumber}, {city}, {state_code}, {postcode}, {country}" |
| 90 | +``` |
| 91 | +**Standardized Output:** |
| 92 | +`Main Street 12, San Francisco, CA, 94105, United States` |
| 93 | + |
| 94 | +--- |
| 95 | + |
| 96 | +```bash |
| 97 | +--format "{name}, {street} {housenumber}, {postcode} {city}, {country_code}" |
| 98 | +``` |
| 99 | +**Standardized Output:** |
| 100 | +`Googleplex, Amphitheatre Parkway 1600, 94043 Mountain View, US` |
| 101 | + |
| 102 | +--- |
| 103 | + |
| 104 | +```bash |
| 105 | +--format "{country}, {postcode}-{city}, {street} {housenumber}" |
| 106 | +``` |
| 107 | +**Standardized Output:** |
| 108 | +`Germany, 10117-Berlin, Unter den Linden 77` |
| 109 | + |
| 110 | +--- |
| 111 | + |
| 112 | +```bash |
| 113 | +--format "{housenumber} {street}, {suburb}, {city}, {state}, {country}" |
| 114 | +``` |
| 115 | +**Standardized Output:** |
| 116 | +`221B Baker Street, Marylebone, London, England, United Kingdom` |
| 117 | + |
| 118 | +--- |
| 119 | + |
| 120 | +```bash |
| 121 | +--format "{street}, {city}, {country_code}" |
| 122 | +``` |
| 123 | +**Standardized Output:** |
| 124 | +`Champs-Élysées, Paris, FR` |
| 125 | + |
| 126 | +## Example |
| 127 | + |
| 128 | +**Input (`input.txt`):** |
| 129 | +``` |
| 130 | +1600 Amphitheatre Parkway, Mountain View, CA 94043, USA |
| 131 | +Unknown Place |
| 132 | +123 Example St, Springfield |
| 133 | +Platz der Republik, Berlin, Germany |
| 134 | +``` |
| 135 | + |
| 136 | +**Run:** |
| 137 | +```bash |
| 138 | +python address_standardization.py \ |
| 139 | + --api_key YOUR_API_KEY \ |
| 140 | + --input input.txt \ |
| 141 | + --output geocoded.ndjson \ |
| 142 | + --standardized_output standardized_addresses.csv \ |
| 143 | + --format "{street} {housenumber}, {city}, {state_code}, {postcode}, {country}" |
| 144 | +``` |
| 145 | + |
| 146 | +**Output (`standardized_addresses.csv`):** |
| 147 | +``` |
| 148 | +Original Address,Standardized Address |
| 149 | +"1600 Amphitheatre Parkway, Mountain View, CA 94043, USA","Amphitheatre Parkway 1600, Mountain View, CA, 94043, United States" |
| 150 | +"Unknown Place","" |
| 151 | +"123 Example St, Springfield","Example St 123, Springfield, IL, 62704, United States" |
| 152 | +"Platz der Republik, Berlin, Germany","Platz der Republik, Berlin, BE, 10557, Germany" |
| 153 | +``` |
| 154 | + |
| 155 | +## How the Script Works |
| 156 | + |
| 157 | +The script performs **address standardization in two main steps**: |
| 158 | + |
| 159 | +### 1. **Geocode Addresses with Rate Limiting** |
| 160 | + |
| 161 | +The function `geocode_addresses()` sends requests to the **Geoapify Geocoding API** in controlled batches. |
| 162 | +To comply with the Free plan’s limit of **5 requests per second (RPS)**: |
| 163 | +- The addresses are split into chunks of 5. |
| 164 | +- Each chunk is processed in parallel using threads. |
| 165 | +- After every batch, the script pauses for 1 second before sending the next one. |
| 166 | + |
| 167 | +```python |
| 168 | +def geocode_addresses(api_key, addresses, country_code): |
| 169 | + # Split addresses into batches |
| 170 | + addresses = list(it.batched(addresses, REQUESTS_PER_SECOND)) |
| 171 | + |
| 172 | + # Request results asynchronously for each address batch |
| 173 | + tasks = [] |
| 174 | + with ThreadPoolExecutor(max_workers=10) as executor: |
| 175 | + for batch in addresses: |
| 176 | + logger.info(batch) |
| 177 | + tasks.extend([executor.submit(geocode_address, address, api_key, country_code) for address in batch]) |
| 178 | + sleep(1) |
| 179 | + # Wait for results |
| 180 | + wait(tasks, return_when=ALL_COMPLETED) |
| 181 | + |
| 182 | + return [task.result() for task in tasks] |
| 183 | +``` |
| 184 | + |
| 185 | +1. **Batching Input with `itertools.batched()`** |
| 186 | + The function uses [`itertools.batched()`](https://docs.python.org/3/library/itertools.html#itertools.batched) to split the input list into chunks of 5 addresses. |
| 187 | + This helps control request throughput so that the script doesn't exceed the Geoapify Free plan's **5 requests per second (RPS)** limit. |
| 188 | + |
| 189 | +2. **Asynchronous Execution with `concurrent.futures.ThreadPoolExecutor`** |
| 190 | + Each address within a batch is submitted to a thread pool using [`concurrent.futures.ThreadPoolExecutor`](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor). |
| 191 | + This allows geocoding multiple addresses **in parallel**, improving performance and responsiveness. |
| 192 | + |
| 193 | +3. **Rate Limiting via `sleep(1)`** |
| 194 | + After each batch, the function waits 1 second (`sleep(1)`) to prevent exceeding the allowed request rate. |
| 195 | + |
| 196 | +4. **Waiting for All Results** |
| 197 | + The function uses [`concurrent.futures.wait()`](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.wait) to block until all geocoding tasks are complete. |
| 198 | + |
| 199 | +5. **Returning Results** |
| 200 | + It collects the final results from each task using `.result()` and returns them as a list. |
| 201 | + |
| 202 | +### 2. **Generate Standardized Addresses** |
| 203 | + |
| 204 | +Once the geocoding results are retrieved: |
| 205 | +- The function `generate_standard_addresses()` formats each address using a **user-defined template** (via the `--format` argument). |
| 206 | +- Placeholders like `{street}`, `{postcode}`, `{country}` are filled using data from the geocoding response. |
| 207 | +- If a result is missing or empty, the standardized address will be an empty string. |
| 208 | +- The output is written to a CSV file, pairing the original address with the formatted version. |
| 209 | + |
| 210 | +```python |
| 211 | +def generate_standard_addresses(output, addresses, address_format, geocode_results): |
| 212 | + # Write csv with standardized addresses |
| 213 | + with open(output, 'w', newline='') as f: |
| 214 | + csv_writer = csv.writer(f) |
| 215 | + csv_writer.writerow(["Original Address", "Standardized Address"]) |
| 216 | + for address, result in zip(addresses, geocode_results): |
| 217 | + # For empty geocoding result set empty string |
| 218 | + if not result or result.get('error'): |
| 219 | + standardized_address = '' |
| 220 | + else: |
| 221 | + # Fill template with values, fallback missing data to empty string |
| 222 | + standardized_address = address_format.format_map(GeocodeResult(**result)) |
| 223 | + csv_writer.writerow([address, standardized_address]) |
| 224 | +``` |
| 225 | + |
| 226 | +1. **Opens a CSV file** using [`csv.writer`](https://docs.python.org/3/library/csv.html#csv.writer) and writes a header row: |
| 227 | + `"Original Address", "Standardized Address"` |
| 228 | + |
| 229 | +2. **Iterates** through original addresses and geocoding results using `zip()`. |
| 230 | + |
| 231 | +3. **Handles invalid results** (missing or containing `"error"`) by outputting an empty string. |
| 232 | + |
| 233 | +4. **Formats valid results** using [`str.format_map()`](https://docs.python.org/3/library/stdtypes.html#str.format_map) and a `GeocodeResult` dict subclass that safely substitutes missing values with empty strings. |
| 234 | + |
| 235 | +5. **Writes each pair** to the output CSV. |
| 236 | + |
| 237 | +## Learn More |
| 238 | + |
| 239 | +- [Geoapify Geocoding API Documentation](https://apidocs.geoapify.com/docs/geocoding/) |
| 240 | + Details about available parameters, usage limits, and response formats. |
| 241 | + |
| 242 | +- [Geocoding API Playground](https://apidocs.geoapify.com/playground/geocoding/) |
| 243 | + Try out forward and reverse geocoding interactively. |
| 244 | + |
| 245 | +- [Address Standardization Overview](https://www.geoapify.com/solutions/address-lookup/) |
| 246 | + Learn what address standardization is and how to implement it effectively. |
| 247 | + |
| 248 | +- [Get your free Geoapify API key](https://myprojects.geoapify.com/) |
| 249 | + Sign up and start using the API with free daily limits. |
| 250 | + |
| 251 | +## License |
| 252 | + |
| 253 | +MIT License. See `LICENSE` file for details. |
0 commit comments