Adjust first post
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful

This commit is contained in:
moanos [he/him] 2024-10-04 14:14:41 +02:00
parent 2ee702a151
commit 22156b404e

View File

@ -9,133 +9,145 @@ tags: ['django', 'geocoding', 'nominatim', 'OpenStreetMap', 'osm', 'traefik', 'm
# Introduction
In the [previous post]()
In the [previous post](geocoding-with-django/) I outlined how to set up a Nominatim server that allows us to find a geolocation for any address on the planet. Now let's use our newfound power in Django. Again, all code snippets are [CC0](https://creativecommons.org/public-domain/cc0/) so make free use of them. But I'd be very happy if you tell me if you use them for something cool!
# Nominatim
## Prerquisites
Nominatim is a software that uses OpenStreetMap data for geocoding. It can also do the reverse, find an address for any location on the planet. It is used for the geocoding on [OpenStreetMap](https://openstreetmap.org), so it's quite production-ready. We could use the public API (while obeying the [usage policy](https://operations.osmfoundation.org/policies/nominatim/)) but it's nicer to have our own instance, so we don't stress the resources of a donation funded organization and to improve user privacy.
* You have a working geocoding server or use a public one
* You have a working django app
Nominatim works by importing geodate from a [PBF](https://wiki.openstreetmap.org/wiki/PBF_Format)-file into a postgres database. This database will later be queried to provide location data. The process is described below.
If you want to do geocoding in a different environment you will still be able to use a lot of the the following examples, just skip the Django-specifics and configure the `GEOCODING_API_URL` according to your needs.
## DNS records
# Using the Geocoding API
Se let's start by setting the DNS records so that the domain `geocoding.example.org` points to your server. Adjust as needed.
| Value | Type | Target |
| --- | --- | --- |
| geocoding.example.org | CNAME | server1.example.org|
First of all, let's define the geocoding API URL in our settings. This enables us to switch easily if a service is not available. Add the following to you `settings.py`
## Docker-compose Configuration
We will use Docker Compose to run the official [Nominatim Docker image](https://hub.docker.com/r/mediagis/nominatim).
It bundles nominatim together with the database postgres. I usually prefere to have a central database for multiple services (e.g. allows easier backups) but for nominatim a seperate database is good for two reasons
* import process (described later) will not slow the database for other services
* it's easier to nuke everything if things go wrong
The following environment variables will be used to configure the container
* `PBF_URL`: The URL from where to download the PBF file that contains the geodate we will import. They can be obtained from [Geofabrik](https://download.geofabrik.de/). It is highly recommended to first download the file to a local server and then set this URL to that server so that the ressources from Geofabrik are not affected if something goes wrong. Feel free to use the pre-set URL for germany while it works if you want to test around.
* `REPLICATION_URL`: Where to get updates from. For example Geofabrik's update for the Europe extract are available at `https://download.geofabrik.de/europe-updates/` Other places at Geofabrik follow the pattern `https://download.geofabrik.de/$CONTINENT/$COUNTRY-updates/`
* `POSTGRES_` Postgres tuning data, the current setting allows imports on a ressource constrained system. See [postgres tuning docs](https://github.com/mediagis/nominatim-docker/tree/master/4.4#postgresql-tuning) for more info
* `NOMINATIM_PASSWORD`: Database password.
* `IMPORT_STYLE`: See below
**Import Styles**
Import styles will determin how much "resolution" the geocoding has. It has the following options
* `admin`: Only import administrative boundaries and places.
* `street`: Like the admin style but also adds streets.
* `address`: Import all data necessary to compute addresses down o house number level.
* `full`: Default style that also includes points of interest.
* `extratags`: Like the full style but also adds most of the OSM tags into the extratags column.
It has a huge impact on how long the import will take and how much space it will require. Be aware that the import time is on a machine with 32GB RAM, 4 CPUS and SSDs, these are not fixed numbers. My import of `admin` took 12 hours.
| Style | Import time | DB size | after drop |
| --- | --- | --- | --- |
| admin | 4h | 215 GB | 20 GB|
| street | 22h | 440 GB | 185 GB |
| address | 36h |545 GB | 260 GB |
Explaining *after drop* (from the [docs](https://nominatim.org/release-docs/3.3/admin/Import-and-Update/))
> About half of the data in Nominatim's database is not really used for serving the API. It is only there to allow the data to be updated from the latest changes from OSM. For many uses these dynamic updates are not really required. If you don't plan to apply updates, the dynamic part of the database can be safely dropped using the following command: `./utils/setup.php --drop`
I have not done this, so I don't have any experince with that. But probably it's a good idea if you don't need up-to-date data.
## Reverse Proxy
As with most of my projects, it runs on a server where the [mash-playbook](https://github.com/mother-of-all-self-hosting/mash-playbook) has deployed a [Traefik](https://doc.traefik.io/traefik/), as *Application Proxy*. I'll therefore use trafik labels to configure the revers proxy but the same could be achieved with Caddy or Nginx.
## Complete configuration
```
services:
nominatim:
environment:
- PBF_URL=https://cdn.hyteck.de/osm/germany-latest.osm.pbf
- REPLICATION_URL=https://download.geofabrik.de/europe/germany-updates/
- POSTGRES_SHARED_BUFFERS=1GB
- POSTGRES_MAINTENANCE_WORK_MEM=1GB
- POSTGRES_AUTOVACUUM_WORK_MEM=500MB
- POSTGRES_EFFECTIVE_CACHE_SIZE=1GB
- IMPORT_STYLE=admin
- NOMINATIM_PASSWORD=VERYSECRET
labels:
- "traefik.enable=true"
- "traefik.docker.network=traefik"
- "traefik.http.routers.nominatim.rule=Host(`geocoding.example.org`)"
- "traefik.http.routers.nominatim.service=nominatim-service"
- "traefik.http.routers.nominatim.entrypoints=web-secure"
- "traefik.http.routers.nominatim.tls=true"
- "traefik.http.routers.nominatim.tls.certResolver=default"
- "traefik.http.services.nominatim-service.loadbalancer.server.port=8080"
container_name: nominatim
image: mediagis/nominatim:4.4
restart: always
networks:
- traefik
volumes:
- nominatim-data:/var/lib/postgresql/14/main
- nominatim-flatnode:/nominatim/flatnode
shm_size: 1gb
volumes:
nominatim-flatnode:
nominatim-data:
networks:
traefik:
name: "traefik"
external: true
```python
# appname/settings.py
""" GEOCODING """
GEOCODING_API_URL = config.get("geocoding", "api_url", fallback="https://nominatim.hyteck.de/search") # Adjust if needed
```
## Importing
We can then add a class that interacts with the API.
```python
import logging
Now we are ready to go! Before you type `docker-compose up -d` let me explain what it will do
import requests
import json
from APPNAME import __version__ as app_version
from APPNAME import settings
1. Start the database
2. Download the PBF file from the given URL
3. Import the PBF file into the database. Here you are most likely to run into errors because of ressource constraints
4. Start the Nominatim server
If you are ready, lets go: `docker-compose up -d`. Monitor what nominatim is doing with `docker logs -f nominatim` and make a cup of tea. This will take a while (proably several hours).
class GeoAPI:
api_url = settings.GEOCODING_API_URL
# Set User-Agent headers as required by most usage policies (and it's the nice thing to do)
headers = {
'User-Agent': f"APPNAME {app_version}",
'From': 'info@example.org'
}
## Testing
def __init__(self, debug=False):
self.requests = requests # ignore why we do this for now
You can test your server by visiting the domain. Try `/?q=CITYNAME` to see an actual search result.
def get_coordinates_from_query(self, location_string):
result = self.requests.get(self.api_url, {"q": location_string, "format": "jsonv2"}, headers=self.headers).json()[0]
return result["lat"], result["lon"]
Example: `https://geocoding.example.org/?q=tuebingen`
def _get_raw_response(self, location_string):
result = self.requests.get(self.api_url, {"q": location_string, "format": "jsonv2"}, headers=self.headers)
return result.content
# Result
def get_geojson_for_query(self, location_string):
try:
result = self.requests.get(self.api_url,
{"q": location_string,
"format": "jsonv2"},
headers=self.headers).json()
except Exception as e:
logging.warning(f"Exception {e} when querying Nominatim")
return None
if len(result) == 0:
logging.warning(f"Couldn't find a result for {location_string} when querying Nominatim")
return None
return result
```
You should now have a running Nominatim instance that you can use for geocoding 🎉. Initially I wanted to show in the same post how you'd use this server to power area search in django but that will be in part 2. Feel free to ping me for questions, preferably at [@moanos@gay-pirate-assassins.de](https://gay-pirate-assassins.de/@moanos)
The wrapper is a synchronous interface to our geocoding server and will wait until the server returns a response or times out. This impacts the user experienc, as a site will take longer to load. But it's much easier to code, so here we are. If anyone wants to write a async interface for this I'll not stop them!
Oh and one last thing:
Fornow, let's start by adding `Location` to our `models.py`
## Legal requirements
```python
class Location(models.Model):
place_id = models.IntegerField()
latitude = models.FloatField()
longitude = models.FloatField()
name = models.CharField(max_length=2000)
Data from OpenStreetMap is licenced under the [Open Database License](https://opendatacommons.org/licenses/odbl/). The ODbL allows you to use the OSM data for any purpose you like but **attribution is required**. For showing map data, you'd usually display a small badge in the bottom left corner of the map. But geocoding also needs attribution, [as per this guideline](https://osmfoundation.org/wiki/Licence/Attribution_Guidelines#Geocoding_(search)).
def __str__(self):
return f"{self.name} ({self.latitude:.5}, {self.longitude:.5})"
@staticmethod
def get_location_from_string(location_string):
geo_api = geo.GeoAPI()
geojson = geo_api.get_geojson_for_query(location_string)
if geojson is None:
return None
result = geojson[0]
if "name" in result:
name = result["name"]
else:
name = result["display_name"]
location = Location.objects.create(
place_id=result["place_id"],
latitude=result["lat"],
longitude=result["lon"],
name=name,
)
return location
```
*Don't forget to make&run migrations after this*
An finally we can use the API!
```python
location = Location.get_location_from_string("Berlin")
print(location)
# Berlin, Deutschland (52.51, 13.38)
```
Looking good!
# Area search
Now wee have the coordinates - great! But how can we get the distance between coordinates? Lukily we are not the first people with that question and there is the [Haversine Formula](https://en.wikipedia.org/wiki/Haversine_formula) that we can use. It's not a perfect fomula, for example it assumes the erth is perfectly round which the earth is not. But for most use cases of area search this should be irrelevant for the final result.
Here is my implementation
```python
def calculate_distance_between_coordinates(position1, position2):
"""
Calculate the distance between two points identified by coordinates
It expects the coordinates to be a tuple (lat, lon)
Based on https://en.wikipedia.org/wiki/Haversine_formula
"""
earth_radius_km = 6371 # As per https://en.wikipedia.org/wiki/Earth_radius
latitude1 = float(position1[0])
longitude1 = float(position1[1])
latitude2 = float(position2[0])
longitude2 = float(position2[1])
distance_lat = radians(latitude2 - latitude1)
distance_long = radians(longitude2 - longitude1)
a = pow(sin(distance_lat / 2), 2) + cos(radians(latitude1)) * cos(radians(latitude2)) * pow(sin(distance_long / 2),
2)
c = 2 * atan2(sqrt(a), sqrt(1 - a))
distance_in_km = earth_radius_km * c
return distance_in_km
```
And with that we have a functioning area search 🎉