This commit is contained in:
parent
2ee702a151
commit
22156b404e
@ -9,133 +9,145 @@ tags: ['django', 'geocoding', 'nominatim', 'OpenStreetMap', 'osm', 'traefik', 'm
|
||||
|
||||
# Introduction
|
||||
|
||||
In the [previous post]()
|
||||
In the [previous post](geocoding-with-django/) I outlined how to set up a Nominatim server that allows us to find a geolocation for any address on the planet. Now let's use our newfound power in Django. Again, all code snippets are [CC0](https://creativecommons.org/public-domain/cc0/) so make free use of them. But I'd be very happy if you tell me if you use them for something cool!
|
||||
|
||||
# Nominatim
|
||||
## Prerquisites
|
||||
|
||||
Nominatim is a software that uses OpenStreetMap data for geocoding. It can also do the reverse, find an address for any location on the planet. It is used for the geocoding on [OpenStreetMap](https://openstreetmap.org), so it's quite production-ready. We could use the public API (while obeying the [usage policy](https://operations.osmfoundation.org/policies/nominatim/)) but it's nicer to have our own instance, so we don't stress the resources of a donation funded organization and to improve user privacy.
|
||||
* You have a working geocoding server or use a public one
|
||||
* You have a working django app
|
||||
|
||||
Nominatim works by importing geodate from a [PBF](https://wiki.openstreetmap.org/wiki/PBF_Format)-file into a postgres database. This database will later be queried to provide location data. The process is described below.
|
||||
If you want to do geocoding in a different environment you will still be able to use a lot of the the following examples, just skip the Django-specifics and configure the `GEOCODING_API_URL` according to your needs.
|
||||
|
||||
## DNS records
|
||||
# Using the Geocoding API
|
||||
|
||||
Se let's start by setting the DNS records so that the domain `geocoding.example.org` points to your server. Adjust as needed.
|
||||
| Value | Type | Target |
|
||||
| --- | --- | --- |
|
||||
| geocoding.example.org | CNAME | server1.example.org|
|
||||
First of all, let's define the geocoding API URL in our settings. This enables us to switch easily if a service is not available. Add the following to you `settings.py`
|
||||
|
||||
## Docker-compose Configuration
|
||||
|
||||
We will use Docker Compose to run the official [Nominatim Docker image](https://hub.docker.com/r/mediagis/nominatim).
|
||||
|
||||
It bundles nominatim together with the database postgres. I usually prefere to have a central database for multiple services (e.g. allows easier backups) but for nominatim a seperate database is good for two reasons
|
||||
* import process (described later) will not slow the database for other services
|
||||
* it's easier to nuke everything if things go wrong
|
||||
|
||||
The following environment variables will be used to configure the container
|
||||
|
||||
* `PBF_URL`: The URL from where to download the PBF file that contains the geodate we will import. They can be obtained from [Geofabrik](https://download.geofabrik.de/). It is highly recommended to first download the file to a local server and then set this URL to that server so that the ressources from Geofabrik are not affected if something goes wrong. Feel free to use the pre-set URL for germany while it works if you want to test around.
|
||||
* `REPLICATION_URL`: Where to get updates from. For example Geofabrik's update for the Europe extract are available at `https://download.geofabrik.de/europe-updates/` Other places at Geofabrik follow the pattern `https://download.geofabrik.de/$CONTINENT/$COUNTRY-updates/`
|
||||
* `POSTGRES_` Postgres tuning data, the current setting allows imports on a ressource constrained system. See [postgres tuning docs](https://github.com/mediagis/nominatim-docker/tree/master/4.4#postgresql-tuning) for more info
|
||||
* `NOMINATIM_PASSWORD`: Database password.
|
||||
* `IMPORT_STYLE`: See below
|
||||
|
||||
**Import Styles**
|
||||
|
||||
Import styles will determin how much "resolution" the geocoding has. It has the following options
|
||||
|
||||
* `admin`: Only import administrative boundaries and places.
|
||||
* `street`: Like the admin style but also adds streets.
|
||||
* `address`: Import all data necessary to compute addresses down o house number level.
|
||||
* `full`: Default style that also includes points of interest.
|
||||
* `extratags`: Like the full style but also adds most of the OSM tags into the extratags column.
|
||||
|
||||
It has a huge impact on how long the import will take and how much space it will require. Be aware that the import time is on a machine with 32GB RAM, 4 CPUS and SSDs, these are not fixed numbers. My import of `admin` took 12 hours.
|
||||
|
||||
| Style | Import time | DB size | after drop |
|
||||
| --- | --- | --- | --- |
|
||||
| admin | 4h | 215 GB | 20 GB|
|
||||
| street | 22h | 440 GB | 185 GB |
|
||||
| address | 36h |545 GB | 260 GB |
|
||||
|
||||
Explaining *after drop* (from the [docs](https://nominatim.org/release-docs/3.3/admin/Import-and-Update/))
|
||||
|
||||
> About half of the data in Nominatim's database is not really used for serving the API. It is only there to allow the data to be updated from the latest changes from OSM. For many uses these dynamic updates are not really required. If you don't plan to apply updates, the dynamic part of the database can be safely dropped using the following command: `./utils/setup.php --drop`
|
||||
|
||||
I have not done this, so I don't have any experince with that. But probably it's a good idea if you don't need up-to-date data.
|
||||
|
||||
|
||||
|
||||
## Reverse Proxy
|
||||
|
||||
As with most of my projects, it runs on a server where the [mash-playbook](https://github.com/mother-of-all-self-hosting/mash-playbook) has deployed a [Traefik](https://doc.traefik.io/traefik/), as *Application Proxy*. I'll therefore use trafik labels to configure the revers proxy but the same could be achieved with Caddy or Nginx.
|
||||
|
||||
## Complete configuration
|
||||
|
||||
```
|
||||
services:
|
||||
nominatim:
|
||||
environment:
|
||||
- PBF_URL=https://cdn.hyteck.de/osm/germany-latest.osm.pbf
|
||||
- REPLICATION_URL=https://download.geofabrik.de/europe/germany-updates/
|
||||
- POSTGRES_SHARED_BUFFERS=1GB
|
||||
- POSTGRES_MAINTENANCE_WORK_MEM=1GB
|
||||
- POSTGRES_AUTOVACUUM_WORK_MEM=500MB
|
||||
- POSTGRES_EFFECTIVE_CACHE_SIZE=1GB
|
||||
- IMPORT_STYLE=admin
|
||||
- NOMINATIM_PASSWORD=VERYSECRET
|
||||
labels:
|
||||
- "traefik.enable=true"
|
||||
- "traefik.docker.network=traefik"
|
||||
- "traefik.http.routers.nominatim.rule=Host(`geocoding.example.org`)"
|
||||
- "traefik.http.routers.nominatim.service=nominatim-service"
|
||||
- "traefik.http.routers.nominatim.entrypoints=web-secure"
|
||||
- "traefik.http.routers.nominatim.tls=true"
|
||||
- "traefik.http.routers.nominatim.tls.certResolver=default"
|
||||
- "traefik.http.services.nominatim-service.loadbalancer.server.port=8080"
|
||||
|
||||
container_name: nominatim
|
||||
image: mediagis/nominatim:4.4
|
||||
restart: always
|
||||
networks:
|
||||
- traefik
|
||||
volumes:
|
||||
- nominatim-data:/var/lib/postgresql/14/main
|
||||
- nominatim-flatnode:/nominatim/flatnode
|
||||
shm_size: 1gb
|
||||
volumes:
|
||||
nominatim-flatnode:
|
||||
nominatim-data:
|
||||
|
||||
networks:
|
||||
traefik:
|
||||
name: "traefik"
|
||||
external: true
|
||||
```python
|
||||
# appname/settings.py
|
||||
""" GEOCODING """
|
||||
GEOCODING_API_URL = config.get("geocoding", "api_url", fallback="https://nominatim.hyteck.de/search") # Adjust if needed
|
||||
```
|
||||
|
||||
## Importing
|
||||
We can then add a class that interacts with the API.
|
||||
```python
|
||||
import logging
|
||||
|
||||
Now we are ready to go! Before you type `docker-compose up -d` let me explain what it will do
|
||||
import requests
|
||||
import json
|
||||
from APPNAME import __version__ as app_version
|
||||
from APPNAME import settings
|
||||
|
||||
1. Start the database
|
||||
2. Download the PBF file from the given URL
|
||||
3. Import the PBF file into the database. Here you are most likely to run into errors because of ressource constraints
|
||||
4. Start the Nominatim server
|
||||
|
||||
If you are ready, lets go: `docker-compose up -d`. Monitor what nominatim is doing with `docker logs -f nominatim` and make a cup of tea. This will take a while (proably several hours).
|
||||
class GeoAPI:
|
||||
api_url = settings.GEOCODING_API_URL
|
||||
# Set User-Agent headers as required by most usage policies (and it's the nice thing to do)
|
||||
headers = {
|
||||
'User-Agent': f"APPNAME {app_version}",
|
||||
'From': 'info@example.org'
|
||||
}
|
||||
|
||||
## Testing
|
||||
def __init__(self, debug=False):
|
||||
self.requests = requests # ignore why we do this for now
|
||||
|
||||
You can test your server by visiting the domain. Try `/?q=CITYNAME` to see an actual search result.
|
||||
def get_coordinates_from_query(self, location_string):
|
||||
result = self.requests.get(self.api_url, {"q": location_string, "format": "jsonv2"}, headers=self.headers).json()[0]
|
||||
return result["lat"], result["lon"]
|
||||
|
||||
Example: `https://geocoding.example.org/?q=tuebingen`
|
||||
def _get_raw_response(self, location_string):
|
||||
result = self.requests.get(self.api_url, {"q": location_string, "format": "jsonv2"}, headers=self.headers)
|
||||
return result.content
|
||||
|
||||
# Result
|
||||
def get_geojson_for_query(self, location_string):
|
||||
try:
|
||||
result = self.requests.get(self.api_url,
|
||||
{"q": location_string,
|
||||
"format": "jsonv2"},
|
||||
headers=self.headers).json()
|
||||
except Exception as e:
|
||||
logging.warning(f"Exception {e} when querying Nominatim")
|
||||
return None
|
||||
if len(result) == 0:
|
||||
logging.warning(f"Couldn't find a result for {location_string} when querying Nominatim")
|
||||
return None
|
||||
return result
|
||||
```
|
||||
|
||||
You should now have a running Nominatim instance that you can use for geocoding 🎉. Initially I wanted to show in the same post how you'd use this server to power area search in django but that will be in part 2. Feel free to ping me for questions, preferably at [@moanos@gay-pirate-assassins.de](https://gay-pirate-assassins.de/@moanos)
|
||||
The wrapper is a synchronous interface to our geocoding server and will wait until the server returns a response or times out. This impacts the user experienc, as a site will take longer to load. But it's much easier to code, so here we are. If anyone wants to write a async interface for this I'll not stop them!
|
||||
|
||||
Oh and one last thing:
|
||||
Fornow, let's start by adding `Location` to our `models.py`
|
||||
|
||||
## Legal requirements
|
||||
```python
|
||||
class Location(models.Model):
|
||||
place_id = models.IntegerField()
|
||||
latitude = models.FloatField()
|
||||
longitude = models.FloatField()
|
||||
name = models.CharField(max_length=2000)
|
||||
|
||||
Data from OpenStreetMap is licenced under the [Open Database License](https://opendatacommons.org/licenses/odbl/). The ODbL allows you to use the OSM data for any purpose you like but **attribution is required**. For showing map data, you'd usually display a small badge in the bottom left corner of the map. But geocoding also needs attribution, [as per this guideline](https://osmfoundation.org/wiki/Licence/Attribution_Guidelines#Geocoding_(search)).
|
||||
def __str__(self):
|
||||
return f"{self.name} ({self.latitude:.5}, {self.longitude:.5})"
|
||||
|
||||
@staticmethod
|
||||
def get_location_from_string(location_string):
|
||||
geo_api = geo.GeoAPI()
|
||||
geojson = geo_api.get_geojson_for_query(location_string)
|
||||
if geojson is None:
|
||||
return None
|
||||
result = geojson[0]
|
||||
if "name" in result:
|
||||
name = result["name"]
|
||||
else:
|
||||
name = result["display_name"]
|
||||
location = Location.objects.create(
|
||||
place_id=result["place_id"],
|
||||
latitude=result["lat"],
|
||||
longitude=result["lon"],
|
||||
name=name,
|
||||
)
|
||||
return location
|
||||
```
|
||||
|
||||
*Don't forget to make&run migrations after this*
|
||||
|
||||
An finally we can use the API!
|
||||
|
||||
```python
|
||||
location = Location.get_location_from_string("Berlin")
|
||||
print(location)
|
||||
# Berlin, Deutschland (52.51, 13.38)
|
||||
```
|
||||
|
||||
Looking good!
|
||||
|
||||
# Area search
|
||||
|
||||
Now wee have the coordinates - great! But how can we get the distance between coordinates? Lukily we are not the first people with that question and there is the [Haversine Formula](https://en.wikipedia.org/wiki/Haversine_formula) that we can use. It's not a perfect fomula, for example it assumes the erth is perfectly round which the earth is not. But for most use cases of area search this should be irrelevant for the final result.
|
||||
|
||||
Here is my implementation
|
||||
|
||||
```python
|
||||
def calculate_distance_between_coordinates(position1, position2):
|
||||
"""
|
||||
Calculate the distance between two points identified by coordinates
|
||||
It expects the coordinates to be a tuple (lat, lon)
|
||||
|
||||
Based on https://en.wikipedia.org/wiki/Haversine_formula
|
||||
"""
|
||||
earth_radius_km = 6371 # As per https://en.wikipedia.org/wiki/Earth_radius
|
||||
latitude1 = float(position1[0])
|
||||
longitude1 = float(position1[1])
|
||||
latitude2 = float(position2[0])
|
||||
longitude2 = float(position2[1])
|
||||
|
||||
distance_lat = radians(latitude2 - latitude1)
|
||||
distance_long = radians(longitude2 - longitude1)
|
||||
|
||||
a = pow(sin(distance_lat / 2), 2) + cos(radians(latitude1)) * cos(radians(latitude2)) * pow(sin(distance_long / 2),
|
||||
2)
|
||||
c = 2 * atan2(sqrt(a), sqrt(1 - a))
|
||||
|
||||
distance_in_km = earth_radius_km * c
|
||||
|
||||
return distance_in_km
|
||||
```
|
||||
|
||||
And with that we have a functioning area search 🎉
|
||||
|
Loading…
Reference in New Issue
Block a user