hyteck-blog/content/post/django-geocoding/index.md
moanos 2ee702a151
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
refactor: rename blog post
2024-09-28 13:33:15 +02:00

7.5 KiB

title date draft image categrories tags
Where are you? - Part 2 - Geocoding with Django to empower area search 2024-09-28T14:05:10+02:00 false uploads/django_geocoding2.png
English
django
geocoding
nominatim
OpenStreetMap
osm
traefik
mash-playbook
docker
docker-compose

Introduction

In the previous post

Nominatim

Nominatim is a software that uses OpenStreetMap data for geocoding. It can also do the reverse, find an address for any location on the planet. It is used for the geocoding on OpenStreetMap, so it's quite production-ready. We could use the public API (while obeying the usage policy) but it's nicer to have our own instance, so we don't stress the resources of a donation funded organization and to improve user privacy.

Nominatim works by importing geodate from a PBF-file into a postgres database. This database will later be queried to provide location data. The process is described below.

DNS records

Se let's start by setting the DNS records so that the domain geocoding.example.org points to your server. Adjust as needed.

Value Type Target
geocoding.example.org CNAME server1.example.org

Docker-compose Configuration

We will use Docker Compose to run the official Nominatim Docker image.

It bundles nominatim together with the database postgres. I usually prefere to have a central database for multiple services (e.g. allows easier backups) but for nominatim a seperate database is good for two reasons

  • import process (described later) will not slow the database for other services
  • it's easier to nuke everything if things go wrong

The following environment variables will be used to configure the container

  • PBF_URL: The URL from where to download the PBF file that contains the geodate we will import. They can be obtained from Geofabrik. It is highly recommended to first download the file to a local server and then set this URL to that server so that the ressources from Geofabrik are not affected if something goes wrong. Feel free to use the pre-set URL for germany while it works if you want to test around.
  • REPLICATION_URL: Where to get updates from. For example Geofabrik's update for the Europe extract are available at https://download.geofabrik.de/europe-updates/ Other places at Geofabrik follow the pattern https://download.geofabrik.de/$CONTINENT/$COUNTRY-updates/
  • POSTGRES_ Postgres tuning data, the current setting allows imports on a ressource constrained system. See postgres tuning docs for more info
  • NOMINATIM_PASSWORD: Database password.
  • IMPORT_STYLE: See below

Import Styles

Import styles will determin how much "resolution" the geocoding has. It has the following options

  • admin: Only import administrative boundaries and places.
  • street: Like the admin style but also adds streets.
  • address: Import all data necessary to compute addresses down o house number level.
  • full: Default style that also includes points of interest.
  • extratags: Like the full style but also adds most of the OSM tags into the extratags column.

It has a huge impact on how long the import will take and how much space it will require. Be aware that the import time is on a machine with 32GB RAM, 4 CPUS and SSDs, these are not fixed numbers. My import of admin took 12 hours.

Style Import time DB size after drop
admin 4h 215 GB 20 GB
street 22h 440 GB 185 GB
address 36h 545 GB 260 GB

Explaining after drop (from the docs)

About half of the data in Nominatim's database is not really used for serving the API. It is only there to allow the data to be updated from the latest changes from OSM. For many uses these dynamic updates are not really required. If you don't plan to apply updates, the dynamic part of the database can be safely dropped using the following command: ./utils/setup.php --drop

I have not done this, so I don't have any experince with that. But probably it's a good idea if you don't need up-to-date data.

Reverse Proxy

As with most of my projects, it runs on a server where the mash-playbook has deployed a Traefik, as Application Proxy. I'll therefore use trafik labels to configure the revers proxy but the same could be achieved with Caddy or Nginx.

Complete configuration

services:
  nominatim:
    environment:
      - PBF_URL=https://cdn.hyteck.de/osm/germany-latest.osm.pbf
      - REPLICATION_URL=https://download.geofabrik.de/europe/germany-updates/
      - POSTGRES_SHARED_BUFFERS=1GB
      - POSTGRES_MAINTENANCE_WORK_MEM=1GB
      - POSTGRES_AUTOVACUUM_WORK_MEM=500MB
      - POSTGRES_EFFECTIVE_CACHE_SIZE=1GB
      - IMPORT_STYLE=admin
      - NOMINATIM_PASSWORD=VERYSECRET
    labels:
      - "traefik.enable=true"
      - "traefik.docker.network=traefik"
      - "traefik.http.routers.nominatim.rule=Host(`geocoding.example.org`)"
      - "traefik.http.routers.nominatim.service=nominatim-service"
      - "traefik.http.routers.nominatim.entrypoints=web-secure"
      - "traefik.http.routers.nominatim.tls=true"
      - "traefik.http.routers.nominatim.tls.certResolver=default"
      - "traefik.http.services.nominatim-service.loadbalancer.server.port=8080"

    container_name: nominatim
    image: mediagis/nominatim:4.4
    restart: always
    networks:
      - traefik
    volumes:
      - nominatim-data:/var/lib/postgresql/14/main
      - nominatim-flatnode:/nominatim/flatnode
    shm_size: 1gb
volumes:
  nominatim-flatnode:
  nominatim-data:

networks:
  traefik:
    name: "traefik"
    external: true

Importing

Now we are ready to go! Before you type docker-compose up -d let me explain what it will do

  1. Start the database
  2. Download the PBF file from the given URL
  3. Import the PBF file into the database. Here you are most likely to run into errors because of ressource constraints
  4. Start the Nominatim server

If you are ready, lets go: docker-compose up -d. Monitor what nominatim is doing with docker logs -f nominatim and make a cup of tea. This will take a while (proably several hours).

Testing

You can test your server by visiting the domain. Try /?q=CITYNAME to see an actual search result.

Example: https://geocoding.example.org/?q=tuebingen

Result

You should now have a running Nominatim instance that you can use for geocoding 🎉. Initially I wanted to show in the same post how you'd use this server to power area search in django but that will be in part 2. Feel free to ping me for questions, preferably at @moanos@gay-pirate-assassins.de

Oh and one last thing:

Data from OpenStreetMap is licenced under the Open Database License. The ODbL allows you to use the OSM data for any purpose you like but attribution is required. For showing map data, you'd usually display a small badge in the bottom left corner of the map. But geocoding also needs attribution, as per this guideline.