--- title: "Where are you? - Part 1 - Geocoding with Nominatim to empower area search " date: 2024-09-28T12:05:10+02:00 draft: false image: "uploads/django_geocoding.png" categrories: ['English'] tags: ['django', 'geocoding', 'nominatim', 'OpenStreetMap', 'osm', 'traefik', 'mash-playbook', 'docker', 'docker-compose'] --- # Introduction Geocoding is the process of translating a text input like `Ungewitterweg, Berlin` into a location with longitude and latitude such as `52.544022/13.147589`. So whenever you search in OpenStreetMap or Google Maps for a location, it does exactly that (and sometimes more, but we don't focus on that now). For a pet project of mine ([notfellchen.org](https://notfellchen.org)) I wanted to do exactly that: When a animal is added there to be adopted, the user must input a location that is geocoded and saved with it's coordinates. When another user visits the site, that wants to adopt a pet in their area, they input their location and it will search for all animals in a specific radius. How is that done? I'll show you! # Nominatim Nominatim is a software that uses OpenStreetMap data for geocoding. It can also do the reverse, find an address for any location on the planet. It is used for the geocoding on [OpenStreetMap](https://openstreetmap.org), so it's quite production-ready. We could use the public API (while obeying the [usage policy](https://operations.osmfoundation.org/policies/nominatim/)) but it's nicer to have our own instance, so we don't stress the resources of a donation funded organization and to improve user privacy. Nominatim works by importing geodate from a [PBF](https://wiki.openstreetmap.org/wiki/PBF_Format)-file into a postgres database. This database will later be queried to provide location data. The process is described below. ## DNS records Se let's start by setting the DNS records so that the domain `geocoding.example.org` points to your server. Adjust as needed. | Value | Type | Target | | --- | --- | --- | | geocoding.example.org | CNAME | server1.example.org| ## Docker-compose Configuration We will use Docker Compose to run the official [Nominatim Docker image](https://hub.docker.com/r/mediagis/nominatim). It bundles nominatim together with the database postgres. I usually prefere to have a central database for multiple services (e.g. allows easier backups) but for nominatim a seperate database is good for two reasons * import process (described later) will not slow the database for other services * it's easier to nuke everything if things go wrong The following environment variables will be used to configure the container * `PBF_URL`: The URL from where to download the PBF file that contains the geodate we will import. They can be obtained from [Geofabrik](https://download.geofabrik.de/). It is highly recommended to first download the file to a local server and then set this URL to that server so that the ressources from Geofabrik are not affected if something goes wrong. Feel free to use the pre-set URL for germany while it works if you want to test around. * `REPLICATION_URL`: Where to get updates from. For example Geofabrik's update for the Europe extract are available at `https://download.geofabrik.de/europe-updates/` Other places at Geofabrik follow the pattern `https://download.geofabrik.de/$CONTINENT/$COUNTRY-updates/` * `POSTGRES_` Postgres tuning data, the current setting allows imports on a ressource constrained system. See [postgres tuning docs](https://github.com/mediagis/nominatim-docker/tree/master/4.4#postgresql-tuning) for more info * `NOMINATIM_PASSWORD`: Database password. * `IMPORT_STYLE`: See below **Import Styles** Import styles will determin how much "resolution" the geocoding has. It has the following options * `admin`: Only import administrative boundaries and places. * `street`: Like the admin style but also adds streets. * `address`: Import all data necessary to compute addresses down o house number level. * `full`: Default style that also includes points of interest. * `extratags`: Like the full style but also adds most of the OSM tags into the extratags column. It has a huge impact on how long the import will take and how much space it will require. Be aware that the import time is on a machine with 32GB RAM, 4 CPUS and SSDs, these are not fixed numbers. My import of `admin` took 12 hours. | Style | Import time | DB size | after drop | | --- | --- | --- | --- | | admin | 4h | 215 GB | 20 GB| | street | 22h | 440 GB | 185 GB | | address | 36h |545 GB | 260 GB | Explaining *after drop* (from the [docs](https://nominatim.org/release-docs/3.3/admin/Import-and-Update/)) > About half of the data in Nominatim's database is not really used for serving the API. It is only there to allow the data to be updated from the latest changes from OSM. For many uses these dynamic updates are not really required. If you don't plan to apply updates, the dynamic part of the database can be safely dropped using the following command: `./utils/setup.php --drop` I have not done this, so I don't have any experince with that. But probably it's a good idea if you don't need up-to-date data. ## Reverse Proxy As with most of my projects, it runs on a server where the [mash-playbook](https://github.com/mother-of-all-self-hosting/mash-playbook) has deployed a [Traefik](https://doc.traefik.io/traefik/), as *Application Proxy*. I'll therefore use trafik labels to configure the revers proxy but the same could be achieved with Caddy or Nginx. ## Complete configuration ``` services: nominatim: environment: - PBF_URL=https://cdn.hyteck.de/osm/germany-latest.osm.pbf - REPLICATION_URL=https://download.geofabrik.de/europe/germany-updates/ - POSTGRES_SHARED_BUFFERS=1GB - POSTGRES_MAINTENANCE_WORK_MEM=1GB - POSTGRES_AUTOVACUUM_WORK_MEM=500MB - POSTGRES_EFFECTIVE_CACHE_SIZE=1GB - IMPORT_STYLE=admin - NOMINATIM_PASSWORD=VERYSECRET labels: - "traefik.enable=true" - "traefik.docker.network=traefik" - "traefik.http.routers.nominatim.rule=Host(`geocoding.example.org`)" - "traefik.http.routers.nominatim.service=nominatim-service" - "traefik.http.routers.nominatim.entrypoints=web-secure" - "traefik.http.routers.nominatim.tls=true" - "traefik.http.routers.nominatim.tls.certResolver=default" - "traefik.http.services.nominatim-service.loadbalancer.server.port=8080" container_name: nominatim image: mediagis/nominatim:4.4 restart: always networks: - traefik volumes: - nominatim-data:/var/lib/postgresql/14/main - nominatim-flatnode:/nominatim/flatnode shm_size: 1gb volumes: nominatim-flatnode: nominatim-data: networks: traefik: name: "traefik" external: true ``` ## Importing Now we are ready to go! Before you type `docker-compose up -d` let me explain what it will do 1. Start the database 2. Download the PBF file from the given URL 3. Import the PBF file into the database. Here you are most likely to run into errors because of ressource constraints 4. Start the Nominatim server If you are ready, lets go: `docker-compose up -d`. Monitor what nominatim is doing with `docker logs -f nominatim` and make a cup of tea. This will take a while (proably several hours). ## Testing You can test your server by visiting the domain. Try `/?q=CITYNAME` to see an actual search result. Example: `https://geocoding.example.org/?q=tuebingen` # Result You should now have a running Nominatim instance that you can use for geocoding 🎉. Initially I wanted to show in the same post how you'd use this server to power area search in django but that will be in part 2. Feel free to ping me for questions, preferably at [@moanos@gay-pirate-assassins.de](https://gay-pirate-assassins.de/@moanos) Oh and one last thing: ## Legal requirements Data from OpenStreetMap is licenced under the [Open Database License](https://opendatacommons.org/licenses/odbl/). The ODbL allows you to use the OSM data for any purpose you like but **attribution is required**. For showing map data, you'd usually display a small badge in the bottom left corner of the map. But geocoding also needs attribution, [as per this guideline](https://osmfoundation.org/wiki/Licence/Attribution_Guidelines#Geocoding_(search)).