Compare commits

...

14 Commits

Author SHA1 Message Date
677ea89517 ci: Allow specifying hugo version 2025-11-09 09:32:32 +01:00
06ba0029b2 fix: update runner
Because of
hugo: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by hugo)
2025-11-09 09:09:56 +01:00
ddddaaac8d fix: debug 2025-11-09 09:01:25 +01:00
241ead5cd0 fix: label? 2025-11-09 08:55:49 +01:00
8b57f3c096 feat: add forgejo ci 2025-11-09 08:17:02 +01:00
6ff17e27f5 fix: date 2025-11-08 22:27:16 +01:00
2a42e2e2a3 feat: minor corrections 2025-11-08 22:26:24 +01:00
a6f05525e3 feat: Add images 2025-11-08 22:16:34 +01:00
c8b813084b feat: add more content 2025-11-08 21:32:12 +01:00
428dcb8727 fix: feat: locally host known script 2025-11-08 21:31:51 +01:00
517ceacf79 fix: remove deprecated setting 2025-11-07 19:16:40 +01:00
83a0754e46 feat: add shortcode to embed html
This was implemented to embed drawio exports
2025-11-07 19:15:21 +01:00
57243489c8 feat: add gpa postmortem
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
2025-10-19 16:17:58 +02:00
c18e2a7ecf fix: typo 2025-08-03 14:05:31 +02:00
13 changed files with 8107 additions and 2 deletions

View File

@@ -0,0 +1,60 @@
name: Deploy Hugo Site
on:
push:
branches:
- main
- forge-ci
jobs:
build-and-deploy:
runs-on: ubuntu-24.04
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
submodules: recursive
fetch-depth: 0
- name: Install Hugo - latest version
run: |
apt-get update -y
apt-get install -y wget tar jq # Install necessary components
ldd --version
# If a Hugo version is provided in the secrets, use this
# else latest will be used
if [ -n "${{ secrets.HUGO_VERSION }}" ]; then
HUGO_VERSION="${{ secrets.HUGO_VERSION }}"
echo "Using Hugo version from secret: $HUGO_VERSION"
else
# Use the GitHub API to get information about the latest release, then use jq to find out the tag name
HUGO_VERSION=$(wget -qO- https://api.github.com/repos/gohugoio/hugo/releases/latest | jq -r '.tag_name')
echo "Using latest Hugo version: $HUGO_VERSION"
fi
# Use ${HUGO_VERSION#v} to strip the v from v1.0.0
# See "Substring Removal" in https://tldp.org/LDP/abs/html/string-manipulation.html
wget -O hugo.tar.gz "https://github.com/gohugoio/hugo/releases/download/${HUGO_VERSION}/hugo_extended_${HUGO_VERSION#v}_Linux-64bit.tar.gz"
tar -xzf hugo.tar.gz hugo
mv hugo /usr/local/bin/hugo
chmod +x /usr/local/bin/hugo
ldd /usr/local/bin/hugo
hugo version
- name: Build site
run: hugo --minify
- name: Deploy to server via rsync
env:
DEPLOY_HOST: ${{ secrets.DEPLOY_HOST }}
DEPLOY_USER: ${{ secrets.DEPLOY_USER }}
DEPLOY_PATH: ${{ secrets.DEPLOY_PATH }}
run: |
apt-get install -y rsync openssh-client
mkdir -p ~/.ssh
echo "${{ secrets.SSH_PRIVATE_KEY }}" > ~/.ssh/id_ed25519
chmod 600 ~/.ssh/id_ed25519
ssh-keyscan -H "$DEPLOY_HOST" >> ~/.ssh/known_hosts
rsync -avz --delete public/ "$DEPLOY_USER@$DEPLOY_HOST:$DEPLOY_PATH"

View File

@@ -7,7 +7,6 @@ disqusShortname = ""
# Enable Google Analytics by entering your tracking code
googleAnalytics = ""
preserveTaxonomyNames = true
paginate = 5 #frontpage pagination
[privacy]
# Google Analytics privacy settings - https://gohugo.io/about/hugo-and-gdpr/index.html#googleanalytics

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,122 @@
---
title: "How to manually check hundreds of animal shelters - every 14 days"
date: 2025-11-08T12:05:10+02:00
lastmod: 2025-11-08T21:05:10+02:00
draft: false
image: "uploads/checking-shelters.png"
categories: [ 'English' ]
tags: [ 'notfellchen', 'animal shelter', 'animal welfare', 'django' ]
---
I run a website called [Notfellchen](https://notfellchen.org) that list animals that are waiting for adoption. It's
currently restricted to fancy rats in Germany and that for good reason: Running this website involves **checking every
shelter every two weeks manually**. You need to visit the website, check if there are new animals, contact the shelter
and add them to notfellchen if they allow it. This takes time. A lot.
This blog post will outline some of the things I did in order to streamline this and make it possible to **check every
german shelter in 2.5 hours**.
## General process
When you establish a process. want others to help you or if you want to find inefficiencies, it's a good idea to
formalize it. So here is a rough BPMN diagram of the whole process.
{{< html animal-discovery.drawio.html >}}
## List of animal shelters
Focusing on the first step: We want to check the website of an animal shelter - but where do we get a list of animal
shelters from? Luckily there is an easy answer: [OpenStreetMap](https://openstreetmap.org) and I wrote a
whole [other blog post on how I imported and improved this data](https://hyteck.de/post/improve-osm-by-using-it/).
## Species-specific link
Importing this data provides us (most of the time) with a link to the shelter's website. However, rats are usually not
listed on the home page but on a subsite.
In order to save time, I introduced the concept of a species-specific link per organization and species.
So for the Tierheim Entenhausen this might look like this
| Species | Species specific link |
|---------|--------------------------------------------------------|
| Cat | https://tierheim-entenhausen.de/adoption/cats |
| Rats | https://tierheim-entenhausen.de/adoption/small-mammals |
As animal shelter pages look very different from each other, clicking this link provides an enormous time benefit
compared to clicking through a homepage manually.
# Org check page
I set up a special page to make it most efficient to check shelters. It's structured in four parts:
* **Stats**: The stats show how many animal shelters are checked in the last two weeks and how many to go.
* **Not checked for the longest period**: Shows the animal shelters to check next, it's therefore sorted by the date
they were last checked
* **In active communication**: A overview of the organizations where there is communication (or an attempt thereof).
This can take multiple das or even weeks so the internal comment field is very useful to keep track.
* **Last checked** It sometimes happens that I accidentally set a organization to "Checked" by accident. I added this
section to make it easier to revert that.
![](screenshot-checking-site.png)
## Shortcuts
To make it even faster to work through the organizations I added some shortcuts for the most common functionality and
documented the browser own shortcut to close a tab.
* `O`: Open website of the first organization
* `CTRL+W`: Close tab (Firefox, Chrome)
* `C`: Mark first organization as checked
## Results
After implementing all this, how long does it take now to check all organizations? Here are the numbers
| Measurement | |
|-----------------------------------------------------------|--------------|
| Time to check one organization (avg.) | 12.1s |
| Organization checked per minute | 4.96 org/min |
| Time to check all (eligible) german animal shelters (429) | 1 h 16 min |
This excludes the time, it takes to add animals or contact rescue organizations. One of these actions must be taken
whenever an eligible animal is found on a website. Here you can see how this interrupts the process:
![](progress.png)
And here is the breakdown of time per activity. A big caveat here is, that I did not follow up on previous conversations
here, therefore the contacting number is likely an underestimation.
| Activity | Time spent | Percentage |
|------------|------------|------------|
| Checking | 54 min 44s | 72.3% |
| Adding | 11 min 15s | 14.9% |
| Contacting | 9min 41s | 12.8% |
To me, this looks like a pretty good result. I can't say which optimizations brought how much improvement, but I'd argue
they all play a role in reaching the 12s per rescue organizations that is checked.
In order to check all german animal shelters, one needs to put in about 2 and a half hours every two weeks. That seems
reasonable to me. Further improvements of the likely do not lie in the organization check page but the contact process
and adoption notice form.
For now, I'm happy with the results.
# Addendum: Common annoyances
When doing this over the last few months I encountered some recurring issues that not only were annoying but also take
up a majority of the time. Here are some that stood out
* **Broken SSL encryption** So many animal shelters do not have a functioning SSL certificate. It takes time to work
around the warnings.
* **No results not indicated** More often than not, animal shelters do not have rats. However, when you visit a page
like [this](https://tierschutzliga.de/tierheime/tierparadies-oberdinger-moos/tiervermittlung/#?THM=TOM&Tierart=Kleintiere)
it's hard to know if there is a technical issue or if there are no animals for your search.
* **No static links** Sites where you have to click through a menu to get to the right page, but you can not link
directly to it.
* **No website** Seriously, there are some animal shelters that only use Instagram or Facebook to tell people about the
animals they have. This is not only a privacy nightmare, it's also incredibly hard to find out which information is
up-to-date. Furthermore, there exists no data structure, so posts about animals often miss crucial information like
the sex.
While I obviously have some grievances here, I know the organizations never have enough resources, and they'd
usually love to have a nicer website. Just keep that in mind too.

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 273 KiB

View File

@@ -0,0 +1,252 @@
---
title: "Postmortem - how to completely screw up an update"
date: 2025-10-19T12:05:10+02:00
lastmod: 2025-10-19T16:00:04+02:00
draft: false
image: "uploads/postmortem.png"
categories: [ 'English' ]
tags: [ 'backup', 'postmortem', 'fediverse', 'gotosocial' ]
---
The fediverse instance [gay-pirate-assassins.de](https://gay-pirate-assassins.de) was down for a couple of days. This
postmortem will outline what went wrong and what I did to prevent things from going that wrong in the future.
# Timeline
* 2025-10-05 17:26: [Update announcement](https://gay-pirate-assassins.de/@moanos/statuses/01K6TFQ1HVPAR6AYN08XYQ7XFV)
* 2025-10-05 ~17:45: Update started
* 2025-10-05 ~18:00: Services restart
* 2025-10-05 ~18:00: GoToSocial doesn't come up
* 2025-10-12 ~10:00: Issue is found
* 2025-10-12 10:30: Issue is fixed
* 2025-10-12 10:31: GoToSocial is started, migrations start
* 2025-10-12 15:38: Migrations finished successfully
* 2025-10-12 15:38: Service available again
* 2025-10-12 18:36:[Announcement sent](https://gay-pirate-assassins.de/@moanos/statuses/01K7CMGF7S2TE39792CMADGEPJ)
All times are given in CEST.
## The beginning: An update goes wrong
I run a small fediverse server with a few users called. [gay-pirate-assassins](https://gay-pirate-assassins.de/) which is powered by [GoToSocial](https://gotosocial.org/).
The (amazing) GoToSocial devs released `v0.20.0-rc1` and `v0.20.0-rc2`. As the new features seemed pretty cool, I'm
inpatient and the second release candidate seemed stable,
I decided to update to `v0.20.0-rc2`. So I stared a backup (via borgmatic), waited for it to finish and confirmed it ran
successfully.
Then I changed the version number in the [mash](https://github.com/mother-of-all-self-hosting/mash-playbook)-ansible
playbook I use. Then I pulled the newest version of the playbook and it's roles because I wanted to update all services
that run on the server. I checked
the [Changelog](https://github.com/mother-of-all-self-hosting/mash-playbook/blob/main/CHANGELOG.md),
didn't see anything and then started the update. It went through and GoToSocial started up just fine.
But the instance start page showed me 0 users, 0 posts and 0 federated instances. **Something has gone horribly wrong!**
## Migrations
It was pretty clear to me, that the migrations went wrong.
The [GoToSocial Migration notes](https://codeberg.org/superseriousbusiness/gotosocial/releases/tag/v0.20.0-rc1)
specifically mentioned long-running migrations that could take several hours. I assumed that somehow, during the running
database migration, the service must have restarted and left the DB in a broken state. This issue happened to me before.
Well, that's what backups are for, so let's pull it.
## Backups
Backups for this server are done two ways:
* via postgres-backup: Backups of the database are written to disk
* via [borgmatic](https://torsion.org/borgmatic/): Backups via borg are written to backup nodes, one of them at my home
They run every night automatically, monitored by [Healthchecks](https://healthchecks.io/). I triggered a manual run
before the update so that is the one I mounted using [Vorta](https://vorta.borgbase.com/).
And then the realization.
```
mash-postgres:5432 $ ls -lh
total 2.1M
-r-------- 1 moanos root 418K Oct 05 04:03 gitea
-r-------- 1 moanos root 123K Oct 05 04:03 healthchecks
-r-------- 1 moanos root 217K Oct 05 04:03 ilmo
-r-------- 1 moanos root 370K Oct 05 04:03 notfellchen
-r-------- 1 moanos root 67K Oct 05 04:03 oxitraffic
-r-------- 1 moanos root 931 Oct 05 04:03 prometheus_postgres_exporter
-r-------- 1 moanos root 142K Oct 05 04:03 semaphore
-r-------- 1 moanos root 110K Oct 05 04:03 vaultwarden
-r-------- 1 moanos root 669K Oct 05 04:03 woodpecker_ci_server
```
Fuck. The database gay-pirate-assassins is not there. Why?
To explain that I have to tell you how it *should* work: Services deployed by the mash-playbook are automatically wired
to the database and reverse proxy by a complex set of Ansible variables. This is great, because adding a service can
therefore be as easy as adding
```
healthchecks_enabled: true
healthchecks_hostname: health.hyteck.de
```
to the `vars.yml` file.
This will then configure the postgres database automatically, based on the `group_vars`. They look like this
```
mash_playbook_postgres_managed_databases_auto_itemized:
- |-
{{
({
'name': healthchecks_database_name,
'username': healthchecks_database_username,
'password': healthchecks_database_password,
} if healthchecks_enabled and healthchecks_database_hostname == postgres_connection_hostname and healthchecks_database_type == 'postgres' else omit)
}}
```
Note that a healthchecks database is only added to the managed databases if `healthchecks_enabled` is `True`.
This is really useful for backups because the borgmatic configuration also pulls the list
`mash_playbook_postgres_managed_databases_auto_itemized`. Therefore, you do not need to specify which databases to back
up, it just backs up all managed databases.
However, the database for gay-pirate assassins was not managed. In the playbook it's only possible to configure a
service once. You can not manage multiple GoToSocial instances in the same `vars.yml`. In the past, I had two instances
of GoToSocial running on the server. I therefore
followed [the how-to of "Running multiple instances of the same service on the same host"](https://github.com/mother-of-all-self-hosting/mash-playbook/blob/main/docs/running-multiple-instances.md).
Basically this means that an additional `vars.yml` must be created that is treated as a completely different server.
Databases must be created manually as they are not managed.
With that knowledge you can understand that when I say that the database for gay-pirate-assassins was not managed,
this means it was not included in the list of databases to be backed up. The backup service thought it ran successfully,
because it backed up everything it knew of.
So this left me with a three-month-old backup. Unacceptable.
## Investigating
So the existing database needed to be rescued. I SSHed into the server and checked the database. It looked completely
normal.
I asked the devs if they could me provide me with the migrations as they already did in the past. However, they pointed
out that the migrations are too difficult for that approach. They suggested to delete the oldest migration to force a
re-run of the migrations.
Here is where I was confused, because this was the `bun_migrations` table:
```
gay-pirate-assassins=# SELECT * FROM bun_migrations ORDER BY id DESC LIMIT 5;
id | name | group_id | migrated_at
-----+----------------+----------+-------------------------------
193 | 20250324173534 | 20 | 2025-04-23 20:00:33.955776+00
192 | 20250321131230 | 20 | 2025-04-23 19:58:06.873134+00
191 | 20250318093828 | 20 | 2025-04-23 19:57:50.540568+00
190 | 20250314120945 | 20 | 2025-04-23 19:57:30.677481+00
```
The last migration ran in April, when I updated to `v0.19.1`. Strange.
At this point I went on vacation and paused investigations, not only because the vacation was great, but also because I
bamboozeld by this state.
---
After my vacation I came back, and did some backups of the database.
```
$ docker run -e PGPASSWORD="XXXX" -it --rm --network mash-postgres postgres pg_dump -U gay-pirate-assassins -h mash-postgres gay-pirate-assassins > manual-backup/gay-pirate-assassins-2025-10-13.sql
```
Then I deleted the last migration, as I was advised
```
DELETE FROM bun_migration WHERE id=193;
```
and restarted the server. While watching the server come up it hit me in the face:
```
Oct 12 08:31:29 s3 mash-gpa-gotosocial[2251925]: timestamp="12/10/2025 08:31:29.905" func=bundb.sqliteConn level=INFO msg="connected to SQLITE database with address file:/opt/gotosocial/sqlite.db?_pragma=busy_timeout%281800000%29&_pragma=journal_mode%>
Oct 12 13:38:46 s3 mash-gpa-gotosocial[2304549]: timestamp="12/10/2025 13:38:46.588" func=router.(*Router).Start.func1 level=INFO msg="listening on 0.0.0.0:8080"
```
The server is **starting from a completely different database**! That explains why
* the last migration was never done
* the server showed me 0 users, 0 posts and 0 federated instances even though the postgres database had plenty of those
All of a sudden a SQlite database was configured. This happened because
of [this commit](https://github.com/mother-of-all-self-hosting/ansible-role-gotosocial/commit/df34af385f9765bda8f160f6985a47cb7204fe96)
which introduced SQlite support and set it as default. This was not mentioned in
the [Changelog](https://github.com/mother-of-all-self-hosting/mash-playbook/blob/main/CHANGELOG.md).
So what happened is, that the config changed and then the server was restarted and an empty DB was initialized. The
postgres DB never started to migrate.
## Fixing
To fix it, I did the following
1. Configure the playbook to use postgres for GoToSocial:
```
# vars.yml
gotosocial_database_type: postgres
```
2. Run the playbook to configure GoToSocial (but not starting the service)
```
just run-tags install-gotosocial
```
3. Check the configuration is correct
4. Start the service
The migrations took several hours but after that, everything looked stable again. I don't think there are any lasting
consequences. However, the server was unavailable for several days.
## Learnings
I believe the main issue here was not the change in the config that went unnoticed by me. While I'd ideally notice stuff
like this, the server is a hobby, and I'll continue to not check every config option that changed.
The larger issue was the backup. Having a backup would have made this easy to solve. And there are other, less lucky
problems where I'd be completely lost without a backup. So to make sure this doesn't happen again, I did/will do the
following:
### 1. Mainstream the config
As explained, I used a specific non-mainstream setup in the ansible playbook because, in the past, I ran two instances
of GoToSocial on the server. After shutting down one of them, I never moved gay-pirate-assassins to be part of the main
config. This means important parts of the configuration had to be done manually, which I botched.
So in the past week I cleaned up and gay-pirate-assassins is now part of the main `vars.yml` and will benefit from all
relevant automations.
### 2. Checking backups
I was confident in my backups because
* they run every night very consistently. If they fail e.g. because of a network outage I reliably get a warning.
* I verified successfully run of the backup job prior to upgrading
The main problem was me assuming that a successful run of the backup command, meant a successful backup. Everyone will
tell you that a backup that is not tested is not to be trusted. And they are right. However, doing frequent
test-restores
exceeds my time and server capacity. So what I'll do instead is the following:
* mount the backup before an upgrade
* `tail` the backup file as created by postgres-backup and ensure the data is from the same day
* check media folders for the last changed image
This is not a 100% guarantee, but I'd argue it's a pretty good compromise for now. As the frequency of mounting backups
increases and therefore becomes faster, I'll re-evaluate to do a test-restore at least semi-regulary.
## Conclusion
I fucked up, but I was lucky that my error was recoverable and no data was lost. Next time this will hopefully be not due
to luck, but better planning!
Any questions? Let me know!

View File

@@ -0,0 +1,22 @@
```
"
Oct 12 09:33:25 s3 mash-gpa-gotosocial[2304549]: timestamp="12/10/2025 09:33:25.266" func=cache.(*Caches).Start level=INFO msg="start: 0xc002476008"
Oct 12 09:33:25 s3 mash-gpa-gotosocial[2304549]: timestamp="12/10/2025 09:33:25.303" func=bundb.pgConn level=INFO msg="connected to POSTGRES database"
Oct 12 09:33:25 s3 mash-gpa-gotosocial[2304549]: timestamp="12/10/2025 09:33:25.328" func=migrations.init.110.func1 level=INFO msg="creating statuses column thread_id_new"
Oct 12 09:33:31 s3 mash-gpa-gotosocial[2304549]: timestamp="12/10/2025 09:33:31.872" func=bundb.queryHook.AfterQuery level=WARN duration=6.528757799s query="SELECT count(*) FROM \"statuses\"" msg="SLOW DATABASE QUERY"
Oct 12 09:33:31 s3 mash-gpa-gotosocial[2304549]: timestamp="12/10/2025 09:33:31.873" func=migrations.init.110.func1 level=WARN msg="rethreading 4611812 statuses, this will take a *long* time"
Oct 12 09:33:38 s3 mash-gpa-gotosocial[2304549]: timestamp="12/10/2025 09:33:38.111" func=migrations.init.110.func1 level=INFO msg="[~0.02% done; ~137 rows/s] migrating threads"
Oct 12 09:33:44 s3 mash-gpa-gotosocial[2304549]: timestamp="12/10/2025 09:33:44.618" func=migrations.init.110.func1 level=INFO msg="[~0.04% done; ~171 rows/s] migrating threads"
```
```
Oct 12 13:38:08 s3 mash-gpa-gotosocial[2304549]: timestamp="12/10/2025 13:38:08.726" func=migrations.init.110.func1 level=INFO msg="[~99.98% done; ~148 rows/s] migrating stragglers"
Oct 12 13:38:10 s3 mash-gpa-gotosocial[2304549]: timestamp="12/10/2025 13:38:10.309" func=migrations.init.110.func1 level=INFO msg="[~99.99% done; ~162 rows/s] migrating stragglers"
Oct 12 13:38:12 s3 mash-gpa-gotosocial[2304549]: timestamp="12/10/2025 13:38:12.192" func=migrations.init.110.func1 level=INFO msg="[~100.00% done; ~141 rows/s] migrating stragglers"
Oct 12 13:38:13 s3 mash-gpa-gotosocial[2304549]: timestamp="12/10/2025 13:38:13.711" func=migrations.init.110.func1 level=INFO msg="[~100.00% done; ~136 rows/s] migrating stragglers"
Oct 12 13:38:13 s3 mash-gpa-gotosocial[2304549]: timestamp="12/10/2025 13:38:13.714" func=migrations.init.110.func1 level=INFO msg="dropping temporary thread_id_new index"
```
#

View File

@@ -8,7 +8,7 @@ categories: ['English']
tags: ['crm', 'twenty', 'salesforce', 'django', 'self-hosting']
---
As some of you might know, I spend my day working with Salesforce, a very, very feature-rich CR that you pay big money to use.
I spend my day working with Salesforce, a very, very feature-rich CRM that you pay big money to use.
Salesforce is the opposite of OpenSource and the many features are expensive. Salesforce business model is based on this and on the lock-in effect.
If your company invested in implementing Salesforce, they'll likely pay a lot to keep it.

View File

@@ -0,0 +1,14 @@
{{ $source := index .Params 0 }}
<div class="embedded-html">
{{ with .Page.Resources.GetMatch $source | readFile }}
{{ replace . "https://viewer.diagrams.net/js/viewer-static.min.js" "/js/viewer-static.min.js"|safeHTML }}
{{ end }}
</div>
<style>
/* Reset the image width that is set in the theme for
If wired styling issues appear, check if the theme is responsible and eventually unset this here
*/
.geAdaptiveAsset {
width: unset;
}
</style>

7625
static/js/viewer-static.min.js vendored Normal file

File diff suppressed because one or more lines are too long

Binary file not shown.

After

Width:  |  Height:  |  Size: 146 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 86 KiB