[Closed][BE] Check for dead federated instances (fixes #2221) #3427

issue_tracking_bot@lemm.ee · 2 years ago

[Closed][BE] Check for dead federated instances (fixes #2221) #3427

issue_tracking_bot@lemm.ee · 2 years ago

which send traffic out, but block incoming traffic, thus still tying up my federation workers.

The outbound federation code I think is basically an engine to denial-of-service peers. It makes no consideration that it is sending votes, comments, posts over and over to the same 100 servers and it just blindly queues http transactions with no concern that it is the same host it is already trying to communicate.

I’ve managed 1990’s e-mail MTA’s with almost all the same sending problems and more traffic in 1999 than I’ve seen Lemmy do this month, and you have to have awareness of your outbound queue to a particular (familiar/frequent) host. Store and forward is what Lemmy needs to manage the variety of different software (Kbin), low-budget hardware.

Right now the outbound design worries too much about not wasting storage but makes no consideration to just how much overhead there is to http transactions and mindlessly opening to the exact same server so many in a short period. It also does not give server operators an API to monitor their queues and activity, masking background information that is essential to know for server capacity planning and even attacks against the content/users of the site.

Community to Community replication is a huge mount of content and single HTTP transaction per message with all the federation boilerplate and signing is probably doomed. I do not consider the volume of messages as of July 1 to be that high, the crashing servers and maturing smartphone apps have held back a lot of the content - you get less replies for every comment that does not get shared.

I encourage something drastic on the outbound queue. I would suggest bite the bullet and make a big change now. Three ideas as to new direction:

put in a SQLite database (don’t put more load on PostgreSQL) and at minimum log every new outbound item there so you can have self-awareness of sending to a particular host is backed up. Maybe don’t store the individual comments and posts and only their id in the main PostgreSQL and which instance you need to deliver to.

Make the MTA part of Lemmy a different server app and service. Queue to the other app.

“Punt” and face the reality that the huge traffic potential of Community replication doesn’t go well with the boilerplate federation JSON structure (bulky overhead), single HTTP transaction per item, and even the digital signature overhead. Build a Community to Community, Lemmy to Lemmy, replication agent that uses the front-door API to do posts, comments, and votes. This has to be the majority of the traffic and overhead. The front-end API already can load 300 comments at a time, add some new API paths for accepting bulk input. Now you have a logged-in session and don’t have to put a digital signature on each individual comment. This also allows backfill when servers are down or new. I would make it a pull agent, even a non-logged in user can fetch comments and posts per-community (and users subscribe to community), that way you don’t need to log-in to remote Lemmy servers to pick up new messages (read only).

Drastic, I say. If a Rust programmer is handy, I’d throw in SQLite right now and build some structures to track by instance what is outbound queued for delivery. Also build an API to return some JSON on the queue sizes for server operators.

Originally posted by RocketDerp in #3427

[Closed][BE] Check for dead federated instances (fixes #2221) #3427

[Closed][BE] Check for dead federated instances (fixes #2221) #3427

Check for dead federated instances (fixes #2221) by Nutomic · Pull Request #3427 · LemmyNet/lemmy