I though I’d better make a post about recent lemmy.world federation issues, since not everyone sees the daily posts.

TL;DR:

  • Lemmy.world inbound federation is about 18 hours behind, and slowly catching up.
  • Outbound is fine: If you post or comment on something in a lemmy.world community, it will be almost instantly federated to other users, but if they reply to you then it will take 18 hours before you see it.
  • It was caused by an assumed Kbin bug in combination with a lemmy bug
  • All other instances seem fine for inbound and outbound federation

On Thursday last week Lemmy.world began to receive hundreds of thousands of “activities” (actions - comment, post, upvote, etc) from Kbin to their communities. These actions then federate from Lemmy.world out to any instance with users subscribed to those communities.

Unfortunately, this also helped uncover a big issue with how Lemmy handles inbound federation. Actions are sent one at a time, and the next can’t be sent until the last is done. This means the number of activities your server can receive depends on the latency between the sending and receiving server. In our case, the fact we are hosted in Auckland and Lemmy.world is hosted in Finland means that we can receive about 4 or 5 inbound activities from them per second.

When the Kbin issue happened, suddenly Lemmy.world had hundreds of thousands of activities to send to us in addition to the normal inflow of content. We got way behind in terms of Lemmy.world content coming to us. At the peak, we were around 28 hours behind, about 550k activities behind. We are now sitting at around 19 hours behind, and will likely remain that way until this afternoon, when it should improve some more (it’s now peak time on the other side of the world).

Here is a graph of the progress. Aussie.zone has a similar problem to us, but is closer to Europe where lemmy.world is hosted so the latency issue is less pronounced. We are one of the worst affected instances due to how it’s hard to get further away than where we are now.

graph of lemmy.nz and aussie.zone showing how many activities each are behind, over the past 3 or 4 days. Huge jump, peaking at 550k, then slowly coming down, lemmy.nz about halfway down (250k), and aussie.zone finally reaching the bottom, to be only a few hundred behind (seconds or minutes of delay instead of hours)

As you can see, with Aussie.zone being a bit closer, they were hit hard but not as hard as us, and have now managed to recover. We are still probably days away from being back in near real time federation with lemmy.world, but each day does seem to improve quite a bit.

Other than picking up the server and moving it to Europe, there doesn’t seem to be a lot we can do except wait and hope the lemmy bug is fixed before lemmy.world grows so large that users are creating content faster than we can receive it.

There have been some suggestions for short term band-aids. One is that we could set up a proxy in Europe, and filter out certain traffic. For example, we could prevent downvotes being sent to reduce the amount of traffic coming our way (since everything is sent one at a time, as per the ActivityPub protocol). I’m not super keen on doing anything that gets us out of sync with the others, though.

We could of course move hosting to Europe, but that gets a lot more expensive (since we’d have to pay for it instead of being free-loaders) and also means the site would feel more sluggish, since then that latency would be for users on every page load instead of the behind the scenes federation.

For now we are slowly getting back in sync with lemmy.world, and as long as there aren’t more Kbin bugs then we should be ok for now. Unfortunately doing things like blocking Kbin.social doesn’t seem to help, as lemmy.world still tries to send us the content from Kbin.social users. i.e. it’s rejected by our server, not prevented from being sent, so it still takes a slot.

Any questions, suggestions, or concerns, reply to this post!

  • deadbeef79000@lemmy.nz
    link
    fedilink
    arrow-up
    4
    ·
    8 months ago

    High latency to the rest of the world is our price to pay for being far enough away we survive nuclear winter.

    I can live with that!

    Hold the line until Lemmy is fixed to hassle federation annually asynchronously: it’s the only actual fix.

    • Dave@lemmy.nzOPM
      link
      fedilink
      arrow-up
      2
      ·
      8 months ago

      For the moment, holding the line is ok. We are slowly catching up. But if the fix doesn’t come soon, and lemmy.world continues to grow, there will be a time that we reach the tipping point and will never be able to keep up.

      Hopefully the fix comes before then!

      • deadbeef79000@lemmy.nz
        link
        fedilink
        arrow-up
        3
        ·
        8 months ago

        Kind of what I was meaning, in that it’s a fundamental flaw with lemmy’ federated implementation.

        It will fall over eventually, no matter how low one’s latency is.

        • Dave@lemmy.nzOPM
          link
          fedilink
          arrow-up
          3
          ·
          8 months ago

          Yeah, just unlucky we are one of the first affected. We are still slowly improving, so hopefully the bide our time approach will work.

    • Dave@lemmy.nzOPM
      link
      fedilink
      arrow-up
      2
      ·
      8 months ago

      That’s it! Apologies, I had thought I’d included the link, I must have forgotten to add it.

  • Piotr Sikora@pol.social
    link
    fedilink
    arrow-up
    2
    ·
    8 months ago

    @Dave hi! I’am working with kbin.social infrastructure, and we now fixed some issues. Can you try to re-enable federation with kbin.social?

    • Dave@lemmy.nzOPM
      link
      fedilink
      arrow-up
      1
      ·
      8 months ago

      I only had kbin.social on our block list for a day or so. when it didn’t help I removed it. We should be federating, and should have been for several days!