2023-08-19

RoundSparrow @ BT@bulletintree.com · 1 year ago

2023-08-19

RoundSparrow @ BT@bulletintree.com · 1 year ago

Back to Basics

All this INSERT overhead, real-time counting. Real time votes. But it is only chewing up dead tuples with constant rewrites of PostgreSQL rows to +1 every single thing in the site to give non-cached results.

And it isn’t benefiting the SELECT side of reading that data, it’s burdening it.

The subscribed table is likely merged for federated and local users. But when it comes time to list posts, having to sort through remote users data in the same table is overhead for every post listing. Same goes for votes, and yes - every SELECT looks at granular votes - because it wants to show the UI which items were already voted on. But it’s a huge amount of data in that table to filter out all the votes on outdated posts, votes from user snot even on this server, etc.

And there are no limits… you could block every person and make the database have to labor away filtering out all the people you blocked. You can block a community. The testing code to reproduce these edge cases alone is a lot of work that isn’t being done… and it creates sitting time bomb that some user who hits the ‘save’ on every post or block on every user throws queries into wild behaviors.

I think some sanity has to be considered, like “2 weeks worth of posts” is how data is organized… and then at least someone who goes wild with save post or blocking users - there is a cut-off.

I think the personalization of data should pretty much be an entire post-production layer of the app. The core engine should be focused on post and comment storage and retrieval. “saved post” lists, blocking of instances, blocking of persons… let post-production deal with that.

There will be major world news events where people want to get in and see the latest comments, and the code will be crashing left and right because of personal block lists that some hand full of users built up to 80,000 people (on a single account) with some script file. Meanwhile, nobody has made a test script file to see what happens at 80,000 people on a block list…

RoundSparrow @ BT@bulletintree.com · 1 year ago

Ok… so where to begin?

language choices. I think it’s a noble gesture, but it’s hard to ignore the overhead factor and all the end user who accidentally hide their posts and comments by getting confused by it.
all sorts but “Most comment”, “old”, and “Controversial” come down to recent posts. Nobody is complain about a 3 week old post not appearing… with one exception, featured. I think I have some tricks to play with featured. Can some basic sanity be added to the project by putting a limit on time? 3 days? Are most people here to browse the most recent 3 days of content? 7 days? Can all data be divided and organized around this? With the exception being: single community?
Is there a limp mode? Can something short of Beehaw and Lemmy.world turning off their entire front page - need to be built into the app. I think it needs to be done. In emergency / limp mode, you could cut off old data, or cut off personalization.

I think the project has fundamentally misinformed the population that servers are too busy because of too many users. I just don’t see that many users!! Everything I see is too many JOIN statements! Moving to new virgin servers starts with zero data, that’s why it worked. Lemmy.world has way more data than some empty instance that is 3 weeks old. And the project leaders have failed to understand or communicate this basic issue.