Weeks ago I had my moment of facing the attitude of keeping all this secret.
Just casually mention join_collapse_limit was tried behind-the scenes a month ago, then why are there zero post or comments in the entire Lemmy search for join_collapse_limit? I searched the entire GitHub project - no mention of join_collapse_limit. But Ready on the Spot to reveal the secret private communications tried join_collapse_limit log ago.
You know what join_collapse_limit is telling yo8u? Too many JOIN is a performance problem! The entire ORM Rust code and reliance on new JOIN is going to lead to other unpredictable performance problems that varies when there are 10,000 posts vs 2 million posts! And that’s exactly the history of 2023… watching the code performance wildly swing based on the size of communities being queried, etc.
What I see is that pull request for ideas get created only after noise is made on a subject. There is a lack of openness to make mistakes in public.
For me,** the server crashes are what annoys me**, not human beings working on solutions. But for most of the people on the project, what seems to anthem is needing to have proper tabs vs. spaces on source code and even adding layers of SQL formatting tools in the middle of what clearly can be described as an SQL performance crisis.
Things keep getting broken: the HTML sanitation took a few hours to add to the project but now weeks of broken titles, HTML code blocks, even URL parameters are now broken on everyday links. The changes to delete behavior have orphaned comments and that has gone on for weeks now.
Back to Basics
All this INSERT overhead, real-time counting. Real time votes. But it is only chewing up dead tuples with constant rewrites of PostgreSQL rows to +1 every single thing in the site to give non-cached results.
And it isn’t benefiting the SELECT side of reading that data, it’s burdening it.
The subscribed table is likely merged for federated and local users. But when it comes time to list posts, having to sort through remote users data in the same table is overhead for every post listing. Same goes for votes, and yes - every SELECT looks at granular votes - because it wants to show the UI which items were already voted on. But it’s a huge amount of data in that table to filter out all the votes on outdated posts, votes from user snot even on this server, etc.
And there are no limits… you could block every person and make the database have to labor away filtering out all the people you blocked. You can block a community. The testing code to reproduce these edge cases alone is a lot of work that isn’t being done… and it creates sitting time bomb that some user who hits the ‘save’ on every post or block on every user throws queries into wild behaviors.
I think some sanity has to be considered, like “2 weeks worth of posts” is how data is organized… and then at least someone who goes wild with save post or blocking users - there is a cut-off.
I think the personalization of data should pretty much be an entire post-production layer of the app. The core engine should be focused on post and comment storage and retrieval. “saved post” lists, blocking of instances, blocking of persons… let post-production deal with that.
There will be major world news events where people want to get in and see the latest comments, and the code will be crashing left and right because of personal block lists that some hand full of users built up to 80,000 people (on a single account) with some script file. Meanwhile, nobody has made a test script file to see what happens at 80,000 people on a block list…
Ok… so where to begin?
language choices. I think it’s a noble gesture, but it’s hard to ignore the overhead factor and all the end user who accidentally hide their posts and comments by getting confused by it.
all sorts but “Most comment”, “old”, and “Controversial” come down to recent posts. Nobody is complain about a 3 week old post not appearing… with one exception, featured. I think I have some tricks to play with featured. Can some basic sanity be added to the project by putting a limit on time? 3 days? Are most people here to browse the most recent 3 days of content? 7 days? Can all data be divided and organized around this? With the exception being: single community?
Is there a limp mode? Can something short of Beehaw and Lemmy.world turning off their entire front page - need to be built into the app. I think it needs to be done. In emergency / limp mode, you could cut off old data, or cut off personalization.
I think the project has fundamentally misinformed the population that servers are too busy because of too many users. I just don’t see that many users!! Everything I see is too many JOIN statements! Moving to new virgin servers starts with zero data, that’s why it worked. Lemmy.world has way more data than some empty instance that is 3 weeks old. And the project leaders have failed to understand or communicate this basic issue.