I fed this output:
https://lemmy-federate.com/api/community.find?input=%7B%22skip%22%3A0%2C%22take%22%3A10%7D
Into json_pp | grep name
, and got:
"name" : "science_memes",
"name" : "al_gore",
"name" : "applied_paranoia",
"name" : "windowmanagers",
"name" : "hihihi",
"name" : "media_reviews",
"name" : "petits_animaux",
"name" : "twnw",
"name" : "niagaraonthelake",
"name" : "niagarafalls",
That’s it. There are no more names. Inspecting the dataset seems to show a lot of communities, but only their number. Is there a separate table that maps community numbers to names?
(previous discussion for reference)
That’s interesting 🤨 It shouldn’t reduce the records until the last page.
The communities w/relationships is a much bigger dataset. I tried grabbing 100 records per fetch in a loop with no sleep or throttle. Page 523 had 100 records and page 524 was an empty file. I restarted with the skip at 523 and got to page 531. It died again, this time leaving a file that ended in the middle of a JSON field.
Any suggestions? I wonder if I should put a 1 or 2 second delay between pages so the server is not overloaded.
(update) wow, this is bulkier than I expected. 966mb. Hope that didn’t cause any problems. I guess I won’t do that full fetch again. I don’t suppose there an API parameter to select records with
updatedAt
newer than a specified date?(update 2) is
skip
the number of pages, or records? I treated it as pages but it’s starting to look like that’s number of records – which would mean I grabbed a lot of dupes. Sorry! (if that’s the case)(update 3) Shit… looks like skip is the number of records, which makes sense. Sorry for the waste! I’ll fix my script.
Good to hear the problem was that!