Codeberg was asking about this. The linked toot by a commenter points to :

SEqlite

These are CC-BY-SA 4.0 remixes of the Stack Exchange Creative Commons Data Dumps. 100% Unendorsed by Stack Exchange, Inc.

They are minimal. They provide the data you probably care about and the data you need to comply with the original license in SQLite format.

    • ramble81@lemm.ee
      link
      fedilink
      arrow-up
      8
      arrow-down
      1
      ·
      2 months ago

      And guess what, it can be done just as easily, if not, more easily on a federated instance. You don’t gain at real additional control over your data (and no putting “covered under license X” is about as realistic as those Facebook posts saying “I don’t give anyone access to my posts”).

      I’ve said this before and I’ll say it again, realistically the only way to control your data from AI is a DRM type solution which everyone fundamentally hates.

      • DaseinPickle@leminal.space
        link
        fedilink
        arrow-up
        1
        ·
        edit-2
        2 months ago

        I don’t think this can be solved with any type of technology. It needs legislation. These AI companies need regulation.

    • huginn@feddit.it
      link
      fedilink
      arrow-up
      2
      arrow-down
      1
      ·
      2 months ago

      Federated Stack Exchange isn’t harder for AI to eat. If anything it’s easier.