An anticapitalist tech blog. Embrace the technology that liberates us. Smash that which does not.

  • Jo Miran@lemmy.ml
    link
    fedilink
    English
    arrow-up
    23
    arrow-down
    1
    ·
    6 months ago

    You are polluting the data set. Do it a few times with different text sources and the scrubbers won’t know what part of your comment history is good. Replace, don’t delete.

    • ArbitraryValue@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      18
      arrow-down
      1
      ·
      edit-2
      6 months ago

      I’m pretty sure they’ll know that the first version of each comment is almost certainly the good one. People sometimes edit a comment to add new information or fix a typo, but they almost never replace nonsense with a good comment, rather than the other way around.

      Edit: fixed typos, also replaced excerpt from Moby Dick with this post.

      Edit 2: the comments you post here are totally available for machine learning, so I don’t see much of a point in deleting my Reddit comments as long as I’m participating in Lemmy.

      • Jo Miran@lemmy.ml
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        1
        ·
        edit-2
        6 months ago

        Maybe. Almost every comment I make I edit. The key is that by doing this you are inserting the possibility. It is actually easier, and safer, to just filter out edited comments than it is to try to sort out what’s good and what isn’t. The bottom line is that the best course of action is to avoid Reddit at all cost. If you do go there and feel compelled to comment, then coming back the next day to replace your comments a few times is better than “deleting”.

        • Blue_Morpho@lemmy.world
          link
          fedilink
          English
          arrow-up
          10
          ·
          6 months ago

          They don’t need to filter out edited comments. They keep the first version. It’s good enough.

        • brygphilomena@lemmy.world
          link
          fedilink
          English
          arrow-up
          5
          ·
          6 months ago

          You could easily compare old vs new and see how much has changed. If more is added, edit is good. If 80% matches, it was probably minor fixes.

          If nothing matches, then remove it from the data set and use the original comment. Which I’m sure they still have.

          • Blue_Morpho@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            6 months ago

            They know people are messing with the data so they aren’t going to trust anything changed a few days after first posted.

    • originalfrozenbanana@lemm.ee
      link
      fedilink
      English
      arrow-up
      6
      ·
      6 months ago

      Not in a meaningful way. It’s easy to detect and revert a change like this. Instead of bulk changing all your comments, you should slowly change them over time.

      Even then, users don’t usually edit most of their comments. Sure Reddit might be naive and just take the current comments, but it’s pretty trivial to reverse this kind of thing.

      Probably good to do it to make this process harder and more error prone for Reddit but I would not be under the impression that this has an impact beyond being annoying.

    • 4am@lemm.ee
      link
      fedilink
      English
      arrow-up
      2
      ·
      6 months ago

      Or it’ll help train the AI to recognize when that happens and more easily parse history for the relevant stuff.

      • Car@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        3
        ·
        6 months ago

        It’s already happened last year during the reddit exodus. The AI models either validate the data or not. This has a chance of working, which is better than doing nothing at all.

    • GBU_28@lemm.ee
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      1
      ·
      6 months ago

      Over a long period sure. If they see a spike where say, 25% of a user’s comments are changed in a day, then they’ll just use day -1