This article helped defined the “data engineer” role so I’d say it belongs here!

Although some time has passed, I find it very relevant: SQL is used more than ever, graphical ETL tools that don’t output code are rare and vendors are still trying to convince executives to trust all their data to proprietary data warehouses.

The author Maxime Beauchemin also wrote Airflow and Superset so they have some experience worth listening to.

  • CodeBlooded@programming.dev
    link
    fedilink
    arrow-up
    6
    ·
    11 months ago

    Unlike data scientists — and inspired by our more mature parent, software engineering — data engineers build tools, infrastructure, frameworks, and services.

    Just a comment on this note: At my company, I started changing our job posting titles from “Data Engineer” to “Software Engineer, Data.” “Data Engineer” is such a loose title which seems to change definition from company to company. I found that those “data engineer” postings attract lots of applicants who know enough SQL to be dangerous and programming wise- didn’t know much beyond making a “hello world” or a calculator in a single Python script. Software “best practices” and design principles were no where to be found. Those applicants were more “data analytics engineers” than “developers.”

    Once job titles changed to “software” engineering, we got the engineers we were looking for.

    • Reader9@programming.devOP
      link
      fedilink
      English
      arrow-up
      3
      ·
      11 months ago

      That seems like a good idea by removing ambiguity about what the necessary skills are.

      When joining a new company, I once asked a wise colleague “are you a data engineer or a backend engineer?”. They replied “I’m a software engineer” and ever since I have given the same answer, for reasons similar to your post.

      I have also seen “data engineer” used at facebook to indicate someone who writes SQL but not other programming languages, another potential reason not to use this as a job title IMO.

  • neil@programming.dev
    link
    fedilink
    arrow-up
    2
    ·
    11 months ago

    Man, SSIS really stunk. You’d end up having to write your own components anyways and had the extra layer of making them look like pricey RAD toolkit bits to satisfy empty suits. And then you’d have to write SSIS packages that wrote SSIS packages to deal with fluid schemas from multiple teams deploying all of the time.

    • Jim@programming.dev
      link
      fedilink
      English
      arrow-up
      2
      ·
      11 months ago

      I’ve said this before to other people, but over time, those tools eventually became what Airflow and other orchestration tools are: defining DAGs and running scripts.

      When I was using SSIS, eventually, every task was a C# or PowerShell executor instead of using the built-in functionality. So glad for Airflow and other modern tools today.

      • Reader9@programming.devOP
        link
        fedilink
        English
        arrow-up
        1
        ·
        11 months ago

        those tools eventually became what Airflow and other orchestration tools are: defining DAGs and running scripts

        Definitely. It is much more pleasant to work with better tools for the same functionality.

        Airflow got a lot of things right. For example in Luigi a runnable “task” is a python class that gets implicitly executed, whereas in Airflow tasks are made from functions that get called in a more straightforward/imperative manner. This makes DAGs much easier to read and write in Airflow.