I’m a tech interested guy. I’ve touched SQL once or twice, but wasn’t able to really make sense of it. That combined with not having a practical use leaves SQL as largely a black box in my mind (though I am somewhat familiar with technical concepts in databasing).
With that, I keep seeing [pic related] as proof that Elon Musk doesn’t understand SQL.
Can someone give me a technical explanation for how one would come to that conclusion? I’d love if you could pass technical documentation for that.
As a data engineer for the past 20+ years: There is absolutely no fucking way that the us gov doesnt use sql. This is what shows that he’s stupid not only in sql but in data science in general.
Regarding duplications: its more nuanced than those statements each side put. There can be duplications in certain situations. In some situations there shouldnt be. And I dont really see how duplications in a db is open to fraud.
Well we heard what the Whitehouse press secretary has to say about the fraud they found 2 days ago. They found massive amounts and she brought receipts! All of them were examples of money being spent that disagree with Trump’s new policies. Like money spent on DEI initiatives and aid sent to countries in Africa to help slow the spread of HIV. That receipt was for a laughable $57,000.
Then when asked how any of it was fraud she said, well they consider that fraud because it wasn’t used to help Americans.
So the 27 year old married to a billionaire 32 years older than her is complaining that the money wasnt directly spent on her gold digging ass, and if it’s not spent directly on her, it’s fraud.
Biggest disgrace of a government that has ever existed.
Yeah, obviously ol’ boy is tripping if he thinks SQL isn’t used in the government.
Big thing I’m prying at is whether there would be a legitimate purpose to have duplicated SSNs in the database (thus showing the First Bro doesn’t understand how SQL works).
If it’s used as an identifier to link together rows from different tables. Also known as “joining” tables. SSN (with birthdate) is a unique identifier, and so it’s natural to choose as a primary/foreign key.
It really is baffling trying to make sense of what he is saying. It’s like the only explanation that makes any sense at all is that he has no idea what he is talking about. Even if he knew just cursory knowledge about database cardinality you wouldn’t say stuff so stupid.
Oh yeah? How about SCD? I bet all ssn are in an SCD.
It doesn’t matter without scope. Are we looking at a database of SSNs? tax records? A sign in log? The social security number database might require uniques in some way, but tax records could be the same person over multiple years. A sign in gives a unique identifier but you could be signing in every day.
It’s like saying a car VIN shows up multiple times in a database. Where? What database? Was it sold? Tickets? Registered every year?
This is nothing more than a “assume I mean immigrants or tax fraud and get mad!” inflammatory statement with no proof or reason.
There can be duplicate SSNs due to name changes of an individual, that’s the easiest answer. In general, it’s common to just add a new record in cases where a person’s information changes so you can retain the old record(s) and thus have a history for a person (look up Slowly Changing Dimensions (SCD)). That’s how the SSA is able to figure out if a person changed their gender, they just look up that information using the same SSN and see if the gender in the new application is different from the old data.
Another accusation Elon made was that payments are going to people missing SSNs. The best explanation I have for that is that various state departments have their own on-premise databases and their own structure and design that do not necessarily mirror the federal master database. There are likely some databases where the SSN field is setup to accept strings only, since in real life, your SSN on your card actually has dashes, those dashes make the number into a string. If the SSN is stored as a string in a state database, then when it’s brought over to the federal database (assuming the federal db is using a number field instead of text), there can be some data loss, resulting in a NULL.
JFC: married individuals, or divorced and name change back, would be totally fucked. Just on the very surface is his fuckery.
Hypothetically you could have a separate “previous names” table where you keep the previous names and on the main table you only keep the current name. There are a lot of ways to design a db to not unnecessarily duplicate SSNs, but without knowing the implementation it’s hard to say how wrong Musk is. But it’s obvious he doesn’t know what he’s talking about because we know that due to human error SSN-s are not unique and you can’t enforce uniqueness on SSN-s without completely fucking up the system. Complaining about it the way he did indicates that he doesn’t really understand why things are the way they are.
Another accusation Elon made was that payments are going to people missing SSNs.
A much simpler answer is thatnot all Americans actually have an SSN. The Amish for example have religious objections towards insurance, so they were allowed to opt out from social security and therefore don’t get an SSN.It’s true that some Americans don’t have Social Security numbers, but those Americans can’t collect Social Security benefits unless/until they get one.
My bad, I thought it was about payments in general (including other programs) but it says social security database. Sorry.
Musk’s statement about the government not using SQL is false. I worked for FEMA for fourteen years, a decade of which was as a Reports Analyst. I wrote Oracle SQL+ code to pull data from a database and put it into spreadsheets. I know, I know. You’re shocked that Elon Musk is wrong. Please remain calm.
I work for a crown corp in Canada we have, off the top of my head, about 800 MSSQL, Oracle, MySQL/MariaDB, Postgres databases across the org (I manage our CMDB). Musk is a retard. The world runs on SQL.
He wouldn’t know this though because he’s a techbro that builds apps with MongoDB b cause he doesn’t understand what normalizing data is and why SQL is the best option for 99.9999999% of applications.
Fucking idiots.
As a former DOD contractor I can also confirm we built whole platforms that use Oracle (shudder) SQL
Elmo Susk surely thinks they store everything on excel.
Pandas and Pickle man….
Yeah, obviously ol’ boy is tripping if he thinks SQL isn’t used in the government.
Big thing I’m prying at is whether there would be a legitimate purpose to have duplicated SSNs in the database (thus showing the First Bro doesn’t understand how SQL works).
Elon Musk is the walking talking embodiment of the Dunning-Kruger effect.
100%
What’s fascinating is you can take pretty much ANY topic, beside scamming at scale because there he truly is a master, you have some knowledge about and see very fast that he has no fucking clue. From engineering to video game, the guy has no idea. Sure his entourage, paid or not, might actually be World expert about said topic, but not him. So obvious.
It’s so basic that documentation is completely unnecessary.
“De-duping” could mean multiple things, depending on what you mean by “duplicate”.
It could mean that the entire row of some table is the same. But that has nothing to do with the kind of fraud he’s talking about. Two people with the same SSN but different names wouldn’t be duplicates by that definition, so “de-duping” wouldn’t remove it.
It can also mean that a certain value shows up more than once (eg just the SSN). But that’s something you often want in database systems. A transaction log of SSN contributions would likely have that SSN repeated hundreds of times. It has nothing to do with fraud, it’s just how you record that the same account has multiple contributions.
A database system as large as the SSA has needs to deal with all kinds of variations in data (misspellings, abbreviations, moves, siblings, common names, etc). Something as simplistic as “no dupes anywhere” would break immediately.
SSN is also not a valid unique key, there have been situations with multiple people issued the same SSN:
Yeah. And the fix for that has nothing to do with “de-duping” as a database operation either.
The main components would probably be:
- Decide on a new scheme (with more digits)
- Create a mapping from the old scheme to the new scheme. (that’s where existing duplicates would get removed)
- Let people use both during some transition period, after which the old one isn’t valid any more.
- Decide when you’re going to stop issuing old SSNs and only issue new ones to people born after some date.
There’s a lot of complication in each of those steps but none of them are particularly dependant on “de-duped” databases.
Just read the format of the us ssn in that wikipedia. That wasnt a smart format to use lol. Only supports 99*999 ( +/- 100k ) people per area code. No wonder numbers are reused.
In some countries its birthday+sequence number encoded with gender+checksum and that has been working since the 80’s.
Before that was a different number, but it wasnt future proof like the us ssn so we migrated away in the 80’s :')In my country the only way that someone has the same number is if someone was born on the same day (±1 century), in the same city and has the same name and family name. Is extremely difficult to have duplicates in that way (exception: immigrants, because the “city code” is the same for the whole foreign country, so it’s not impossible that there are two Ananya Gupta born on the same day in the whole India)
Oh ye, our system wouldnt fit india as its limited to 500 births a day ( sequence is 3, digits and depending if its even or uneven describes your gender ). Your system seems fine to me and beats the us system hands down haha
Because of course the government uses SQL. It’s as stupid as saying the government doesn’t use electricity or something equally stupid. The government is myriad agencies running myriad programs on myriad hardware with myriad people. My damned computers at home are using at least 2-3 SQL databases for some of the programs I run.
SQL is damn near everywhere where data sets are found.
Aha Airforce one likely uses SQL
AF1 probably needs a database just for it’s in in-flight menu.
It’s entirely possible that the database is pre SQL.
He didn’t say the SSN database isn’t SQL. He said the GOVERNMENT doesn’t use SQL.
Yeah, obviously ol’ boy is tripping if he thinks SQL isn’t used in the government.
Big thing I’m prying at is whether there would be a legitimate purpose to have duplicated SSNs in the database (thus showing the First Bro doesn’t understand how SQL works).
SSNs being duplicated would be entirely expected depending upon the table’s purpose. There are many forms of normalization in database tables.
I mean just think about this a little bit, if the purpose is transactions or something and each row has a SSN reference in it for some reason, you’d have a duplicate SSN per transaction row.
A tiny bit of learning SQL and you could easily see transactional totals grouped by SSN (using, get this, a group by clause). This shit is all 100% normal depending upon the normalization level of the schema. There are even – almost obviously – tradeoffs between fully normalizing data and being able to access it quickly. If I centralize the identities together and then always only put the reference id in a transactional table, every query that needs that information has to go join to it and the table can quickly become a dependency knot.
There was a “member” table for instance in an IBM WebSphere schema that used to cause all kinds of problems, because every single record was technically a “member” so everything in the whole system had to join to it to do anything useful.
had to join to it
I don’t think I get what this means. As you describe it, that reference id sounds comparable to a pointer, and so there should be a quick look up when you need to de-reference it, but that hardly seems like a “dependency knot”?
I feel like this is showing my own ignorance on the back end if databasing. Can you point me to references that explain this better?
I’m talking about a SQL join. It’s essentially combining two tables into one set of query results and there are a number of different ways to do it.
https://www.w3schools.com/sql/sql_join.asp
Some joins are fast and some can be slow. It depends on a variety of different factors. But making every query require multiple joins to produce anything of use is usually pretty disastrous in real-life scenarios. That’s why one of the basics of schema design is that you usually normalize to what’s called third normal form for transactional tables, but reporting schemas are often even less normalized because that allows you to quickly put together reporting queries that don’t immediately run the database into the ground.
DB normalization and normal forms are practically a known science, but practitioners (and sometimes DBAs) often have no clue that this stuff is relatively settled and sometimes even use a completely wrong normal form for what they are doing.
https://en.m.wikipedia.org/wiki/Database_normalization
In most software (setting aside well-written open source), the schema was put together by someone who didn’t even understand what normal form they were targeting or why they would target it. So the schema for one application will often be at varying forms of normalization, and schemas across different applications almost necessarily will have different normal forms within them even if they’re properly designed.
All that said, detecting, grouping, comparing, and removing duplicates is a basic function of SQL. It’s definitely not expected that, for instance, database tables would never contain a duplicate reference to a SSN. Leon is indeed demonstrating here that he’s a complete idiot when it comes to databases. (And he goes a step further by saying the government doesn’t use SQL when it obviously does somewhere. SQL databases are so ubiquitous that just about any modern software package contains one.)
Oh, well another user pointed out that SSN’s are not unique, I think they are recycled after death or something. In any case, I do know that when the SSN system was first created it was created by people who said this is NOT MEANT to be treated as unique identifiers for our populace, and if it were it would be more comprehensive than an unsecure string of numbers that anyone can get their hands on. But lo and behold, we never created a proper solution and we ended up using SSN’s for identity purposes. Poop.
I’m pretty sure there is a federal statute that says ONLY the SSA may collect or use SSNs, as to federal agencies. I argued it once when a federal agency court tried to tell me that it couldn’t process part of my client’s case without it. I didn’t care but my client was crotchety and would only even give me the last four.
Edit. It’s a regulation:
https://www.law.cornell.edu/cfr/text/28/802.23
An agency cannot require disclosure of an SSN for any right or benefit unless a specific federal statute requires it or the agency required the disclosure prior to 1975.
In my case the agency got back to me with some federal statute that didn’t say what they said it said, and eventually they had to admit they were wrong.
Its because the comments he made are inconsistent with common conventions in data engineering.
- It is very common not to deduplicate data and instead just append rows, The current value is the most recent and all the old ones are simply historical. That way you don’t risk losing data and you have an entire history.
- whilst you could do some trickery to deduplicate the data it does create more complexity. There’s an old saying with ZFS: “Friends don’t let friends dedupe” And it’s much the same here.
- compression is usually good enough. It will catch duplicated data and deal with it in a fairly efficient way, not as efficient as deduplication but it’s probably fine and it’s definitely a lot simpler
- Claiming the government does not use SQL
- It’s possible they have rolled their own solution or they are using MongoDB Or something but this would be unlikely and wouldn’t really refute the initial claim
- I believe many other commenters noted that it probably is MySQL anyway.
Basically what he said is
incoherentinconsistent with typical practices among data engineersto anybody who has worked with larger data.In terms of using SQL, it’s basically just a more reliable and better Excel that doesn’t come with a default GUI.
If you need to store data, It’s almost always best throw it into a SQLite database Because it keeps it structured. It’s standardised and it can be used from any programming language.
However, many people use excel because they don’t have experience with programming languages.
Get chatGpt to help you write a PyQT GUI for a SQLite database and I think you would develop a high level understanding for how the pieces fit together
Edit: @zalgotext made a good point.
Great explanation, but I have a tiny, tiny, minor nit-pick
Basically what he said is incoherent to anybody who has worked with larger data.
I’m being pedantic, but I disagree with your wording. As a backend dev, I work with relational databases a ton, and what Musk said wasn’t incomprehensible to me, it just sounded like something a first year engineer fresh out of college would say.
Again, the rest of your explanation is spot on, absolutely no notes, but I do think the distinction between “adult making up incomprehensible bullshit” and “adult cosplaying as a baby engineer who thinks he’s hot shit but doesn’t know anything beyond surface level stuff” is important.
Fair point, I’ve edited the answer to be clearer for future readers.
There’s an old saying with ZFS: “Friends don’t let friends dedupe”
That’s a bad example to reference. The ZFS implementation of deduplication is poorly thought out, and I say that even though I like and run ZFS on my own Linux server(s). I understand that the BTRFS implementation of dedupe works well (no first-hand experience), and the Windows one works great (first-hand experience).
I’ve had a poor experience with btrfs dedupe tbh (and a terrible experience with qgroups), however, this was years ago. Btrfs snapshots I prefer though, much easier not to have that dependence.
What distro are you using for ZFS, void?
Good to know, thanks. I haven’t worked with btrfs much yet. I have ZFS on a Debian server.
It was a great answer until the very last sentence. ChatGPT is never a reference for anything ever if you have any fraction of a brain.
I have a fraction of a brain, I think, and use ChatGPT as a guide so that I have something to start with. Even if it’s slightly off, my two brain cells can pick it out and go from there. It’s not so bad.
And you know, I get it if you don’t like AI, but let’s be honest about it at the very least.
deleted by creator
To be honest it’s a shit solution that makes you worse by merely using it.
I mostly ask it things I don’t know, though. I’m not exporting my thinking to it.
I ask it difficult translations, how to code something I’m unfamiliar with, help with grammar, i use it as an OCR for other languages, to help me remember things I can’t directly search, etc. I have a hard time believing all use is detrimental, especially when you’re filling in the gaps of your knowledge and a best guess will do. It’s surely better than a web search for things you don’t even know how to write in a search box.
You sound like common sense and the other person sounds like they have an axe to grind.
I mostly ask it things I don’t know, though. I’m not exporting my thinking to it.
Exhibit A
Which are then obviously confirmed with a web search. Jesus, spare me the cynicism.
And I’m just going to say this as a general observation, but the user base of the fediverse is pretty sophisticated at this time to be assuming shit like this. You make this place hostile by not giving the benefit of the doubt, you know. And even then. How hard is it to not think the worst of everyone you come across online? So ridiculous and petty.
I disagree, it’s just a tool. It’s a fantastic way to template applications very quickly, particularly for those who are not already familiar with technologies and may not have the time or opportunity to play around with things otherwise.
Llm is not a search engine and it can produce awful code. This is not production code, it’s for tinkering. As a sandbox tool, LLMs are fantastic.
On the ethical side of things, yeah openAI sucks, Qwen2.5 would be up to this task, one can run that locally.
It’s a disinformation machine which completely lacks all context. If it’s about 85% accurate to average internet denizens and 15% halucination, then it’s an absolutely atrocious source to learn from. You’re literally lying to yourself, that is what the tool does.
Well Ive ad a great time using LLMs to sandbox a dozen implementations and then investigate the shortcoming and advantages of different implementations.
Mistakes happen a lot but they can be managed on a small MWE with a couple of tests.
It’s how the tool is used more than any given tool being bad.
I understand your point and you’re not wrong. However, I’m not wrong either and you should take a second look at how you might use these tools in a way that makes your life easier and addresses the valid limitations you’ve described.
deleted by creator
- It is very common not to deduplicate data and instead just append rows, The current value is the most recent and all the old ones are simply historical. That way you don’t risk losing data and you have an entire history.
Having never seen the database schema myself, my read is that the SSN is used as a primary key in one table, and many other tables likely use that as a foreign key. He probably doesn’t understand that foreign keys are used as links and should not be de-duplicated, as that breaks the key relationship in a relational database. As others have mentioned, even in the main table there are probably reused or updated SSNs that would then be multiple rows that have timestamps and/or Boolean flags for current/expired.
Is this is true, then by this time we are all fucked. Like Monday someone checks their banking or retirement and it all gone. That’s gonna be a crazy day.
I hope they’re not using the actual SSN as the primary key. I hope its a big ass number that is otherwise unrelated.
It’s an insanely idiotic thing to say. Federal government IT is myriad, and done at a per agency level. Any relational database system, which the federal government uses plenty of, uses SQL in one way or another. Elon doesn’t know what he is talking about at all, and is being an ultimate idiot about this. Even in the context of mainframe projects thatif we are giving elong the benefit of doubt about referring to, most COBOL shoprbibknow have adapted to addressing internal data records using an SQL interface, although obviously in that legacy world it is insanely fractured and arcane.
Yeah, obviously ol’ boy is tripping if he thinks SQL isn’t used in the government.
Big thing I’m prying at is whether there would be a legitimate purpose to have duplicated SSNs in the database (thus showing the First Bro doesn’t understand how SQL works).
Well, if someone changes their name you’d add a new record with the same SSN to hold their new name, that way it keeps the records consistent with the paperwork; old papers say their old name and reference the retired record, new papers use their new name and reference the new record.
You can use the SSN as the key to find all records associated with a person, it doesn’t have to be a single row per SSN, in fact that would make the data harder to manage and less accurate.
E.g. if someone changes their last name after getting married, it could be useful to be able to have their current and former name in the database for reference.
Another commentor pointed out a legitimate use case, but it’s not even worth thinking about that much. De-duplocated is usually a word you use in data science to talk aboutakong sure your dataset is “hygienic” and that you aren’t duplicating data points. A database is much different because it is less about representing data, and more about storing it in a way that allows you to perform transactions at scale - retrieval, storage, modification, etc. Relational databases are analyzed in terms of data cardinality which essentially describes tradeoffs in representation between speed of retrieval (duplications good) vs storage efficiency (duplications bad).
The issue is that Elon is so vague and so off the mark that it is very hard to believe that he even has the first clue about what he is a talking about. Even you are confused just by reading it. It is all a tactic to convince others that he is smarter than he is while doing extreme damage to the hardworking people that actually make this stuff possible. Have you noticed that the man has never come to a conclusion that wasn’t in his interests? This is not honest intellectualism, or discussion based on technical merit. It’s self serving propaganda.
It’s more than just SQL. Social Security Numbers can be re-used over time. It is not a unique identifier by itself.
i’ve heard conflicting reports on this, i have no idea to what degree this is true, but i would be cautious about making this statement unless you demonstrate it somehow.
As read on wikipedia ( https://en.wikipedia.org/wiki/Social_Security_number ) the format only allows +/- 100k numbers per area code ( which is also limited to 999 codes? ), so over time you are forced to reuse some codes. In total the format allows 99m unique codes, and the us currently has 334mil people sooooo :')
On June 25, 2011, the Social Security Administration changed the SSN assignment process to “SSN randomization”,[36] which did the following:
The Social Security Administration does not reuse Social Security numbers. It has issued over 450 million since the start of the program, about 5.5 million per year. It says it has enough to last several generations without reuse and without changing the number of digits. https://www.ssa.gov/history/hfaq.html
evidently they must be doing something else on the backend for this to be working, assuming there are quite literally 100M numbers, which is going to be static due to math, obviously, but they clearly can’t be reassigning numbers to 3 people on average at any given time, without some sort of external mechanism.
There are approximately 420 million numbers available for assignment.
https://www.ssa.gov/employer/randomization.html
that certainly doesnt seem like it would support several generations, possibly at our current birth rate i suppose.
DDG AI bullshit tells me that there are a billion codes. https://www.marketplace.org/2023/03/10/will-we-ever-run-out-of-social-security-numbers/ this article says it’s 1 billion
https://www.ssn-verify.com/how-many-ssns
this website also lists it as approximately 1 billion.
I think i see the change. They are mentioning the ssn is 9 numbers long, which is 1 longer than the 3-3-2 format wikipedia mentions. That does mean its around 999mil numbers, which ye allows for a few generations ( like, 1 or 2 lol )
yeah, that sounds about right, ok i think we’ve figured this one out now. lol
The US government pays lots of money to Oracle to use their database. And it’s not for BerkleyDB either. (Poor sleepy cat). Oracle provides them support for their relational databases… and those databases use… SQL.
Now if Musk tries to end the Oracle contracts, then Oracle’s lawyers will go after his lawyers and I’m a gonna get me some popcorn. (But we all know that won’t happen in any timeline… Elon gotta keep Larry happy.)
Yeah, obviously ol’ boy is tripping if he thinks SQL isn’t used in the government.
Big thing I’m prying at is whether there would be a legitimate purpose to have duplicated SSNs in the database (thus showing the First Bro doesn’t understand how SQL works).
Big thing I’m prying at is whether there would be a legitimate purpose to have duplicated SSNs in the database
formally, changing the identity of someone would have a very explicit reason to keep a “duplicate” ssn entry, if purely for historical reasons for example. I’m sure there are a myriad of technical reasons to be doing this.
Because everyone hates IPv6?
Why not reuse SSN that are no longer are in service for whatever reason?
He gonna write everything in Pandas. Who the fuck needs to pay hundreds of millions a year to Oracle. (And I bet thats really how much they pay Oracle)
Also, ohh boy Oracle’s layers… those you dont wanna mess with.
He is saying the US government doesn’t use structured databases.
At least 90% of all databases have a structure.
Yeah, obviously ol’ boy is tripping if he thinks SQL isn’t used in the government.
Big thing I’m prying at is whether there would be a legitimate purpose to have duplicated SSNs in the database (thus showing the First Bro doesn’t understand how SQL works).
As someone explained in another comment, you often duplicate information due to rules around cardinality to gain improvements in retrieval an. structure. I would be pretty worried if SSSNs were being used as a a widepread primary key in any set of tables - those should generally be UUIDs that can be optimized for gashing while avoiding collisions.
Even if we are being generous to Elon, we could assume that social security payments are processed on mainframes given how many have to go out and the legacy nature of the program. Most mainframe shops I know have adapted an SQL interface for records in some capacity, but who knows what he is looking at.
Government federal IT is done at a per agency basis. I would say oracle database is pretty much the most licensed piece of software the government does use outside of Redhat Linux and windows desktop.
The ignorance of Elon is truly concerning, but somehow the worst part to me is Elon calling someone a retard for pointing that out.
Ableist, racist white supremacist doing their ableist-racist-white-supremacist thing.
He called a rescuer a pedophile for trying to rescue children…
After his own offer to rescue the children was turned down
TL;DR de-deuplication in that form is used to refer a technique where you reference two different pieces of data in the file system, with one single piece of data on the drive, the intention being to optimize file storage size, and minimize fragmentation.
You can imagine this would be very useful when taking backups for instance, we call this a “Copy on Write” approach, since generally it works by copying the existing file to a second reference point, where you can then add an edit on top of the original file, while retaining 100% of the original file size, and both copies of the file (its more complicated than this obviously, but you get the idea)
now just to be clear, if you did implement this into a DB, which you could do fairly trivially, this would change nothing about how the DB operates, it wouldn’t remove “duplicates” it would only coalesce duplicate data into one single tree to optimize disk usage. I have no clue what elon thinks it does.
The problem here, as a non programmer, is that i don’t understand why you would ever de-duplicate a database. Maybe there’s a reason to do it, but i genuinely cannot think of a single instance where you would want to delete one entry, and replace it with a reference to another, or what elon is implying here (remove “duplicate” entries, however that’s supposed to work)
Elon doesn’t know what “de-duplication” is, and i don’t know why you would ever want that in a DB, seems like a really good way to explode everything,
i genuinely cannot think of a single instance where you would want to delete one entry, and replace it with a reference to another
Well, there’s not always a benefit to keeping historical data. Sometimes you only want the most up-to-date information in a particular table or database, so you’d just update the row (replace). It depends on the use case of a given table.
what elon is implying here (remove “duplicate” entries, however that’s supposed to work)
Elon believes that each row in a table should be unique based on the SSN only, so a given SSN should appear only once with the person’s name and details on it. Yes, it’s an extremely dumb idea, but he’s a famously stupid person.
Well, there’s not always a benefit to keeping historical data. Sometimes you only want the most up-to-date information in a particular table or database, so you’d just update the row (replace). It depends on the use case of a given table.
in this case you would just overwrite the existing row, you wouldn’t use de-duplication because it would do the opposite of what you wanted in that case. Maybe even use historical backups or CoW to retain that kind of data.
Elon believes that each row in a table should be unique based on the SSN only, so a given SSN should appear only once with the person’s name and details on it. Yes, it’s an extremely dumb idea, but he’s a famously stupid person.
and naturally, he doesn’t know what the term “de-duplication” means. Definitionally, the actual identity of the person MUST be unique, otherwise you’re going to somehow return two rows, when you call one, which is functionally impossible given how a DB is designed.
in this case you would just overwrite the existing row, you wouldn’t use de-duplication because it would do the opposite of what you wanted in that case.
… That’s what I said, you’d just update the row, i.e. replace the existing data, i.e. overwrite what’s already there
Definitionally, the actual identity of the person MUST be unique, otherwise you’re going to somehow return two rows, when you call one, which is functionally impossible given how a DB is designed.
… I don’t think you understand how modern databases are designed
… That’s what I said, you’d just update the row, i.e. replace the existing data, i.e. overwrite what’s already there
u were talking about not keeping historical data, which is one of the proposed reasons you would have “duplicate” entries, i was just clarifying that.
… I don’t think you understand how modern databases are designed
it’s my understanding that when it comes to storing data that it shouldn’t be possible to have two independent stores of the exact same thing, in two separate places, you could have duplicate data entries, but that’s irrelevant to the discussion of de-duplication aside from data consolidation. Which i don’t imagine is an intended usecase for a DB. Considering that you literally already have one identical entry. Of course you could simply make it non identical, that goes without saying.
Also, we’re talking about the DB used for the social security database, not fucking tigerbeetle.
Ssn being unique isnt a dumb idea, its a very smart idea, but due to the us ssn format its impossible to do. Hence to implement the idea you need to change the ssn format so it is unique before then.
Also, elons remark is stupid as is. Im sure the row has a unique id, even if its just a rowid column.
Also, elons remark is stupid as is. Im sure the row has a unique id, even if its just a rowid column.
even then, i wonder if there’s some sort of “row hash function” that takes a hash of all the data in a single entry, and generates a universally unique hash of that entry, as a form of “global id”
How come republicans keep saying that doggy is going to expose all the fraud in the government but yet the biggest fraud with 37 felonies is president? What the actual fuck to these people think?
Because those are all false charges according to them.