A few years ago we were required to pass Microsoft's certification exams, one of the options was to do it online. Their requirement was that the name/surname in the application form had to be identical to what's in your ID (you also had to attach scans of your ID). The problem is, we're from Russia, and the form didn't allow Cyrillic letters, and there's only Cyrillic letters in a Russian ID. We had to fly all the way to St.Petersburg to pass exams offline in a Microsoft-approved certification center, instead of just doing it online in our office. Another option was getting a travel passport (it contains a Latin transliteration), but the deadlines didn't allow us to wait for up to 1 month (maximum time for issuing a travel passport here).
They refused to accept a romanization? If so that's doubly bad. Excusing technical deficiencies is one thing, but preventing people from working around them is indicative of a high level organizational failure.
Yes, it had to be identical. The problem with romanization of Cyrillic is that there's no commonly accepted standard, and I suspect the employees who process the applications were very unlikely to be Russian, so they would have no idea how to match our romanized variant to what's found on the ID scans. At least if they allowed Cyrillic in the submit form, they could have compared it visually as a set of pictograms, no need to know Cyrillic by heart. And it wasn't some US-only application thing we ran into by mistake, it talked about providing identification documents "of your country". But for some reason, they excluded all countries which don't use Latin alphabet.
Countries which are getting rarer and rarer. I'm in China and we havent yet faced such problems at the office (we too are mandated to get those certs).
But in China, there are 5 Microsoft-approved offline training centers per metro station in any city with IT employees + the flats are so small you cant possible be uninterrupted during the exam during covid, so we mostly go offline.
> This particular example name [Tanaka Tarou 田中太郎] is perhaps best known as the name of an alien in an anime series (and a manga). There have also been real people with this name.
Huh? That's a bit like saying the name "John Doe" is "perhaps best known from being the name of a character in a movie." It's just the stand-in, "generic" Japanese name that's used in examples. If an alien had the name in a cartoon it was probably a joke about the alien going out if its way to appear ordinary, the way the characters in Third Rock from the Sun were named Tom, Dick, and Harry.
Anyway, this article always kind of rubbed me the wrong way. OK, maybe someone's name is one character that's not possible to represent with Unicode. What do you want me to do about it?
The only way I can think of do that would be to not make a name field mandatory. That's certainly possible in some applications, but not in others. For example, I work in insurance and we need a name to put on the policy documents.
And considering the great number of official forms one must fill out which make the same assumptions that programmers foolishly believe, I am inclined to believe that The Artist Formerly Known as Prince or whoever already has some idea what they'll put in the fields to accommodate their unusual situation.
I think that part is pretty easy to understand, but if the author's intention is to have every single one of these issues addressed they're going about it the entirely wrong way.
I’m not sure anyone, including the author, considers addressing every one of those to be possible. But they want you to be aware of the assumptions you’re baking into your code, and maybe spend a few moments about which requirements you actually require. What do you think would be a better way of going about it?
> What do you think would be a better way of going about it?
I suppose it depends on what you're trying to accomplish. This article definitely reads like it's trying to annoy developers to the point of closing the page though, even if it's intentions are relatively agreeable. I'm reminded of the whole 'master vs main' debacle, where this kind of tribalism hit critical mass. There was huge potential for people to have a good discussion about acceptable practices in software development, but we gave it up in lieu of tribalism and us-vs-them politics. As a result, there was way more media blowback than necessary, which lead to a false sense of urgency and a whole lot of bugs.
So, I don't know. But I do know that a good discussion doesn't start with 40 theses and a hostage situation.
Most of these issues can be solved by simply providing a "Name" field that accepts Unicode and has a reasonable length. Don't validate it in any way except to require at least one non-whitespace character. Trim any excess whitespace and run the text through a Unicode normalization algorithm. At most, filter out Unicode blocks that can't possibly apply, such as emoji and maths symbols.
Done.
That covers something like 90% of the already pretty obscure corner-cases. The remainder are likely out-of-scope, such as children that have not-yet-been-named. They can't legally sign up for things in most (all?) jurisdictions, so it's not really worth considering in most scenarios.
This is already 10,000x better than ascii-only input fields for "Surname" and "Given Name" limited to something like 20 characters each. Where entering "Null" or "O'Neill" throws a HTTP/500 error because JSON is "so simple" that anyone can use it...
I consider breaking the fields up well worth the trade-off of people without a surname having some difficulty with it, because that situation is extremely rare, and the user experience is significantly worse if, e.g., I can't meaningfully sort names because I have no idea if you put your given name first or last, etc.
There's also really nothing about JSON that should dictate that you can't handle the name "Null" or "O'Neill." You don't even need special escaping for those.
It is meaningless to "sort" names across countries. Most languages have their own sort rules, and some languages have multiple sorts depending on the specific usage!
Sorting is almost never what you want.
Are you printing out a phone book onto dead trees? No? Then you do not need to sort names.
Just do substring search, and you've already got a UI that is 10x better than having people flip to 'A', then 'Aa' to get to 'Aaron', etc...
If you're about to say "but I need to define a sort order for a database table key" then you've made another mistake and also not read the article...
> Most languages have their own sort rules, and some languages have multiple sorts depending on the specific usage!
That is true. However, it's also obviously true that a sort order that tries to take all of these into account, instead of using the rules of the current user's language, is obviously useless, so I don't understand why you think it's a meaningful objection. Users are surprised and annoyed by an inability to sort names.
I guess I'm not too worried about sort order being incorrect, since it already is incorrect 99% of the time anyway—most programs sort strings "1" < "12" < "2", whereas that's... wrong. It annoys me several times per week, but we've all accepted it and moved on. Apple's OSes do it right, but that's about it—and even still, if you're in a shell, it's probably back to being wrong.
Probably you can't, because the person couldn't even type their name in. So if you're writing an application, that's one that's outside your control, and you can't do anything about.
Probably you can't, because the person couldn't even type their name in. So if you're writing an application, that's one that's outside your control, and you can't do anything about.
> the way the characters in Third Rock from the Sun were named Tom, Dick, and Harry.
Wow, I've watched that show for years and never caught that. I love the ordinary alien name trope, and my favorite incarnation is Ford Prefect from the Hitchhikers Guide series. (While I imagine it was a great gag in the UK for the alien to believe that cars are the dominant life form on earth, the joke was a bit lost on US readers because the model was not popular here.)
One real person with that name is the current editor of a Japanese science magazine [1] that has been published since 1931. I did some work for him about ten years ago. He is a very nice man.
> OK, maybe someone's name is one character that's not possible to represent with Unicode. What do you want me to do about it?
The example with The Artist Formerly Known as Prince is rather contrived. More realistic is the Chinese family names which are not yet supported by Unicode.
But if you can get away with it, it is perfectly reasonable business decision to only support names which can be represented as Unicode characters! The point of the article is just that you should be aware about the assumptions you make, and the less assumptions you need to make, the fewer problems.
A less contrived example would be the forms which required a last name of at least two characters. This is just dumb, since "O" is a perfectly valid and even reasonably common last name. So the writers put in extra effort to outlaw certain names, for no benefit whatsoever. Presumably they thought all real names are at least two characters long. Knowing this was a faulty assumption would have saved them the time they spent on making their site less useful.
In some countries, women have different last names than men do.
Eg. in macedonia, traditionally a person would get their fathers name as their surname (so if the fathers name was "Petar", and the son was Dragan, the lastname would be roughly translated to "Dragan of Peter"). Because there are different forms for feminine and masculine words, The son would be named "Dragan Petrovski", and the daughter (Eg Marija) would be named "Marija Petrovska".
This tanslated to modern times means, that if "Marija Ilijevska" married "Dragan Petrovski", she would then be named "Marija Petrovska". Their sons surname would again be "Petrovski" and daughter "Petrovska".
So basically matching parents and their kids by their last names has to take in account the -ski or -ska form .
In old Polish there was also a special form of name for unmarried women, so for example:
Jan Kowalski has a daughter, Anna Kowalska.
When she is young, she is called Anna Kowalszczonka, Kowaliczka or Kowalówna depending on who you asked (and you can't reliably reverse these forms into the original name)
When she is grown up, she is called Anna Kowalska.
As it happens, the country was part of the Russian empire, so she's also called Анна Ковалска or Aнна Kовалская, depending on who you asked.
When she marries Jan Kowal, she is called Anna Kowalowa or Anna Kowal.
When Jan Kowal dies and she marries Wawrzyniec Słowacki, she is called Anna Słowacka. But her husband is from Austria-Hungary, so he's also called Laurentius Slowacki somewhere in his documents, even though he never uses this name for anything.
Their son Stefan moves to Lithuania, but even though he's ethnically Polish, Lithuanian government requires him to have "Steponas Slovackis" printed on his documents.
In Spain, everyone has two surnames. Your first surname is your father's first surname, and your second surname is your mother's first surname. What this means is that in a family of mum, dad and kids, only the kids will share the same surname.
We had a German friend who married a Spanish man, and she insisted on changing her surname to his when they got married since that was traditional for her. But for Spanish people this was really weird, since it sounded like they were siblings. I actually haven't asked them since they had kids, but I guess their kids must have his first surname doubled (which is not unusual - María Sanchez Sanchez just means that the first surname of both parents was Sanchez).
This is the case in many countries. Nordic countries do this too (...-son and ...-dottir etc.) And FWIW, in Quebec, women are not allowed to take their husband's last name when they marry. They choose for the kids.
The names don’t change one has a subvariety: the falsehood that last names might change, but first names don’t. Transitioning showed me firsthand how many common products make changing specifically the first name impossible.
Can we next do Falsehoods web developers believe about there being any benefit to hijacking the native scroll functionality, and use this website as an example?
Javascript-based scrolljacking meant to enforce inertial smooth scrolling. Even when it works (it's glitchy on this page and doesn't always activate for me) it makes the page painful to use.
More than anything, this reflects how many business decisions a "programmer" makes.
If a "programmer" was actually just a programmer, none of their beliefs would matter. Either it met the spec, or not.
But in reality, a lot of the job of a "programmer" is to make all those detailed decisions, and handle all the exceptional cases, that other people don't want to be bothered to think about. But those detailed decisions and edge cases need to be handled because much of the value of software is running the business worldwide at all hours with fewer humans involved, so those edge cases will come up constantly.
So all these little decisions fall on the one writing software. That's actually an incredible amount of power once you realize it.
Because the speed at which they can write good specs cannot match a competitor doing iterative back and forth from an elevator discussion. Even in large slow banks, we churn feature releases weekly now with half finished stuff behind feature flags to see if they like it so far. And well, it works and they re happier and we can have lighter BAs, and spec mistakes cost a lot less as they are spotted faster, not after 3 months of blind work.
Oh and btw the value of software is not to handle every edge cases well, but to be able to change, imho.
> I fear that part of the reason that this blog post had less impact than I hoped was that Patrick did not give examples of how each assumption can be false.
I think the reason is that many assume that these cases are niche and not something they need to worry about.
Which might be the case, or might not, depending on the application. But either way, using celebrities or historical figures is not a great way to convince anyone otherwise.
> In some countries (notably French speaking) it is convention to write a person’s surname in all caps to make it clear which part of the name is the surname.
This is a French-specific thing? I didn't know that. I really like it, it makes it easier to know which part is the name and which part is the family name.
It's not fully French specific — conventionally, transliterated Japanese family names are rendered in ALL CAPS, particularly when it's unclear whether or not the name is being presented in Japanese-style family-name-first order to western audiences who may otherwise confuse the given and family names.
Programmers need to learn from bureaucrats of the 19th century. When bookkeeping of the people truly became a thing in the 1800s, industrialised nations sent out bureaucrats to collect the names of people in villages. Unfortunately, people seldom had enough names to fill in the forms the bureaucrats had. The idea of a family name was not something bestowed to nor necessary for a lowborn. If you're the only Jack in town, why bother with any other name than Jack?
Not contend with simply filling in Jack, bureaucrats would simply come up with a last name for them. Or - if they felt so inclined - ask the person to come up with one themselves. They would often chose their occupation. Americans know that a lot of modern family name spellings in the US are the result of careless bureaucrats at Ellis Island.
Back then, family names, house names, last names and surnames weren't necessarily the same thing. One might have more than one. You may be of one House, but your last name was Jackson (son of Jack). Also somewhat inconvenient for these bureaucrats with few fields on their forms. And also, stop changing last name from generation to generation.
Though, fortunately for these bureaucrats, unlike modern programmers, when the map did not fit the terrain, they could simply alter the terrain.
> If you're the only Jack in town, why bother with any other name than Jack?
In Le Guin's The Dispossessed [1] people on the satellite planet called Anarres get assigned an unique mononymous [2] name at birth. No one else alive at the time has the same name (and usually the name isn't re-used for a number of years). The protagonist is "Shevek".
So you are suggesting programmers should just assign people names conforming to some standardized schema? You might get away with that for a government website in some states, but good luck being a commercial site in competitive market and requiring customers to change their names before you will take their money.
The key takeway is that the simplest handling or names (as an opaque unicode string which is not assumed to be stable or unique or conform to any particular structure) is also the one which have the fewest problems.
The problems arise when you want to do more than just echo the name exactly as entered. Perhaps you want to show only last name in some context - now you assume everyone have a last name, and you are already in trouble.
Of course you can't always get away with treating names as opaque, so this is where you need to be very careful with the assumptions you make. The approach depends on the purpose of registering the name in the first place. For example if the purpose is to identify a person showing showing up to pick up a rental car, you just ask them to enter their name as it is stated on the drivers license. If you want to mail them a letter, ask for the name as it is stated on their mailbox.
There is no one-size-fits-all solution, but instead of worrying about the-artist-formerly-known-as-Prince or tribes communicating only with colored fabric, consider the use case for the name. If you have a web-shop you probably don't have to worry about orphan toddler refugees, but if you write software for hospitals you absolutely have to consider the case where the name is unknown. I'm sure Mr Artist-formerly-known-as-Prince has both a passport, credit card, legal name and drivers license, all with names representable as Unicode characters. The question is which one your application needs.
Btw. I find it amusing when web startups think they need a "web scale" distributed database system in order to be scalable to a billion users, but thinks handling characters outside of the ASCII-range is an obscure edge case which can be ignored.
Some systems even have to handle not-yet-borns. An example I have seen is that when pregnant in France, you might have to pre-register your future child for one reason or another (a common one is waiting lists for childcare, but some paperwork requires it in other situations). Ours had “À naître” [to be born] as a first name, as the forms did not accept no first name.
Realistic name from working in the NT: Someone with your name died, so you no longer have a name until the elders convene to give you one. This may take somewhere between hours and years.
However, after a period of time dictated by the hierarchial position of the one who died, you will be given the choice to retake your previous name, or maintain your new one, if you have it.
Several name changes with the differing clocks for switching may overlap. Each time, you may end up with interspersed periods of no name.
Every time a change occurs, you are not normally allowed to acknowledge that the previous name was ever attached to you. As such, your gov IDs have a numeric constant, like everyone else, but no name fields.
On an island of a thousand, I estimated roughly 10% of people at any one time had no name.
By the way, John Wyndham (author of The Day of the Triffids. Mentioned in the article) also wrote a novel titled, "The Kraken Wakes". Which is a hilarious, messed up and excellent piece of scifi full of early 20th century flavorful goodness.
"with examples" => Follows up by not giving examples to a bunch of entries
"Confound your cultural relativism! People in my society, at least, agree on one commonly accepted standard for names.
And will your software only be dealing with people named by your society?"
"I can safely assume that this dictionary of bad words contains no people’s names in it.
This is a common mistake – many “bad words” are not bad words in other languages, and some are used in names. Moreover, not every society restricts what words may be used in a name; it’s perfectly possible that someone’s name may have been established in such a jurisdiction."
> People in my society, at least, agree on one commonly accepted standard for names.
For context, patio11, the author of the original list (without examples), emigrated to Japan and has noted that it is (was?) common for computer systems there to have no idea what to do with his English name. This included his Japanese employer, who apparently assumed all employees would have Japanese names.
> I can safely assume that this dictionary of bad words contains no people’s names in it.
I am a Spaniard (long & convoluted name) living in Japan, unfortunately I know what patio11 is talking about. I literally have to call today a place to "fix" an application after being rejected twice because "my name was wrong", we'll see how the spelling of my 31-character-long name goes.
I can do the bad words one: in Thai, the root word “porn” (pronounced like the English word “pawn”) can be loosely translated as “blessing” and forms the basis of many given names, surnames and place names. Any blacklist that includes the word “porn” will cut out a lot of totally normal Thai names like Siriporn or Pornrapat.
> People whose names break my system are weird outliers. They should have had solid, acceptable names, like 田中太郎.
> No, your system is badly designed.
> This particular example name is perhaps best known as the name of an alien in an anime series (and a manga). There have also been real people with this name.
I feel like the author of this page missed the original joke here: The sentence is in English so the reader is primed to expect an English name, or at least something representable in ascii/latin characters, and then it drops something totally different to challenge the reader's expectations.
A personal name is either a Polynym (a name with multiple sortable components), a Mononym (a name with only one component), or a Pictonym (a name represented by a picture - this exists due to people like [Prince][1]).
A person can have multiple names, playing roles, such as LEGAL, MARITAL, MAIDEN, PREFERRED, SOBRIQUET, PSEUDONYM, etc. You might have business rules, such as "a person can only have one legal name at a time, but multiple pseudonyms at a time".
Some examples:
names: [
{
type:"POLYNYM",
role:"LEGAL",
given:"George",
middle:"Herman",
moniker:"Babe",
surname:"Ruth",
generation:"JUNIOR"
},
{
type:"MONONYM",
role:"SOBRIQUET",
mononym:"The Bambino" /* mononyms can be more than one word, but only one component */
},
{
type:"MONONYM",
role:"SOBRIQUET",
mononym:"The Sultan of Swat"
}
]
names:[
{
type:"POLYNYM",
role:"LEGAL",
given:"Juan Pablo",
surname:"Fernández de Calderón",
secondarySurname:"García-Iglesias" /* hispanic people often have two surnames. it can be impolite to use the wrong one. Portuguese and Spaniards differ as to which surname is important */
}
]
Given names, middle names, surnames can be multiple words such as `"Billy Bob" Thornton`, or `Ralph "Vaughn Williams"`.
You can very easily have multiple legal names, as "legal" is a national concept and you can have multiple citizenships, each one with a different legal system and thus potentially a different legal name.
There are other examples where an individual can have no name. If you do not complete your test of manhood in the cultures that have one you may not be assigned a name.
Your Malcom X is interesting because I think he would have used the term "SLAVE-NAME" for the role instead of "BIRTH". This leads to the question of who gets to decide the category names - a topic which I am wholly unqualified to discuss!
I think about the 2010 article often as I develop- often little line of business apps etc I have to put first/last/middle on the ui and database because that’s what the paper form always had. I really wish for this, as well as other common scenarios- address and date/time entry also- there was a standard library to pull from for various languages. Need a simple user friendly ui for names that follows all the rules? Copy this. Or a DB schema for names? Here’s the TSQL.
#41 - if you write a system that can accommodate all 40 previous examples, you can still integrate it with the popular brand customer management system your company wants to use.
Unicode text boxes with no validation other than a (generous) length limit and unicode validation is exactly the right approach. Arbitrary assumptions and limitations littered throughout the code is not.
Yes, and to be perfectly clear don't have a minimum length (either empty or nullable) and make it mutable, in any case where you're doing some form of soft identification (e.g. "Hi, $NAME, here is your reservation" or "I am calling from $COMPANY, may I speak with $NAME?")
Also, tell me what the purpose you're using my name for, e.g. "Name as it appears on your payment card/passport/ID/SSN" helps tremendously in resolving ambiguity.
Also, consider not asking for a name at all unless there's actually a good reason.
Fortunately this is also the simplest to implement, since this is the default. So just don't go out of your way to write code to restrict these to some arbitrary subset of characters and tell me I have an "illegal name".
You probably need a length limit to prevent against attacks though. Just don't set it at 31 character.
A friend of mine went to school with the son of an exiled king. Everything was fine -- this fellow had a proper passport, visa, etc. I suppose being a prince helps a bit with nation-state level paperwork.
The problem occurred when the prince had to seek urgent care, the Emergency Room of a rural hospital.
They needed first name, last name. The prince didn't have a way to fill out the family name, the last name. English was not his first language, he was bleeding, and it was a late night for everyone all around.
So it became a real argument. Finally my friend leans over and says "SMITH. His last name is Smith!"
The orderly was able to fill out the form, and everyone was better off.
A few years ago we were required to pass Microsoft's certification exams, one of the options was to do it online. Their requirement was that the name/surname in the application form had to be identical to what's in your ID (you also had to attach scans of your ID). The problem is, we're from Russia, and the form didn't allow Cyrillic letters, and there's only Cyrillic letters in a Russian ID. We had to fly all the way to St.Petersburg to pass exams offline in a Microsoft-approved certification center, instead of just doing it online in our office. Another option was getting a travel passport (it contains a Latin transliteration), but the deadlines didn't allow us to wait for up to 1 month (maximum time for issuing a travel passport here).
They refused to accept a romanization? If so that's doubly bad. Excusing technical deficiencies is one thing, but preventing people from working around them is indicative of a high level organizational failure.
Yes, it had to be identical. The problem with romanization of Cyrillic is that there's no commonly accepted standard, and I suspect the employees who process the applications were very unlikely to be Russian, so they would have no idea how to match our romanized variant to what's found on the ID scans. At least if they allowed Cyrillic in the submit form, they could have compared it visually as a set of pictograms, no need to know Cyrillic by heart. And it wasn't some US-only application thing we ran into by mistake, it talked about providing identification documents "of your country". But for some reason, they excluded all countries which don't use Latin alphabet.
Countries which are getting rarer and rarer. I'm in China and we havent yet faced such problems at the office (we too are mandated to get those certs).
But in China, there are 5 Microsoft-approved offline training centers per metro station in any city with IT employees + the flats are so small you cant possible be uninterrupted during the exam during covid, so we mostly go offline.
> This particular example name [Tanaka Tarou 田中太郎] is perhaps best known as the name of an alien in an anime series (and a manga). There have also been real people with this name.
Huh? That's a bit like saying the name "John Doe" is "perhaps best known from being the name of a character in a movie." It's just the stand-in, "generic" Japanese name that's used in examples. If an alien had the name in a cartoon it was probably a joke about the alien going out if its way to appear ordinary, the way the characters in Third Rock from the Sun were named Tom, Dick, and Harry.
Anyway, this article always kind of rubbed me the wrong way. OK, maybe someone's name is one character that's not possible to represent with Unicode. What do you want me to do about it?
> What do you want me to do about it?
When you write code that accepts names, don’t bake in assumptions that violate these falsehoods.
The only way I can think of do that would be to not make a name field mandatory. That's certainly possible in some applications, but not in others. For example, I work in insurance and we need a name to put on the policy documents.
If your company can’t insure someone whose name violates these assumptions, then the best you can do is make those assumptions clear.
If you book travel, they have something you're supposed to put in the last name field if you don't have one.
Or do, and just don't care that Elon Musk's kid won't be able to use your application
You're being obtuse. Everybody can still use the application. The just need to supply a name that is valid to the application. THIS IS REASONABLE
And considering the great number of official forms one must fill out which make the same assumptions that programmers foolishly believe, I am inclined to believe that The Artist Formerly Known as Prince or whoever already has some idea what they'll put in the fields to accommodate their unusual situation.
I think that part is pretty easy to understand, but if the author's intention is to have every single one of these issues addressed they're going about it the entirely wrong way.
I’m not sure anyone, including the author, considers addressing every one of those to be possible. But they want you to be aware of the assumptions you’re baking into your code, and maybe spend a few moments about which requirements you actually require. What do you think would be a better way of going about it?
> What do you think would be a better way of going about it?
I suppose it depends on what you're trying to accomplish. This article definitely reads like it's trying to annoy developers to the point of closing the page though, even if it's intentions are relatively agreeable. I'm reminded of the whole 'master vs main' debacle, where this kind of tribalism hit critical mass. There was huge potential for people to have a good discussion about acceptable practices in software development, but we gave it up in lieu of tribalism and us-vs-them politics. As a result, there was way more media blowback than necessary, which lead to a false sense of urgency and a whole lot of bugs.
So, I don't know. But I do know that a good discussion doesn't start with 40 theses and a hostage situation.
Most of these issues can be solved by simply providing a "Name" field that accepts Unicode and has a reasonable length. Don't validate it in any way except to require at least one non-whitespace character. Trim any excess whitespace and run the text through a Unicode normalization algorithm. At most, filter out Unicode blocks that can't possibly apply, such as emoji and maths symbols.
Done.
That covers something like 90% of the already pretty obscure corner-cases. The remainder are likely out-of-scope, such as children that have not-yet-been-named. They can't legally sign up for things in most (all?) jurisdictions, so it's not really worth considering in most scenarios.
This is already 10,000x better than ascii-only input fields for "Surname" and "Given Name" limited to something like 20 characters each. Where entering "Null" or "O'Neill" throws a HTTP/500 error because JSON is "so simple" that anyone can use it...
I consider breaking the fields up well worth the trade-off of people without a surname having some difficulty with it, because that situation is extremely rare, and the user experience is significantly worse if, e.g., I can't meaningfully sort names because I have no idea if you put your given name first or last, etc.
There's also really nothing about JSON that should dictate that you can't handle the name "Null" or "O'Neill." You don't even need special escaping for those.
There are entire counties with single names.
Which ones?
Myanmar and Afghanistan are the most well known.
Did you read the article?
It is meaningless to "sort" names across countries. Most languages have their own sort rules, and some languages have multiple sorts depending on the specific usage!
Sorting is almost never what you want.
Are you printing out a phone book onto dead trees? No? Then you do not need to sort names.
Just do substring search, and you've already got a UI that is 10x better than having people flip to 'A', then 'Aa' to get to 'Aaron', etc...
If you're about to say "but I need to define a sort order for a database table key" then you've made another mistake and also not read the article...
> Most languages have their own sort rules, and some languages have multiple sorts depending on the specific usage!
That is true. However, it's also obviously true that a sort order that tries to take all of these into account, instead of using the rules of the current user's language, is obviously useless, so I don't understand why you think it's a meaningful objection. Users are surprised and annoyed by an inability to sort names.
I guess I'm not too worried about sort order being incorrect, since it already is incorrect 99% of the time anyway—most programs sort strings "1" < "12" < "2", whereas that's... wrong. It annoys me several times per week, but we've all accepted it and moved on. Apple's OSes do it right, but that's about it—and even still, if you're in a shell, it's probably back to being wrong.
Windows Explorer gets it (mostly) correct also.
How do you even accept names if the name is not representable in any character encoding?
Probably you can't, because the person couldn't even type their name in. So if you're writing an application, that's one that's outside your control, and you can't do anything about.
OK, so what I'm asking you is how I am meant to handle names which are characters that are literally impossible to represent with a computer.
The artist formerly known as imwill eagerly awaits your answer.
I responded this to a different thread also, but:
Probably you can't, because the person couldn't even type their name in. So if you're writing an application, that's one that's outside your control, and you can't do anything about.
> the way the characters in Third Rock from the Sun were named Tom, Dick, and Harry.
Wow, I've watched that show for years and never caught that. I love the ordinary alien name trope, and my favorite incarnation is Ford Prefect from the Hitchhikers Guide series. (While I imagine it was a great gag in the UK for the alien to believe that cars are the dominant life form on earth, the joke was a bit lost on US readers because the model was not popular here.)
One real person with that name is the current editor of a Japanese science magazine [1] that has been published since 1931. I did some work for him about ten years ago. He is a very nice man.
[1] https://www.iwanami.co.jp/ad/kagaku/
> OK, maybe someone's name is one character that's not possible to represent with Unicode. What do you want me to do about it?
The example with The Artist Formerly Known as Prince is rather contrived. More realistic is the Chinese family names which are not yet supported by Unicode.
But if you can get away with it, it is perfectly reasonable business decision to only support names which can be represented as Unicode characters! The point of the article is just that you should be aware about the assumptions you make, and the less assumptions you need to make, the fewer problems.
A less contrived example would be the forms which required a last name of at least two characters. This is just dumb, since "O" is a perfectly valid and even reasonably common last name. So the writers put in extra effort to outlaw certain names, for no benefit whatsoever. Presumably they thought all real names are at least two characters long. Knowing this was a faulty assumption would have saved them the time they spent on making their site less useful.
Discussed at the time:
Falsehoods Programmers Believe About Names – With Examples - https://news.ycombinator.com/item?id=18567548 - Nov 2018 (169 comments)
Here's a falsehood that's missing: people want pages that scroll in unexpected, unpredictable ways.
In some countries, women have different last names than men do.
Eg. in macedonia, traditionally a person would get their fathers name as their surname (so if the fathers name was "Petar", and the son was Dragan, the lastname would be roughly translated to "Dragan of Peter"). Because there are different forms for feminine and masculine words, The son would be named "Dragan Petrovski", and the daughter (Eg Marija) would be named "Marija Petrovska".
This tanslated to modern times means, that if "Marija Ilijevska" married "Dragan Petrovski", she would then be named "Marija Petrovska". Their sons surname would again be "Petrovski" and daughter "Petrovska".
So basically matching parents and their kids by their last names has to take in account the -ski or -ska form .
In old Polish there was also a special form of name for unmarried women, so for example:
Jan Kowalski has a daughter, Anna Kowalska.
When she is young, she is called Anna Kowalszczonka, Kowaliczka or Kowalówna depending on who you asked (and you can't reliably reverse these forms into the original name)
When she is grown up, she is called Anna Kowalska.
As it happens, the country was part of the Russian empire, so she's also called Анна Ковалска or Aнна Kовалская, depending on who you asked.
When she marries Jan Kowal, she is called Anna Kowalowa or Anna Kowal.
When Jan Kowal dies and she marries Wawrzyniec Słowacki, she is called Anna Słowacka. But her husband is from Austria-Hungary, so he's also called Laurentius Slowacki somewhere in his documents, even though he never uses this name for anything.
Their son Stefan moves to Lithuania, but even though he's ethnically Polish, Lithuanian government requires him to have "Steponas Slovackis" printed on his documents.
This feels like the opposite of KISS principle.
https://en.wikipedia.org/wiki/KISS_principle
I would recommend my daughter or son to keep their name even after marriage just to avoid paperwork hastle.
Also in other Slavic countries, e.g. Poland (-ski/-ska), Iceland (-son/-dottir), and sometimes in Scotland (Mac-/Nic-) and Ireland (Mac-, Ni-).
https://en.wikipedia.org/wiki/Icelandic_name gives an Icelandic example of Stefán Gunnarsson with children named Harpa Stefánsdóttir and Róbert Stefánsson.
Björk's name is Björk Guðmundsdóttir, daughter of Guðmundur Gunnarsson and Hildur Rúna Hauksdóttir.
These use the genitive -s followed by either -dóttir for girls or -son for boys.
In Iceland, matching parents and their kids by their last names doesn't work that well.
In Spain, everyone has two surnames. Your first surname is your father's first surname, and your second surname is your mother's first surname. What this means is that in a family of mum, dad and kids, only the kids will share the same surname.
We had a German friend who married a Spanish man, and she insisted on changing her surname to his when they got married since that was traditional for her. But for Spanish people this was really weird, since it sounded like they were siblings. I actually haven't asked them since they had kids, but I guess their kids must have his first surname doubled (which is not unusual - María Sanchez Sanchez just means that the first surname of both parents was Sanchez).
This is the case in many countries. Nordic countries do this too (...-son and ...-dottir etc.) And FWIW, in Quebec, women are not allowed to take their husband's last name when they marry. They choose for the kids.
> Nordic countries do this too (...-son and ...-dottir etc.)
Not for a long time. Only Iceland does that now.
The names don’t change one has a subvariety: the falsehood that last names might change, but first names don’t. Transitioning showed me firsthand how many common products make changing specifically the first name impossible.
Can we next do Falsehoods web developers believe about there being any benefit to hijacking the native scroll functionality, and use this website as an example?
Started to read, was interesting, but then I scrolled, and the whole page seems broken. Life's too short to deal with that nonsense.
Something is terribly wrong with scrolling on that page.
For me scrolling just doesn't work at all...
Oh thank goodness, I thought it was just me. Dragging the scrollbar worked, but the scroll wheel didn't. Strangest thing I've ever experienced.
Javascript-based scrolljacking meant to enforce inertial smooth scrolling. Even when it works (it's glitchy on this page and doesn't always activate for me) it makes the page painful to use.
The arrow keys work. As does Firefox reader mode. But my mouse wheel scrolls by sixteen lines!
More than anything, this reflects how many business decisions a "programmer" makes.
If a "programmer" was actually just a programmer, none of their beliefs would matter. Either it met the spec, or not.
But in reality, a lot of the job of a "programmer" is to make all those detailed decisions, and handle all the exceptional cases, that other people don't want to be bothered to think about. But those detailed decisions and edge cases need to be handled because much of the value of software is running the business worldwide at all hours with fewer humans involved, so those edge cases will come up constantly.
So all these little decisions fall on the one writing software. That's actually an incredible amount of power once you realize it.
Because the speed at which they can write good specs cannot match a competitor doing iterative back and forth from an elevator discussion. Even in large slow banks, we churn feature releases weekly now with half finished stuff behind feature flags to see if they like it so far. And well, it works and they re happier and we can have lighter BAs, and spec mistakes cost a lot less as they are spotted faster, not after 3 months of blind work.
Oh and btw the value of software is not to handle every edge cases well, but to be able to change, imho.
> I fear that part of the reason that this blog post had less impact than I hoped was that Patrick did not give examples of how each assumption can be false.
I think the reason is that many assume that these cases are niche and not something they need to worry about.
Which might be the case, or might not, depending on the application. But either way, using celebrities or historical figures is not a great way to convince anyone otherwise.
> In some countries (notably French speaking) it is convention to write a person’s surname in all caps to make it clear which part of the name is the surname.
This is a French-specific thing? I didn't know that. I really like it, it makes it easier to know which part is the name and which part is the family name.
Yes I've seen this in some French and Belgian companies.
It's not fully French specific — conventionally, transliterated Japanese family names are rendered in ALL CAPS, particularly when it's unclear whether or not the name is being presented in Japanese-style family-name-first order to western audiences who may otherwise confuse the given and family names.
I’ve also seen it in the CIA World Factbook. I didn’t realize it was a French thing.
https://www.cia.gov/the-world-factbook/countries/japan/
Lots of info on the page, but the relevant part:
“head of government: Prime Minister Fumio KISHIDA (since 4 October 2021 )“
Edit: Another example:
https://www.cia.gov/the-world-factbook/countries/albania/
“head of government: Prime Minister Edi RAMA (since 10 September 2013); Deputy Prime Minister Senida MESI (since 13 September 2017)“
I knew this as a French-specific thing when I worked there and I used to love it. Haven't seen it anywhere else though.
Programmers need to learn from bureaucrats of the 19th century. When bookkeeping of the people truly became a thing in the 1800s, industrialised nations sent out bureaucrats to collect the names of people in villages. Unfortunately, people seldom had enough names to fill in the forms the bureaucrats had. The idea of a family name was not something bestowed to nor necessary for a lowborn. If you're the only Jack in town, why bother with any other name than Jack?
Not contend with simply filling in Jack, bureaucrats would simply come up with a last name for them. Or - if they felt so inclined - ask the person to come up with one themselves. They would often chose their occupation. Americans know that a lot of modern family name spellings in the US are the result of careless bureaucrats at Ellis Island.
Back then, family names, house names, last names and surnames weren't necessarily the same thing. One might have more than one. You may be of one House, but your last name was Jackson (son of Jack). Also somewhat inconvenient for these bureaucrats with few fields on their forms. And also, stop changing last name from generation to generation.
Though, fortunately for these bureaucrats, unlike modern programmers, when the map did not fit the terrain, they could simply alter the terrain.
Some airlines do this, they ask that you anglicise your name for the boarding pass.
Family names weren’t changed at Ellis Island.
See eg this from the New York Public Library: https://www.nypl.org/blog/2013/07/02/name-changes-ellis-isla...
(The mandated adoption of surnames is a pretty complex topic and of course proceeded in different ways in various times and places.)
> If you're the only Jack in town, why bother with any other name than Jack?
In Le Guin's The Dispossessed [1] people on the satellite planet called Anarres get assigned an unique mononymous [2] name at birth. No one else alive at the time has the same name (and usually the name isn't re-used for a number of years). The protagonist is "Shevek".
[1] https://en.wikipedia.org/wiki/The_Dispossessed
[2] https://en.wikipedia.org/wiki/Mononymous_person
So you are suggesting programmers should just assign people names conforming to some standardized schema? You might get away with that for a government website in some states, but good luck being a commercial site in competitive market and requiring customers to change their names before you will take their money.
The key takeway is that the simplest handling or names (as an opaque unicode string which is not assumed to be stable or unique or conform to any particular structure) is also the one which have the fewest problems.
The problems arise when you want to do more than just echo the name exactly as entered. Perhaps you want to show only last name in some context - now you assume everyone have a last name, and you are already in trouble.
Of course you can't always get away with treating names as opaque, so this is where you need to be very careful with the assumptions you make. The approach depends on the purpose of registering the name in the first place. For example if the purpose is to identify a person showing showing up to pick up a rental car, you just ask them to enter their name as it is stated on the drivers license. If you want to mail them a letter, ask for the name as it is stated on their mailbox.
There is no one-size-fits-all solution, but instead of worrying about the-artist-formerly-known-as-Prince or tribes communicating only with colored fabric, consider the use case for the name. If you have a web-shop you probably don't have to worry about orphan toddler refugees, but if you write software for hospitals you absolutely have to consider the case where the name is unknown. I'm sure Mr Artist-formerly-known-as-Prince has both a passport, credit card, legal name and drivers license, all with names representable as Unicode characters. The question is which one your application needs.
Btw. I find it amusing when web startups think they need a "web scale" distributed database system in order to be scalable to a billion users, but thinks handling characters outside of the ASCII-range is an obscure edge case which can be ignored.
A more realistic example for 40 ("people have names"): Newborns might not have a name. Lots of systems might have to handle newborns.
Some systems even have to handle not-yet-borns. An example I have seen is that when pregnant in France, you might have to pre-register your future child for one reason or another (a common one is waiting lists for childcare, but some paperwork requires it in other situations). Ours had “À naître” [to be born] as a first name, as the forms did not accept no first name.
Realistic name from working in the NT: Someone with your name died, so you no longer have a name until the elders convene to give you one. This may take somewhere between hours and years.
However, after a period of time dictated by the hierarchial position of the one who died, you will be given the choice to retake your previous name, or maintain your new one, if you have it.
Several name changes with the differing clocks for switching may overlap. Each time, you may end up with interspersed periods of no name.
Every time a change occurs, you are not normally allowed to acknowledge that the previous name was ever attached to you. As such, your gov IDs have a numeric constant, like everyone else, but no name fields.
On an island of a thousand, I estimated roughly 10% of people at any one time had no name.
41. Surnames like "Null" don't exist: https://news.ycombinator.com/item?id=12426315
By the way, John Wyndham (author of The Day of the Triffids. Mentioned in the article) also wrote a novel titled, "The Kraken Wakes". Which is a hilarious, messed up and excellent piece of scifi full of early 20th century flavorful goodness.
"with examples" => Follows up by not giving examples to a bunch of entries
"Confound your cultural relativism! People in my society, at least, agree on one commonly accepted standard for names. And will your software only be dealing with people named by your society?"
"I can safely assume that this dictionary of bad words contains no people’s names in it. This is a common mistake – many “bad words” are not bad words in other languages, and some are used in names. Moreover, not every society restricts what words may be used in a name; it’s perfectly possible that someone’s name may have been established in such a jurisdiction."
> People in my society, at least, agree on one commonly accepted standard for names.
For context, patio11, the author of the original list (without examples), emigrated to Japan and has noted that it is (was?) common for computer systems there to have no idea what to do with his English name. This included his Japanese employer, who apparently assumed all employees would have Japanese names.
> I can safely assume that this dictionary of bad words contains no people’s names in it.
This problem is notorious enough to have a name: https://en.wikipedia.org/wiki/Scunthorpe_problem
How I wish this article and information gets translated into Japanese…
I am a Spaniard (long & convoluted name) living in Japan, unfortunately I know what patio11 is talking about. I literally have to call today a place to "fix" an application after being rejected twice because "my name was wrong", we'll see how the spelling of my 31-character-long name goes.
I can do the bad words one: in Thai, the root word “porn” (pronounced like the English word “pawn”) can be loosely translated as “blessing” and forms the basis of many given names, surnames and place names. Any blacklist that includes the word “porn” will cut out a lot of totally normal Thai names like Siriporn or Pornrapat.
> People whose names break my system are weird outliers. They should have had solid, acceptable names, like 田中太郎.
> No, your system is badly designed.
> This particular example name is perhaps best known as the name of an alien in an anime series (and a manga). There have also been real people with this name.
I feel like the author of this page missed the original joke here: The sentence is in English so the reader is primed to expect an English name, or at least something representable in ascii/latin characters, and then it drops something totally different to challenge the reader's expectations.
In Greek first names are conjugated. Also last names change based on gender.
A personal name is either a Polynym (a name with multiple sortable components), a Mononym (a name with only one component), or a Pictonym (a name represented by a picture - this exists due to people like [Prince][1]).
A person can have multiple names, playing roles, such as LEGAL, MARITAL, MAIDEN, PREFERRED, SOBRIQUET, PSEUDONYM, etc. You might have business rules, such as "a person can only have one legal name at a time, but multiple pseudonyms at a time".
Some examples:
or
or
or
Given names, middle names, surnames can be multiple words such as `"Billy Bob" Thornton`, or `Ralph "Vaughn Williams"`.
You can very easily have multiple legal names, as "legal" is a national concept and you can have multiple citizenships, each one with a different legal system and thus potentially a different legal name.
There are other examples where an individual can have no name. If you do not complete your test of manhood in the cultures that have one you may not be assigned a name.
Looks like GP didn't read the list.
An empty array might then suffice, but surely this outcast has a sobriquet, like "that loser without a name"
Which is exactly what I said
> You might have business rules, such as "a person can only have one legal name at a time[...]"
ya, meaning it's an option, and not the default
Does it cover "Bobby Tables"?
https://xkcd.com/327/
Your Malcom X is interesting because I think he would have used the term "SLAVE-NAME" for the role instead of "BIRTH". This leads to the question of who gets to decide the category names - a topic which I am wholly unqualified to discuss!
I think about the 2010 article often as I develop- often little line of business apps etc I have to put first/last/middle on the ui and database because that’s what the paper form always had. I really wish for this, as well as other common scenarios- address and date/time entry also- there was a standard library to pull from for various languages. Need a simple user friendly ui for names that follows all the rules? Copy this. Or a DB schema for names? Here’s the TSQL.
#41 - if you write a system that can accommodate all 40 previous examples, you can still integrate it with the popular brand customer management system your company wants to use.
Gotcha. Un-validated unicode text boxes for names in forms from now on. After all, someone could be named like the contents of a 64MB binary blob.
Compromises are unavoidable in web development.
Oj yes litten QmFieSBUYWJsZXM=, we call him.
Qk9CQlkgVEFCTEVT
Right, I must have made some miscalculation at the registry office... please don't tell my wife
Nullable, of course.
Unicode text boxes with no validation other than a (generous) length limit and unicode validation is exactly the right approach. Arbitrary assumptions and limitations littered throughout the code is not.
Yes, and to be perfectly clear don't have a minimum length (either empty or nullable) and make it mutable, in any case where you're doing some form of soft identification (e.g. "Hi, $NAME, here is your reservation" or "I am calling from $COMPANY, may I speak with $NAME?")
Also, tell me what the purpose you're using my name for, e.g. "Name as it appears on your payment card/passport/ID/SSN" helps tremendously in resolving ambiguity.
Also, consider not asking for a name at all unless there's actually a good reason.
Excellent, that covers emojis as well.
Fortunately this is also the simplest to implement, since this is the default. So just don't go out of your way to write code to restrict these to some arbitrary subset of characters and tell me I have an "illegal name".
You probably need a length limit to prevent against attacks though. Just don't set it at 31 character.
Did I miss the falsehood about family name always following personal name? (Doesn't apply to Chinese, Japanese, Korean, or Hungarian names.)
Since they do mention Klingon in one of them, I kinda also want to add to yours: Bajorans also put family name first.
Funnily enough, the example about the importance of capitalisation has a capitalisation error.
It's "Van Gogh" when written without the first name. With, it becomes "Vincent van Gogh".
Basically, the "filler words" are without capital unless "naked" at the start of the noun.
So it also is "Van der Staaij" or "Kees van der Staaij".
Good followup on falsehoods about time https://infiniteundo.com/post/25326999628/falsehoods-program...
Lots of previous discussion when this was new:
https://news.ycombinator.com/item?id=18567548
> Cyrillic characters for Russian names
Russians don't hold a patent on Cyrillic. There are plenty of languages using Cyrillic, e.g. Mongolian or Serbian.
My full name is 36 characters long which isn't crazy but still causes a fair few issues when I try to sign up for some sites
The altered scroll on this site is rough.
How do you store Queen Elizabeth II?
A friend of mine went to school with the son of an exiled king. Everything was fine -- this fellow had a proper passport, visa, etc. I suppose being a prince helps a bit with nation-state level paperwork.
The problem occurred when the prince had to seek urgent care, the Emergency Room of a rural hospital.
They needed first name, last name. The prince didn't have a way to fill out the family name, the last name. English was not his first language, he was bleeding, and it was a late night for everyone all around.
So it became a real argument. Finally my friend leans over and says "SMITH. His last name is Smith!"
The orderly was able to fill out the form, and everyone was better off.
This is so old. Welcome, https://xkcd.com/1053/.