I agree with Microsoft/Google/KDE's order. The author's situation is extremely rare, and the situation where someone wants "10" to be before "9" is far more common. Moreover, desktops don't label this sorting "alphabetical" (E: and it would really be "lexicographic"*), they label it "by name" (an informal criteria), so technically they're not lying.
> I miss the time when computers did what you told them to, instead of trying to read your mind.
You may be looking at that time through rose-tinted glasses. I don't like when computers lie to me either, but "mind-reading" is really helpful in ways we take for granted, like autosave. Desktops can have an option to sort files truly alphabetically, but the more common case should always be the default; that's the definition of "intuitive".
I will add that I'm plenty "smart" enough to understand that "10" comes before "9" in a strictly alphabetical sense, and I still want my file managers to sort "9" before "10".
I don't want to put leading zeroes before every all the single digit numbers in my file names. (And then potentially go come back later and add even more leading zeroes once the maximum number reaches three digits.)
---
I split all of my audiobooks into chapters. I use the format "Chapter 01.mp3" (or "Chapter 001.mp3" when there are > 99 chapters) because some (all?) MP3 players are too stupid to sort numbers properly and I want my audiobooks to work everywhere.
This works, but it looks kind of ugly and creates extra work—yes I have scripts to automate it, it's still an extra step—and it would be great if I could just trust that every device will understand numbers.
> I don't want to put leading zeroes before every all the single digit numbers in my file names.
> ... it would be great if I could just trust that every device will understand numbers.
Strings are not numbers, even if some part of their content "looks like a number."
> I will add that I'm plenty "smart" enough to understand that "10" comes before "9" in a strictly alphabetical sense, and I still want my file managers to sort "9" before "10".
Problem is, this is your preference for a specific situation. Which may not be another person's preference in the same situation nor yours in a different situation.
So what are programs to do?
Display strings in a consistent, documented, manner. Which is lexicographical ordering in all cases lacking meta-data to indicate otherwise.
> Display strings in a consistent, documented, manner.
IMO, "Treat any sequence of digits as a number for the purpose of sorting" is consistent. I'm not sure if it's documented—I've never needed to look up the documentation—but if it's not, the developers could certainly fix that.
> this is your preference for a specific situation.
Sure, but we generally make decisions based on which situations we think will be most common. I think having ten or more things (screenshots, audio samples, whatever) named "Thing 1" – "Thing 10" in a folder is extremely common. And if Thing 10 comes before 9, it's really annoying!
Let's say I have a directory of 32 numbered files. Under the author's preferred sorting method, they'll get displayed:
If I download a folder with files like this, I basically have to pause whatever I'm doing and edit the files to have leading zeroes before I can make sense of what I'm looking at.
Do I understand that you want these to be sorted like this?
1
2
9
10
11
So I guess you also want things sorted like
1.1
1.2
2
9
9.9
And also
1
1.1
1.10
1.2
1.10.1
So when you're done defining whatever crazy rules you think up, how do I pause whatever and edit the filenames to get them back into lexicographical order?
You can massage lexicographical to meet your needs. I can't massage your arbitrary rules to meet my needs.
Your examples don’t need any extra rules to be sorted correctly. The basic idea is that any sequence of digits is treated for sorting as if it were a single character. On my iPhone, your examples are sorted as expected.
I would not know how an OS treats those if we do not assume mindreading vs proper lexicographic order. Why would we need to substitute precision with vagueness for something that simply taking care of proper naming would suffice?
Ah yes sorry, 1.10 comes after 1.2 because 10 is bigger than 2 (so in fact different from your example). But assuming your original list is a list of versions (which seems reasonable given the presence of multiple decimal points for some cases), then that’s the order you’d want.
If you have non-integer numbers in your filenames then it won’t give the order you want, but there isn’t going to be a rule that works for all cases.
I was with you until this point, but 1.2 is bigger than 1.10, because 1.2 is a shortened version of writing 1.20 _unless_ you explicitely want these to be version numbers or something like that. The normal expectation would be to treat numbers as, well, mathematical numbers, and not SemVer, especially if we only have one decimal point, don't you think?
As I said, the sorting rule won’t always give pleasing results, but it seems to me like a simple and reasonable modification of lexicographic ordering.
1.10, the number, is equivalent to 1.1. It is less than 1.2. You say you want numbers to sort as numbers, but you want 1.10 to be greater than 1.2.
Do you consider '1/4' to be a number? Should it come before or after '1/3'?
I'm guessing that you don't want to sort one character at a time if you encounter one of [0-9]. Instead, you want to group all consecutive [0-9] as a single sortable number. But aren't characters '.', ',', '/', '-' also part of numbers?
It doesn’t work for decimals. It also doesn’t work for pi, or most dates. That’s okay. Supporting those cases would require “reading your mind” / trying to guess what the user wants by applying opaque rules. I certainly don’t want that.
Treating consecutive digits as numbers is a simple modification (I still think it’s quite simple) that is easy to understand and supports 99% of real-world use cases.
> But assuming your original list is a list of versions (which seems reasonable given the presence of multiple decimal points for some cases), then that’s the order you’d want.
What level of assumption is here expected from the sorting-system, would it have to process ALL entries of the list to find multiple decimal-points and then assume that they are ALL versions and not numbers?
How to treat this on different locales, where the decimal point is a comma and thousands-separator is a dot. Should the locale then also be considered by that system? Also when listing the folder of a remote-system with a different locale?
What about dates, should that system attempt to sort entries with multiple date-formats (yyyy-mm-dd, dd-mm-yyyy, dd-MMM-yyyy,...)?
The topic is far more complex than this narrow example. If we expect such a system to alter its sorting based on some data format interpretation, there is a risk of misinterpretation which might make the whole list unusable...
It has nothing to do with decimal points. It just looks at any contiguous sequence of digits and treats it as a single character for the purposes of sorting. The decimal point could be any other character and the behavior would be the same.
Decimal numbers are treated as strings and will have a completely different order, with digits after the decimal point sorted differently to whole numbers without fractions?
Or you mean every set of continuous digits within the same string are considered as individual whole number?
Depending on the decision, either lists of decimal numbers or lists of version numbers will be sorted wrong.
--> This could be covered by adjusting the logic based on the amount of decimal points.
And the logic complexity keeps increasing, up to an arbitrary point of "no, this will not be considered", resulting in an unpredictable user-experience of sorting...
I understand that you found your perfect trade-off for sorting based on longer considerations. But it will be difficult to communicate such a concept to a user.
Applying partial rules to improve sorting in one direction is not a lossless activity, it makes the UX actually worse in other scenarios as the user is first guided to assume a certain behavior, but then learns that his expectation is broken in adjacent scenarios (Which is more or less the bottom-line of that article to begin with).
In the end it'll be just "another standard" for sorting [0]
> But it will be difficult to communicate such a concept to a user.
This isn't a prerequisite, since the existing naive character sort approach is not communicated either. In fact, it's almost universally unexpected by any user who hasn't written a naive string sort. Apple doesn't do this, and I very much did not need it communicated to me why 10 was coming after 2, because that's what everyone, who's not a programmer, expects.
As a litmus test, go ask some people, who are not programmers, without loading the question beyond "here are some files, how would you expect for them to be displayed in a list?". Show the lists side by side. It should not surprise you.
We just discussed a situation where lexicographical sorting doesn’t work. Adding in a rule to treat consecutive digits as one number doesn’t significantly complicate the logic and makes sorting work for a major additional use case. It doesn’t magically fix every case but it fixes a common one with minimal downsides.
> IMO, "Treat any sequence of digits as a number for the purpose of sorting" is consistent.
Are you sure about that?
So how do you suggest handling hexadecimal numbers?
Or octal numbers?
What about binary numbers?
What about file names with portions of a date and/or time?
How is a program supposed to know any of the above?
> Let's say I have a directory of 32 numbered files.
Assuming any of the filesystems I am aware of is in use, those names are strings having one or two characters. They are not "numbered files."
Sorting dates: This is why there is an international standard of having YYYY-MM-DD hh:mm:ss in the order we have it. We got to learn this in school in the 80-ies because sorting paper documents would be more logical and easier to find stuff. So way before most people got computerized.
It just happens to be the most logical way to sort for computers too, as long as humans are involved in the usage of the data.
> Sorting dates: This is why there is an international standard of having YYYY-MM-DD hh:mm:ss in the order we have it.
That would be great, but this ISO is just one of the standards, and there are still regional standards as well.
And that's still ignoring the end-user. In Europe for example, humans might create filenames with date in format dd.mm, e.g. "Report 25.01.xls"
A system attempting to sort this intelligently would likely assume this is a decimal number, as it has zero context for it.
It's just slightly worse than the lack of consistent UTC-usage of systems, with the mixed attempts to correct data to local timezone (or not) depending on application...
Okay, I'll refine the rule to "Treat any sequence of digits as a base 10 whole number for the purpose of sorting". I still think this is quite clear. (Frankly, I also think the original definition is quite clear unless you're purposefully trying to misinterpret it.)
> those names are strings having one or two characters. They are not "numbered files."
Yes they are! In this context, a number is an idea, not a data type. Strings are capable of containing numbers.
I generally agree that treating substrings that are numbers as numbers is a good default for most users in most situations.
However, for hex numbers this simply won't give good results because some of them will just happen to not contain any of the digits A to F and be treated as base-10 numbers by the heuristic while others will include these digits and be sorted differently.
(So, a having a strict lexicographic mode as an alternative in file managers would be nice.)
Your concept appears to have coherence until you consider that numbers are not necessarily expressed in decimal notation. What about hexadecimal numbers in filenames? Should they be sorted your way?
And what about very long strings of digits in the filenames - so long that they are too long for even the longest available numerical representation? In some apps, they are converted to floating point...
> "Treat any sequence of digits as a number for the purpose of sorting" is consistent.
How about decimal numbers, are they strings or still numbers?
How about version numbers with multiple dots?
How about decimal numbers of a different locale, e.g. you list the folder from a remote machine with filenames of a different locale?
The problem with such semi-consistent schemes is that they are still guess-work, they may make some cases better for some people, but other cases practically unusable because the system doesn't have sufficient information to handle all scenarios consistently.
> Strings are not numbers, even if some part of their content "looks like a number."
Irrelevant and intentionally obtuse. Filenames can't be anything but strings - there's literally no way to mark part of a filename as "this is an integer", so the idea that "strings are not numbers" is ridiculous because the only way to encode numbers (which people constantly want to encode) is as part of a string - which means that parts of filenames are numbers, because that's exactly how people use them.
> Problem is, this is your preference for a specific situation. Which may not be another person's preference in the same situation nor yours in a different situation.
> So what are programs to do?
> Display strings in a consistent, documented, manner. Which is lexicographical ordering in all cases lacking meta-data to indicate otherwise.
These do not follow from each other.
First, the assertion that "peoples' preferences are different, so we shouldn't pick an overwhelmingly common preference" is laughably false. The vast majority of computer users (which happen to not be people on HN) prefer "sort numbers by number rather than by UTF-8 value", so that's simply the correct way to sort.
Second, even regardless of the above, there's nothing preventing a "by name" sorting from being consistent and documented.
It's great if DEs build this and give it a name. It's even better if they have a different one that deals with SI prefixes too. But it's not good if "alphabetical order" means that.
This is a really important point - my file manager just says "Name" with sorting. So while its not perfectly defined, it doesn't make the promise of saying its alphabetical.
> I will add that I'm plenty "smart" enough to understand that "10" comes before "9" in a strictly alphabetical sense, and I still want my file managers to sort "9" before "10".
Amen.
> I split all of my audiobooks into chapters. I use the format "Chapter 01.mp3" (or "Chapter 001.mp3" when there are > 99 chapters) because some (all?) MP3 players are too stupid to sort numbers properly and I want my audiobooks to work everywhere.
Well, some car and kitchen radio manufacturers will probably never get this right. In my car (which tends not to be brand new) they even messed up UTF-8 chars, which gets me laughing every time a track has them. It's become a running gag with my wife, "Oh, listen up, it's &%=?! again".
> (all?)
Well, I kind of hate to say this, but Apple got this right with the iPods. They even regarded the metadata fields `sort-*` (e.g. sort-album), movement-name (for series) and movement-index (for part). With these fields they really group and sort my audio books as I expect it to be.
I even wrote my own software to fill these tags appropriately, so that I don't need to split my audio books. I'm pretty happy using `m4b` files - an mp4 / m4a container with chapter support, which is supported perfectly fine on my iPod Nano 7g and my Android Phone (using Audiobookshelf[1] and Voice[2]). After all these years, the iPod Nano 7g to me is the PERFECT portable audio book player with 2 exceptions: Repairability and the proprietary Apple headphone remote protocol [3].
There’s a couple of reasons I don’t use m4b files:
- A lot of my audiobooks come as mp3, and converting to m4b (which is AAC based) would mean loosing quality.
- Some MP3 players (even those that support AAC) don’t support M4B.
- I want playback to stop automatically at the end of a chapter, unless I actively decide to start the next chapter. (Admittedly, some MP3 players don’t have an option for this anyway and will always start the next track. This annoys me.)
- Even with chapter metadata, I find it difficult to seek through a 10+ hour m4b file. Seeking through a 10 – 60 minute chapter is more manageable. (Of course, this doesn’t always work out; A Memory of Light has a single chapter that’s more than ten hours long. Whatever, I want to split in a way that follows the author’s structure, and Sanderson purposefully chose to write one extremely long chapter.)
I probably sound like I regularly switch between 20+ different models of MP3 player. In fact, I mostly use my computer or iPhone these days; however, I expect my audiobook collection to outlast any one piece of hardware.
Perhaps, but if you set your browser language to US English you have dates displayed as MM.DD.YYYY and there's no way to change it neither to European nor ISO (YYYY-MM-DD) format.
I'm not sure I agree. I think I could be convinced if there was a unique and universal representation for numeric values using characters.
But we have so many textual representations of numeric values that I'm assuming the "mind-reading" goodness only works for a small subset. And the subset will be somewhat intuitive for developers but unlikely to be so for non-technical people.
For example, does the order handle numbers with fractions (decimal points)? If yes, does it require a at least one leading digit (zero)? Does a.12345 come before or after a.345?
Does it handle thousand separators? What about international thousand and decimal separators (e.g. Euro-style . for thousand separation and , for decimal separation).
Does it handle scientific notation?
If the answer is no to any of these questions, it's likely to lead to surprise/confusion.
It's like a feature request that initially sounds reasonable and useful but once you explore the requirements in detail you realize there are too many edge cases to be able to meet the request in a non-brittle way.
The sort rules are simple (1). Treat any consecutive sequence of digits as a number when sorting. So for example version numbers (which must be massively more common than decimals in filenames) work correctly, and 5.9 is indeed smaller than 5.10 and the latter is not identical to 5.1 .
Given that this idea goes back more than two decades, has been the default behaviour of the most used OSes for many years, with no major outcry, I think empirically we can be fairly certain that it does not routinely lead to a lot of surprises and confusion.
In considering the simplicity of the rule, I think you're using a developers perspective here where we automatically classify numbers and have a clear mental model of the separation between value and representation.
But I'm not sure how simple it would be to explain to a non-technical user why size_5, size_10 and size_15 are in order but size_0.25, size_0.5 and size_0.75 are out-of-order.
> with no major outcry
I'm regularly amazed at how little non-developer/technical users complain about strange and confusing behavior.
> I'm regularly amazed at how little non-developer/technical users complain about strange and confusing behavior.
I am a highly technical user that works with a lot of people with traditional engineering degrees but little to no software experience (except as frequent users). The answer here is that they've learned that all computer software is arcane and mysterious, and so they just accept that there will be strange patterns they have to pick up on, and that's their role as a user. They don't complain about strange and confusing behavior because they treat all the behavior as strange and confusing.
What does that mean? What disciplines? I cannot believe that all junior graduates in engineering disciplines in the 2020s are not doing some programming, even if just writing macros in a CAD program.
Most of the people I work with are 35+, but even the juniors in MechE, Aero, etc. tend to have some scripting experience that doesn't necessarily translate to having a robust intuition about DBs, the relationship between frontend and backend design, etc.
> But I'm not sure how simple it would be to explain to a non-technical user why size_5, size_10 and size_15 are in order but size_0.25, size_0.5 and size_0.75 are out-of-order.
You don't have to explain it if the situation never comes up.
I'd bet 99.9% of computer users don't have any files which would trigger this edge case in a situation they would actually notice. Decimals just aren't that commonly used in this context, and even if you do have decimals the sorting will still work a lot of the time. For the remaining 0.5%, chalk it up to a bug.
I literally had to test this on my Mac just now because I never realized it was broken.
> I'm regularly amazed at how little non-developer/technical users complain about strange and confusing behavior.
Because EVERYTHING a computer does to non-developer/technical users is "strange and confusing". With few exceptions, most people have no idea why their computer does something the way it does, or how they could make it do something different even if they wanted it to. And most of the time, when they complain about it to someone knowledgeable the answer will be some variant on "that's just sort of the way it is". Imagine a world where the names are sorting the way that the OP is looking for, you're still having to explain to someone why the first group sorts "out of order" and the second group sorts "in order". And if they complained, they would almost certainly get an answer that is some variant on "that's just sort of the way it is".
And if you explain in detail about how it works, a lot of people (not all, but quite of few of the more obstreperous types who raise these as CRITICAL BUGS with solutions apparently SO SIMPLE MY DOG COULD IMPLEMENT IT) will then say "I don't know why you have to make it all so complicated, things were simpler and better in v(n-12) in 1997".
If you add an option you're making it more complicated, harder to document and less discoverable, if you don't it's "useless", if you use a heuristic it's "too magical". Eventually someone has to be unhappy.
> I'm regularly amazed at how little non-developer/technical users complain about strange and confusing behavior.
It reminds me of the recent article here titled something like "Altoids by the mouthful". We just get used to eating cat poop and we never realize it is not a good idea to eat cat poop, not that we should make it more palatable by chasing the cat poop by chewing Altoids by the mouthful.
There's a user expectation that photo20.jpg comes after photo3.jpg.
There's no user expectation around whether photo1.jpg or photo01.jpg comes first. Just like there's no user expectation around whether photo1.jpg or Photo1.jpg comes first. Users also don't have the slightest idea about what order punctuation gets sorted in.
Just sort the things that matter in the way users expect (natural sort order) and come up with something reasonably consistent for the rest.
> An algorithm must be unambiguosly specified for all possible inputs.
And it is. It's just that some outputs may not match what the user expects. TFA's preferred algorithm (simple lexicographic sorting) matches user expectations 90% of the time. The algorithm actually in use on most OSs (simple lexicographic sorting + treat consecutive digits as combined numbers) matches expectations 99% of the time. An algorithm that matches expectations 100% of the time doesn't exist. Shouldn't we pick the 99% algorithm?
(I am admittedly making up the actual percentages, but you get the point.)
But did it show as a list or an ordered collection of folders? And the second time you opened the folder did it rearrange into a haphazard scattering with items off the edge of the window?
> I just tried it on Mac, its sorted in the order you listed. Extending it a bit, the order is:
> photo1 photo01 photo001 photo0001 photo2
What you enumerated is known as "ascending lexicographical ordering" and has nothing to do with "the shorter representation of the same number", but instead the ASCII[0] character values in each file name.
The entire idea that numbers would be treated on a character by character basis rather than as numbers is somewhat intuitive for developers and not for non-technical people.
The answer to all of those questions is no for lexicographic ordering. Lexicographic ordering leads to surprise and confusion as a result.
> It's like a feature request that initially sounds reasonable and useful but once you explore the requirements in detail you realize there are too many edge cases to be able to meet the request in a non-brittle way.
It's been on windows and macOS for coming up on 25 years, and is in practically every modern UI. It’s reasonable.
Are filenames likely to include those representations? I feel like probably not (can you even include commas in Windows filenames?)
More to the point of the article--if you want things sorted by date, sort by date. I think most laypeople aren't looking at long CHAR1234_5678 filenames anyway, they're looking at thumbnails and dates.
The most common date format used in Europe uses period separators so can often appear in filenames. Commas are probably more rare. Things like versions are often fractional like v1.3 or v1.11 and can appear embedded in filenames.
> can you even include commas in Windows filenames?
Yes.
> Use any character in the current code page for a name, including Unicode characters and characters in the extended character set (128–255), except for the following:
The following reserved characters:
Here's a different scenario: filenames with dates in them. Consider September Budget and October Budget. September is the equivalent of 9, October of 10. Which comes first for natural sorting? Remember, the file modify date may not be useful here since you may have wrapped up the September budget on October 1st while the prior edit to the October budget may have been on September 20th.
The problem is that there is no such thing as natural, and it is quite hard to determine what is more common. (Quite often more common is culturally dependent or, worse, contex dependent).
Sure, but if in this case the number would have only indicated the month you have an issue way earlier than 100 actually, you already have an issue on month 13 when you would go back got 01 and now you are overriding the old one.
> It’s been about two thousand years since the number of months in a year has been increased.
What? What are you thinking of? The number of months in a year is always 12 or 13 in any calendar system because they start by reflecting the moon. If you mean the Christian calendar, it was fixed at 12 months to the year well over 2000 years ago. If you mean any calendar, it's probably been more like one year since the number of months in a year has been increased. 12 lunar months falls short of a solar year by about 11 days, so any given lunar calendar will generate an extra month about every three years, and there are lots of different lunar calendars.
(For example, the Chinese calendar occasionally repeats full months in order to keep the month of the year lined up with the season. Whenever this happens, there will be 13 months in the year, of which two share the same name.)
The ancient Romans claimed to have had a 10-month calendar [1], which is what I assume the reference is. Either that, or when month 6 got renamed August in honor of Emperor Augustus
> The ancient Romans claimed to have had a 10-month calendar [1], which is what I assume the reference is.
Well, in the first place (as you note), there is no reason to believe that claim - the ancient Romans never made such a claim, but the classical Romans made that claim about the ancient Romans - but more importantly even if it were true the months would have been added many centuries prior to "about two thousand years" ago. Nothing related to additional months happened two thousand years ago.
Given that 09 and 10 refer to months, that wont ever gonna be a problem. And if you want to differentiate them years too, you can prefix with 2025- or put them in a 2025/, 2026/ etc folder.
>September is the equivalent of 9, October of 10. Which comes first for natural sorting? Remember, the file modify date may not be useful here since you may have wrapped up the September budget on October 1st while the prior edit to the October budget may have been on September 20th. The problem is that there is no such thing as natural
Yeah, but there is such a thing as "give a predictable and consistent way I can name the files so that they sort as I want everywhere" which (if different OSes don't try to be "smart") would have been to prefix them with the numeric date zero padded.
Budget 2025-09.ods and Budget 2025-10.ods would sort reliably.
The options explode infinitely if you start trying to guess what people want in terms of semantic grouping. One user might want to see "September Budget" beside "September Sales Projections" and "September Calendar", and another might want to group it with "October Budget" and "November Budget".
If you have simple, stupid, but predictable tools, people can work around that, by picking naming conventions and even directory groupings that achieve what they want.
The worst is when you have an enforced sort that's not what you want. I think in Windows now, even if you say "Sort by name" in the Downloads directory, it insists on sub-grouping by age. I want every version of the Foobaz spec I downloaded, and no, I don't remember if all of them were in the last 3 months!
There is a simple criteria for ordering file names: treat sequences of characters as alphabetical, and sequences of digits as numbers.
It's easy to understand and predictable; it just happens to not be based on ASCII character codes, which is a legacy technology method only ever meaningful to US developers.
Yes, have you never edited the metadata? Also most filesystems these days preserve it when copied, e.g. my camera's EXFAT filesystem on an SD card gets the creation date preserved when I copy it to my PC or NAS, or between NAS & laptop later.
Agreed.What's more, the idea that people learn to put leading zeros is wrong and impractical, unless you know in advance how many digits you need. When you go from version 5.9.17 to 5.10.0 you don't go back and relabel every existing folder as 5.09.17.
The today standard way of sorting is well defined, unambiguous, and natural. Lexographic has its place, but user facing interfaces ain't it.
I had a similar fun problem with a little tool for use with an ATSC TV tuner.
For context, while NTSC program selections were typically indexed by channel ("ABC here is channel 4, NBC is channel 6"), ATSC uses "subchannels" like "12.1" or "21.5". I had assumed these could be safely stored as a decimal type.
Then one of the broadcasters here introduced both "42.1" and "42.10" and it broke the key model in the underlying SQLite database I kept the channel info in.
Lexicographic order is great when you need an unambiguous criterion that will work the same in every implementation; but you only need that for automated processing, i.e. for coding.
For user-facing presentation, having 5.9.xxx before 5.10.xxx is simpler; the corner case that baffles users is having 5.1 and 5.10 before 5.2.
Some (most) systems will sort 5.9 after 5.10 though, so if the user is baffled they'll need to learn it anyway. Adding a second way to do it kinda makes things worse
I think the only problem is that it's a surprise and mystery, particularly because "dumb" alphabetical sort has existed forever. When they "fixed this" for the 99% of regular users cases, they should have made it as separate "smart natural sort" option separate from the "strict alphabetical sort" option (next to date, size, etc). Simple and obvious, rather than surprisingly different from the decades of experience that even non-technical users already have.
It's not just the one decision though; there are literally thousands, maybe tens of thousands, of these decisions in most software. You want every single one of them to have an option? You want it to support every single combination? At some point, it is ridiculous. Sometimes you just have to decide how your software is going to work and not leave every single decision to the user.
You don’t let every decision to the user, you make good defaults, but leave the option to override to the user! And thousands isn’t scary as long as groups/tags/search work, so what’s ridiculous about empowering the user?
Increasing the number of different possible combinations of settings your software can be running with by a factor of one nonillion is not a choice I’d make if I wanted to have any confidence in its reliability and security.
That's why you write small programs. It won't take long for most programs to bloat to the level where they're dealing with nonillions of combinations, whether the user has control over those combinations or not.
How the files sort seems kinda important. It gets at the core behavior of the program. It's not something superficial like a default icon, which the user probably can change.
There's such thing as too many options, and there's also such thing as too few. This is one of the important ones. I'd say that macOS, Gnome, and Windows have definitely hidden or removed a lot of important options in the past decade, and despite the modern slickness mesmerizing people into thinking they're easier to use, they're actually harder to use as a result.
(I say this as a professional developer and power-user of all 3 desktops over the past 25 ish years, who also helps non-technical family and friends a few times every year. Some people will be like "oh I'm so bad at computers lol" or "oh this is a piece of junk huh" but really the UI just got dumber in the name of "ease of use", and the expert has to be called in to decipher it.)
In a file manager? Any more than the displayed thumbnails, icon size, whether folders are separated from files, whether images are separated from videos, what video types are supported, what file types are opened inline, what the click and double click behaviours are, etc?
And yeah kde has settings for all these but kde is also known for being too configurable.
I might be wrong on this, but I vaguely recall that on macOS back when you could commonly option-click to reveal advanced options, if you held option when clicking a sort it would change how it sorted from alphabetical to lexical or vice versa. I’m not a thousand percent sure of it, though, I think when I needed it I was able to set a directory preference via terminal to change how a specific directory was sorted and it was an option there. MacOS had (or has) a lot of buried options which I presume date back to its origins as a Unix as well as a convenience to its developers. A lot of the command line utilities were hacked calls to graphical settings code though, so it wasn’t very stable version to version as the UI calls changed and nobody prioritized non-UI bug fixes or breaking changes. These days CLI is nearly forgotten or assumed to be an exploit vector - see Screen Time data for example.
But the alternative would be a surprise to people who assume "by name" will order numbers, including those who are new to technology (and I think most non-technical people who sort things manually unknowingly order numbers).
We want to minimize surprises and mysteries, but computers have so much hidden complexity it's impossible to eliminate them. If users were shown a full description of how every feature on their computer worked before using it, they'd quickly start ignoring the descriptions. There should probably be a tooltip or "manual entry" for "by name" for those who are curious, and it should never be labeled "alphabetical" because it's not. But cases like the author's, where he assumes a feature works differently than most people (including the designers) assume, can't be helped.
> and the situation where someone wants "10" to be before "9" is far more common.
I guess you mean "after"? Otherwise it seems to me you're agreeing with OP.
> desktops don't label this sorting "alphabetical" (E: and it would really be "lexicographic"*), they label it "by name" (an informal criteria), so technically they're not lying.
FYI the more formal name for the "by name" order is "natural sort order".
It’s more confusing. I thought the article was correct when they said -10 coming before -9. Why? Because they were talking about the strict alphabetical sort. They are already prepending zeroes to force the comparison to be 10 vs 09. So, yes, they were talking about ascending order, but not natural ascending order, but ascii sorting order where 10 is before 9 because the comparison isn’t 9 vs 10, but 1 vs 9.
It was only clear to me because I could guess where they were going. They were complaining about natural sort vs alphabetical sort, which is a case I’ve run into many times, so I could see the argument coming.
The irony to me was that they were already altering how they named files to fit what they thought the computer wanted by prepending a zero to get a proper alphabetic sort. And even after that, some computers didn’t follow their idea of what it should be doing.
I have some beef with microsoft, that you can only change this at the Computer level, not per user (see registry key below). Also they call it natural sorting for users, but logical sorting internaly. Unify your termini!
TIL they are called "hives". Windows Registry is an interesting thing. Even casual users have to interactive with it once or twice w/o fully understand it.
Raymond Chen explained why a registry file is called a “hive”:
Because one of the original developers of Windows NT hated bees. So the developer who was responsible for the registry snuck in as many bee references as he could. A registry file is called a “hive”, and registry data are stored in “cells”, which is what honeycombs are made of.
I don't. I want string sorting to be string sorting. Filenames are strings.
I wouldn't mind if there was an option to tell the file manager to do this "wrangle numbers out of strings and treat them as numbers" thing--so that I could turn that option off, and others who want that behavior could turn it on.
But for this to be the default, without even a way to change it (except in Dolphin, it looks like)? That seems daft to me.
Btw, I use Trinity Desktop, and I just verified that in TDE's version of Konqueror, the sorting of filenames is the same as for ls on the command line, e.g., 'item-10.txt' comes before 'item-9.txt'. Another good reason for me not to have switched to a more "modern" desktop.
> The author's situation is extremely rare
I don't think it is. But that's really beside the point. The computer is my tool. If it doesn't do what I want or expect it to do, it's a bad tool for me. And designers of tools shouldn't be making assumptions about how I want to use it. They should be giving me ways to tune it to how I want to use it.
> "mind-reading" is really helpful in ways we take for granted, like autosave.
I don't use autosave either. I don't want the computer to assume when I want to save a file. The computer is too stupid to know that.
> with auto save systems, you flag/name a version as your canonical save point.
You mean each saved version is stored separately, like a version control system?
A system like that would be fine (in fact I use version control all the time for this kind of thing). But that's often not how auto save is implemented; the auto save just clobbers the last version you saved. That's the kind I don't use.
The file sorting isn’t something relegated to niche users because of the prevalence of tv episode file name sorting (eg S01e01) and it has necessitated the leading zeroes to make it work properly with “alphabetical sorting”.
People sorting their files in alphabetical order but who want numerical values in their files to be sorted digit by digit instead of as numbers is the rare case.
I might go further in my ideal sorting algo which would be normalize capitalization and ignore all non-alphanumeric characters and treat them all as separators.
What you vaguely outline has already been standardised in UTS #10. The algorithm is both based on prevailing user expectations and also has shaped them since the wide-spread adoption of implementations.
"mind-reading" is a really an unfortunate term though. Every algorithm is a strict and consistent set of rules that tries to serve the needs of its users. No magic is ever involved.
It is just that some users have conflicting needs and some sets of rules are more complex than others. So I think what this really is about is 'computer reading', the needs of some users to be able to predict with ease what the computer is going to do. Some people would rather be able to predict the computer doing something that they actually don't really need, and then make up for its shortcomings, than have something they feel they cannot predict and control, but is actually closer to what they want.
This is a bit like the term magic. Any sufficiently complex algorithm may indistinguishable from mind-reading, but it's still an algorithm. Mind-reading, like magic, depends on us being able to understand or not, which is highly subjective. But both are misleading terms.
> I agree with Microsoft/Google/KDE's order. The author's situation is extremely rare...
Even if that were a valid reason for making it the default behavior, the real issue is they don't even give you the option to have the lexically correct sort order. They just decided to give you something that's not accurate and that's all you get.
A trend which is frustratingly, increasingly common.
It's trivial to allow customization behind menus. But we rarely get that anymore. Especially for sandboxes devices like phones.
It's a giant middle finger to users who want to actually use their devices as a tool, instead of simply a portal for more sales and marketing.
I agree with everything but the definition of intuitive; sometimes, the more common situation is less intuitive. An egregious example of this is "Close ad" buttons, which are intentionally placed unintuitively to direct the user to view the ad.
Your definition of "intuitive" would imply that innovation in intuition is impossible, which is evidently not true.
I agree with you, but I also agree with the author: the heuristic used to figure out the "natural" ordering here is broken; if you're going to "guess" at how to order things, you need to be more sophisticated than just "find a suffix that looks like a number and order by it".
How is that right, when file explorer picks an arbitrary character in the middle(!) of the filename and sorts by it? Say, I have a file987name.txt and list5.txt, so sorting by name ascending a file explorer would for whatever reason decide to sort by fifth character, so that list5 would lower than file987name, because 5 is lower than 9, via some twisted logic. How is that normal in any way?
Thankfully I'm using Total Commander and FastStone as a image organizer, neither of which have this bug in the sorting.
Most of the time, as a regular user, I agree with having smarter ordering. And smarter all features for what its worth. Except when it doesn't work because of some corner case. In which case the "smart feature" becomes a kind of a leaky abstraction - now as a user I have to figure out how the machine works, so that I can trick into doing what I need.
Give the user an option: have both "by name" lexicographic ordering, make it default by all means, but also provide a way to switch to an alphabetical order one for power users. Same applies to other features.
It is disappointing that apps and even some Linux Desktops today take the flexibility away from users, in the name of usability. By all means, I like and benefit from all the smart features, and I want them and will keep the on by default, but leave me an option to do the simpler, dumber and more predictable things too, for the case when I need to fallback to it.
The author wants the "worse" sort, one based on ASCII/Unicode codepoints, without any intelligence for numbers that 99% of GUI users want.
For their purposes, they've assumed something about the implementation, to the point that a convenience feature is actually a misfeature for them. But the author here is probably a developer, or close to one, so they do not represent the needs of most people using computers.
Understanding the target audience for your product results in very different design decisions. Better is better might be great for products, but worse is better is probably better for systems that need to grow and evolve.
It's an issue of mental models. As a developer, his mental model is one of how naive software would sort items with mixed numbers in them. Most people, of course, naturally sort 10 after 9 -- their mental model doesn't contain software developer assumptions.
> The author wants the "worse" sort, one based on ASCII/Unicode codepoints, without any intelligence for numbers that 99% of GUI users want.
I want the author's opinion on how caplital and lowercase letters should be sorted. Do they follow strict ASCII/Unicode codepoints, or do they normalize into actual alphabetical order and sort upper/lower within each letter?
> I want the author's opinion on how caplital and lowercase letters should be sorted. Do they follow strict ASCII/Unicode codepoints, or do they normalize into actual alphabetical order and sort upper/lower within each letter?
I prefer the strict ASCII / Unicode sorting (all capitals first, then all lowercase).
This feels like the right moment to mention "ch", which is considered a letter in orthodox Czech, sorted between "h" and "i". The problem is, you can't reliably distinguish between "ch"-the-letter and "ch" as just "c" and "h" combined, which are present in loan words but also some original Czech compound words.
So if you're doing it "properly", sorting strings in Czech involves understanding the etymology of every word.
Why? For example to not have diacritics in month names? Take them as examples as you can easily add them to a shell script to make in work the way you want.
I'm multi-lingual but try to separate business stuff for example (multi-lingual) from private stuff (mostly one language), so clashes between languages rarely happen.
But if it gets complicated I'll usually resort to Perl scripts to take care of pesky details. Sorting an associative array where the key is a string in unified form and the value is the multi-lingual target is rather easy in a script language which one is fluent in.
"Most people" have incoherent ideas that can't even be used. So instead a designer cherry-picks some ideas - setting the agenda - and declares that they're popular. That doesn't make them good ideas. Also, "most people" are easily influenced and will like the terrible things that they've been told to like.
>Understanding the target audience for your product results in very different design decisions
This is an excuse. Just add an option to sort both ways. It isn't hard.
There is no target audience in this planet that benefits from less options or less features. Even if you had the features under an "advanced mode" UI that's still a better software than not having the feature in first place.
Have people forgotten the 80/20 rule? Most features will be used by only a small slice of users, that doesn't mean they're out of scope.
Sorry, I'm just kind of exhausted of software not being able to do the most obvious things because it didn't align to some perfect vision of how the user should be.
> There is no target audience in this planet that benefits from less options or less features.
I'm currently involved in UI design and, to my frustration, adding more options or features seems to send a vocal minority of the user base into a foaming-at-the-mouth violent rage. It's like any change resets the entire contents of their brain, and it's our fault we're making things so confusing for everyone...
And let's not get started on how we're wasting time adding things that they don't personally need, and therefore no one could possibly need, ever. No, clearly by adding this sorting method, we must have directly stolen development time from the feature they want, which is a personal attack directed at them and every member of their family going three generations back.
The most irritating circumstance for this is looking for files named with a hash:
3ea4f...
...
97dce...
...
126b9...
This is one of the settings I immediately turn off on Windows via the registry key mentioned in the other comments here.
I miss the time when computers did what you told them to, instead of trying to read your mind.
These days, it's more like "trying to change your mind". I absolutely hate the "the user is wrong" authoritarian mentality that unfortunately has infected a ton of software, even open-source.
Exactly. This is even more annoying when it isn't exactly a hash, but some gibberish you cannot really make sense of, which does have a numeric section in them: like a user ID, or unix time, or who knows what else it could be, but you are trying to visually find a file abcd89764237 somewhere after abcd683426834, and it isn't evident why you cannot, unit you notice that the latter has more digits in its "ID" for some reason.
It looks like GTK & KDE both suffer from this - I get this behaviour in Thunar and in Dolphin. This is the kind of thing that makes me lose sleep. It's the same on MacOS too, at least in the latest version.
> Well, apparently all these operating systems have decided that no, users are too dumb and they cannot possibly understand what alphabetical order means. So when you ask them to sort your files alphabetically, they don’t. Instead, they decide that if some piece of the file name is a number, the real numerical value must be used.
Well, no. You don't actually ask them to sort in alphabetical order. You ask them to sort "by name", and that is up to their interpretation. And they choose the interpretation that (per their reasoning, and possibly some actual data) seems most likely to correspond to what the user wants.
Maybe future versions of those OSes will add a rule that says that if any of the number groups have leading zeros then it reverts back to actual alphabetic order. Or maybe they'll give you configurable options. (Maybe some of them already do.)
Clearly a leading zero means the number is in octal (but only if all the subsequent digits are between 0 and 7). I think that would lead to the most intuitive results.
> And they choose the interpretation that (per their reasoning, and possibly some actual data) seems most likely to correspond to what the user wants.
Yes, that make sense, but the problem is that this interpretation changed in the last 10 (15? 20?) years. It used to be that "by name" meant "by name, il alphabetical / lexicographical order" in pretty much every file manager.
I almost always want the version-sorting that's being presented in this article, rather than an "alphabetical" sort. But on the other hand, it absolutely seems like a valid bug that this is presented as an "alphabetical" sort, rather than something like "alphabetic/numeric" or similar. In other words, a problem of labeling rather than one of sorting.
It’s not being presented as an alphabetical sort, though. The author assumed that sorting by name meant an alphabetical sort, but that’s not how it’s labeled.
Author here - I Agree both with you and with the parent's comment. Having two options in the "sort by menu" - like "Name (natural)" and "Name (strict)" or something - would have solved everything.
> The problem is imposing it on the user with no warning or option to turn it off.
You can say that about every single design decision made about every product.
The gripe about this particular feature seems misplaced because almost all users will want the sort that's offered and the actual alphabetical sort is likely the desire of a more advanced user who, in fact, is offered a choice through registry editing and/or using a more advanced cli option for the occasion they might need an alternative sort.
Obviously. All I'm saying is that this particular decision ought not to have been taken from the user. Real alphabetical order is not an unreasonable thing to want.
“Real alphabetical ordering” is incredibly nonspecific. It’s underspecified even for ASCII-US, but essentially meaningless for those of us in 2025 who need to handle Unicode.
How do capital letters sort relative to lowercase letters? How do letters sort relative to digits? How do you consider code points that can correspond to different letters in different lettering systems with different ordering? How do you handle diacritics? Do you want the behaviour to be stable through Unicode normalization? Should it differ based on the character encoding? Should different representations of the same character, such as blackboard lettering or circled numbers, be sorted with other representations of the same character or grouped separately?
You can come up answers for these questions, but there’s no unambiguously correct option. The least subjective option is sorting based on encoded byte representation (if that is even specified), but that is not “alphabetical” and would not be intuitive to most users.
You're focusing on the wrong part of the problem when you say "essentially meaningless". Yes, choices must be made about how you order your "alphabet". But the meat of the request is that sorting goes character by character. That's a clear criteria, even with Unicode involved.
And I would say the reasonable way to define character is grapheme cluster and yes you want it stable to normalization and encoding.
How capital letters/diacritics/different representations affect the order of your alphabet, and which ones are considered equivalent, is something without a clear answer. Same for whether letters or numbers come first, and where punctuation goes. But you don't need consensus on that to fix the problem in the post.
I thought it was pretty well-known that capital letter come before lower-case. I think it's punctuation, then numbers, then capital letters, then lower-case. At any rate, that's what textbook indices do (assuming I remember correctly).
You are starting to sound like a troll. Yes, unicode has many representations of digits. That has nothing to do with the question of whether 2.jpg should come before or after 10.jpg.
"Numbers. A customization may be desired to allow sorting numbers in numeric order. If strings including numbers are merely sorted alphabetically, the string “A-10” comes before the string “A-2”, which is often not desired. This behavior can be customized, but it is complicated by ambiguities in recognizing numbers within strings (because they may be formatted according to different language conventions). Once each number is recognized, it can be preprocessed to convert it into a format that allows for correct numeric sorting, such as a textual version of the IEEE numeric format."
Notably, some versions of “sort” on Linux have version sort nowadays. sort -V
I actually don’t know exactly how it works internally and it is a little bit magical, but I use it all the time when looking through my files because it just sorta works in most cases. Of course a nice thing about it is easy to turn on or off.
I am surprised how many people are comfortable calling sorting numbers alphabetical sorting (including TFA).
In true alphabetical sorting, sorting numbers is undefined behaviour. Both of these sorting methods are valid extensions of alphabetical sorting, and which you prefer is just that: a preference.
So actually when he says ‘alphabetical order’, he does not, in fact, mean ‘alphabetical order’.
“There are additional complications in certain languages, where the comparison is context sensitive and depends on more than just single characters compared directly against one another,
[…]
Numbers. A customization may be desired to allow sorting numbers in numeric order. If strings including numbers are merely sorted alphabetically, the string “A-10” comes before the string “A-2”, which is often not desired. This behavior can be customized, but it is complicated by ambiguities in recognizing numbers within strings (because they may be formatted according to different language conventions). Once each number is recognized, it can be preprocessed to convert it into a format that allows for correct numeric sorting, such as a textual version of the IEEE numeric format.”*
I think those file browsers made the right choice, even given that they don’t (as in this example) always do the right thing.
I thought this was pretty well known. E.g. the macOS Foundation library even exposes NSString.localizedStandardCompare() [1] which implements the sorting algorithm used by Finder, and should be used by any well-behaved macOS application. Windows uses StrCompareLogical [2].
I would have assumed it worked the same as ls, so I found the article interesting. But now that I know, I think this way is better.
I can’t think of any case where I would need purely alphabetical sort. In most photo browsing apps, photos will be sorted by timestamp rather than filename. If I really needed it to sort properly in file explorer, I would try sorting on created date. And failing that I would probably just normalize the file names.
Sorting so "foo9" is before "foo10" is called natural sort. I found out about natural sort a week ago and I am thrilled that my programs now print their output in a sensible order. Give natural sort a try and see if it improves your life too :-)
for i in $(seq 2 10) ; do
touch img_$i-hn.txt
done
ls img_* | sort -V
img_2-hn.txt
img_3-hn.txt
img_4-hn.txt
img_5-hn.txt
img_6-hn.txt
img_7-hn.txt
img_8-hn.txt
img_9-hn.txt
img_10-hn.txt
And we have "sort -h" to sort the output of e.g. "du -sh *" properly.
Nah that's not just you. That is an unnatural way to sort things because that's not how numbers are ordered. I remember when Windows changed to sorting numbers by their value and, despite my programmer brain finding it strange in a way, I was super happy to have files display in an order that actually made sense.
Same here. I was surprised at everyone here who prefers the more-complicated-but-arguably-more-intuitive lexical sort. Naive alphabetical sorts break some expectations, but don't produce any weird edge cases.
I wonder if there's an age divide at play here, where those of us who grew up with the naive alphabetical sort prefer it.
The mistake is software which doesn't follow a recognized standard for date/time representation in its filenames. Ie, RFC 3339, ISO8601 or their union/intersection[1] (but preferably just ignore ISO8601 because its overcomplicated and RFC3339 is simpler and more intuitive).
In OP's examples, the filenames are YYYYMMDD_hhmmssssss, which is neither valid ISO8601 nor valid RFC 3999, as the former doesn't accept underscores (only 'T'), and the latter doesn't accept basic format dates (YYYYMMDD), only the equivalent of extended format (YYYY-MM-DD).
And if dates in file names simply used the extended format, the problem disappears. The lexical order is the natural order.
Alternatively, file managers that treat any digits as a number should be improved to recognize when a sequence of digits is not actually a number but a date/time, and order those chronologically. This might occasionally produce a few false positives, but I'd suspect it would be a rare occurrence.
I was very surprised by it when I noticed it a year or so ago. What's interesting is that when it works, eg you have a directory with numbers from 1-10, you don't really notice it. It isn't until it bites you in the ass, eg your downloads folder with a bunch long numeric strings, some in hex, where you want to find one and suddely it's not where you expect.
I used a gui software some years ago that distinguished between version sort and alphabetic sort. It would be handy to have a toggle.
I get it, but if all these major operating systems are handling this same ambiguous [0] situation in the same way, perhaps one needs to reevaluate their mental model or expectations.
Am I out of touch? No, it's the operating systems who are wrong
"I created the Alphanum Algorithm to solve this problem. The Alphanum Algorithm sorts strings containing a mix of letters and numbers. Given strings of mixed characters and numbers, it sorts the numbers in value order, while sorting the non-numbers in ASCII order. The end result is a natural sorting order."
There are many older instances of that, such as "versionsort" from various Linux tools and libraries. I think this has likely been independently recreated several times, with various subtle differences.
I felt a little bad about this snark but actually, author barely understands their own use case (says they want alphabetical order but they actually want something more) and barely understands the UI they're using (says they asked for alphabetical order but none of the file managers they used says it has any such setting) and then they go on to claim this is to satisfy dumb users:
> Well, apparently all these operating systems have decided that no, users are too dumb and they cannot possibly understand what alphabetical order means.
> I have also found a setting to fix Dolphin’s behavior, but it was very much buried into its many configuration options.
KDE wins again. It's my favorite desktop environment, because it has defaults that are friendly to noobs, but it also get out of your way and lets you change things if you want.
The trend is for other desktop environments to be either/or. Either they are super simple and noob friendly, or they are super technical and have a steep learning curve and you get to configure everything - but only via text config. Maybe Cosmic looks like it's going the same route as KDE, where it's trying to bridge the gap.
Well, lots of interfaces don’t say “alphabetical” anymore, they say “name” or some variant, and then they can define it however they want, regardless or because of the frustration it causes users but not some other users which will now be inverted for long term-frustration averaged user experience.
To answer the question in the article, I’m pretty sure Windows Explorer (and probably File Manager before that) has sorted filenames this way for at least 30 years.
For some inexplicable reason, Plex just throws its hand up on non-ASCII characters and puts them first.
In Norway we have three extra letters, æøå, and they're at the end of the alphabet after z. But in Plex, I have Øystein Sunde[1] placed before any other in my music library.
Now in the 1990s I would forgive US software for such a thing, but it's 2025...
ls sorts filenames strictly lexicographically, comparing character by character, so e.g. "055436307" is compared as the characters "0", "5", "5", etc. so it sorts before "121134" because "0" is less than "1". if all compared characters match and one string ends, the shorter one comes first. Symbols like _ are just more characters, and their position relative to digits depends on the locale’s collation table.
Google Drive uses ICU collation with the numeric option enabled, which treats each consecutive sequence of digits inside the filename as an integer. so "055436307" is parsed as the number 55,436,307, while "121134" is parsed as 121,134. and since 121,134 < 55,436,307 then "121134..." comes before "055436307..." even though lexicographic order would suggest the opposite. and i think when two digit runs have the same numeric value, the shorter run comes first; if runs are equal and the string continues, then normal character comparison resumes, including any underscores or suffixes
The leading zero isn't an issue because it will sort correctly under both systems. The issue OP is having is that he's adding random numbers after the hhmmss section. If instead he added a delimiter before the random number the files would sort correctly under both systems as well, e.g. hhmmss_num.
it's weird to me that all the people declaring that they know what the average user wants to see, don't also suggest that the computer should rename files it encounters as necessary to give the user what the user wants.
if we don't have to collate as dictated by ascii, why should we expect users to live within the bounds of file names with dotted extensions? you think users care whether something is a jpg or a png? do users want to see .MOV and .mov next to each other (not because sort, because one camera programmer did it that way for an ancient DOS filesystem, and another didn't.) (unix, btw, never required that users live with dotted extensions, that was a digital knockoff/cpm/microsoft thing that you didn't understand so all your new tools enforce it even though you never had to put your code in a .c file, that was just for your convenience as the user whose needs must be respected)
so, we have to have "computery filenames" but we should violate "computery sorting"? how incredibly close-minded of you, you have no idea or basis to know what users want to see. oh, and the solemnity with which you make these proclamations, ok, don't get me started on that.
I got used to naming files/folders with leading zeros when I want them to be sorted alphabetically (for example payslips/invoices, etc).
But I'm a tech guy, I know what does "alphabetically" mean in the tech world. And it probably is not what common folks mean when they think "alphabetically" outside the tech world.
Edit: in fact, if I recall correctly, the proper term for this kind of sort (the one OP wants) is alphanumeric sort.
I also got used to it, but especially when writing short scripts that generate numbered files it gets annoying to have to pad with zeroes every time, and also precommit to a specific amount of digits you want to allow (finding a compromise between adding a ridiculous amount, like 20, and using only 4 despite knowing the script might one day surpass 10⁵ files).
The natural numbers are ordered. Let me use its ordering instead of having to rely on an ad-hoc lexicographic fixed-length tuple representation of decimal digits, without any padding. My position is that numbers in filenames should always be considered atomically unless explicitly instructed otherwise.
If there were no issues of backwards compatibility, I would thus advocate for changing ls. Eza (maintained fork of Exa, Rust-based ls alternative) actually does sort this way by default, much to my delight.
I think the real issue here is that two Android phones take photos with incompatible naming schemes.
I am sure that at some point someone thought the milliseconds should or should not be separated from the seconds and made that change without thinking through the consequences.
The so-called "natural" sort makes sense for version numbers and enumeration (without zero-padding) but I'm more often dealing with file names with a datetime (like in the article), a hexadecimal hash, or just randomized string of characters that includes numbers. In those cases "natural" sort makes it harder to find the file you're looking for.
Even when files are enumerated it's pretty rare to have more than 9 parts and no zero-padding, whereas there are almost always multiple consecutive digits in the use cases for which "natural" sort is not a good fit for. It just feels like a bad default, at least for a programmer's workload.
AFAICT, natural sort shouldn't ever make datetimes harder to find, unless they are formatted inconsistently, as in the author's case. Suppose one camera wrote dates as 20250928 and another as 2025-09-28. ASCIIbetical sort would do nothing to help here.
Natural sort can even improve things over ASCII sort, for instance if someone is stuck with a format like "28/9/2025" or "September 2 2025"
More fascinating for me is this discussion thread, where there's legitimate debate around the need/expectation for alphabetical sorting to match/include lexical sorting.
I'm personally in the "want lexical as part of alphabetical" - as 'photo19' should come after 'photo2' in my expectations, but the number of cases cited where this doesn't/shouldn't work is enough to justify a degree of contextual or situation awareness that most systems and interfaces simply aren't designed to cater for (file-systems vs photo-storage applications).
Convenient-to-select settings should always include:
Sort:
In Alphabetical Order
In Alphanumeric Order
In Alphabetic-Word Order
In Right-Aligned Alphabetic Order
Randomly
Sometimes
Never
By Hash
Very Fast
In the Background
In the Foreground
In the Underground
In the Cloud
Yes
With Bubbles
No Strong Opinion
Of
On YYYY-MM-DD HH-MM-SS: [SELECT] Repeat: [SELECT]
With Random Site Free Download Sort Extension: [SELECT]
Let Facebook
Emergency Backup Sort [SELECT]
Who Sort?
I have the same issue with "15 minutes before" instead of "2025-09-29 01:13:30".
(Which is wrong once the site doesn't update)
Needless to say, those are all "features" dumbing us down in the long run.
A philosophical side question: I want to opt out of this but I can't. So is this is case where my peers are limiting my intellectual development? I.e. preventing me from a) doing the time calculations in my head, b) writing my software such that is uses leading zeros?
> I miss the time when computers did what you told them to, instead of trying to read your mind.
You haven't seen anything yet.
Get ready for "Sort by AI" which will try to interpret the content of your images to sort them based on what you'll want to look at next.
Incidentally, in this case AI would have sorted them the way you want:
These look like photos straight from a phone, with filenames in the form:
IMG_YYYYMMDD_HHMMSS...
So the natural way to sort them is *chronologically*, by the timestamp embedded in the filename.
If we do that, the order becomes:
1. `IMG_20250820_055436307.jpg` — Aug 20, 05:54:36
2. `IMG_20250820_092016029_HDR.jpg` — Aug 20, 09:20:16
3. `IMG_20250820_092440966_HDR.jpg` — Aug 20, 09:24:40
4. `IMG_20250820_092832138_HDR.jpg` — Aug 20, 09:28:32
5. `IMG_20250820_095716_607.jpg` — Aug 20, 09:57:16
6. `IMG_20250820_103857_991.jpg` — Aug 20, 10:38:57
7. `IMG_20250820_103903_811.jpg` — Aug 20, 10:39:03
That order reflects the actual sequence the photos were taken.
I think the algorithm is probably incorrect. A number starting with 0 should be treated lexically not numerically. Otherwise you have a situation where img_1_01.jpg and img_01_1.jpg does not have a complete ordering.
> Otherwise you have a situation where img_1_01.jpg and img_01_1.jpg does not have a complete ordering.
(Good) "natural sort" implementations generally have ways of handling ties like this. It's similar to the problem of case-insensitive sort over case sensitive sets.
There are quite a few more rules for sorting that can be applied - it's not just numbers, and numbers don't always work the way you describe.
There is "Dictionary Order", "Phone book order", and a few other standards. (Dictionary order is not lexicographic order, even if the two are now commonly conflated).
A simple rule that most still know is a book titled "The Book", should be sorted under "Book, The".
They have variations on how special characters sort, how abbreviations are handled, and even have differences in numbers. For example, in phone book order, "21st Century" sorts under “Twenty-first”, not "21".
And, of course, non-English languages add all sorts of other rules.
This tends to get ignored these days, as lexical sorts are so much easier to implement, that people forget there are other, preferred options.
Unfortunately, it's not so simple, especially once you go beyond the ASCII. Dylan Beattie has this brilliant talk [1] where he points out how even the "systems" in human language involve a pile of quirks rather than any simple clean rules, and many of those rules are conflicting and the appropriate order of precedence depends on the context. Eg: the correct sorting order for the same sets of strings might even depend on the geography in which the question is asked!
If you haven't had to deal with it previously, you'd be flabbergasted at how many foot-guns there are in such a simple question as alphabetical sorting, even without involving numeric components in strings.
There's a Group Policy setting in Windows: Computer Configuration\Administrative Templates\Windows Components\File Explorer\Turn off numerical sorting in File Explorer
Group Policy has so many essential settings I hurry to change with every isntall. I wish Windows would expose more of them to the user in ordinary settings.
When you get a bunch of files (let's say 1000+) without leading zeroes, this is a blessing. But I get the author's frustration, the expected behavior is not there, instead, he gets magical sorting that is wrong for his use case. I'm not sure what the ux should be, and maybe the algorithm here could be smarter, but it's a trade-off.
The expected behaviour is ambiguous (and thus subjective). Older versions of windows shipped with alpha sort. New versions ship with natural alpha sort. According to the UX designers over at Microsoft (and surely the user feedback), natural sort _is_ the expected behaviour.
I certainly agree with natural sort being the expected behaviour too.
This must be why, when I have a folder in Win11 full of files with GUIDs as names, they are never in the order I expect. Windows seems to sort them randomly but there must be some sub-sequence of numbers that it's deciding are the important ones and sorting off those. For me I'd much rather just sort left to right alphabetical.
I honestly thought Explorer was broken and have been looking into 3rd party file browsers for Windows because this has been driving me so nuts. Thank you!
There’s also the account-specific HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Policies\Explorer\ NoStrCmpLogical, which I meant to mention, but mixed them up by mistake.
I think if we (in our industry in general) had REAL agile, and not pseudo-waterfall "the designers design it, the engineers implement it, QA QA's it, and then we lay everyone off because they're no longer needed" (but, loophole alert! We did daily standups and used Jira, so it was "agile" the whole time!), then we'd have a snowball's chance in hell of actually having a reasonable solution to this. Off the top of my head, this seems like something that should be a setting in control panel. But, because everyone assumes (contra to agile) that the designers "got it right the first time", this kind of improvement can't happen.
I find the term "natural" to be inadequate here, there is nothing natural about sorting strings in this particular fashion compared to another. It should be given a more descriptive name, like "Number-aware alphabetic" or something like that as to actually give a hint about what it does.
> I miss the time when computers did what you told them to, instead of trying to read your mind.
This could be an illusion, or at least something difficult to evaluate; the operator is less likely to notice the situations when the computer successfully “reads their mind”.
Also, I guess new users (i.e. those unfamiliar with previous behavior) won’t care as much about wrong assumptions; they will only learn that one doesn’t need a leading zero.
I feel like it's not intelligence or lack there of, it's that implementing sort with a[i] < b[i] is the simplest way to do it. Putting 9 before 10 would require some kind of windowing since otherwise you'd be comparing 9 and 1, and of course 1 is smaller.
Earlier this year I submitted a bug to VSCode about sorting Playwright tests in the alphabetical-and-numerical order that VSCode favours, after Playwright told me it was a VSCode issue.
Some people rushed to fix this as I'd done some diving into the issue and presented the relevant information and code, so now VSCode's Playwright test list uses the same sorting mechanism as the rest of VSCode.
Sadly, the underlying Playwright does not receive that order from VSCode so it still actually runs sequentially-numbered tests in strict alphabetical order. :(
Historically I'd like to add that FileNames are just sequences of bytes that come with a few restrictions.
They don't even have an encoding you can use to sort something.
Windows FileNames look like UTF-16, but they can be truncated. You can't convert them to UTF-8 and back without loss. (For that you need WTF-8)
Once you use random FileNames you'll start to notice...
Be glad you don't have to deal with non-ASCII characters: acute/grave/tilde/umlaut/diaresis/etc. accented characters, dotted vs. dotless 'i' (Turkish), barred i (an 'i' with a sort of dash through the middle, used in some languages for a sort of schwa-like vowel), thorn, not to mention non-Roman characters. And different languages sort the same characters differently, so you can't just pay attention to their Unicode values. (@cubefox has a post here pointing to the Unicode Consortium's doc about sorting)
I thought this was going to be a deep dive into what "alphabetical" means and how that's itself not a universal term between locales, what with so many different collation preferences.
That would likely have been a more useful article for the average developer. It is extremely hard to be aware of all the ways strings of different locales can defy our intuitions.
I don't know why I was surprised to learn this but there is a standard for alphabetical order. The NISO Guidelines for Alphabetical Arrangement of Letters and Sorting of Numerals and Other Symbols: https://www.niso.org/sites/default/files/2017-08/tr03.pdf
Numbers aren't part of "the alphabet", so sorting digits within a string by the numeric value makes just as much sense (and is what most users want most of the time) as treating digits as isolated characters (what OP wants).
As an aside, this is also the reason why ISO 8601 is the best date format – it sorts the same way whether you do it alphabetically or lexicographically.
I was joking. Really I would sort file names lexicographically. But the way Debian sorts version numbers is interesting and seems like a good way of handling that particular situation.
Renaming things to make them queue correctly (I usually couldn't care less about visual sorting, I use a terminal) is by far my #1 task by LoC and frequency of occurence, and by far the most annoying. Metadata can be very helpful to obviate this issue, but it usually just leads to another problem where you now need metadata editors and readers in addition to the "user-visible" name metadata. It's frustrating.
>Well, apparently all these operating systems have decided that no, users are too dumb and they cannot possibly understand what alphabetical order means.
i really really hate this framing, and i see it far too often. no, the operating system developers did not make a value judgement about their users. they observed their users to find out what behaviour was expected, and they designed the behaviour of the system to match the behaviour that the majority of users expect.
and then you made an incorrect assumption about how the system works, and decided that your incorrect assumption means everybody else is dumb and you're the only smart person in this situation?
This is also the case with Excel. If numbers are stored in general formatted columns and you sort by A to Z, you'll get 1, 10, 11, 12, ..., 2, 20, 21 and so on
If you don't like the default natural sorting order, you can just change it in Dolphin. Settings > Configure Dolphin > View > Content Display > select anything other than "natural". You can even pick if you want case sensitivity or not.
The OS doesn't think you're too stupid to understand sorting, it relies on you being smart enough to figure out where the setting is located. In this case, four levels deep is probably too much to ask from users if they will write an entire blog post like this before finding the toggle.
> But 1 is smaller than 9, so file-10.txt should be first in alphabetical order. Everyone understands that, and soon people learn to put enough leading zeros if they want their files to stay sorted the way they like.
No. Not “everyone understands that”. Natural sort happens in real life and everyone understands that. Only those who understand ASCII — not the average user of graphical file managers — will deduce the reason for your definition of “alphabetical order”.
> Now that I know what the issue is, I can solve it by renaming the files with a consistent scheme.
Even if you are a file naming Einstein and you always zero pad your integers to exactly right length, we have this thing called the internet, where you can download other people's files.
If only we could represent sort order by some structural form of decision logic which also embeds encoding a regular .. well.. expression matching a pattern..
I've encountered a tangential problem to this with package versioning on Linux distros. Thankfully it was not too hard to write an algorithm to compare versions (thanks AI!).
This was a fun thing to realize in my early days of programming in Delphi. I guess the author will soon realize why old systems name things ticket "00001" and so on.
I agree that the base functionality of just sorting character by character can be occasionally useful. However I would really be interested in seeing why you believe this to be the correct choice for user-facing graphical file managers, as its evident problems with typical usage seem more salient compared to the edge cases as illustrated in the article.
Many commentors are positing the "clever" sort is what 99% of the user's want, but I really doubt it has been properly checked beyond the original PO's hunch and at most some user panel with pre-sampled data.
Most of these decisions are early default behaviors that stay there as long as users aren't clamoring for change, and TBH I can't imagine most users to have a self emerging strong opinion on how alphabetical sort should be working.
> not every single piece of software fucks up something as basic as string sorting
it is neither basic nor simple. Have you ever heard of UTF-8 and locales?
Here is an exercise for the curious reader:
Pick any UTF-8 string "a", and another one "b", so that in increasing lexicographical order "a" sorts after "a+b" ("a" concatenated by "b"). ("a" > "a+b")
I'm often yelling at the software on my computer: "STOP TRYING TO HELP ME!"
It's like having a toddler help you make a meal. It wants to be involved and recognized so badly. Meanwhile I'm starving and just want to get the food done as quickly as is possible and I'm constantly tripping over this little ball of misguided efforts.
Please. Stop trying to be smarter than me. You often can't, and when you get it wrong, you make it measurably worse. If you insist on doing this please give me the "Expert Mode" setting back so I can flatly disable ALL OF IT with one click.
Thanks, now I cannot unsee this. Thunar has this broken sort order too, and I've no idea how to make it sort file names with hash values 'properly' - by which I mean the same as `ls` which broadly speaking on my system is 0 to 9 then a-z case insensitive.
Instead I have an order of starting character that goes 1,4,5,7,9,2,3,7,8,9,4,6,1,2,.. etc etc which is utterly useless as a sort. I've always thought the sort was weird but couldn't quite figure out why (I usually sort by date descending). Another non-productive thing to figure out and fix.
Call lexicographic order "sort by name" as it's called now, and call dumb character-by-character sort "plain" or something like that. I'm not a designer, maybe there are more intuitive names, but come on. This isn't an intractable problem.
I rename all of my photos upon import using the created date, formatted as `YYYY-MM-DD kk:mm:ss`.
But it would frankly be great if most file browsers just let me sort photos based on metadata. But then I just end up in a dedicated photo browser, instead.
Our users loved when we added "natural" sort, which was pain in the ass in the db, but ultimately no big deal.
They absolutely do not care or understand the difference between alphabetical and numerical and natural, what they care about is 10 should not come before 9 in "Item 10" vs "Item 9".
Whatever pedantic argument you have that natural is not alphabetic will lose you sales, your users do not care and want numbers to make sense in sorting.
I have the same problem on Nemo. More specifically, I had made a small app that displayed files of a directory in alphabetical order, and then when I look at it in Nemo it isn't the same order because I didn't implement their smart algorithm.
I fail to see how this is a "problem"? You implemented a sorting mechanism that was useful to your application, while Nemo implemented another which as this thread demonstrates seems to be much more useful and intuitive for the average user. This is also of course not specific to Nemo, as no 'modern' file manager on Linux sorts filenames like it's 1980 and all you are able to feasibly do is step through the bytes.
> Of course, the user who named those files probably wants file-9.txt to come before file-10.txt. But 1 is smaller than 9, so file-10.txt should be first in alphabetical order. Everyone understands that, and soon people learn to put enough leading zeros if they want their files to stay sorted the way they like. Well, apparently all these operating systems have decided that no, users are too dumb and they cannot possibly understand what alphabetical order means. So when you ask them to sort your files alphabetically, they don’t. Instead, they decide that if some piece of the file name is a number, the real numerical value must be used.
I think there are many things wrong with your assessment of the situation.
First, where does it say in these file managers that they're sorting by alphabetical order? I see that you've specified that you want the files sorted by name, but I don't see that you've specified you want them sorted by name alphabetically. And what does "alphabetical sort" even mean when you're sorting characters which are not letters? What you mean is probably "lexicographical sort".
Second, you admit yourself that users probably want natural sort. Why would you expect these products to do the thing which they know users usually don't want by default? That just seems like bad design to me. They know users usually want natural sort, and you know users usually want natural sort, so why would you expect the default behaviour to be a lexicographical sort?
Third, just like how you've learned to work around the lack of natural sort in poorly designed products of years past by adding leading zeroes, you can just add trailing zeroes to get the lexicographical ordering that you want. Why do you seem to be implying that the latter is more user-hostile than the former? It doesn't make sense to me. A decision had to be made about what sort to use and they picked the one that most people want. Isn't that what we should be expecting in a product that caters to its users?
I see in other comments you've suggested that there should be a separate option for choosing between lexicographical sort and natural sort. But in the past, when lexicographical sort was the only option, why weren't you complaining about it being user-hostile to only have one option then? Why is it only when the default is something you're personally not used to that it warrants complaint? And where do we stop, do we have separate controls for every single sortable string field to determine whether it should be sorted lexicographically or naturally? Or just the name field? Don't you think that is going to lead to interface bloat?
Another problem which annoys me to no end is that most file managers and file selection boxes put directories before files.
This makes it hard to find the file that was most recently changed, for example. Which is an action that is extremely common. (In fact, why does my file manager not have a most-recently-used shortcut?)
In Total Commander, there is a function in the options to sort strict by numerical char code. It will sort those files correctly. Unfortunately, it will also sort "10.txt" before "2.txt".
---
In all file managers, I miss an API point where one can give a userdefined sorting function for the file and folder list.
What do you mean by "Unfortunately"? This appears to be the only correct conclusion from the algorithm you selected, you can't eat the cake and have it too.
Regarding your second point, that's not really what a graphical file manager is for, I think. At this point (likely even earlier) you would be better off just writing a simple script in the scripting language of your choice. (If going for something fancy, you could also implement a FUSE based on symlinks for the original files, where the filename is prepended by a sort key. This would work for every major file manager and you could manipulate the files in mostly the same way as before.)
Paragraph 1: I speak of a sorting method which splits the filename at the boundaries between numbers and non-numbers, and sorts by the parts of the resulting tuple, the numbers naturally (10 comes after 2) and the rest by numerical char code.
Paragraph 2: I am not sure what you mean here with writing a script. The graphical file manager shall sort its file list using the sorting function I hand over to it.
"That's not really what a graphical file manager is for". Says who? Every software which has a plugin system does that, why should a file manager not?
Sorting by name (collation) is waaay tricker than simply figuring out how to parse the numbers.
The International Components for Unicode library implements the Unicode Collation Algorithm, which depends on the language code and region of the locale, and looks up the quirks for each locale in the Common Locale Data Repository.
It's a much better idea to just use the standard ICU library or platform specific libraries (which are often build on ICU like JavaScript's Intl.Collator), instead of trying to hot dog it by rolling your own.
>ICU provides the following services: Unicode text handling, full character properties, and character set conversions; Unicode regular expressions; full Unicode sets; character, word, and line boundaries; language-sensitive collation and searching; normalization, upper and lowercase conversion, and script transliterations; comprehensive locale data and resource bundle architecture via the Common Locale Data Repository (CLDR); multiple calendars and time zones; and rule-based formatting and parsing of dates, times, numbers, currencies, and messages.
>The Unicode collation algorithm (UCA) is an algorithm defined in Unicode Technical Report #10, which is a customizable method to produce binary keys from strings representing text in any writing system and language that can be represented with Unicode. These keys can then be efficiently compared byte by byte in order to collate or sort them according to the rules of the language, with options for ignoring case, accents, etc.[1]
>Unicode Technical Report #10 also specifies the Default Unicode Collation Element Table (DUCET). This data file specifies a default collation ordering. The DUCET is customizable for different languages,[1][2] and some such customizations can be found in the Unicode Common Locale Data Repository (CLDR).[3]
>The Common Locale Data Repository (CLDR) is a project of the Unicode Consortium to provide locale data in XML format for use in computer applications. CLDR contains locale-specific information that an operating system will typically provide to applications. CLDR is written in the Locale Data Markup Language (LDML).
>Among the types of data that CLDR includes are the following:
Translations for language names
Translations for territory and country names
Translations for currency names, including singular/plural modifications
Translations for weekday, month, era, period of day, in full and abbreviated forms
Translations for time zones and example cities (or similar) for time zones
Translations for calendar fields
Patterns for formatting/parsing dates or times of day
Exemplar sets of characters used for writing the language
Patterns for formatting/parsing numbers
Rules for language-adapted collation
Rules for spelling out numbers as words
Rules for formatting numbers in traditional numeral systems (such as Roman and Armenian numerals)
Rules for transliteration between scripts, much of it based on BGN/PCGN romanization
Tricky collation examples:
sv-SE (Swedish): å, ä, ö are separate letters at the end of the alphabet, not variants of a or o.
de-DE (German): ä, ö, ü may sort as ae, oe, ue in some contexts, or as distinct letters. ß sometimes sorts as ss.
tr-TR (Turkish): dotted i (i) and dotless ı are different letters; I sorts with ı, not with i.
es-ES (Spanish): traditionally ch and ll were treated as single letters with their own place in the alphabet.
cs-CZ (Czech): ch still counts as a unique letter, sorted after h.
da-DK / no-NO (Danish/Norwegian): ø comes after z.
is-IS (Icelandic): þ (“thorn”) is part of the alphabet, after z.
fr-FR (French): accents usually ignored in sorting, so é = e, but not always depending on collation settings.
el-GR (Modern Greek): tonos accents, final sigma ς vs. σ, etc.
nl-NL (Dutch): the digraph “ij” is often treated as a single letter, and capitalized as “IJ”. In dictionaries and phone books it often sorts as a single letter under “I”, but sometimes is listed after “X” depending on tradition.
Then you get into non-Latin languages like, Chinese, Japanese, and Korean collation, which gets hairy with radicals, kana order, and stroke count.
Also different locales have different ways of representing numbers, like switching between "," and "." as separators and decimal points.
ICU supports integer only "natural" numeric collation, so anything more complicated like versions, floating point, negative numbers, hex, thousands separators, fractions, roman numerals, etc, you'd have to build on top of ICU.
ICU doesn't support incomprehensible dead languages like Latin or Ancient Greek (it does however support French ;). It does support Roman numeral formatting, but not collation, which would be pretty tricky and ambiguous.
A nuanced but common example that ICU/UCA/CLDR helps with is a menu to select the current locale: you have to translate each language's name into the current locale, and also sort them in the current locale. On top of different collations they can also have totally different spellings, like "United States of America" is "Verenigde Staten van Amerika" in Dutch. This makes it challenging for users to find their own language when the locale is set wrong! You just can't win.
Not to mention emojis! Which comes first: The chicken or the egg? The taco or the poop?
Also, the Mac Finder switches ":" and "/" for historical reasons (HFS used to use ":" as a directory separator instead of "/"), so you can create a file name like "9/11 Attack" in the Finder, which actually gets the underlying Unix filename "9:11 Attack". Don't believe me? Rename a file in the Finder to include a slash, which you know is impossible to represent as a Unix file name. Then go "ls" the directory in the shell.
The Mac Finder weirdly collates "/" after "9" because under the hood it’s really storing it as ":", which sorts before "0". But it also has other punctuation collating inconsistencies, sorting "," and ";" and others after "0" too. Definitely not ASCII order -- I'm not sure what rules it uses, but it's different than "ls".
However, while it's generally true you can't have "/" in Unix file names, NFS used to trustingly let clients rename Unix files to include a "/" in their name, which the Gator Box AppleTalk/Ethernet gateway let you do with the Mac Finder (pre OS/X), which would silently corrupt your "dump" backups on the Unix NFS server, so you would not learn about it until you tried to retrieve your files and "restore" crashed.
>Another reason that NFS sucks: Anyone remember the Gator Box? It enabled you to trick NFS into putting slashes into the names of files and directories, which seemed to work at the time, but came back to totally fuck you later when you tried to restore a dump of your file system.
>The NFS protocol itself didn't disallow slashes in file names, so the NFS server would accept them without question from any client, silently corrupting the file system without any warning. Thanks, NFS!
Heh. One of the bugs that once caused me to bang my head against the wall was caused by the Estonian language. Its alphabet has Z following S and Š. So the "foolproof" regexp to match the letters '[a-za-Z]' was misfiring for some entries.
By the way, there seems to be a "standard" way to sort strings:
> Unicode Technical Report #10 also specifies the Default Unicode Collation Element Table (DUCET). This data file specifies a default collation ordering.
I assume this mainly aims at giving a reasonable compromise between the different dictionary and phone book sorting rules of various languages (and even locales), which should give reasonable results for most languages. I assume this also puts "Alice2" before "Alice10".
> 1.9.2 Non-Goals
>
> The Default Unicode Collation Element Table (DUCET) explicitly does not provide for the following features:
> [ ... ]
> Numeric formatting: numbers composed of a string of digits or other numerics will not necessarily sort in numerical order.
When you're dealing with unix-y Git repositories for example.
If you mean more from a user perspective, it really depends. For registry keys for example, since they're interacted with programmatically for the most part, I was expecting them to be case-sensitive. They're case-insensitive though, so that was a bit of a whiplash.
Digits and any other characters that are not A-Z and a-z should not get sorted. That's the true result of doing what you asked and not what you meant. Pedantic, but that's why we are here.
Not this keyboard not this chair, but the problem is with idiots between keyboards and chairs.
The author is not the ID10T it’s the other general users.
The author is intelligent enough to recognize that this is not alphabetical sort, but the term that they are looking for to describe the sort that they see in dolphin windows, google etc. is *lexical* sort, not alphabetical.
The engineering problem is ID10Tic not technical. How do you educate an illiterate public on what the difference between alphabetical and lexical sort is in practice?
You can’t, so you engineer around it and call lexical sort alphabetical.
This is one of the big ways that LLMs are going to change the game for UX. Your operating system is going to have some sort of 'butler', which knows all of your preferences, and the butler will go through the APIs and man files and informational dialogs of every app you use and auto-configure them.
Then if you want something to change, just ask the butler. If the app is open source and doesn't support the requested feature, the butler might even be able to code it up.
If I understand the article, the author wants magic :)
I take it to mean they want the system to know file_9.txt is less then file_10.txt.
I never saw that happen in any OS, so I do not know what he is referring to. Maybe whatever that old system was, it sorted by create time as opposed to file name.
So, the author can try and create "aisort" that will look at all file names and add leading zeros to the file numeric portion, sort, then remove the zeros added. That will probably as slow as s***t and use gobs pf memory, depending on the number of files.
Author here - My surprise stems exactly from the fact that for the last few years I have exclusively managed my files via a the UNIX shell, which behaves in the classical way.
When I started using Linux as my daily driver after many years of Windows (but with familiarity with UNIX systems going way back), I knew it would be like that in the terminal, but it still took some adjustment. But actually, Nemo does the same "natural sort" thing, and also sorts case-insensitively.
That's not what the author says- they said that file managers actually are somehow sorting file-9.txt before file-10.txt, and it's breaking real alphabetical ordering.
i think it's the opposite, that they _want_ file_10.txt to come before file_9.txt by default, but that file explorers fail at this. it's rare that i want true alphabetical sort, but it's convenient for cases like tfa where alphabetical sort is more predictable if i have filenames that look like <letters>_<numbers-of-same-length>.txt.
I agree with Microsoft/Google/KDE's order. The author's situation is extremely rare, and the situation where someone wants "10" to be before "9" is far more common. Moreover, desktops don't label this sorting "alphabetical" (E: and it would really be "lexicographic"*), they label it "by name" (an informal criteria), so technically they're not lying.
> I miss the time when computers did what you told them to, instead of trying to read your mind.
You may be looking at that time through rose-tinted glasses. I don't like when computers lie to me either, but "mind-reading" is really helpful in ways we take for granted, like autosave. Desktops can have an option to sort files truly alphabetically, but the more common case should always be the default; that's the definition of "intuitive".
* https://news.ycombinator.com/item?id=45404022#45405279
I will add that I'm plenty "smart" enough to understand that "10" comes before "9" in a strictly alphabetical sense, and I still want my file managers to sort "9" before "10".
I don't want to put leading zeroes before every all the single digit numbers in my file names. (And then potentially go come back later and add even more leading zeroes once the maximum number reaches three digits.)
---
I split all of my audiobooks into chapters. I use the format "Chapter 01.mp3" (or "Chapter 001.mp3" when there are > 99 chapters) because some (all?) MP3 players are too stupid to sort numbers properly and I want my audiobooks to work everywhere.
This works, but it looks kind of ugly and creates extra work—yes I have scripts to automate it, it's still an extra step—and it would be great if I could just trust that every device will understand numbers.
> I don't want to put leading zeroes before every all the single digit numbers in my file names.
> ... it would be great if I could just trust that every device will understand numbers.
Strings are not numbers, even if some part of their content "looks like a number."
> I will add that I'm plenty "smart" enough to understand that "10" comes before "9" in a strictly alphabetical sense, and I still want my file managers to sort "9" before "10".
Problem is, this is your preference for a specific situation. Which may not be another person's preference in the same situation nor yours in a different situation.
So what are programs to do?
Display strings in a consistent, documented, manner. Which is lexicographical ordering in all cases lacking meta-data to indicate otherwise.
> Display strings in a consistent, documented, manner.
IMO, "Treat any sequence of digits as a number for the purpose of sorting" is consistent. I'm not sure if it's documented—I've never needed to look up the documentation—but if it's not, the developers could certainly fix that.
> this is your preference for a specific situation.
Sure, but we generally make decisions based on which situations we think will be most common. I think having ten or more things (screenshots, audio samples, whatever) named "Thing 1" – "Thing 10" in a folder is extremely common. And if Thing 10 comes before 9, it's really annoying!
Let's say I have a directory of 32 numbered files. Under the author's preferred sorting method, they'll get displayed:
If I download a folder with files like this, I basically have to pause whatever I'm doing and edit the files to have leading zeroes before I can make sense of what I'm looking at.Do I understand that you want these to be sorted like this?
So I guess you also want things sorted like And also So when you're done defining whatever crazy rules you think up, how do I pause whatever and edit the filenames to get them back into lexicographical order?You can massage lexicographical to meet your needs. I can't massage your arbitrary rules to meet my needs.
Your examples don’t need any extra rules to be sorted correctly. The basic idea is that any sequence of digits is treated for sorting as if it were a single character. On my iPhone, your examples are sorted as expected.
Would you sort
or ?I would not know how an OS treats those if we do not assume mindreading vs proper lexicographic order. Why would we need to substitute precision with vagueness for something that simply taking care of proper naming would suffice?
Ah yes sorry, 1.10 comes after 1.2 because 10 is bigger than 2 (so in fact different from your example). But assuming your original list is a list of versions (which seems reasonable given the presence of multiple decimal points for some cases), then that’s the order you’d want.
If you have non-integer numbers in your filenames then it won’t give the order you want, but there isn’t going to be a rule that works for all cases.
I was with you until this point, but 1.2 is bigger than 1.10, because 1.2 is a shortened version of writing 1.20 _unless_ you explicitely want these to be version numbers or something like that. The normal expectation would be to treat numbers as, well, mathematical numbers, and not SemVer, especially if we only have one decimal point, don't you think?
As I said, the sorting rule won’t always give pleasing results, but it seems to me like a simple and reasonable modification of lexicographic ordering.
It is neither simple, nor reasonable.
1.10, the number, is equivalent to 1.1. It is less than 1.2. You say you want numbers to sort as numbers, but you want 1.10 to be greater than 1.2.
Do you consider '1/4' to be a number? Should it come before or after '1/3'?
I'm guessing that you don't want to sort one character at a time if you encounter one of [0-9]. Instead, you want to group all consecutive [0-9] as a single sortable number. But aren't characters '.', ',', '/', '-' also part of numbers?
What about numbers like ↋, 五, π, B, ⅔, or -1?
It doesn’t work for decimals. It also doesn’t work for pi, or most dates. That’s okay. Supporting those cases would require “reading your mind” / trying to guess what the user wants by applying opaque rules. I certainly don’t want that.
Treating consecutive digits as numbers is a simple modification (I still think it’s quite simple) that is easy to understand and supports 99% of real-world use cases.
> But assuming your original list is a list of versions (which seems reasonable given the presence of multiple decimal points for some cases), then that’s the order you’d want.
What level of assumption is here expected from the sorting-system, would it have to process ALL entries of the list to find multiple decimal-points and then assume that they are ALL versions and not numbers?
How to treat this on different locales, where the decimal point is a comma and thousands-separator is a dot. Should the locale then also be considered by that system? Also when listing the folder of a remote-system with a different locale?
What about dates, should that system attempt to sort entries with multiple date-formats (yyyy-mm-dd, dd-mm-yyyy, dd-MMM-yyyy,...)?
The topic is far more complex than this narrow example. If we expect such a system to alter its sorting based on some data format interpretation, there is a risk of misinterpretation which might make the whole list unusable...
It has nothing to do with decimal points. It just looks at any contiguous sequence of digits and treats it as a single character for the purposes of sorting. The decimal point could be any other character and the behavior would be the same.
So only whole numbers are sorted as numbers then.
Decimal numbers are treated as strings and will have a completely different order, with digits after the decimal point sorted differently to whole numbers without fractions?
Or you mean every set of continuous digits within the same string are considered as individual whole number?
Depending on the decision, either lists of decimal numbers or lists of version numbers will be sorted wrong.
--> This could be covered by adjusting the logic based on the amount of decimal points.
And the logic complexity keeps increasing, up to an arbitrary point of "no, this will not be considered", resulting in an unpredictable user-experience of sorting...
>Depending on the decision, either lists of decimal numbers or lists of version numbers will be sorted wrong.
Yes. I don’t see why this is a big deal.
I didn’t suggest adjusting the logic based on the number of decimal points.
Ah ok.
I understand that you found your perfect trade-off for sorting based on longer considerations. But it will be difficult to communicate such a concept to a user.
Applying partial rules to improve sorting in one direction is not a lossless activity, it makes the UX actually worse in other scenarios as the user is first guided to assume a certain behavior, but then learns that his expectation is broken in adjacent scenarios (Which is more or less the bottom-line of that article to begin with).
In the end it'll be just "another standard" for sorting [0]
[0] https://xkcd.com/927/
> But it will be difficult to communicate such a concept to a user.
This isn't a prerequisite, since the existing naive character sort approach is not communicated either. In fact, it's almost universally unexpected by any user who hasn't written a naive string sort. Apple doesn't do this, and I very much did not need it communicated to me why 10 was coming after 2, because that's what everyone, who's not a programmer, expects.
As a litmus test, go ask some people, who are not programmers, without loading the question beyond "here are some files, how would you expect for them to be displayed in a list?". Show the lists side by side. It should not surprise you.
I consider 八 to be a whole number.
There is a rule that works for all cases. It's lexicographical sorting.
Simple. Consistent. Easy to manipulate to get what you want.
We just discussed a situation where lexicographical sorting doesn’t work. Adding in a rule to treat consecutive digits as one number doesn’t significantly complicate the logic and makes sorting work for a major additional use case. It doesn’t magically fix every case but it fixes a common one with minimal downsides.
> IMO, "Treat any sequence of digits as a number for the purpose of sorting" is consistent.
Are you sure about that?
> Let's say I have a directory of 32 numbered files.Assuming any of the filesystems I am aware of is in use, those names are strings having one or two characters. They are not "numbered files."
Sorting dates: This is why there is an international standard of having YYYY-MM-DD hh:mm:ss in the order we have it. We got to learn this in school in the 80-ies because sorting paper documents would be more logical and easier to find stuff. So way before most people got computerized.
It just happens to be the most logical way to sort for computers too, as long as humans are involved in the usage of the data.
> Sorting dates: This is why there is an international standard of having YYYY-MM-DD hh:mm:ss in the order we have it.
That would be great, but this ISO is just one of the standards, and there are still regional standards as well.
And that's still ignoring the end-user. In Europe for example, humans might create filenames with date in format dd.mm, e.g. "Report 25.01.xls"
A system attempting to sort this intelligently would likely assume this is a decimal number, as it has zero context for it.
It's just slightly worse than the lack of consistent UTC-usage of systems, with the mixed attempts to correct data to local timezone (or not) depending on application...
Okay, I'll refine the rule to "Treat any sequence of digits as a base 10 whole number for the purpose of sorting". I still think this is quite clear. (Frankly, I also think the original definition is quite clear unless you're purposefully trying to misinterpret it.)
> those names are strings having one or two characters. They are not "numbered files."
Yes they are! In this context, a number is an idea, not a data type. Strings are capable of containing numbers.
I generally agree that treating substrings that are numbers as numbers is a good default for most users in most situations.
However, for hex numbers this simply won't give good results because some of them will just happen to not contain any of the digits A to F and be treated as base-10 numbers by the heuristic while others will include these digits and be sorted differently.
(So, a having a strict lexicographic mode as an alternative in file managers would be nice.)
Octal or binary numbers are going to be fine, but it'll totally and confusingly mess up hexadecimal numbers.
I am not sure any of the points you raised change anything to the OP's point, do they?
Op was taking about changing the rule to something more intuitive, in such case it would s'en natural that decimal numbers are used.
Your concept appears to have coherence until you consider that numbers are not necessarily expressed in decimal notation. What about hexadecimal numbers in filenames? Should they be sorted your way?
And what about very long strings of digits in the filenames - so long that they are too long for even the longest available numerical representation? In some apps, they are converted to floating point...
> "Treat any sequence of digits as a number for the purpose of sorting" is consistent.
How about decimal numbers, are they strings or still numbers?
How about version numbers with multiple dots?
How about decimal numbers of a different locale, e.g. you list the folder from a remote machine with filenames of a different locale?
The problem with such semi-consistent schemes is that they are still guess-work, they may make some cases better for some people, but other cases practically unusable because the system doesn't have sufficient information to handle all scenarios consistently.
> Strings are not numbers, even if some part of their content "looks like a number."
Irrelevant and intentionally obtuse. Filenames can't be anything but strings - there's literally no way to mark part of a filename as "this is an integer", so the idea that "strings are not numbers" is ridiculous because the only way to encode numbers (which people constantly want to encode) is as part of a string - which means that parts of filenames are numbers, because that's exactly how people use them.
> Problem is, this is your preference for a specific situation. Which may not be another person's preference in the same situation nor yours in a different situation.
> So what are programs to do?
> Display strings in a consistent, documented, manner. Which is lexicographical ordering in all cases lacking meta-data to indicate otherwise.
These do not follow from each other.
First, the assertion that "peoples' preferences are different, so we shouldn't pick an overwhelmingly common preference" is laughably false. The vast majority of computer users (which happen to not be people on HN) prefer "sort numbers by number rather than by UTF-8 value", so that's simply the correct way to sort.
Second, even regardless of the above, there's nothing preventing a "by name" sorting from being consistent and documented.
Either way, this line of reasoning is just wrong.
> I will add that I'm plenty "smart" enough to understand that "10" comes before "9" in a strictly alphabetical sense
Strictly speaking 9, 1 and 0 are not in the alphabet so can't be sorted alphabetically.
And I think most "normal users" wouldn't expect that programmers generalize the alphabet like we do.
Well, that's not alphabetical order.
It's great if DEs build this and give it a name. It's even better if they have a different one that deals with SI prefixes too. But it's not good if "alphabetical order" means that.
What desktop environment called this alphabetical?
This is a really important point - my file manager just says "Name" with sorting. So while its not perfectly defined, it doesn't make the promise of saying its alphabetical.
I mean, nine does come before ten in alphabetical order.
> I will add that I'm plenty "smart" enough to understand that "10" comes before "9" in a strictly alphabetical sense, and I still want my file managers to sort "9" before "10".
Amen.
> I split all of my audiobooks into chapters. I use the format "Chapter 01.mp3" (or "Chapter 001.mp3" when there are > 99 chapters) because some (all?) MP3 players are too stupid to sort numbers properly and I want my audiobooks to work everywhere.
Well, some car and kitchen radio manufacturers will probably never get this right. In my car (which tends not to be brand new) they even messed up UTF-8 chars, which gets me laughing every time a track has them. It's become a running gag with my wife, "Oh, listen up, it's &%=?! again".
> (all?)
Well, I kind of hate to say this, but Apple got this right with the iPods. They even regarded the metadata fields `sort-*` (e.g. sort-album), movement-name (for series) and movement-index (for part). With these fields they really group and sort my audio books as I expect it to be.
I even wrote my own software to fill these tags appropriately, so that I don't need to split my audio books. I'm pretty happy using `m4b` files - an mp4 / m4a container with chapter support, which is supported perfectly fine on my iPod Nano 7g and my Android Phone (using Audiobookshelf[1] and Voice[2]). After all these years, the iPod Nano 7g to me is the PERFECT portable audio book player with 2 exceptions: Repairability and the proprietary Apple headphone remote protocol [3].
1: https://audiobookshelf.org
2: https://github.com/PaulWoitaschek/Voice
3: https://tinymicros.com/wiki/Apple_iPod_Remote_Protocol
There’s a couple of reasons I don’t use m4b files:
- A lot of my audiobooks come as mp3, and converting to m4b (which is AAC based) would mean loosing quality.
- Some MP3 players (even those that support AAC) don’t support M4B.
- I want playback to stop automatically at the end of a chapter, unless I actively decide to start the next chapter. (Admittedly, some MP3 players don’t have an option for this anyway and will always start the next track. This annoys me.)
- Even with chapter metadata, I find it difficult to seek through a 10+ hour m4b file. Seeking through a 10 – 60 minute chapter is more manageable. (Of course, this doesn’t always work out; A Memory of Light has a single chapter that’s more than ten hours long. Whatever, I want to split in a way that follows the author’s structure, and Sanderson purposefully chose to write one extremely long chapter.)
I probably sound like I regularly switch between 20+ different models of MP3 player. In fact, I mostly use my computer or iPhone these days; however, I expect my audiobook collection to outlast any one piece of hardware.
[flagged]
And maybe someone else uses “American” style dates in their file names mm-dd-YYYY, can those also be put in correct order for those users?
That is just silly notation used by a minority in this world ;-)
Perhaps, but if you set your browser language to US English you have dates displayed as MM.DD.YYYY and there's no way to change it neither to European nor ISO (YYYY-MM-DD) format.
I'm not sure I agree. I think I could be convinced if there was a unique and universal representation for numeric values using characters.
But we have so many textual representations of numeric values that I'm assuming the "mind-reading" goodness only works for a small subset. And the subset will be somewhat intuitive for developers but unlikely to be so for non-technical people.
For example, does the order handle numbers with fractions (decimal points)? If yes, does it require a at least one leading digit (zero)? Does a.12345 come before or after a.345?
Does it handle thousand separators? What about international thousand and decimal separators (e.g. Euro-style . for thousand separation and , for decimal separation).
Does it handle scientific notation?
If the answer is no to any of these questions, it's likely to lead to surprise/confusion.
It's like a feature request that initially sounds reasonable and useful but once you explore the requirements in detail you realize there are too many edge cases to be able to meet the request in a non-brittle way.
The sort rules are simple (1). Treat any consecutive sequence of digits as a number when sorting. So for example version numbers (which must be massively more common than decimals in filenames) work correctly, and 5.9 is indeed smaller than 5.10 and the latter is not identical to 5.1 .
Given that this idea goes back more than two decades, has been the default behaviour of the most used OSes for many years, with no major outcry, I think empirically we can be fairly certain that it does not routinely lead to a lot of surprises and confusion.
(1) https://en.m.wikipedia.org/wiki/Natural_sort_order
> The sort rules are simple
In considering the simplicity of the rule, I think you're using a developers perspective here where we automatically classify numbers and have a clear mental model of the separation between value and representation.
But I'm not sure how simple it would be to explain to a non-technical user why size_5, size_10 and size_15 are in order but size_0.25, size_0.5 and size_0.75 are out-of-order.
> with no major outcry
I'm regularly amazed at how little non-developer/technical users complain about strange and confusing behavior.
> I'm regularly amazed at how little non-developer/technical users complain about strange and confusing behavior.
I am a highly technical user that works with a lot of people with traditional engineering degrees but little to no software experience (except as frequent users). The answer here is that they've learned that all computer software is arcane and mysterious, and so they just accept that there will be strange patterns they have to pick up on, and that's their role as a user. They don't complain about strange and confusing behavior because they treat all the behavior as strange and confusing.
Most of the people I work with are 35+, but even the juniors in MechE, Aero, etc. tend to have some scripting experience that doesn't necessarily translate to having a robust intuition about DBs, the relationship between frontend and backend design, etc.
> But I'm not sure how simple it would be to explain to a non-technical user why size_5, size_10 and size_15 are in order but size_0.25, size_0.5 and size_0.75 are out-of-order.
You don't have to explain it if the situation never comes up.
I'd bet 99.9% of computer users don't have any files which would trigger this edge case in a situation they would actually notice. Decimals just aren't that commonly used in this context, and even if you do have decimals the sorting will still work a lot of the time. For the remaining 0.5%, chalk it up to a bug.
I literally had to test this on my Mac just now because I never realized it was broken.
> I'm regularly amazed at how little non-developer/technical users complain about strange and confusing behavior.
Because EVERYTHING a computer does to non-developer/technical users is "strange and confusing". With few exceptions, most people have no idea why their computer does something the way it does, or how they could make it do something different even if they wanted it to. And most of the time, when they complain about it to someone knowledgeable the answer will be some variant on "that's just sort of the way it is". Imagine a world where the names are sorting the way that the OP is looking for, you're still having to explain to someone why the first group sorts "out of order" and the second group sorts "in order". And if they complained, they would almost certainly get an answer that is some variant on "that's just sort of the way it is".
And if you explain in detail about how it works, a lot of people (not all, but quite of few of the more obstreperous types who raise these as CRITICAL BUGS with solutions apparently SO SIMPLE MY DOG COULD IMPLEMENT IT) will then say "I don't know why you have to make it all so complicated, things were simpler and better in v(n-12) in 1997".
If you add an option you're making it more complicated, harder to document and less discoverable, if you don't it's "useless", if you use a heuristic it's "too magical". Eventually someone has to be unhappy.
> I'm regularly amazed at how little non-developer/technical users complain about strange and confusing behavior.
It reminds me of the recent article here titled something like "Altoids by the mouthful". We just get used to eating cat poop and we never realize it is not a good idea to eat cat poop, not that we should make it more palatable by chasing the cat poop by chewing Altoids by the mouthful.
Edit: for today's lucky ten thousand
https://news.ycombinator.com/item?id=45343449
> Treat any consecutive sequence of digits as a number when sorting.
Based on this description, I have no idea how the following would be sorted:
• photo.jpg
• photo1.jpg
• photo01.jpg
• photos.jpg
Does it matter?
There's a user expectation that photo20.jpg comes after photo3.jpg.
There's no user expectation around whether photo1.jpg or photo01.jpg comes first. Just like there's no user expectation around whether photo1.jpg or Photo1.jpg comes first. Users also don't have the slightest idea about what order punctuation gets sorted in.
Just sort the things that matter in the way users expect (natural sort order) and come up with something reasonably consistent for the rest.
> There's a user expectation that photo20.jpg comes after photo3.jpg.
my user expectation is the opposite
i get what you're saying but it's not achievable in practice, at least not consistently
It sounds like a problem with too many expectations therefore someone will be disappointed.
> Does it matter?
Yes. An algorithm must be unambiguosly specified for all possible inputs.
> An algorithm must be unambiguosly specified for all possible inputs.
And it is. It's just that some outputs may not match what the user expects. TFA's preferred algorithm (simple lexicographic sorting) matches user expectations 90% of the time. The algorithm actually in use on most OSs (simple lexicographic sorting + treat consecutive digits as combined numbers) matches expectations 99% of the time. An algorithm that matches expectations 100% of the time doesn't exist. Shouldn't we pick the 99% algorithm?
(I am admittedly making up the actual percentages, but you get the point.)
I just tried it on Mac, its sorted in the order you listed. Extending it a bit, the order is:
photo1 photo01 photo001 photo0001 photo2
So the shorter representation of the same number comes first. It does make intuitive sense to me.
But did it show as a list or an ordered collection of folders? And the second time you opened the folder did it rearrange into a haphazard scattering with items off the edge of the window?
> I just tried it on Mac, its sorted in the order you listed. Extending it a bit, the order is:
> photo1 photo01 photo001 photo0001 photo2
What you enumerated is known as "ascending lexicographical ordering" and has nothing to do with "the shorter representation of the same number", but instead the ASCII[0] character values in each file name.
0 - https://man.freebsd.org/cgi/man.cgi?query=ascii&apropos=0&se...
With ASCII lexicographic ordering, photo01 would come before photo1.
1 and 0 aren't even in the alphabet so in "alphabetical order" I still wouldn't know a prior how that's sorted.
I guess?
There is a standard algorithm - CLDR collation. There are several options available but, generally speaking, it’s a standard.
The specific option for numeric sorting is “kn”.
As far as I can tell, every operating system and many other interfaces tend to use this standard algorithm.
https://www.unicode.org/reports/tr35/tr35-collation.html#CLD...
> If the answer is no to any of these questions, it's likely to lead to surprise/confusion.
Worse, if the answer is yes to any of these questions, it's also likely to lead to surprise/confusion. The only way to win is not to play.
The entire idea that numbers would be treated on a character by character basis rather than as numbers is somewhat intuitive for developers and not for non-technical people.
The answer to all of those questions is no for lexicographic ordering. Lexicographic ordering leads to surprise and confusion as a result.
> It's like a feature request that initially sounds reasonable and useful but once you explore the requirements in detail you realize there are too many edge cases to be able to meet the request in a non-brittle way.
It's been on windows and macOS for coming up on 25 years, and is in practically every modern UI. It’s reasonable.
Are filenames likely to include those representations? I feel like probably not (can you even include commas in Windows filenames?)
More to the point of the article--if you want things sorted by date, sort by date. I think most laypeople aren't looking at long CHAR1234_5678 filenames anyway, they're looking at thumbnails and dates.
> if you want things sorted by date, sort by date
Unfortunately it doesn't work. When I copy the files, they all get new dates in whatever random order they happened to be copied in.
The most common date format used in Europe uses period separators so can often appear in filenames. Commas are probably more rare. Things like versions are often fractional like v1.3 or v1.11 and can appear embedded in filenames.
That's not fractional though.
Proper fractional, 1.11 is smaller than 1.3.
In versions, 1.11 is larger than 1.3
> can you even include commas in Windows filenames?
Yes.
> Use any character in the current code page for a name, including Unicode characters and characters in the extended character set (128–255), except for the following: The following reserved characters:
< (less than)
> (greater than)
: (colon)
" (double quote)
/ (forward slash)
\ (backslash)
| (vertical bar or pipe)
? (question mark)
* (asterisk)
https://learn.microsoft.com/en-us/windows/win32/fileio/namin...
Ah, the classic filenames with decimal points and scientific notation in them, so common...
Here's a different scenario: filenames with dates in them. Consider September Budget and October Budget. September is the equivalent of 9, October of 10. Which comes first for natural sorting? Remember, the file modify date may not be useful here since you may have wrapped up the September budget on October 1st while the prior edit to the October budget may have been on September 20th.
The problem is that there is no such thing as natural, and it is quite hard to determine what is more common. (Quite often more common is culturally dependent or, worse, contex dependent).
So the argument is that because this doesn't solve the challenge of ordering all possible strings by semantic meaning it should not be used?
Even though it increases the match between semantic meaning and string sorting in many important cases and is a simple and consistent rule?
the one true way: budget_09.csv, budget_10.csv
Then `budget_100.csv` comes by and now you need to rename 99 files.
It’s been about two thousand years since the number of months in a year has been increased. I don’t think we’re getting 88 new ones anytime soon.
Sure, but if in this case the number would have only indicated the month you have an issue way earlier than 100 actually, you already have an issue on month 13 when you would go back got 01 and now you are overriding the old one.
Presumably there is a separate directory for every year.
And it's been about 25 years since we had to increase the number of digits for a year.
budget_97.csv, budget_98.csv, budget_99.csv, budget_2000.csv
> It’s been about two thousand years since the number of months in a year has been increased.
What? What are you thinking of? The number of months in a year is always 12 or 13 in any calendar system because they start by reflecting the moon. If you mean the Christian calendar, it was fixed at 12 months to the year well over 2000 years ago. If you mean any calendar, it's probably been more like one year since the number of months in a year has been increased. 12 lunar months falls short of a solar year by about 11 days, so any given lunar calendar will generate an extra month about every three years, and there are lots of different lunar calendars.
(For example, the Chinese calendar occasionally repeats full months in order to keep the month of the year lined up with the season. Whenever this happens, there will be 13 months in the year, of which two share the same name.)
The ancient Romans claimed to have had a 10-month calendar [1], which is what I assume the reference is. Either that, or when month 6 got renamed August in honor of Emperor Augustus
[1] https://en.wikipedia.org/wiki/Roman_calendar#Legendary_10-mo...
> The ancient Romans claimed to have had a 10-month calendar [1], which is what I assume the reference is.
Well, in the first place (as you note), there is no reason to believe that claim - the ancient Romans never made such a claim, but the classical Romans made that claim about the ancient Romans - but more importantly even if it were true the months would have been added many centuries prior to "about two thousand years" ago. Nothing related to additional months happened two thousand years ago.
Given that 09 and 10 refer to months, that wont ever gonna be a problem. And if you want to differentiate them years too, you can prefix with 2025- or put them in a 2025/, 2026/ etc folder.
Even better, I'd prefer to have more semantic meaning, and for budget-2025-09.csv, buget-2025-10.csv to work everywhere...
>September is the equivalent of 9, October of 10. Which comes first for natural sorting? Remember, the file modify date may not be useful here since you may have wrapped up the September budget on October 1st while the prior edit to the October budget may have been on September 20th. The problem is that there is no such thing as natural
Yeah, but there is such a thing as "give a predictable and consistent way I can name the files so that they sort as I want everywhere" which (if different OSes don't try to be "smart") would have been to prefix them with the numeric date zero padded.
Budget 2025-09.ods and Budget 2025-10.ods would sort reliably.
The options explode infinitely if you start trying to guess what people want in terms of semantic grouping. One user might want to see "September Budget" beside "September Sales Projections" and "September Calendar", and another might want to group it with "October Budget" and "November Budget".
If you have simple, stupid, but predictable tools, people can work around that, by picking naming conventions and even directory groupings that achieve what they want.
The worst is when you have an enforced sort that's not what you want. I think in Windows now, even if you say "Sort by name" in the Downloads directory, it insists on sub-grouping by age. I want every version of the Foobaz spec I downloaded, and no, I don't remember if all of them were in the last 3 months!
There is a simple criteria for ordering file names: treat sequences of characters as alphabetical, and sequences of digits as numbers.
It's easy to understand and predictable; it just happens to not be based on ASCII character codes, which is a legacy technology method only ever meaningful to US developers.
You can easily disable grouping in Windows Explorer.
Date is already in the metadata, it doesn't need to be in the filename.
Have you ever copied a file?
Yes, have you never edited the metadata? Also most filesystems these days preserve it when copied, e.g. my camera's EXFAT filesystem on an SD card gets the creation date preserved when I copy it to my PC or NAS, or between NAS & laptop later.
> Yes, have you never edited the metadata?
I don't even know what that means.
And just because some OS's copy the creation date doesn't mean all of them do. Specifically, the most popular desktop OS -- Windows -- doesn't.
(And it has nothing to do with your filesystem. It's your OS.)
>I don't even know what that means
Obviously something like:
And I'm supposed to do that manually for each of the couple hundred photos I copy...?
I'm sorry if I have a hard time taking that suggestion seriously.
>Yes, have you never edited the metadata?
Is your suggestion that people edit the metadata to get the sorting they want? madness...
Agreed.What's more, the idea that people learn to put leading zeros is wrong and impractical, unless you know in advance how many digits you need. When you go from version 5.9.17 to 5.10.0 you don't go back and relabel every existing folder as 5.09.17.
The today standard way of sorting is well defined, unambiguous, and natural. Lexographic has its place, but user facing interfaces ain't it.
Had this in the Beat Saber mod manager recently. The game released 1.40.10 and my mod manager suddenly thought that game went backwards from 1.40.9
I had a similar fun problem with a little tool for use with an ATSC TV tuner.
For context, while NTSC program selections were typically indexed by channel ("ABC here is channel 4, NBC is channel 6"), ATSC uses "subchannels" like "12.1" or "21.5". I had assumed these could be safely stored as a decimal type.
Then one of the broadcasters here introduced both "42.1" and "42.10" and it broke the key model in the underlying SQLite database I kept the channel info in.
No
Just no
User interfaces that try to be cleaver are a pita.
Keep it simple, and avoid the confusion with corner cases that otherwise will baffle users. Like this
Lexicographic order is great when you need an unambiguous criterion that will work the same in every implementation; but you only need that for automated processing, i.e. for coding.
For user-facing presentation, having 5.9.xxx before 5.10.xxx is simpler; the corner case that baffles users is having 5.1 and 5.10 before 5.2.
Some (most) systems will sort 5.9 after 5.10 though, so if the user is baffled they'll need to learn it anyway. Adding a second way to do it kinda makes things worse
LOL I can tell you don't have the experience of designing UI and shipping product to end users
> Keep it simple
What's simple? Good defaults make things simple, which means putting 9 before 10 in case, for the reason explained by parent.
I think the only problem is that it's a surprise and mystery, particularly because "dumb" alphabetical sort has existed forever. When they "fixed this" for the 99% of regular users cases, they should have made it as separate "smart natural sort" option separate from the "strict alphabetical sort" option (next to date, size, etc). Simple and obvious, rather than surprisingly different from the decades of experience that even non-technical users already have.
It's not just the one decision though; there are literally thousands, maybe tens of thousands, of these decisions in most software. You want every single one of them to have an option? You want it to support every single combination? At some point, it is ridiculous. Sometimes you just have to decide how your software is going to work and not leave every single decision to the user.
You don’t let every decision to the user, you make good defaults, but leave the option to override to the user! And thousands isn’t scary as long as groups/tags/search work, so what’s ridiculous about empowering the user?
Increasing the number of different possible combinations of settings your software can be running with by a factor of one nonillion is not a choice I’d make if I wanted to have any confidence in its reliability and security.
That's why you write small programs. It won't take long for most programs to bloat to the level where they're dealing with nonillions of combinations, whether the user has control over those combinations or not.
How the files sort seems kinda important. It gets at the core behavior of the program. It's not something superficial like a default icon, which the user probably can change.
There's such thing as too many options, and there's also such thing as too few. This is one of the important ones. I'd say that macOS, Gnome, and Windows have definitely hidden or removed a lot of important options in the past decade, and despite the modern slickness mesmerizing people into thinking they're easier to use, they're actually harder to use as a result.
(I say this as a professional developer and power-user of all 3 desktops over the past 25 ish years, who also helps non-technical family and friends a few times every year. Some people will be like "oh I'm so bad at computers lol" or "oh this is a piece of junk huh" but really the UI just got dumber in the name of "ease of use", and the expert has to be called in to decipher it.)
It may be one of thousands of decisions, but it's one of a handful that are exposed in the user interface as a fundamental action.
In a file manager? Any more than the displayed thumbnails, icon size, whether folders are separated from files, whether images are separated from videos, what video types are supported, what file types are opened inline, what the click and double click behaviours are, etc?
And yeah kde has settings for all these but kde is also known for being too configurable.
I might be wrong on this, but I vaguely recall that on macOS back when you could commonly option-click to reveal advanced options, if you held option when clicking a sort it would change how it sorted from alphabetical to lexical or vice versa. I’m not a thousand percent sure of it, though, I think when I needed it I was able to set a directory preference via terminal to change how a specific directory was sorted and it was an option there. MacOS had (or has) a lot of buried options which I presume date back to its origins as a Unix as well as a convenience to its developers. A lot of the command line utilities were hacked calls to graphical settings code though, so it wasn’t very stable version to version as the UI calls changed and nobody prioritized non-UI bug fixes or breaking changes. These days CLI is nearly forgotten or assumed to be an exploit vector - see Screen Time data for example.
But the alternative would be a surprise to people who assume "by name" will order numbers, including those who are new to technology (and I think most non-technical people who sort things manually unknowingly order numbers).
We want to minimize surprises and mysteries, but computers have so much hidden complexity it's impossible to eliminate them. If users were shown a full description of how every feature on their computer worked before using it, they'd quickly start ignoring the descriptions. There should probably be a tooltip or "manual entry" for "by name" for those who are curious, and it should never be labeled "alphabetical" because it's not. But cases like the author's, where he assumes a feature works differently than most people (including the designers) assume, can't be helped.
> and the situation where someone wants "10" to be before "9" is far more common.
I guess you mean "after"? Otherwise it seems to me you're agreeing with OP.
> desktops don't label this sorting "alphabetical" (E: and it would really be "lexicographic"*), they label it "by name" (an informal criteria), so technically they're not lying.
FYI the more formal name for the "by name" order is "natural sort order".
> I guess you mean "after"? Otherwise it seems to me you're agreeing with OP.
Depends on which direction you're sorting in, no?
> Depends on which direction you're sorting in, no?
In a vacuum: yes. In this particular case: no, because we have the article's context clarifying that we're talking about ascending order.
It’s more confusing. I thought the article was correct when they said -10 coming before -9. Why? Because they were talking about the strict alphabetical sort. They are already prepending zeroes to force the comparison to be 10 vs 09. So, yes, they were talking about ascending order, but not natural ascending order, but ascii sorting order where 10 is before 9 because the comparison isn’t 9 vs 10, but 1 vs 9.
It was only clear to me because I could guess where they were going. They were complaining about natural sort vs alphabetical sort, which is a case I’ve run into many times, so I could see the argument coming.
The irony to me was that they were already altering how they named files to fit what they thought the computer wanted by prepending a zero to get a proper alphabetic sort. And even after that, some computers didn’t follow their idea of what it should be doing.
You mean file9 before file10?
I have some beef with microsoft, that you can only change this at the Computer level, not per user (see registry key below). Also they call it natural sorting for users, but logical sorting internaly. Unify your termini!
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\Explorer] "NoStrCmpLogical"=dword:00000001
To change it per user, set it in the user's hive instead of in the local machine hive (e.g. HKEY_CURRENT_USER instead of HKEY_LOCAL_MACHINE)
TIL they are called "hives". Windows Registry is an interesting thing. Even casual users have to interactive with it once or twice w/o fully understand it.
https://learn.microsoft.com/en-us/windows/win32/sysinfo/regi...
Raymond Chen explained why a registry file is called a “hive”:
Because one of the original developers of Windows NT hated bees. So the developer who was responsible for the registry snuck in as many bee references as he could. A registry file is called a “hive”, and registry data are stored in “cells”, which is what honeycombs are made of.
https://devblogs.microsoft.com/oldnewthing/20030808-00/?p=42...
https://devblogs.microsoft.com/oldnewthing/20030808-00/?p=42...
Thanks for fixing the link!
I mean, if you're running regedit at all you are not a casual user.
> I agree with Microsoft/Google/KDE's order.
I don't. I want string sorting to be string sorting. Filenames are strings.
I wouldn't mind if there was an option to tell the file manager to do this "wrangle numbers out of strings and treat them as numbers" thing--so that I could turn that option off, and others who want that behavior could turn it on.
But for this to be the default, without even a way to change it (except in Dolphin, it looks like)? That seems daft to me.
Btw, I use Trinity Desktop, and I just verified that in TDE's version of Konqueror, the sorting of filenames is the same as for ls on the command line, e.g., 'item-10.txt' comes before 'item-9.txt'. Another good reason for me not to have switched to a more "modern" desktop.
> The author's situation is extremely rare
I don't think it is. But that's really beside the point. The computer is my tool. If it doesn't do what I want or expect it to do, it's a bad tool for me. And designers of tools shouldn't be making assumptions about how I want to use it. They should be giving me ways to tune it to how I want to use it.
> "mind-reading" is really helpful in ways we take for granted, like autosave.
I don't use autosave either. I don't want the computer to assume when I want to save a file. The computer is too stupid to know that.
I generally agree with your points (and love TDE) but
> I don't use autosave either. I don't want the computer to assume when I want to save a file. The computer is too stupid to know that.
That’s why, with auto save systems, you flag/name a version as your canonical save point.
Rather like a video game, I’d rather have the autosaves and not need them, because I generally save the game myself, than not have them at all.
A computer can be helpful and obedient at the same time, when it’s done correctly and puts the user in control.
> with auto save systems, you flag/name a version as your canonical save point.
You mean each saved version is stored separately, like a version control system?
A system like that would be fine (in fact I use version control all the time for this kind of thing). But that's often not how auto save is implemented; the auto save just clobbers the last version you saved. That's the kind I don't use.
The file sorting isn’t something relegated to niche users because of the prevalence of tv episode file name sorting (eg S01e01) and it has necessitated the leading zeroes to make it work properly with “alphabetical sorting”.
And that would sort correctly with both methods, though, especially when each "field" is delineated (e.g. Show.S0XE0Y.Episode.Name.HEVC.1080p.mkv)
You’re saying that files with s1e10 and s1e9 would place 9 first?
Both should be supported.
Perhaps put the uncommon (true alphanumerical order) behind a nested menu or something. But the mind-reading-less option should be there.
> Both should be supported.
At least in KDE they are, and you can pick whether you want natural or alphabetical sorting (which has a case sensitive and insensitive variant).
"The author's situation is extremely rare"
People sorting their files for alphabetical order is extremely rare?
And right now I fail to see even one 'case where someone wants "10" to be before "9"'
People sorting their files in alphabetical order but who want numerical values in their files to be sorted digit by digit instead of as numbers is the rare case.
I might go further in my ideal sorting algo which would be normalize capitalization and ignore all non-alphanumeric characters and treat them all as separators.
> ignore all alphanumeric characters
There's not much left to sort by then, is there?
Good catch! Fixed.
What you vaguely outline has already been standardised in UTS #10. The algorithm is both based on prevailing user expectations and also has shaped them since the wide-spread adoption of implementations.
"mind-reading" is a really an unfortunate term though. Every algorithm is a strict and consistent set of rules that tries to serve the needs of its users. No magic is ever involved.
It is just that some users have conflicting needs and some sets of rules are more complex than others. So I think what this really is about is 'computer reading', the needs of some users to be able to predict with ease what the computer is going to do. Some people would rather be able to predict the computer doing something that they actually don't really need, and then make up for its shortcomings, than have something they feel they cannot predict and control, but is actually closer to what they want.
This is a bit like the term magic. Any sufficiently complex algorithm may indistinguishable from mind-reading, but it's still an algorithm. Mind-reading, like magic, depends on us being able to understand or not, which is highly subjective. But both are misleading terms.
> I agree with Microsoft/Google/KDE's order. The author's situation is extremely rare...
Even if that were a valid reason for making it the default behavior, the real issue is they don't even give you the option to have the lexically correct sort order. They just decided to give you something that's not accurate and that's all you get.
A trend which is frustratingly, increasingly common.
It's trivial to allow customization behind menus. But we rarely get that anymore. Especially for sandboxes devices like phones.
It's a giant middle finger to users who want to actually use their devices as a tool, instead of simply a portal for more sales and marketing.
I agree with everything but the definition of intuitive; sometimes, the more common situation is less intuitive. An egregious example of this is "Close ad" buttons, which are intentionally placed unintuitively to direct the user to view the ad.
Your definition of "intuitive" would imply that innovation in intuition is impossible, which is evidently not true.
I agree with you, but I also agree with the author: the heuristic used to figure out the "natural" ordering here is broken; if you're going to "guess" at how to order things, you need to be more sophisticated than just "find a suffix that looks like a number and order by it".
>You may be looking at that time through rose-tinted glasses.
Nope, regarding what he talks about, the time was rose-tinted itself.
What is the reason to append a textual file name with a number? User Experience?
They are magic numbers. Maybe a serial ID, date stamp with more magic, revision, release, ...
Magic Number land has 10 > 9 in the above.
9 > 10 is only possible when removing the Magic Number and morph into mealiness text.
At the moment I cannot think of any magic number where 9 > 10.
How is that right, when file explorer picks an arbitrary character in the middle(!) of the filename and sorts by it? Say, I have a file987name.txt and list5.txt, so sorting by name ascending a file explorer would for whatever reason decide to sort by fifth character, so that list5 would lower than file987name, because 5 is lower than 9, via some twisted logic. How is that normal in any way?
Thankfully I'm using Total Commander and FastStone as a image organizer, neither of which have this bug in the sorting.
... no file explorer behaves as you describe.
That was an analogy, to illustrate how the "intelligent guessing" of sorting looks weird as soon as any other character is ignored.
PS: apparently FastStone also sorts "intelligently" :( , I didn't test it correctly the first time. Only Total Commander does sorting as expected.
Most of the time, as a regular user, I agree with having smarter ordering. And smarter all features for what its worth. Except when it doesn't work because of some corner case. In which case the "smart feature" becomes a kind of a leaky abstraction - now as a user I have to figure out how the machine works, so that I can trick into doing what I need.
Give the user an option: have both "by name" lexicographic ordering, make it default by all means, but also provide a way to switch to an alphabetical order one for power users. Same applies to other features.
It is disappointing that apps and even some Linux Desktops today take the flexibility away from users, in the name of usability. By all means, I like and benefit from all the smart features, and I want them and will keep the on by default, but leave me an option to do the simpler, dumber and more predictable things too, for the case when I need to fallback to it.
Haven't people started calling this "natural" order or something?
[dead]
[flagged]
This is reminding me of the whole "Worse is better" essay and debate:
https://news.ycombinator.com/item?id=27916370
The author wants the "worse" sort, one based on ASCII/Unicode codepoints, without any intelligence for numbers that 99% of GUI users want.
For their purposes, they've assumed something about the implementation, to the point that a convenience feature is actually a misfeature for them. But the author here is probably a developer, or close to one, so they do not represent the needs of most people using computers.
Understanding the target audience for your product results in very different design decisions. Better is better might be great for products, but worse is better is probably better for systems that need to grow and evolve.
It's an issue of mental models. As a developer, his mental model is one of how naive software would sort items with mixed numbers in them. Most people, of course, naturally sort 10 after 9 -- their mental model doesn't contain software developer assumptions.
> The author wants the "worse" sort, one based on ASCII/Unicode codepoints, without any intelligence for numbers that 99% of GUI users want.
I want the author's opinion on how caplital and lowercase letters should be sorted. Do they follow strict ASCII/Unicode codepoints, or do they normalize into actual alphabetical order and sort upper/lower within each letter?
> I want the author's opinion on how caplital and lowercase letters should be sorted. Do they follow strict ASCII/Unicode codepoints, or do they normalize into actual alphabetical order and sort upper/lower within each letter?
I prefer the strict ASCII / Unicode sorting (all capitals first, then all lowercase).
And where do you sort the letter ä? (After a is correct in German, but I think Swedish does it differently.)
This feels like the right moment to mention "ch", which is considered a letter in orthodox Czech, sorted between "h" and "i". The problem is, you can't reliably distinguish between "ch"-the-letter and "ch" as just "c" and "h" combined, which are present in loan words but also some original Czech compound words.
So if you're doing it "properly", sorting strings in Czech involves understanding the etymology of every word.
What a headache! I'm glad that the relevant standard ČSN 97 6030 does not demand analysis of compounds or knowledge of etymology.
That's why we have all this LC_* stuff in Linux, which you can configure to your needs:
Mix in your Swedish or Swaheli, maybe even the Vatican State:> export LC_TIME=en_US.UTF-8
Why would you do this to yourself?
Why? For example to not have diacritics in month names? Take them as examples as you can easily add them to a shell script to make in work the way you want.
How does this work if you're a multi-lingual person and you have files with names in different languages?
I'm multi-lingual but try to separate business stuff for example (multi-lingual) from private stuff (mostly one language), so clashes between languages rarely happen.
But if it gets complicated I'll usually resort to Perl scripts to take care of pesky details. Sorting an associative array where the key is a string in unified form and the value is the multi-lingual target is rather easy in a script language which one is fluent in.
The sorting order is only defined between strings of the same locale, not between strings of different locales.
You can specify the sorting order per command like
LC_COLLATE="tr_TR.utf8" ls
if it differs from your system or user locale.
An alternative is to first transliterate the strings to ASCII and then sort them (but this does not preserve the sorting order of non-latin scripts).
Asciibetical sorting
> most people using computers
> the target audience
Which is it? Those should be different groups.
"Most people" have incoherent ideas that can't even be used. So instead a designer cherry-picks some ideas - setting the agenda - and declares that they're popular. That doesn't make them good ideas. Also, "most people" are easily influenced and will like the terrible things that they've been told to like.
>Understanding the target audience for your product results in very different design decisions
This is an excuse. Just add an option to sort both ways. It isn't hard.
There is no target audience in this planet that benefits from less options or less features. Even if you had the features under an "advanced mode" UI that's still a better software than not having the feature in first place.
Have people forgotten the 80/20 rule? Most features will be used by only a small slice of users, that doesn't mean they're out of scope.
Sorry, I'm just kind of exhausted of software not being able to do the most obvious things because it didn't align to some perfect vision of how the user should be.
> There is no target audience in this planet that benefits from less options or less features.
I'm currently involved in UI design and, to my frustration, adding more options or features seems to send a vocal minority of the user base into a foaming-at-the-mouth violent rage. It's like any change resets the entire contents of their brain, and it's our fault we're making things so confusing for everyone...
And let's not get started on how we're wasting time adding things that they don't personally need, and therefore no one could possibly need, ever. No, clearly by adding this sorting method, we must have directly stolen development time from the feature they want, which is a personal attack directed at them and every member of their family going three generations back.
It is best to not engage with these demons.
KDE welcomes configurable complexity, Gnome deemphasises it. I am glad that broad user choice exists.
The most irritating circumstance for this is looking for files named with a hash:
This is one of the settings I immediately turn off on Windows via the registry key mentioned in the other comments here.I miss the time when computers did what you told them to, instead of trying to read your mind.
These days, it's more like "trying to change your mind". I absolutely hate the "the user is wrong" authoritarian mentality that unfortunately has infected a ton of software, even open-source.
Exactly. This is even more annoying when it isn't exactly a hash, but some gibberish you cannot really make sense of, which does have a numeric section in them: like a user ID, or unix time, or who knows what else it could be, but you are trying to visually find a file abcd89764237 somewhere after abcd683426834, and it isn't evident why you cannot, unit you notice that the latter has more digits in its "ID" for some reason.
It looks like GTK & KDE both suffer from this - I get this behaviour in Thunar and in Dolphin. This is the kind of thing that makes me lose sleep. It's the same on MacOS too, at least in the latest version.
> Well, apparently all these operating systems have decided that no, users are too dumb and they cannot possibly understand what alphabetical order means. So when you ask them to sort your files alphabetically, they don’t. Instead, they decide that if some piece of the file name is a number, the real numerical value must be used.
Well, no. You don't actually ask them to sort in alphabetical order. You ask them to sort "by name", and that is up to their interpretation. And they choose the interpretation that (per their reasoning, and possibly some actual data) seems most likely to correspond to what the user wants.
Maybe future versions of those OSes will add a rule that says that if any of the number groups have leading zeros then it reverts back to actual alphabetic order. Or maybe they'll give you configurable options. (Maybe some of them already do.)
Clearly a leading zero means the number is in octal (but only if all the subsequent digits are between 0 and 7). I think that would lead to the most intuitive results.
> And they choose the interpretation that (per their reasoning, and possibly some actual data) seems most likely to correspond to what the user wants.
Yes, that make sense, but the problem is that this interpretation changed in the last 10 (15? 20?) years. It used to be that "by name" meant "by name, il alphabetical / lexicographical order" in pretty much every file manager.
Microsoft and Apple changed to natural order in 2001.
It never was "alphabetical" but rather an order determined by the numeric index into the used encoding table.
Reminds me of https://xkcd.com/1172/
I almost always want the version-sorting that's being presented in this article, rather than an "alphabetical" sort. But on the other hand, it absolutely seems like a valid bug that this is presented as an "alphabetical" sort, rather than something like "alphabetic/numeric" or similar. In other words, a problem of labeling rather than one of sorting.
It’s not being presented as an alphabetical sort, though. The author assumed that sorting by name meant an alphabetical sort, but that’s not how it’s labeled.
In fairness, sorting by name has, for many years, been an alphabetic sort. Doing a mixed alpha/numeric sort is a relatively new thing.
Natural sorting is relatively new in KDE. But in Windows since 2001.
Yeah, exactly. The behavior described is actually very useful. The problem is imposing it on the user with no warning or option to turn it off.
Author here - I Agree both with you and with the parent's comment. Having two options in the "sort by menu" - like "Name (natural)" and "Name (strict)" or something - would have solved everything.
> The problem is imposing it on the user with no warning or option to turn it off.
You can say that about every single design decision made about every product.
The gripe about this particular feature seems misplaced because almost all users will want the sort that's offered and the actual alphabetical sort is likely the desire of a more advanced user who, in fact, is offered a choice through registry editing and/or using a more advanced cli option for the occasion they might need an alternative sort.
This is a sensible default.
> You can say that about every single design decision made about every product.
No, that's not true. Many aspects of my computer's UI are user-configurable.
Yes but not every single one of them
Obviously. All I'm saying is that this particular decision ought not to have been taken from the user. Real alphabetical order is not an unreasonable thing to want.
“Real alphabetical ordering” is incredibly nonspecific. It’s underspecified even for ASCII-US, but essentially meaningless for those of us in 2025 who need to handle Unicode.
How do capital letters sort relative to lowercase letters? How do letters sort relative to digits? How do you consider code points that can correspond to different letters in different lettering systems with different ordering? How do you handle diacritics? Do you want the behaviour to be stable through Unicode normalization? Should it differ based on the character encoding? Should different representations of the same character, such as blackboard lettering or circled numbers, be sorted with other representations of the same character or grouped separately?
You can come up answers for these questions, but there’s no unambiguously correct option. The least subjective option is sorting based on encoded byte representation (if that is even specified), but that is not “alphabetical” and would not be intuitive to most users.
You're focusing on the wrong part of the problem when you say "essentially meaningless". Yes, choices must be made about how you order your "alphabet". But the meat of the request is that sorting goes character by character. That's a clear criteria, even with Unicode involved.
And I would say the reasonable way to define character is grapheme cluster and yes you want it stable to normalization and encoding.
How capital letters/diacritics/different representations affect the order of your alphabet, and which ones are considered equivalent, is something without a clear answer. Same for whether letters or numbers come first, and where punctuation goes. But you don't need consensus on that to fix the problem in the post.
I thought it was pretty well-known that capital letter come before lower-case. I think it's punctuation, then numbers, then capital letters, then lower-case. At any rate, that's what textbook indices do (assuming I remember correctly).
The issue at hand is how numbers are sorted. That has nothing to do with unicode.
Unicode has many different representations of digits, and I would dispute using the term “alphabetical” to refer to digit ordering in any case.
You are starting to sound like a troll. Yes, unicode has many representations of digits. That has nothing to do with the question of whether 2.jpg should come before or after 10.jpg.
You think user deserve to have control, but you think that control only needs to extend to the treatment of those ten characters, nothing else?
I guess your position is coherent, but it’s very silly.
Those ten characters are of disproportionately high importance (to put it mildly).
You're wrong about that. See UTS #10 § 1.4.
(I did not downvote you.)
"Numbers. A customization may be desired to allow sorting numbers in numeric order. If strings including numbers are merely sorted alphabetically, the string “A-10” comes before the string “A-2”, which is often not desired. This behavior can be customized, but it is complicated by ambiguities in recognizing numbers within strings (because they may be formatted according to different language conventions). Once each number is recognized, it can be preprocessed to convert it into a format that allows for correct numeric sorting, such as a textual version of the IEEE numeric format."
Notably, some versions of “sort” on Linux have version sort nowadays. sort -V
I actually don’t know exactly how it works internally and it is a little bit magical, but I use it all the time when looking through my files because it just sorta works in most cases. Of course a nice thing about it is easy to turn on or off.
The term for the sort in the article is called lexical, but the problem is the people are stupid.
The average user does not know the difference between lexical and alphabetic sort
I am surprised how many people are comfortable calling sorting numbers alphabetical sorting (including TFA).
In true alphabetical sorting, sorting numbers is undefined behaviour. Both of these sorting methods are valid extensions of alphabetical sorting, and which you prefer is just that: a preference.
So actually when he says ‘alphabetical order’, he does not, in fact, mean ‘alphabetical order’.
Yes. This is called ”natural order”.
I personally call it "ASCII sorting", or "UTF-8 sorting".
https://www.unicode.org/reports/tr10/#Contextual_Sensitivity:
“There are additional complications in certain languages, where the comparison is context sensitive and depends on more than just single characters compared directly against one another,
[…]
Numbers. A customization may be desired to allow sorting numbers in numeric order. If strings including numbers are merely sorted alphabetically, the string “A-10” comes before the string “A-2”, which is often not desired. This behavior can be customized, but it is complicated by ambiguities in recognizing numbers within strings (because they may be formatted according to different language conventions). Once each number is recognized, it can be preprocessed to convert it into a format that allows for correct numeric sorting, such as a textual version of the IEEE numeric format.”*
I think those file browsers made the right choice, even given that they don’t (as in this example) always do the right thing.
But -10 is smaller than -2, right?
Filenames rarely have negative numbers in them, and it'd usually be ambiguous whether they were negative or dash-separated positive.
I know you jest, but this just further demonstrates why Natural Sorting is complicated and might not be the best default choice.
my_photos_at_-3c
my_photos_at_-10c
Do users want smaller numbers first, or do they want them in counting order, away from zero?
That's a hyphen, not a minus sign, silly.
I thought this was pretty well known. E.g. the macOS Foundation library even exposes NSString.localizedStandardCompare() [1] which implements the sorting algorithm used by Finder, and should be used by any well-behaved macOS application. Windows uses StrCompareLogical [2].
[1] https://developer.apple.com/documentation/foundation/nsstrin...:)
[2] https://learn.microsoft.com/en-us/windows/win32/api/shlwapi/...
I would have assumed it worked the same as ls, so I found the article interesting. But now that I know, I think this way is better.
I can’t think of any case where I would need purely alphabetical sort. In most photo browsing apps, photos will be sorted by timestamp rather than filename. If I really needed it to sort properly in file explorer, I would try sorting on created date. And failing that I would probably just normalize the file names.
I tried it just for kicks.
The Finder sorts these as:
Whereas `ls -l` gives meSorting so "foo9" is before "foo10" is called natural sort. I found out about natural sort a week ago and I am thrilled that my programs now print their output in a sensible order. Give natural sort a try and see if it improves your life too :-)
I found the magic two lines of Python to do a natural sort here, by the way: https://stackoverflow.com/questions/11150239/natural-sorting...
Natural sort is an Option in sort(1):
And we have "sort -h" to sort the output of e.g. "du -sh *" properly.Edit: formatting and add sort -h
Maybe it's just me but I don't miss this at all:
The only time natural sort bit me was with nonsensical names like <md5>.jpgNah that's not just you. That is an unnatural way to sort things because that's not how numbers are ordered. I remember when Windows changed to sorting numbers by their value and, despite my programmer brain finding it strange in a way, I was super happy to have files display in an order that actually made sense.
I think it depends on the person. That order is exactly what I expect and want.
Same here. I was surprised at everyone here who prefers the more-complicated-but-arguably-more-intuitive lexical sort. Naive alphabetical sorts break some expectations, but don't produce any weird edge cases.
I wonder if there's an age divide at play here, where those of us who grew up with the naive alphabetical sort prefer it.
You prefer looking at photos in that weirdly particular shuffled order that isn't the order they were taken in?
The mistake is software which doesn't follow a recognized standard for date/time representation in its filenames. Ie, RFC 3339, ISO8601 or their union/intersection[1] (but preferably just ignore ISO8601 because its overcomplicated and RFC3339 is simpler and more intuitive).
In OP's examples, the filenames are YYYYMMDD_hhmmssssss, which is neither valid ISO8601 nor valid RFC 3999, as the former doesn't accept underscores (only 'T'), and the latter doesn't accept basic format dates (YYYYMMDD), only the equivalent of extended format (YYYY-MM-DD).
And if dates in file names simply used the extended format, the problem disappears. The lexical order is the natural order.
Alternatively, file managers that treat any digits as a number should be improved to recognize when a sequence of digits is not actually a number but a date/time, and order those chronologically. This might occasionally produce a few false positives, but I'd suspect it would be a rare occurrence.
[1]:https://ijmacd.github.io/rfc3339-iso8601/
If I want to sort by date, I sort by the "Date" column, not the file name
I hope you don't ever copy files.
I copy files all the time? I have files in my documents folder with creation dates in the 90's that have been copied forward between many computers.
Or edit any files with historical data.
Creation Date and Modification Date are separate
If I wanted to sort by date taken I would do just that using the EXIF data on them.
More importantly, it is how computers work, and how computers have worked for many decades.
Anyone with experience expects them to work this way. Trying to be clever to cater to the inexperienced only harms both groups.
Computers have been sorting with natural sort for decades. By now, it is "how computers work".
Were you under the impression this was something new?
I was very surprised by it when I noticed it a year or so ago. What's interesting is that when it works, eg you have a directory with numbers from 1-10, you don't really notice it. It isn't until it bites you in the ass, eg your downloads folder with a bunch long numeric strings, some in hex, where you want to find one and suddely it's not where you expect.
I used a gui software some years ago that distinguished between version sort and alphabetic sort. It would be handy to have a toggle.
I get it, but if all these major operating systems are handling this same ambiguous [0] situation in the same way, perhaps one needs to reevaluate their mental model or expectations.
Am I out of touch? No, it's the operating systems who are wrong
0 - numbers are not part of the alphabet.
"I created the Alphanum Algorithm to solve this problem. The Alphanum Algorithm sorts strings containing a mix of letters and numbers. Given strings of mixed characters and numbers, it sorts the numbers in value order, while sorting the non-numbers in ASCII order. The end result is a natural sorting order."
https://web.archive.org/web/20210207124255/http://www.daveko...
There are many older instances of that, such as "versionsort" from various Linux tools and libraries. I think this has likely been independently recreated several times, with various subtle differences.
Numbers aren't in the alphabet. So no, you don't mean alphabetical order.
I felt a little bad about this snark but actually, author barely understands their own use case (says they want alphabetical order but they actually want something more) and barely understands the UI they're using (says they asked for alphabetical order but none of the file managers they used says it has any such setting) and then they go on to claim this is to satisfy dumb users:
> Well, apparently all these operating systems have decided that no, users are too dumb and they cannot possibly understand what alphabetical order means.
It really is world-class irony. Impressive.
There isn't the alphabet
> I have also found a setting to fix Dolphin’s behavior, but it was very much buried into its many configuration options.
KDE wins again. It's my favorite desktop environment, because it has defaults that are friendly to noobs, but it also get out of your way and lets you change things if you want.
The trend is for other desktop environments to be either/or. Either they are super simple and noob friendly, or they are super technical and have a steep learning curve and you get to configure everything - but only via text config. Maybe Cosmic looks like it's going the same route as KDE, where it's trying to bridge the gap.
Well, lots of interfaces don’t say “alphabetical” anymore, they say “name” or some variant, and then they can define it however they want, regardless or because of the frustration it causes users but not some other users which will now be inverted for long term-frustration averaged user experience.
To answer the question in the article, I’m pretty sure Windows Explorer (and probably File Manager before that) has sorted filenames this way for at least 30 years.
I can confirm that this does not happen in Windows 98, but does happen in Windows XP.
Isnt the author confusing "alphabetical sorting" with "ASCII sorting"?
Afaik there is no universal way to handle numbers in alphabetical lists. Sometimes numbers some before letters, sometimes after, etc.
A digit is not a part of the alphabet, right?
> Isnt the author confusing "alphabetical sorting" with "ASCII sorting"?
But it's actually not ASCII sorting either! ASCII sorting would mean 'Z' comes before 'a' and I assume even the author doesn't want that!
No matter what, there are going to be hidden tricks!
But it's actually not ASCII sorting either! ASCII sorting would mean 'Z' comes before 'a' and I assume even the author doesn't want that!
I don't know about the author, but that's exactly what many others who know about ASCII expect, including me. Digits, then uppercase, then lowercase.
Plex team, are you reading this?
For some inexplicable reason, Plex just throws its hand up on non-ASCII characters and puts them first.
In Norway we have three extra letters, æøå, and they're at the end of the alphabet after z. But in Plex, I have Øystein Sunde[1] placed before any other in my music library.
Now in the 1990s I would forgive US software for such a thing, but it's 2025...
[1]: https://en.wikipedia.org/wiki/%C3%98ystein_Sunde
ls sorts filenames strictly lexicographically, comparing character by character, so e.g. "055436307" is compared as the characters "0", "5", "5", etc. so it sorts before "121134" because "0" is less than "1". if all compared characters match and one string ends, the shorter one comes first. Symbols like _ are just more characters, and their position relative to digits depends on the locale’s collation table.
Google Drive uses ICU collation with the numeric option enabled, which treats each consecutive sequence of digits inside the filename as an integer. so "055436307" is parsed as the number 55,436,307, while "121134" is parsed as 121,134. and since 121,134 < 55,436,307 then "121134..." comes before "055436307..." even though lexicographic order would suggest the opposite. and i think when two digit runs have the same numeric value, the shorter run comes first; if runs are equal and the string continues, then normal character comparison resumes, including any underscores or suffixes
It feels like this algorithm could be improved though. If a number has leading zeros you probably don't want to sort it numerically.
That said the author's situation where it's numerical and different lengths seems likely rare enough that it probably isn't worth complicating things.
The leading zero isn't an issue because it will sort correctly under both systems. The issue OP is having is that he's adding random numbers after the hhmmss section. If instead he added a delimiter before the random number the files would sort correctly under both systems as well, e.g. hhmmss_num.
Yes that's what I said:
> where it's numerical and different lengths
it's weird to me that all the people declaring that they know what the average user wants to see, don't also suggest that the computer should rename files it encounters as necessary to give the user what the user wants.
if we don't have to collate as dictated by ascii, why should we expect users to live within the bounds of file names with dotted extensions? you think users care whether something is a jpg or a png? do users want to see .MOV and .mov next to each other (not because sort, because one camera programmer did it that way for an ancient DOS filesystem, and another didn't.) (unix, btw, never required that users live with dotted extensions, that was a digital knockoff/cpm/microsoft thing that you didn't understand so all your new tools enforce it even though you never had to put your code in a .c file, that was just for your convenience as the user whose needs must be respected)
so, we have to have "computery filenames" but we should violate "computery sorting"? how incredibly close-minded of you, you have no idea or basis to know what users want to see. oh, and the solemnity with which you make these proclamations, ok, don't get me started on that.
I got used to naming files/folders with leading zeros when I want them to be sorted alphabetically (for example payslips/invoices, etc).
But I'm a tech guy, I know what does "alphabetically" mean in the tech world. And it probably is not what common folks mean when they think "alphabetically" outside the tech world.
Edit: in fact, if I recall correctly, the proper term for this kind of sort (the one OP wants) is alphanumeric sort.
I also got used to it, but especially when writing short scripts that generate numbered files it gets annoying to have to pad with zeroes every time, and also precommit to a specific amount of digits you want to allow (finding a compromise between adding a ridiculous amount, like 20, and using only 4 despite knowing the script might one day surpass 10⁵ files).
The natural numbers are ordered. Let me use its ordering instead of having to rely on an ad-hoc lexicographic fixed-length tuple representation of decimal digits, without any padding. My position is that numbers in filenames should always be considered atomically unless explicitly instructed otherwise.
If there were no issues of backwards compatibility, I would thus advocate for changing ls. Eza (maintained fork of Exa, Rust-based ls alternative) actually does sort this way by default, much to my delight.
shameless plug (though I don't get a cent out of this, of course): https://blog.vslira.net/2025/03/a-neat-approach-for-sortable...
I think the real issue here is that two Android phones take photos with incompatible naming schemes.
I am sure that at some point someone thought the milliseconds should or should not be separated from the seconds and made that change without thinking through the consequences.
The so-called "natural" sort makes sense for version numbers and enumeration (without zero-padding) but I'm more often dealing with file names with a datetime (like in the article), a hexadecimal hash, or just randomized string of characters that includes numbers. In those cases "natural" sort makes it harder to find the file you're looking for.
Even when files are enumerated it's pretty rare to have more than 9 parts and no zero-padding, whereas there are almost always multiple consecutive digits in the use cases for which "natural" sort is not a good fit for. It just feels like a bad default, at least for a programmer's workload.
> a hexadecimal hash
I agree with you on this point.
> a datetime
AFAICT, natural sort shouldn't ever make datetimes harder to find, unless they are formatted inconsistently, as in the author's case. Suppose one camera wrote dates as 20250928 and another as 2025-09-28. ASCIIbetical sort would do nothing to help here.
Natural sort can even improve things over ASCII sort, for instance if someone is stuck with a format like "28/9/2025" or "September 2 2025"
More fascinating for me is this discussion thread, where there's legitimate debate around the need/expectation for alphabetical sorting to match/include lexical sorting.
I'm personally in the "want lexical as part of alphabetical" - as 'photo19' should come after 'photo2' in my expectations, but the number of cases cited where this doesn't/shouldn't work is enough to justify a degree of contextual or situation awareness that most systems and interfaces simply aren't designed to cater for (file-systems vs photo-storage applications).
Convenient-to-select settings should always include:
Sort:
"The Tyranny of the Marginal User" strikes again: https://nothinghuman.substack.com/p/the-tyranny-of-the-margi...
I have the same issue with "15 minutes before" instead of "2025-09-29 01:13:30".
(Which is wrong once the site doesn't update)
Needless to say, those are all "features" dumbing us down in the long run.
A philosophical side question: I want to opt out of this but I can't. So is this is case where my peers are limiting my intellectual development? I.e. preventing me from a) doing the time calculations in my head, b) writing my software such that is uses leading zeros?
> I miss the time when computers did what you told them to, instead of trying to read your mind.
You haven't seen anything yet. Get ready for "Sort by AI" which will try to interpret the content of your images to sort them based on what you'll want to look at next.
Incidentally, in this case AI would have sorted them the way you want:
That’s why I don’t even bother with the file name for photos.
1. Sync all equipments to the same clock.
2. Sort by Date Taken, if unavailable, sort by Date Created.
Yes, sounds to me like the user really wanted to sort by time created. And got used to sorting alphabetically as a poor proxy for that.
When copying files from a device and then between systems, too often the dates get lost (shouldn't, but still...)
That only happens for the datetime metadata of the files (modified, created, access etc). The EXIF metadata will still remain the same.
I think the algorithm is probably incorrect. A number starting with 0 should be treated lexically not numerically. Otherwise you have a situation where img_1_01.jpg and img_01_1.jpg does not have a complete ordering.
That's not the issue.
The issue here is that one camera appends milliseconds to the seconds without a separator, and the other uses a separator.
So of course the ones that include milliseconds look like bigger numbers and get sorted last.
Leading zeros aren't the issue here.
> Otherwise you have a situation where img_1_01.jpg and img_01_1.jpg does not have a complete ordering.
(Good) "natural sort" implementations generally have ways of handling ties like this. It's similar to the problem of case-insensitive sort over case sensitive sets.
It wouldn't be the first time widely-used software sorted numbers by a function that does not produce a total ordering. For example, Excel: https://gregat.es/excel-numeric-order-transitivity/
Sort by the time the photo was a taken in the metadata?
There are quite a few more rules for sorting that can be applied - it's not just numbers, and numbers don't always work the way you describe.
There is "Dictionary Order", "Phone book order", and a few other standards. (Dictionary order is not lexicographic order, even if the two are now commonly conflated).
A simple rule that most still know is a book titled "The Book", should be sorted under "Book, The".
They have variations on how special characters sort, how abbreviations are handled, and even have differences in numbers. For example, in phone book order, "21st Century" sorts under “Twenty-first”, not "21".
And, of course, non-English languages add all sorts of other rules.
This tends to get ignored these days, as lexical sorts are so much easier to implement, that people forget there are other, preferred options.
Unfortunately, it's not so simple, especially once you go beyond the ASCII. Dylan Beattie has this brilliant talk [1] where he points out how even the "systems" in human language involve a pile of quirks rather than any simple clean rules, and many of those rules are conflicting and the appropriate order of precedence depends on the context. Eg: the correct sorting order for the same sets of strings might even depend on the geography in which the question is asked!
If you haven't had to deal with it previously, you'd be flabbergasted at how many foot-guns there are in such a simple question as alphabetical sorting, even without involving numeric components in strings.
[1] There's no such thing as plain text https://www.youtube.com/watch?v=ajfb5LSbQVM
> But nope, this is not it, because the good old ls sorts my files correctly
Did the author try "ls -v"? It would probably give the exact same order these file managers used.
ls -l does not sort, so I think the author is just very confused?
GNU ls sorts alphabetically "if none of -cftuvSUX nor --sort is specified".
There's a Group Policy setting in Windows: Computer Configuration\Administrative Templates\Windows Components\File Explorer\Turn off numerical sorting in File Explorer Group Policy has so many essential settings I hurry to change with every isntall. I wish Windows would expose more of them to the user in ordinary settings.
When you get a bunch of files (let's say 1000+) without leading zeroes, this is a blessing. But I get the author's frustration, the expected behavior is not there, instead, he gets magical sorting that is wrong for his use case. I'm not sure what the ux should be, and maybe the algorithm here could be smarter, but it's a trade-off.
> the expected behavior is not there
The expected behaviour is ambiguous (and thus subjective). Older versions of windows shipped with alpha sort. New versions ship with natural alpha sort. According to the UX designers over at Microsoft (and surely the user feedback), natural sort _is_ the expected behaviour.
I certainly agree with natural sort being the expected behaviour too.
This must be why, when I have a folder in Win11 full of files with GUIDs as names, they are never in the order I expect. Windows seems to sort them randomly but there must be some sub-sequence of numbers that it's deciding are the important ones and sorting off those. For me I'd much rather just sort left to right alphabetical.
FWIW, for Windows Explorer the numerical sort order can be disabled by setting the DWORD value
in the registry to 1.I honestly thought Explorer was broken and have been looking into 3rd party file browsers for Windows because this has been driving me so nuts. Thank you!
There’s also the account-specific HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Policies\Explorer\ NoStrCmpLogical, which I meant to mention, but mixed them up by mistake.
I think if we (in our industry in general) had REAL agile, and not pseudo-waterfall "the designers design it, the engineers implement it, QA QA's it, and then we lay everyone off because they're no longer needed" (but, loophole alert! We did daily standups and used Jira, so it was "agile" the whole time!), then we'd have a snowball's chance in hell of actually having a reasonable solution to this. Off the top of my head, this seems like something that should be a setting in control panel. But, because everyone assumes (contra to agile) that the designers "got it right the first time", this kind of improvement can't happen.
In case anyone else strongly disagrees with the author and wants to implement the Microsoft/Google/KDE/etc behavior, Google conveniently has this open sourced: https://github.com/google/closure-library/blob/b312823ec5f84...
How did I find it? Well I wanted to implement it a while ago and I found it in Closure a library I was already using.
Why is the author so perplexed by it?
https://en.wikipedia.org/wiki/Natural_sort_order
It's simply natural sorting. I don't see what is so controversial about it.
I find the term "natural" to be inadequate here, there is nothing natural about sorting strings in this particular fashion compared to another. It should be given a more descriptive name, like "Number-aware alphabetic" or something like that as to actually give a hint about what it does.
> I miss the time when computers did what you told them to, instead of trying to read your mind.
This could be an illusion, or at least something difficult to evaluate; the operator is less likely to notice the situations when the computer successfully “reads their mind”.
Also, I guess new users (i.e. those unfamiliar with previous behavior) won’t care as much about wrong assumptions; they will only learn that one doesn’t need a leading zero.
I feel like it's not intelligence or lack there of, it's that implementing sort with a[i] < b[i] is the simplest way to do it. Putting 9 before 10 would require some kind of windowing since otherwise you'd be comparing 9 and 1, and of course 1 is smaller.
Earlier this year I submitted a bug to VSCode about sorting Playwright tests in the alphabetical-and-numerical order that VSCode favours, after Playwright told me it was a VSCode issue.
Some people rushed to fix this as I'd done some diving into the issue and presented the relevant information and code, so now VSCode's Playwright test list uses the same sorting mechanism as the rest of VSCode.
Sadly, the underlying Playwright does not receive that order from VSCode so it still actually runs sequentially-numbered tests in strict alphabetical order. :(
Historically I'd like to add that FileNames are just sequences of bytes that come with a few restrictions.
They don't even have an encoding you can use to sort something. Windows FileNames look like UTF-16, but they can be truncated. You can't convert them to UTF-8 and back without loss. (For that you need WTF-8)
Once you use random FileNames you'll start to notice...
Be glad you don't have to deal with non-ASCII characters: acute/grave/tilde/umlaut/diaresis/etc. accented characters, dotted vs. dotless 'i' (Turkish), barred i (an 'i' with a sort of dash through the middle, used in some languages for a sort of schwa-like vowel), thorn, not to mention non-Roman characters. And different languages sort the same characters differently, so you can't just pay attention to their Unicode values. (@cubefox has a post here pointing to the Unicode Consortium's doc about sorting)
I thought this was going to be a deep dive into what "alphabetical" means and how that's itself not a universal term between locales, what with so many different collation preferences.
That would likely have been a more useful article for the average developer. It is extremely hard to be aware of all the ways strings of different locales can defy our intuitions.
I don't know why I was surprised to learn this but there is a standard for alphabetical order. The NISO Guidelines for Alphabetical Arrangement of Letters and Sorting of Numerals and Other Symbols: https://www.niso.org/sites/default/files/2017-08/tr03.pdf
Numbers aren't part of "the alphabet", so sorting digits within a string by the numeric value makes just as much sense (and is what most users want most of the time) as treating digits as isolated characters (what OP wants).
As an aside, this is also the reason why ISO 8601 is the best date format – it sorts the same way whether you do it alphabetically or lexicographically.
The correct sorting algorithm is described here: https://manpages.debian.org/stretch/dpkg-dev/deb-version.5.e...
I was joking. Really I would sort file names lexicographically. But the way Debian sorts version numbers is interesting and seems like a good way of handling that particular situation.
Renaming things to make them queue correctly (I usually couldn't care less about visual sorting, I use a terminal) is by far my #1 task by LoC and frequency of occurence, and by far the most annoying. Metadata can be very helpful to obviate this issue, but it usually just leads to another problem where you now need metadata editors and readers in addition to the "user-visible" name metadata. It's frustrating.
If they truly want alphabetical order wouldn't it be that the 9 is "nine" and 1 is "one" and therefore nine would be before one?
Otherwise they mean lexicographical where they only look at the left most value and sort that.
You can't ask for something to be alphabetical and expect it to sort numerically.
>Well, apparently all these operating systems have decided that no, users are too dumb and they cannot possibly understand what alphabetical order means.
i really really hate this framing, and i see it far too often. no, the operating system developers did not make a value judgement about their users. they observed their users to find out what behaviour was expected, and they designed the behaviour of the system to match the behaviour that the majority of users expect.
and then you made an incorrect assumption about how the system works, and decided that your incorrect assumption means everybody else is dumb and you're the only smart person in this situation?
This is also the case with Excel. If numbers are stored in general formatted columns and you sort by A to Z, you'll get 1, 10, 11, 12, ..., 2, 20, 21 and so on
If you don't like the default natural sorting order, you can just change it in Dolphin. Settings > Configure Dolphin > View > Content Display > select anything other than "natural". You can even pick if you want case sensitivity or not.
The OS doesn't think you're too stupid to understand sorting, it relies on you being smart enough to figure out where the setting is located. In this case, four levels deep is probably too much to ask from users if they will write an entire blog post like this before finding the toggle.
> But 1 is smaller than 9, so file-10.txt should be first in alphabetical order. Everyone understands that, and soon people learn to put enough leading zeros if they want their files to stay sorted the way they like.
No. Not “everyone understands that”. Natural sort happens in real life and everyone understands that. Only those who understand ASCII — not the average user of graphical file managers — will deduce the reason for your definition of “alphabetical order”.
> Now that I know what the issue is, I can solve it by renaming the files with a consistent scheme.
Intensely ironic given the previous suggestion.
When I say Afferbeck Lauder [1] I mean "alphabetical order"
[1] https://en.m.wikipedia.org/wiki/Afferbeck_Lauder
Those are beautiful, thank you for posting them
Even if you are a file naming Einstein and you always zero pad your integers to exactly right length, we have this thing called the internet, where you can download other people's files.
Your OCD is not my OCD.
If only we could represent sort order by some structural form of decision logic which also embeds encoding a regular .. well.. expression matching a pattern..
I've encountered a tangential problem to this with package versioning on Linux distros. Thankfully it was not too hard to write an algorithm to compare versions (thanks AI!).
This was a fun thing to realize in my early days of programming in Delphi. I guess the author will soon realize why old systems name things ticket "00001" and so on.
This got me riled up to the point where I blew a gasket and just can't..
I agree with the article.
I liked computers better when everyone hated them, and for the reasons they hated them..
I agree that the base functionality of just sorting character by character can be occasionally useful. However I would really be interested in seeing why you believe this to be the correct choice for user-facing graphical file managers, as its evident problems with typical usage seem more salient compared to the edge cases as illustrated in the article.
Many commentors are positing the "clever" sort is what 99% of the user's want, but I really doubt it has been properly checked beyond the original PO's hunch and at most some user panel with pre-sampled data.
Most of these decisions are early default behaviors that stay there as long as users aren't clamoring for change, and TBH I can't imagine most users to have a self emerging strong opinion on how alphabetical sort should be working.
> not every single piece of software fucks up something as basic as string sorting
it is neither basic nor simple. Have you ever heard of UTF-8 and locales?
Here is an exercise for the curious reader: Pick any UTF-8 string "a", and another one "b", so that in increasing lexicographical order "a" sorts after "a+b" ("a" concatenated by "b"). ("a" > "a+b")
Have you tried sorting by created date instead? What is the point of relying on filename if that's not what you want.
I'm often yelling at the software on my computer: "STOP TRYING TO HELP ME!"
It's like having a toddler help you make a meal. It wants to be involved and recognized so badly. Meanwhile I'm starving and just want to get the food done as quickly as is possible and I'm constantly tripping over this little ball of misguided efforts.
Please. Stop trying to be smarter than me. You often can't, and when you get it wrong, you make it measurably worse. If you insist on doing this please give me the "Expert Mode" setting back so I can flatly disable ALL OF IT with one click.
Cursed alphabetical sorting of numbers:
8 5 4 9 1 7 6 3 2 0
Can you guess what it is?
They are sorted by their Unicode character names obviously
U+0038 DIGIT EIGHT
...
U+0030 DIGIT ZERO
Thanks, now I cannot unsee this. Thunar has this broken sort order too, and I've no idea how to make it sort file names with hash values 'properly' - by which I mean the same as `ls` which broadly speaking on my system is 0 to 9 then a-z case insensitive.
Instead I have an order of starting character that goes 1,4,5,7,9,2,3,7,8,9,4,6,1,2,.. etc etc which is utterly useless as a sort. I've always thought the sort was weird but couldn't quite figure out why (I usually sort by date descending). Another non-productive thing to figure out and fix.
i would argue that when you say "alphabetical order" you mean "lexicographic order"
Por qué no los dos?
Call lexicographic order "sort by name" as it's called now, and call dumb character-by-character sort "plain" or something like that. I'm not a designer, maybe there are more intuitive names, but come on. This isn't an intractable problem.
I rename all of my photos upon import using the created date, formatted as `YYYY-MM-DD kk:mm:ss`.
But it would frankly be great if most file browsers just let me sort photos based on metadata. But then I just end up in a dedicated photo browser, instead.
Our users loved when we added "natural" sort, which was pain in the ass in the db, but ultimately no big deal.
They absolutely do not care or understand the difference between alphabetical and numerical and natural, what they care about is 10 should not come before 9 in "Item 10" vs "Item 9".
Whatever pedantic argument you have that natural is not alphabetic will lose you sales, your users do not care and want numbers to make sense in sorting.
Dumb tools are more robust.
I have the same problem on Nemo. More specifically, I had made a small app that displayed files of a directory in alphabetical order, and then when I look at it in Nemo it isn't the same order because I didn't implement their smart algorithm.
I fail to see how this is a "problem"? You implemented a sorting mechanism that was useful to your application, while Nemo implemented another which as this thread demonstrates seems to be much more useful and intuitive for the average user. This is also of course not specific to Nemo, as no 'modern' file manager on Linux sorts filenames like it's 1980 and all you are able to feasibly do is step through the bytes.
Nemo isn't some random app a teenager made. It's the default file manager of a desktop OS. I expect it to cover more use cases.
I expect the same for other file managers on Linux. Although I must say I'm generally let down by Linux software.
> Of course, the user who named those files probably wants file-9.txt to come before file-10.txt. But 1 is smaller than 9, so file-10.txt should be first in alphabetical order. Everyone understands that, and soon people learn to put enough leading zeros if they want their files to stay sorted the way they like. Well, apparently all these operating systems have decided that no, users are too dumb and they cannot possibly understand what alphabetical order means. So when you ask them to sort your files alphabetically, they don’t. Instead, they decide that if some piece of the file name is a number, the real numerical value must be used.
I think there are many things wrong with your assessment of the situation.
First, where does it say in these file managers that they're sorting by alphabetical order? I see that you've specified that you want the files sorted by name, but I don't see that you've specified you want them sorted by name alphabetically. And what does "alphabetical sort" even mean when you're sorting characters which are not letters? What you mean is probably "lexicographical sort".
Second, you admit yourself that users probably want natural sort. Why would you expect these products to do the thing which they know users usually don't want by default? That just seems like bad design to me. They know users usually want natural sort, and you know users usually want natural sort, so why would you expect the default behaviour to be a lexicographical sort?
Third, just like how you've learned to work around the lack of natural sort in poorly designed products of years past by adding leading zeroes, you can just add trailing zeroes to get the lexicographical ordering that you want. Why do you seem to be implying that the latter is more user-hostile than the former? It doesn't make sense to me. A decision had to be made about what sort to use and they picked the one that most people want. Isn't that what we should be expecting in a product that caters to its users?
I see in other comments you've suggested that there should be a separate option for choosing between lexicographical sort and natural sort. But in the past, when lexicographical sort was the only option, why weren't you complaining about it being user-hostile to only have one option then? Why is it only when the default is something you're personally not used to that it warrants complaint? And where do we stop, do we have separate controls for every single sortable string field to determine whether it should be sorted lexicographically or naturally? Or just the name field? Don't you think that is going to lead to interface bloat?
Bro forgot that ls has an option to obtain the same sorting as the one he doesn't like.
ls has it because it solves a real user need.
Another problem which annoys me to no end is that most file managers and file selection boxes put directories before files.
This makes it hard to find the file that was most recently changed, for example. Which is an action that is extremely common. (In fact, why does my file manager not have a most-recently-used shortcut?)
In Total Commander, there is a function in the options to sort strict by numerical char code. It will sort those files correctly. Unfortunately, it will also sort "10.txt" before "2.txt".
---
In all file managers, I miss an API point where one can give a userdefined sorting function for the file and folder list.
What do you mean by "Unfortunately"? This appears to be the only correct conclusion from the algorithm you selected, you can't eat the cake and have it too.
Regarding your second point, that's not really what a graphical file manager is for, I think. At this point (likely even earlier) you would be better off just writing a simple script in the scripting language of your choice. (If going for something fancy, you could also implement a FUSE based on symlinks for the original files, where the filename is prepended by a sort key. This would work for every major file manager and you could manipulate the files in mostly the same way as before.)
Paragraph 1: I speak of a sorting method which splits the filename at the boundaries between numbers and non-numbers, and sorts by the parts of the resulting tuple, the numbers naturally (10 comes after 2) and the rest by numerical char code.
Paragraph 2: I am not sure what you mean here with writing a script. The graphical file manager shall sort its file list using the sorting function I hand over to it.
"That's not really what a graphical file manager is for". Says who? Every software which has a plugin system does that, why should a file manager not?
Sorting by name (collation) is waaay tricker than simply figuring out how to parse the numbers.
The International Components for Unicode library implements the Unicode Collation Algorithm, which depends on the language code and region of the locale, and looks up the quirks for each locale in the Common Locale Data Repository.
It's a much better idea to just use the standard ICU library or platform specific libraries (which are often build on ICU like JavaScript's Intl.Collator), instead of trying to hot dog it by rolling your own.
International Components for Unicode
https://en.wikipedia.org/wiki/International_Components_for_U...
>ICU provides the following services: Unicode text handling, full character properties, and character set conversions; Unicode regular expressions; full Unicode sets; character, word, and line boundaries; language-sensitive collation and searching; normalization, upper and lowercase conversion, and script transliterations; comprehensive locale data and resource bundle architecture via the Common Locale Data Repository (CLDR); multiple calendars and time zones; and rule-based formatting and parsing of dates, times, numbers, currencies, and messages.
Unicode Collation Algorithm
https://en.wikipedia.org/wiki/Unicode_collation_algorithm
>The Unicode collation algorithm (UCA) is an algorithm defined in Unicode Technical Report #10, which is a customizable method to produce binary keys from strings representing text in any writing system and language that can be represented with Unicode. These keys can then be efficiently compared byte by byte in order to collate or sort them according to the rules of the language, with options for ignoring case, accents, etc.[1]
>Unicode Technical Report #10 also specifies the Default Unicode Collation Element Table (DUCET). This data file specifies a default collation ordering. The DUCET is customizable for different languages,[1][2] and some such customizations can be found in the Unicode Common Locale Data Repository (CLDR).[3]
Common Locale Data Repository
https://en.wikipedia.org/wiki/Common_Locale_Data_Repository
>The Common Locale Data Repository (CLDR) is a project of the Unicode Consortium to provide locale data in XML format for use in computer applications. CLDR contains locale-specific information that an operating system will typically provide to applications. CLDR is written in the Locale Data Markup Language (LDML).
>Among the types of data that CLDR includes are the following:
Tricky collation examples:sv-SE (Swedish): å, ä, ö are separate letters at the end of the alphabet, not variants of a or o.
de-DE (German): ä, ö, ü may sort as ae, oe, ue in some contexts, or as distinct letters. ß sometimes sorts as ss.
tr-TR (Turkish): dotted i (i) and dotless ı are different letters; I sorts with ı, not with i.
es-ES (Spanish): traditionally ch and ll were treated as single letters with their own place in the alphabet.
cs-CZ (Czech): ch still counts as a unique letter, sorted after h.
da-DK / no-NO (Danish/Norwegian): ø comes after z.
is-IS (Icelandic): þ (“thorn”) is part of the alphabet, after z.
fr-FR (French): accents usually ignored in sorting, so é = e, but not always depending on collation settings.
el-GR (Modern Greek): tonos accents, final sigma ς vs. σ, etc.
nl-NL (Dutch): the digraph “ij” is often treated as a single letter, and capitalized as “IJ”. In dictionaries and phone books it often sorts as a single letter under “I”, but sometimes is listed after “X” depending on tradition.
Then you get into non-Latin languages like, Chinese, Japanese, and Korean collation, which gets hairy with radicals, kana order, and stroke count.
Also different locales have different ways of representing numbers, like switching between "," and "." as separators and decimal points.
ICU supports integer only "natural" numeric collation, so anything more complicated like versions, floating point, negative numbers, hex, thousands separators, fractions, roman numerals, etc, you'd have to build on top of ICU.
ICU doesn't support incomprehensible dead languages like Latin or Ancient Greek (it does however support French ;). It does support Roman numeral formatting, but not collation, which would be pretty tricky and ambiguous.
https://www.youtube.com/watch?v=sKWvTlLMB-Y
A nuanced but common example that ICU/UCA/CLDR helps with is a menu to select the current locale: you have to translate each language's name into the current locale, and also sort them in the current locale. On top of different collations they can also have totally different spellings, like "United States of America" is "Verenigde Staten van Amerika" in Dutch. This makes it challenging for users to find their own language when the locale is set wrong! You just can't win.
Not to mention emojis! Which comes first: The chicken or the egg? The taco or the poop?
Also, the Mac Finder switches ":" and "/" for historical reasons (HFS used to use ":" as a directory separator instead of "/"), so you can create a file name like "9/11 Attack" in the Finder, which actually gets the underlying Unix filename "9:11 Attack". Don't believe me? Rename a file in the Finder to include a slash, which you know is impossible to represent as a Unix file name. Then go "ls" the directory in the shell.
The Mac Finder weirdly collates "/" after "9" because under the hood it’s really storing it as ":", which sorts before "0". But it also has other punctuation collating inconsistencies, sorting "," and ";" and others after "0" too. Definitely not ASCII order -- I'm not sure what rules it uses, but it's different than "ls".
However, while it's generally true you can't have "/" in Unix file names, NFS used to trustingly let clients rename Unix files to include a "/" in their name, which the Gator Box AppleTalk/Ethernet gateway let you do with the Mac Finder (pre OS/X), which would silently corrupt your "dump" backups on the Unix NFS server, so you would not learn about it until you tried to retrieve your files and "restore" crashed.
https://news.ycombinator.com/item?id=31821646
>Another reason that NFS sucks: Anyone remember the Gator Box? It enabled you to trick NFS into putting slashes into the names of files and directories, which seemed to work at the time, but came back to totally fuck you later when you tried to restore a dump of your file system.
>The NFS protocol itself didn't disallow slashes in file names, so the NFS server would accept them without question from any client, silently corrupting the file system without any warning. Thanks, NFS!
> I don’t know when this became the norm, to be honest I have not used a normal graphical file manager in a long time.
If I remember correctly Windows 98 was sorting alphabetically. Then Windows XP strted to take numbers into consideration.
Heh. One of the bugs that once caused me to bang my head against the wall was caused by the Estonian language. Its alphabet has Z following S and Š. So the "foolproof" regexp to match the letters '[a-za-Z]' was misfiring for some entries.
By the way, there seems to be a "standard" way to sort strings:
> Unicode Technical Report #10 also specifies the Default Unicode Collation Element Table (DUCET). This data file specifies a default collation ordering.
https://en.wikipedia.org/wiki/Unicode_collation_algorithm
I assume this mainly aims at giving a reasonable compromise between the different dictionary and phone book sorting rules of various languages (and even locales), which should give reasonable results for most languages. I assume this also puts "Alice2" before "Alice10".
> I assume this also puts "Alice2" before "Alice10".
It doesn't (per https://www.unicode.org/reports/tr10/#Non-Goals):
Oh, that surprises me.
TLDR: the author found out about natural ordering [0], i.e. treating a sequence of digits as a number while sorting.
Usually preferable, except when not. Just like distinguishing between upper- and lowercase letters, and other misery.
[0] https://en.wikipedia.org/wiki/Natural_sort_order
> Usually preferable, except when not. Just like distinguishing between upper- and lowercase letters
When would it be preferable to distinguish between capital and lowercase letters?
Wiktionary does it religiously, and it always makes the entries worse. Want to know what something means in German? Well, that's on a separate page.
Do you want to look something up while using your phone? Don't be stupid; use a desktop that won't autocapitalize the first letter you type in.
When you're dealing with unix-y Git repositories for example.
If you mean more from a user perspective, it really depends. For registry keys for example, since they're interacted with programmatically for the most part, I was expecting them to be case-sensitive. They're case-insensitive though, so that was a bit of a whiplash.
[dead]
Ha! I had the exact same realization on MacOS. Extremely annoying behavior.
Digits and any other characters that are not A-Z and a-z should not get sorted. That's the true result of doing what you asked and not what you meant. Pedantic, but that's why we are here.
No.
this is an ID-10T PEBKAC ERR.
Not this keyboard not this chair, but the problem is with idiots between keyboards and chairs.
The author is not the ID10T it’s the other general users.
The author is intelligent enough to recognize that this is not alphabetical sort, but the term that they are looking for to describe the sort that they see in dolphin windows, google etc. is *lexical* sort, not alphabetical.
The engineering problem is ID10Tic not technical. How do you educate an illiterate public on what the difference between alphabetical and lexical sort is in practice?
You can’t, so you engineer around it and call lexical sort alphabetical.
This is one of the big ways that LLMs are going to change the game for UX. Your operating system is going to have some sort of 'butler', which knows all of your preferences, and the butler will go through the APIs and man files and informational dialogs of every app you use and auto-configure them.
Then if you want something to change, just ask the butler. If the app is open source and doesn't support the requested feature, the butler might even be able to code it up.
If I understand the article, the author wants magic :)
I take it to mean they want the system to know file_9.txt is less then file_10.txt.
I never saw that happen in any OS, so I do not know what he is referring to. Maybe whatever that old system was, it sorted by create time as opposed to file name.
So, the author can try and create "aisort" that will look at all file names and add leading zeros to the file numeric portion, sort, then remove the zeros added. That will probably as slow as s***t and use gobs pf memory, depending on the number of files.
No the author is saying the opposite. They expect file9.txt to be after file10.txt, but it many modern operating systems, it isn’t!
Really, I do not know how I missed that :) I read it a couple of times to and I still thought he wanted it the other way.
So my original comment kind of stands but in a opposite way.
I have never see file_9.txt sorted before file_10.txt, I just tested it on OpenBSD and I got this, which I have always seen:
$ ls|sort
file_1.txt
file_10.txt
file_12.txt
file_2.txt
file_20.txt
file_3.txt
file_9.txt
Author here - My surprise stems exactly from the fact that for the last few years I have exclusively managed my files via a the UNIX shell, which behaves in the classical way.
When I started using Linux as my daily driver after many years of Windows (but with familiarity with UNIX systems going way back), I knew it would be like that in the terminal, but it still took some adjustment. But actually, Nemo does the same "natural sort" thing, and also sorts case-insensitively.
>I take it to mean they want the system to know file_9.txt is less then file_10.txt.
The polar opposite, actually.
It’s not magic. It’s called natural sort and it doesn’t require gobs of memory. Most (all?) modern OS file managers will natural sort on file names.
That's not what the author says- they said that file managers actually are somehow sorting file-9.txt before file-10.txt, and it's breaking real alphabetical ordering.
i think it's the opposite, that they _want_ file_10.txt to come before file_9.txt by default, but that file explorers fail at this. it's rare that i want true alphabetical sort, but it's convenient for cases like tfa where alphabetical sort is more predictable if i have filenames that look like <letters>_<numbers-of-same-length>.txt.
Nope, you got it completely reversed.