The other major alternative to consider is RTF. I standardised on that about 10y ago, planning for a 30y horizon. It is a more complex format than Markdown, still text-based, but biased towards WYSIWYG presentation and editing, while Markdown is usually not WYSIWYG in the editor. Both formats suffer from a lack of standardisation, though Markdown seems to have more problems in practice - I've never had an issue caused by RTF incompatibility. Both are very widely supported. Both formats are very widely supported and it can reasonably be expected that this will continue.
I prefer RTF for two main reasons:
* I can't express simple formatting such as "make this text red" in Markdown. No, I don't mean "accentuate this text and leave the decision on how it looks to someone else", I really do mean "make this text red". I do a lot of public speaking, and I want to keep to certain conventions which are easy to read fast.
* Most of the time I am writing text, not reading a version after it goes through a formatter, so I prefer to see it formatted on screen. That's really a limitation on Markdown editors, but it's almost universal so for my point of view, it counts.
I remember, a long time ago, having to try and parse text out from RTF documents and I would rather have every internal organ pecked out by sparrows than try and deal with that abomination again.
> still text-based
The point about Markdown is that you don't need a complex parser[0] to be able to interpret the files - a simple human can read a Markdown file and get the gist of what is going on. RTF has a whole mess of control strings and codes going on that get in the way of a simple visual understanding.
> I can't express simple formatting such as "make this text red" in Markdown.
You can add raw HTML to Markdown which would accomplish this (at the expense of moving away from the "simple plain text", obvs.) There's a bunch of Markdown parsers with extensions for this kind of thing though (from what I can see, they're all not entirely "simple" either which is a shame.)
RTF is extremely easy to parse if you assume it isn't using an ancient code page. This is a pretty safe assumption since almost no modern software even supports all the code pages in the RTF standard. Word is far more likely store Arabic in default Windows-1252 with \u_____-specified code points than to use code page 708 or something.
Is it as easy as Markdown? No. But it should take about an afternoon for a halfway competent programmer to make an rtf2txt utility from scratch that correctly handles > 90% of the RTF files you're likely to encounter in practice.
Interesting. Why not HTML? Browsers have native basic WYSIWYG editing built in, and almost every screen we look at is nowadays HTML, including the code editor.
On a `<div contenteditable=true> element, calling document.execCommand('bold') will make selected text bold in WYSIWYG mode. See https://jsfiddle.net/z0umpb3x/12/ for the concept.
The main idea here is that I don't want to know any syntax like RTF, nor I want to use any tools.
> Why not HTML? Browsers have native basic WYSIWYG editing built in
Can you point me to a page in Firefox that I can use offline to WYSIWYG edit a hypertext document without needing to use the developer console to edit raw HTML?
I believe it's technically possible, but I don't know if there's a tool.
Btw, genuinely asking, what's your offline scenario? Long loong time ago, I used TiddlyWiki for a brief time period for my temp offline scenarios. Single local HTML, can edit, but not WYSIWYG.
I like this approach and I’m thinking I might agree. I’ve been starting with Markdown, but find myself going to HTML. At some point I end up wanting some specific thing that isn’t supported by Markdown so I need to leave Markdown for embedding a video or adding a tiny bit of interactivity (like show/hide), so sticking with HTML throughout is nice.
HTML would be a third alternative, yes. However historically it has been less stable in its definition, and also less stable in round-trip editing by a WYSIWYG editor - it has never been been an aim for HTML to be friendly to editing at that level. In contrast there are several editors which handle RTF without disrupting it for other editors. I use TextEdit (on the Mac), but of course it it is a common word processor format. RTFD is far less well supported, but I don't normally need images. I'm not familiar with any WYSIWYG editors for HTML which represent embedded images as though they were part of the same file.
> The main idea here is that I don't want to know any syntax like RTF
Nor do I. What I want is a file format which is long-term viable, and which I can edit in a WYSIWYG editor. The underlying file format being text-based useful as a recovery mechanism if RTF ever becomes unsupported, but it's not intended that one would edit it manually
> , nor I want to use any tools.
While I do not want to edit a file format manually.
HTML isn't "horrible" at WYSIWYG, it just isn't WYSIWYG. Your comment is like saying "JSON is horrible at for loops"—it's just a misunderstanding of the tool.
HTML WYSIWYG editors are not going to create worse tag soup than WYSIWYG editors for other formats. If you care about how the precise HTML cod then WYSIWYG is the wrong tool.
Makes sense. I was thinking that browsers have native support for WYSIWYG editing and most browsers/apps are Chromium based; so how the browser handles HTML formatting should be stable for a single person scenario even across apps.
RTF isn’t a better option than MD, plaintext is. We start with the premise that we are talking about plain text, as in plain text, texts. But pretty soon we are into colours, then maybe images and videos, and then what all not.
But MD still is closest to plaintext because in most cases MD (or a similar scheme), even when not rendered, can be read easily in its plaintext formatting.
And again, my requirement includes colours - also bold, underline, some font support, left indent etc. It's not sufficient that a file format be long-lived - it also has to encode the information that I need. MD is poor at that, plain text doesn't even attempt it.
Honestly plaintext has a pretty bad record as a format, with all of the UTF-16, Latin-2, Shift JIS, and so on. I’d suggest paper, or for true archival storage, parchment and stone.
Honestly paper, parchment and stone have a pretty bad record as a format. They either degrade or even if preserved can become unreadable (e.g. Linear B) without a surviving linguistic community. Even when the text survives and is rediscovered, translations produced with century-scale gaps often lose subtext or connotation that would still have registered across a narrower gap.
I'd suggest an unbroken chain of oral transmission...
Much more slowly and gracefully than any digital medium we have concocted so far (save for core rope memory, maybe).
> even if preserved can become unreadable (e.g. Linear B) without a surviving linguistic community [...] translations produced with century-scale gaps often lose subtext
This pertains to the message and not the support; also, I'll take missing subtext over missing text any day of the week, thank you very much.
Which is why you use one of the many editors that understand RTF, not edit it directly. The only time you would need to touch the underlying format is if RTF becomes unsupported, which is unlikely.
Markdown is an attempt to codify formatting conventions that were already long in use for newsgroups and email. It’s meant to be human readable first and only incidentally convertible to actual markup. If semantics are a mess and nobody can agree on how you lay out a table, I’m kind of ok with that. Plain text is the presentation format, and if you want precise conversion, drop some raw HTML or LaTeX in there. LaTeX especially has a long history of being dropped into emails between people who needed to talk about math.
> Markdown is an attempt to codify formatting conventions that were already long in use for newsgroups and email.
Markdown was never really about codifying anything existing. Sure, it took a lot of guidance from existing conventions, but it also invented quite a bit, ignored quite a bit, and compromised quite a bit because of wanting to mostly sit atop HTML.
The most notable deviation from custom is its link syntax, which is simply bad. Where any delimiter was necessary, the longstanding custom was to delimit with angle brackets; and Markdown did allow <https://example.com/>, but its text-with-hyperlink syntax of [text](href) is highly confusing, leads to frequent errors, and makes the huge mistake of using as its delimiters characters that are valid in URLs. (In current spec terms, parentheses are in the set of URL code points <https://url.spec.whatwg.org/#url-code-points>.) This has led to all sorts of trouble. My guess is that angle brackets weren’t used because of the potential for confusion with HTML tags, though there would have been no real parser ambiguity, and autolink syntax kinda messed that argument up anyway.
Then its image syntax of data:image/s3,"s3://crabby-images/098d5/098d532c5709257600e77d6a74db8a671c0acee8" alt="alt text", that’s just plain nonsense.
I’d say that reStructuredText (from a little earlier) matched existing conventions slightly better than Markdown, in general. And it was much more sane, as a language, especially when you wanted to extend it.
> It’s meant to be human readable first and only incidentally convertible to actual markup.
If it were incidental, you wouldn’t use it—just go informal and write what suits you, it will be nicer.
No, the only reason for Markdown is so that you can write HTML with a nicer syntax. It’s not incidental at all. Sure, that syntax is intended for human-readability and -authoring, but it’s essential to Markdown that it is actual markup, poorly-defined though it be as a family of lightweight markup languages <https://en.wikipedia.org/wiki/Lightweight_markup_language>.
> Plain text is the presentation format
Do you know how many README.md files there are on GitHub that are essentially HTML soup? It’s a bit sad, really. It doesn’t invalidate the intention of what you’re saying, but I would say that Markdown is noticeably less about plain text being the presentation format than it was twenty years ago.
Can you expand on why you think reading plain markdown is hostile? For the vast majority of my Obsidian vault, I could open the markdown file in Notepad and it would be just about as readable as it is in the editor. Of course, you lose the visual effect of the styling, but it's still perfectly legible.
You don't attempt to parse it. You don't try to understand what **foo* bar* means. You just assume things next to * are bullets or emphasized and things next to # are headings.
When you want to ensure that things are rendered in a particular way, Markdown might not be the best solution—it's for content, not style. In many cases (not all!), style doesn't matter much. It's better to give each device (desktop app, ebook reader, printer) flexibility of display according to user needs and preferences (dark mode, contrast, font size, fonts for dyslexia or personal taste).
HTML gives more control - from plaintext through basic formatting to building pretty much anything. This freedom can be a slippery slope - first comes red text, then custom fonts, then some SVG showing the font, and before you know it, you're building an app.
(That's one thing I love about Hacker News - they kept it simple by not allowing any formatting in posts.)
PDFs are great when you need an exact visual presentation. And yes, I keep plenty of things in PDFs. However, it is not a format I convert to too often—usually, I prefer the flexibility to reflow text for different screen sizes. YMMV.
As for RTF - I'm not sure about its current niche. It's a Microsoft proprietary format with TeX-like syntax, so for custom applications you might need to write your own parser rather than using standard tools like XML parsers. XML feels much cleaner to work with compared to the jungle of other formats we used to live in.
That said, if it works for you, great! I'm just curious why, in this case, you prefer them to a subset of HTML.
A drawback of markup languages, including TeX, is that they intermingle content and style instructions.
> RTF ... is a Microsoft proprietary format
RTF originated with Microsoft, but it's widely supported by scores of word processors and editors. I generally use TextEdit on a Mac, which supports plain text, RTF and RTFD directly. Hence you don't write your own parser: you use the editors already available. It has been around since 1987 with a high degree of stability in the core functionality, so it's a reasonable expectation that those editors will be around long-term, but if they disappear, you can get the text out.
> It's better to give each device (desktop app, ebook reader, printer) flexibility of display
For some purposes, perhaps. Not for mine: if I set text to be red, it should be red on the screen, and red on the paper. Not "emphasised" - red.
PDF is not intended as an editable format, so not relevant here.
HTML is also worth considering. It's not quite as nice to write in as Markdown, but it's still pretty reasonable to hand write (or even read if it comes to it).
It's not sarcasm. HTML is and always has been a decent format for hand-authoring documents. It's not as nice to use as markdown, but it does support more advanced use cases like coloured text.
And if you want an "archival" format that will stand the test of time it's pretty good (so long as your text encoding is readable, HTML will be).
I agree with you, but I will say that Markdown (or at least, all the major "standards" for it) support interleaving HTML both inline and at the block level. Unless you are writing a very bombastic document or want to avoid a rendering step, I'd argue that you'd be best served by just writing CommonMark with embedded HTML.
I'd argue that, at that point, you might as well write HTML. I find it easier to work with one format at a time rather than two intermixed and a typical Markdown document is barely more readable than its equivalent HTML document, while the latter gives you a lot more flexibility and semantic accuracy.
When I created the SSG I use for my blog, I tested a bunch of different markup formats for the article content, but in the end I settled on plain-old raw HTML. Over the years, I've needed to render so many different specific visual elements in the blog entries that if I wasn't actually using raw HTML, I'd probably be using raw HTML snippets all over the place anyway. For another medium I might choose a different markup format, but when it comes to writing content for the web, with HTML I know exactly what I'm getting every time.
I don’t quite follow the first bullet point. If you are doing public speaking, your client is rendering the slides, right? Won’t you be the one interpreting “accentuate?” (So you can make it fit whatever convention you want).
Every once in a while you encounter an opinion on the Internet that you so deeply disagree that it's not even clear how you would even start to counter it.
Depending on what one means by “Markdown” and how completely you want to support RTF, I might argue RTF is way simpler than Markdown. Certainly it’ll be more consistent.
Markdown is far better suited for human authoring; RTF is a data format, not a markup language. And you can cut a lot of corners with Markdown and it mostly won’t bite you. But if you want to do things properly, Markdown gets rather complicated, whereas RTF stays comparatively simple after the initial parsing/serialising cost, especially if you exclude newer features like XML markup.
Oh, I’ve definitely edited RTF manually, when I wanted fine control, or to strip out certain formatting but not all. But it’s definitely not intended for manual editing; and it’s hopeless if you want to go beyond ASCII, as you can’t (from memory) just use UTF-8 or UTF-16 encoding, you need to escape non-codepage characters as \uNUMBER?, where NUMBER is the signed decimal representation of a UTF-16 code unit, which is probably the most absurd escape representation I’ve ever encountered. (The character after the number, here and normally ? these many years, represents the fallback character to use if Unicode is not supported. \uc0 can disable that so it’d be just \uNUMBER. It’s cases like that that really show how RTF was designed as an internal file format for things like Word, back before they fully supported Unicode. RTF has not aged very well as a file format.)
> Cannot read it through SSH. Need something else than your terminal.
Well, for reading you can always use pandoc with stdout output and pipe it to a pager. There are also fancier options like https://github.com/Orange-OpenSource/pandoc-terminal-writer (disclaimer: I contributed a small PR a long time ago)
The part about seeing the results immediately can be done easily ie via plugins in text editors like Notepad++. You have left pane with markdown text and right pane with rendered result, updated after each keystroke.
I've tried 2 different plugins, they are a bit finnicky re formatting (I do a lot of bullet points in my various todo lists and a small thing can break whole block of them) and output looks a bit different, but I still prefer them.
Heck, I often prefer reading markdown itself these days just for myself since the structure is already there and visible, I've learned to see those formatting characters as already sort of rendered formatting so not even looking into rendered pane that much.
Just a note that the most common Markdown flavor (Commonmark) doesn't actually support frontmatter. The author is using presumably Obisidian-flavored Markdown (which is a mixture of Commonmark, GH-flavored Markdown, and Latex).
For file-tagging, I would consider TMSU [0] instead of writing bespoke tools. (ideally we would just use xattrs, but the world isn't ready for that)
> Markdown flavor (Commonmark) doesn't actually support frontmatter.
That leads to mixing presentation logic (meta data, ToC) and content. When typesetting the Markdown, the ToC can be derived from headings and meta data should be isolated to avoid duplication. The following videos demonstrate some of the advantages to this approach:
I like Commonmark but I wish it would have been more opinionated. They chose to allow two ways to do everything[0].
Making * always be used for bold and _ always for italicizing is so much clearer, and some Markdown flavors (notably WhatsApp) do this. So you only have to do *haha* or _haha_, which also makes italic-bold more _*intuitive*_.
Similarly they should have gone with one style of headings, probably with #.
This frees up more visual clarity. Because you are no longer using *** for bold-italic, you can use that for lines, instead of both --- and ***.
This then further frees --- up to be used for tables.
Although I imagine there's a decent subset of people that uses the alternate style of doing headings === and the 'normal' way of doing lines ---, which would have killed adoption.
And good luck convincing people to adopt a new variant at this point. "Commonermark"? "Peasantmark"? "Rabblemark" actually sounds decent.
Edit: actually, having checked the discourse around it a bit more, Commonmark wasn't created as "one Markdown to rule them all", but rather as "Venn diagram markdown with the most overlap".
This is probably to support potential ambiguities and intraword emphasis e.g. underscore is a common pseudo-space so doesn't support intraword use but * does e.g.
is_not_italic
this*is*italic.
I recently implemented a commonmark parser for emphasis. Holy shit it's painful. I regret doing it but it became a battle I refused to surrender.
It's way harder than I expected because of the combination of the ambiguity of * and ** in multi-symbol runs which support infinite nesting even of the same type of emphasis. A given delimiter run could be many different permutations of plain text `*`, `em` and `strong` depending on context of other delimiter runs that might open and close sections along side other context like punctuation, intraword-ness, flanking and whether sums of runs can be be factored by three!
I never expected "**" could be nested emphasis instead of bold so interpretation requires multiple passes to break down delimiter runs and match them up e.g.
***this* and that* -> *<em><em>this</em> and that </em>
> This is probably to support potential ambiguities and intraword emphasis e.g. underscore is a common pseudo-space so doesn't support intraword use but * does e.g.
is_not_italic
this*is*italic.
That seems like a legacy spec mistake they had to adhere to. I'd expect
This is what I would have chosen too as it's natural for programmer sensibilities.
I can see it as a choice from the "plain text first" philosophy i.e. the things you typically write in plain text should not need escaping. My intuition pump is that you can copy-paste an email into .md without edits or surprising rendering.
As such, it's doomed to never satisfy everyone. Personally I never use intraword emphasis and I typically only have underscores in non-code names i.e. `this_is_normally_code`.
If you want strictness, use a linter or a pretty-printer that follows your preferred style. Adopting an opinionated parser means you can't lint or pretty-print input from those with different opinions (I do not like underscores for emphasis), and thus somewhat goes against the goals of TFA here:
> Markdown files are essentially plaintext with some extra syntax for common elements like sections, bullet points, and links. The format deliberately avoids precise control over display details like font selection. Following the rule of least power, I consider this limitation a feature.
One of my biggest ongoing frustrations has been MDX - a sort of markdown-and-JSX mixture whose spec is now in its third release and which has made very little effort to maintain compatibility with either CommonMark or itself. It is fairly strict and fairly elegant, and moving to a new version requires rewriting all previously-written documents to eliminate no-longer-supported syntax and re-training writers. Both of those things are miserable tasks; it has absolutely killed any tolerance I might have had for a stricter parser.
I'm surprised they didn't make a conversion tool to do MDX(old) -> AST -> MDX(new). The library support is there, but it doesn't look like anyone has created a tool to do it.
OP here. I'm pretty cavalier about which Markdown features I use. I employ them differently in various contexts - in plain Markdown files and on my blog, for instance.
But primarily, I treat them as plaintext files. If I needed to remove frontmatter at some point, it would be a simple script. For any feature specific to a particular Markdown flavor, preprocessing, or system - I expect it to work only as plain text elsewhere.
Also, thanks for sharing about TSMU! I was thinking about similar issues—for example, a photo can simultaneously be "from 2022," "from a conference," and "emotionally important." This doesn't work well with typical nested filesystems, where we need to decide on a single folder hierarchy rather than allowing us to filter based on need (as we can in SQL).
GitHub-flavored Markdown is so popular because you can really easy inline them. You don't have to worry about storing them, linking them correctly, and you can even paste to the Markdown field.
There is no elegant solution like this in actual Markdown.
For readmes, thats fine I guess. But I miss all those features when I'm writing proper blog posts, articles and documentation.
There are various hacky workarounds. But as soon as you start using bespoke markdown extensions, you're locked out of 95% of the markdown tooling out there. And everything feels so janky.
I'm looking forward to Typst's HTML output getting more mature. Typst is the only typesetting tool I've ever used that is both enjoyable to use and powerful enough for the kind of documents I want to write. It manages that by being a full on programming language. You can define variables and write custom functions for reused blocks. And there is an ecosystem of 3rd party typst packages. For a paper I wrote recently, my benchmarking tool spat the results into a JSON file. My document loaded that JSON data directly, and used the benchmarking results to populate charts and tables in the paper. It was crazy cool.
Does it work at all? Yeah. Does it work in my markdown editor? Probably not. Does it work in my markdown renderer? I don't know. Which version of Mermaid does it work with? Probably a different version of mermaid on every platform. Can I save my mermaid diagram to a file and link it instead of inlining the mermaid diagram inline? Who knows. Flip a coin everywhere mermaid is supported.
I tried pushing a markdown renderer to the limits once - only to find out that the markdown renderer I was using doesn't correctly implement commonmark, and my markdown file breaks with every other markdown renderer I've tried it with. To say nothing of the custom extensions I tried to use.
At this stage I'd rather keep my markdown files simple, and use something better for real documents. Something like typst.
I think they're mixing the GH web ui with the syntax. You can paste an image right into the editor and it does a really good job of inserting it right where you need to. It is really good UX that I miss when editing markdown locally. Obsidian also does a decent job, but not quite as smooth.
So I also do use gitlab quite a bit, but not as much recently. I went to compare. Gitlab actually does have a similiar ux experience, though I'd give the Github one just a bit of an edge. It looks like the key difference is that github converts a pasted image to an html image tag, while gitlab uses markdown with the width/height brackets that the end.
Honestly, I think using an html image tag is the right way to go. I type in markdown all the time, and I have no problem making links. But markdown image syntax I have to double check each time or let the editor figure it out. HTML image tags, I find easier to remember and read than a markdown one. (But maybe that's because I learned HTML before markdown).
Also, as an old person, I will tell you that 1) I got my first personal computer in 1979 and have been trying to keep my bon mots archived ever since. I have tried a million things and have learned one key lesson: It's not really worth it.
I literally have a footlocker filled with old disk drives (remember, since 1979!) and I have never, ever gone back more than a few years, hell, more than a year.
Now that disks are big, I keep a lot of old stuff. I have, eg, screenshots dating back to 2015. Email before then. And so so much more.
I have never gone back more than a few years.
I will continue to archive because I must but, Old Person to Young People... Don't put too much effort into long term availability. It's not a good investment.
Similar perspective, but I'd offer a minor tweak. Just as before gmail people spent a lot of time "managing" their email. Gmail allowed us to stop bothering and just use search to find stuff among the now-messy volume of email. It works pretty well.
Similarly, I'd say save everything, but spend no time on organizing it, relying on search and ai/future technology to find what you want from among the mess.
Similar thought after many years of trying to be "organized". Search is what matters, make sure the tool or format of storage allows for easy searching.
> Also, as an old person, I will tell you that 1) I got my first personal computer in 1979 and have been trying to keep my bon mots archived ever since.
> I literally have a footlocker filled with old disk drives
I had a private mailing list for 15 years and had emails squirreled away across several hard drives. I archived them all to my Mac, under a directory under /, and tossed the disks. Was too broke to have another disk for backups.
Then Apple decided in a upgrade to trash everything not-Apple under /. Archives gone. No warning. Really amateur move by them. Grrr.
A triage system is essential to determining your archival strategy! We can produce information faster than we can produce information storage systems, so we need to be discerning! Random gibberish is less valuable than a screenshot you took in 2015, and that screenshot from a decade ago is probably less valuable than your tax returns or your treatise on the meaning of life! If it's worth keeping, it's worth putting some time into every few years to make sure it's copied somewhere.
You might reconsider your stance. As LLMs get increasingly more powerful at making sense of all kinds of data, these old archive can suddenly become incredibly useful.
Org was designed for being a PKM-like system. Markdown was designed for README.md. It doesn't mean that you cant use either for the other task, but Obsidian for example had to modify markdown significantly, to add support for tags, math, etc.
It's just a shame that org format works really well only in emacs.
The killer app for markdown would be a collaborative editor that displays the raw markdown and formatted markdown side-by-side and makes both sides editable. Tech people can use `#` and `*` on one side for formatting, product people can use normal text-editor buttons like "header1", "italics", etc.
I built this in college, but the code is lost. It was a week or so of hacking. I believe in you.
IIRC the trick was to get a pipeline for Markdown to HTML, render it into a WYSIWYG editor, then convert the HTML to an AST, and walk that to generate the markdown. I had to “format” both the markdown and html on each render (bidirectional round trip render) because parsing/gen wasn’t whitespace stable.
It's not collaborative, but this is what I love about Typora[0]. Click into a styled area and the styling becomes visible. Click out, and you just see the final styling.
HackMD already does this. It has a dual-pane view for raw markdown and formatted output, supports WYSIWYG editing, and allows real-time collaboration. Surprised no one mentioned it.
- [HackMD: Your Collaborative Markdown Workspace for Knowledge Sharing](https://hackmd.io/)
You can do that in IntelliJ. If there's a way to control a tab on a browser you could do that too. When I was writing my thesis, I would have `inotifywait` running on one side and when it detected the file had changed it would run the entire `pdflatex` + `bibtex` pipeline the 6 times or whatever it needed and Evince would hot-reload so I had a live preview. I'm sure a browser can do the same with some command.
Isn't the point of Markdown that you don't need to 'see what you are generating', you can just read it? I write Markdown every day, but I do it in a plaintext editor (with syntax highlighting). I have a keyboard shortcut to view a preview in my browser, but I don't see a great need to be viewing that preview all the time.
Edit: I was wondering how to enable this mode because it wasn't in my qownnotes ,Here's how I found it , go to the help section , click find action , and search preview and click on show note preview panel.
Now the caveat is that if you want to see it blitted , you have to save the file once to see it automatically show in the other side.
Maybe this can also be definitely automated / I feel like there was some feature that did that for that as well or atleast its very non trivial.
Edit 2 : okay so I just realized that qownnotes also ships with autosave feature which saves and thus also shows what you type in reader mode in like a 0.5 second delay. And I think there is also a way to decrease / increase the autosave part as well
Dude , I didn't realize it , but qownnotes is so good!
It feels like a long term solution would be to use a markdown that is both easy to write (not RTF or XHTML), but has a defined grammar in some standard format (ex: EBNF). Most platform/languages will have a parser and so you can whip up a "renderer" or converter trivially at any point.
The only markup I'm finding with a grammar is MediaWiki (sort of..)
MediaWiki has one of the worst syntaxes and formalisations out there.
I've been trying to render wikipedia pages on and off for more than 10 years and there is no independent parser covering the whole syntax and magic behaviour.
Markdown is already fragmented, that would just be introducing a new fragment, not a standard [insert that comic that everyone posts any time someone proposes a new standard].
The long-term solution is having whatever markdown grammar you want and converting it to a standard AST. Then anyone can create their own transformations of that AST to render whatever document they want, including a new markdown document potentially in a different grammar.
In my defence, the comment I was replying to mentioned "renderers" and "converters". Furthermore IMHO, any text editor is a Markdown reader. If you want it formatted as is it were "markup" then might I suggest converting to e.g. PDF using Pandoc and then using one of the many capable viewers.
I see your point, and maybe it's a matter of preference, but I really do use my text editor for reading Markdown. I wouldn't do the same for a Word doc or HTML, without at least running it through a convertor first.
100% agree. I've been using markdown for a few years after moving away from proprietary note taking apps. Although this has led to me developing my own short hand for many things in my notes. And have been looking at a way to integrate a to-do list with my notes with some Python scripts.
So while my notes may rely on some personal scripts to get there most value out of them, I strongly value that they are still plain text and I can always move them into a new workflow if I need to.
Love Quarto. I write all my notes, presentations, blog posts, memos, etc in .qmd files. For non-technical stuff I use Obsidian to author (there is an extension which tells Obsidian to treat .qmd as ordinary markdown - ie ignoring the additional Quarto frontmatter and so on), then for everything else I use VS Code with the Quarto extension and just render out to the display format I need. I really appreciate that it’s built on Pandoc and it means I can just use one format and one set of tooling for everything.
I love markdown and use it for all my notes, however it really needs a native way to underline. I have been converting some older books and lectures to markdown and underline is used all the time.
Markdown is plaintext so you decide what it means. I personally write *italic* and **bold**, so I can use _underline_. Most Markdown to HTML converters would make the last example into italic, but you can customize many of them.
Commonmark doesn't even mention "bold", "italic", and "underline". It just says "emphasis" and "strong emphasis". You can style it however you want.
Markdown isn’t really meant to be a universal markup format. Its primary goal is to document conventions of annotating plain text which keep the plaintext semi-consistent and readable.
So the purpose of , * etc is purely emphasis. If you need to represent something specific (bold, italic etc) then that’s a job for the Markdown parser (or embedded HTML etc). The result of the parser (HTML, etc) will be less human readable, but actually able to specify formatting.
I agree that CommonMark could be extended, but I think the focus should be on semantic* relevance rather than markup specification.
I love the Fountain spec for exactly this reason. I primarily began using it since it’s Markdown for screenwriting, but it has bold, underline, and italics along with the usual markdown stuff like comments etc. I find it to be by far the best way to write plaintext anything other than code. It’s also a bit more opinionated than Markdown which I highly prefer.
it might depend on what you want to do with the underline. Does it just indicate some kind of emphasis?
Could you use the convention in your documents that "_" is the underline delimiter? I know that the default is to render it as italic/emphasis but that is just a decision at rendering time. The semantics of emphasize/underline could easily overlap.
Of course if you want 3 levels of emphasis with bold, italic, and underline, then yes you need to look elsewhere.
Markdown isn't really a formatting tool. it is a way to structure text in the minimal way that a person would interpret it and a machine could render it.
>converting some older books and lectures...If anyone has a good solution I'm all ears.
I don't know if this helps you, but you said "older": in the 20th century world of typewriters--which had no italics--underlining was used as a substitute for italics. Transforming underlines to italics or going the other way was considered normal. You wouldn't use both in the same document.
There's notional underlining, which in typewritten documents is effectively the equivalent of italic, and there is typographical underlining, where "underline" means "there is a line under this element and/or text".
Both matter, and although Markdown flavours handle the notional case well, they fall down at this (and several other) typographical capabilities. Expressing text in a particular colour (or greyshade) is another example. It's possible to achieve this in practice through embedded HTML and/or CSS tags, or through augmented Markdown variants (Pandoc's Markdown can achieve some things CommonMark or DaringFireball Markdown cannot).
Ultimately though I find I need to switch to a more capable and consistent text-layout engine, usually LaTeX in my case.
Though for even quite large and modestly complex works, Markdown is either sufficient entirely or is useful in getting the work off the ground before switching to a more powerful option.
i said "typewriter", and there is only one kind of underline on a typewriter.
converting old typewritten notes, they may contain typewriter underlining, and it may represent italics. Markdown would be entirely sufficient to handle that.
The typewriter is distinct and often intermediate writing device standing between the markedly free-form though also variable handwriting and the much more standardised, though fairly developed, capabilities of typeset documents.
Unlike handwriting, typewriting uniform (both in type and spacing), and markedly faster.
Unlike printing, typewriting is limited (generally a single typeface, no variability in face, size, or styling (e.g., roman, bold, italic), and requires further guidance to define specifically what result is desired where a typewritten work is not a document's final form.
It's worth noting that print itself differs from handwriting: when we write letters, forms and sizes vary, different writers often differ markedly in their own scripts, trained copyists may achieve a high level of standardisation, but that itself requires significant training and is achievable only by a limited number of artisans,[1] and letterforms themselves are not discrete but individually instanced each time they are created. With the advent of moveable-type printing,[2] letterforms became fixed, and with digital typesetting and computer fonts, each discrete shape or language-specific forms, say, the Roman A, Greek Α (alpha), and Cyrillic А (Azǔ/Азъ), are represented by distinct code points, but are nearly or entirely indistinguishable when rendered on-screen or in print. Further, over the history of both handwriting and typesetting, conventions have emerged for the textual representation of language, including spacing of words (versus scripto continuo), punctuation, paragraphs, page numbering, division of books into chapters, sections, parts, subsections, etc., of lists, tables, indices, (foot|end|side)notes, (parenthesis), drop-caps, figure captions, cataloguing, etc., etc. All of those were inventions and conventions not inherent to language, writing, printing, document preparation, or archival and retrieval themselves. There's still considerable variation between different print language representations, e.g., many texts lack equivalents of italic, bold, or even upper/lower case letterform distinctions.
Typewriting itself occupies an interesting space, being a primary endpoint for some types of documents (correspondence, forms, and the like) and an intermediate form for others, most notably published articles and books. Given that typewriting has both capabilities and limitations which aren't present in typeset documents (whether moveable type or digital), it's not possible to draw a distinct correspondence between what a typewriter outputs and how that might be represented in a derived document. Yes, typewriters can generate underlines, but that might be represented in typeset print as italic, bold, underline, or something else entirely. In practice, editors proofing marks were inserted (as handwritten notations) on a typed manuscript to indicate the preferred presentation, generally following the author's intent and/or the publisher's own house style conventions. See: <https://en.wikipedia.org/wiki/List_of_proofreader%27s_marks>.
________________________________
Notes:
1. An anecdote which sticks with me: among the 1001 Arabian Nights stories is one in which a character makes specific references to the not only his literacy and scribal capabilities, but the types of scripts he could produce. That is, this was a specific and valued skill of that age worth noting, even in a general-audience work.
2. As distinguished from earlier monoblock printing in which a whole work was engraved on a wood block or metal plate, typified by early Pamphilus, seu de Amore from which we have the word pamphlet, see: <https://www.etymonline.com/word/pamphlet>. Such monoblock prints were more like a photocopied handwritten letter, in which variations in individual letterforms are replicated, than they are standardised print obtained from moveable type or, more recently and familiarly, computer-based digital typesetting or Web documents, in which fonts are standardised and each given character is identical to all others matching that style.
Markdown is often, and was originally intended for, HTML generation. But that's not the only target which can be achieved, particularly with such tools as Pandoc, a document format interchange Swiss Army knife.
Relying on format-specific tags imposes stronger constraints on endpoints and/or increases complexity of your document build process.
Inline HTML is part of the standard Markdown syntax, not a complication. If your tool doesn't support HTML it doesn't support Markdown. The format can be so simple in the first place because it allows this escape hatch for anything non-trivial.
And tools like Pandoc can handle that just fine.
My point is that Markdown conversion tools, notably Pandoc, whilst they will incorporate inline HTML when generating HTML endpoints will not convert such inlined code to other endpoints, e.g., LaTeX, DocBook, OpenDocument, etc.
If you want those outputs to faithfully represent formatting, you either need to juggle multiple inline directives for each desired output format, or find some universal Markdown-based mechanism for achieving the same result.
I'd like to make clear that I'm familiar with Markdown; the fact that its original design intent was streamlining HTML generation; that inline "native" code is a feature, not a but, but all the same a rather fraught one; and that actual practice has moved far beyond Markdown merely being used to generate HTML, least of all my own such practice.
I've discussed this situation previously on HN (ironically from the PoV of using LaTeX embeds within Markdown creating problems when attempting to generate other-than-LaTeX outputs), see: <https://news.ycombinator.com/item?id=29690056> (2021).
I wholeheartedly agree with this post. I also keep my notes in Markdown, I also have plenty of Python scripting around them, including automatic publishing of my website.
I use FSNotes today on macOS and iOS. Both apps are open source, both use well-structured .textbundle directories that separate Markdown content from JSON metadata and binary attachments. Synchronization happens through Git. It's a very powerful combination.
Ironically, I wrote a blog post some 8 years ago about this very subject. That blog post is now offline.
I appreciate the mention of FSNotes (and in turn textbundle). Somehow, despite trying tons of note taking apps and formats, I don't remember ever coming across mention of this format specifically.
My biggest beef with org mode and all of the markdown apps I've tried is the asset management problem. For me screenshots are almost as important as the text part of the note, and are usually strongly tied to a single note. I've taken to using apple notes at work just because it "solves" that well enough, but I'd really prefer to work in markdown/plain text (except for the images).
I’ve been self hosting linkding[0] and it has archiving capabilities. Saves in html not markdown but that’s basically the same thing. It’s been very useful and then I back the folder up to R2 for free. I enjoy knowing that if I find something I want to remember it won’t go away. Plus it works great for recipe sites because I don’t have to deal with ads.
Obsidian is the killer app for this. I spent a month converting around 3 years of security notes to markdown and now use obsidian to search/archive everything.
I've been doing this recently with every URL I've bookmarked over the last 15 years or so since I signed up for pinboard.in. http://spider.cloud has been really nice for crawling sites and saving the results as markdown. I plan on expanding it to transcribing youtube videos I've saved, github repos I've starred, HN posts, etc.
Ultimately I'm trying to index my "window" to the web as embedded content in a vector store. Not sure exactly what I'm going to do with it yet but I imagine it will be a component of some kind of personal agent system I can use to reference old info and help as a writing tool or as an "idea generator" of some kind. I'll likely end up not using most of it but you never know.
I've scraped about 10k markdown files which has created a ~10gb chromadb instance so far. Eventually I'll probably create separate collections based on domain, and filter down items that I care about more.
When it comes to web archiving, I've found that Markdown has some real limitations. Sure, it's great for basic text, but it struggles with things like embedded content and non-standard layouts. Try archiving a Twitter thread or an app-style webpage in Markdown, and you'll see what I mean. It just doesn't capture the full picture.
That's why I've come to prefer formats like webarchive, mhtml, or single HTML files for archiving. They're incredibly faithful to the original content - you get almost perfect rendering of the original page, complete with styling and layout. Plus, they can capture stuff behind paywalls or on logged-in pages, which is a huge plus.
The real challenge, though, isn't just about saving the content. It's about making that saved content useful. These archive formats are great for preservation, but they can quickly become a mess of unorganized files that are hard to search through or make sense of.
I think the key is finding ways to organize and interact with these archives more effectively. Things like full-text search across all your saved pages, the ability to add notes or highlights directly on the archived content, and smart tagging systems could go a long way. And it'd be really powerful if we could integrate these archives with other knowledge management tools we use.
I develop a tool called HamsterBase that seems to address a lot of these issues we've been discussing. t's a local-first app. That means all your data stays on your own device - no need to worry about your personal archives being stored on someone else's servers. There's no sign-up or registration required, which is refreshing in today's cloud-centric world.
> [Markdown] struggles with things like embedded content and non-standard layouts.
I don't share that experience. I typeset all these documents using Markdown with pandoc's div extension, transformed into XHTML, and then passed to ConTeXt:
From XHTML, the document is transformed into TeX statements, which opens a world of possibilities. In the following video, custom styling is applied to nested contents:
Alternatives for authoring PDFs include LaTeX or similar markup languages, or GUI-based tools.
For many works, Markdown is more than sufficient for producing book-length texts (I've done this numerous times myself, either authoring my own works or transcribing/modifying books for improved access/readability). Markdown's benefit is that it is extraordinarily lightweight, and removes overhead from the authoring process.
Even where one ultimately chooses to migrate from Markdown to some more capable authoring format, Markdown remains useful for creating the original rough form of the work. Complex elements (figures, formulae, tables, etc.) can be indicated and, after document conversion from, say Markdown to LaTeX, fleshed out in full.
With tools such as Pandoc (see my earlier comments on it), it's trivially possible to create multiple outputs (I usually refer to these as "endpoints") of a document. I've used Makefiles to drive this process, such that I write source in Markdown and generate partial or full HTML documents,[1] other LWMLs,[2] PDF, ePub, straight ASCII/UTF-8/Unicode text, word-processing formats, etc., as I want. The set of Markdown + Pandoc makes this trivial in ways that, say, LaTeX alone isn't entirely suited.[3]
It's of course possible to use another LWML as the source format. Markdown has its limitations, but is most widely known and implemented, and limitations workarounds are typically reasonable.
________________________________
Notes:
1. A partial HTML doc may be useful for dropping into a larger document, and doesn't require global HTML elements such as the <html>, <head>, <body> tags, or others such as <nav> or <aside> in most cases.
2. Lightweight markup languages such as bbCode, AsciiDoc, RST, MediaWiki, OrgMode, etc., etc., see: <https://en.wikipedia.org/wiki/Lightweight_markup_language>. Useful when inserting the document into systems based on these formats.
I've landed on a workflow that I like a lot, and have shown to several people on my team. I use Google Drive for Desktop, which maps the G:\ drive to Google Drive. From there, I use VS Code for Markdown editing.
Google Docs now supports Markdown files, so if I need to convert the Markdown file to Word or PDF, I just open it in Docs and download it in the format I need. (Pandoc also works for this, as the author mentions). Converting HTML to Markdown can also be done in Docs: copy and paste the web page text into Google Docs, and download the file as Markdown.
For mobile, I use the DriveSync app to download my notes (Markdown) folder to my phone. Then I use Obsidian to open and edit the files.
My pain is that I couldn't find a decent md viewer for Windows: free, fast, simple, no distractions. Imagine notepad. I have to open my md files with VSCode or Notepad++ (nasty view).
EXACTLY. Open-source projects are rife with Markdown, but why? There are almost no VIEWERS for it. It's irritating as shit.
After years of looking, I finally ended up with Marked (for Mac). When you ask for a Markdown reader in any forum, you get nothing but suggestions for EDITORS, which happen to have a preview pane. But what is it "previewing," when everybody's just reading these things as plain text with the formatting codes embedded in them?
The underlying purpose of org-mode is to manage this issue (the text part). It doesn't solve it, instead it is a tool for managing the steadily increasing archive organizational complexity within an ever evolving timeline. You reconfigure your archive's implicit schema well now you're in a world of heavy editing. That's life. If you don't have a solid backup strategy, you are going to lose stuff. That's also life. Big binary blobs are a different, equally important problem.
Sure, keep your archive text in markdown (which one? a dumb person asks). But I'd recommend managing it with org-mode, it doesn't really care what format your text is in.
(Yeah I saw the footnote mentioning org-mode but that reads to me that org-mode's reference there is entirely about the markup flavor.)
Yeah, org-mode and by extension Emacs really help in this regard. Now that Emacs has been ported to Android I expect its usefulness to only increase.
Looking back I can't believe I considered just bookmarking a link enough to save it long-term. Sure, I lost a lot of cruft but there were some gems that in retrospect I'd have liked to still reference or look at today. Eh, hindsight is 20/20 as the saying goes.
I'm not surprised this post opens with a link to /r/DataHoarder. Hot take ... I understand the sentiment that you can't trust content on the web to be there forever, but there is also the other side of the argument which is: compulsively saving data is a waste of time and it introduces a cognitive overhead that you'd be better off without.
Idk, I think if it's worth saving it's worth saving and the only person who can determine if it's "worth it" is me.
I agree that some people have an obsession where they save data that isn't worth it, but r/DataHoarder is a great place with a lot of information on building and maintaining large data systems for hobbyists, regardless of what you actually store.
When I find a blog post/article I'm interested in, I save it to my laptop with the SingleFile extension and I take quick notes as well I write my thoughts about it in org-mode. It has a very low cognitive threshold and I can always read it back in the browser. I'm find if not all the outbound links are still working, I'd just like to read back sometimes.
This is almost exactly what I do too, though I also throw the link in the wayback machine so it's easier to share with others should the source go down (and to be courteous to any like-minded fellows who also wanted to see the content, but unfortunately came too late)
I'd disagree a bit there. I do something similar, saving interesting webpages, and it's really really nice being able to quickly search for something that I halfway remember a few months down the line.
I'm not saving everything, and it just gets stuffed unedited into a folder that I can search. Not too much in the cognitive overhead department.
If I read something and then remember it 5 years later because it becomes relevant, I want to be able to find it. It's not even that I will look at it, I just want to have the option if I want to.
> I understand the sentiment that you can't trust content on the web to be there forever
The thing is, people say this, and I am sure for some amount of content it's true. However I eventually realised I have never had a single issue if required, in retrieving literally any piece of software or digital content after the fact from somewhere on the internet.
It's pretty much why I care so little about what happens to my steam library when Gaben kicks it. If I get the urge to replay something in twenty years that I paid $3 dollars for and its suddenly gone, i'll just go find it elsewhere.
Once I had gotten pretty much everything I wanted from running a large scale storage system (largely to learn the in's and outs of linux/general storage concepts) I pretty much just gave it up. Its a lot of money to hold onto things that at this point, I pretty much know i'll always be able to recover elsewhere. I'd rather someone else pay the electricity/drive cost for me.
There is no comparable alternative to the Internet Archive though. They've gotten involved in several lawsuits and their future is far from guaranteed. They're an incredibly important organization, but I think it's too important of a project to be limited to one organization, or even one country or region of the earth. A solar flare could destroy a lot of history.
I don't know that the economics of having multiple Internet Archive-like organizations is currently feasible (I imagine getting funding for one of them is hard enough), but even a partial offline mirror hosted someplace else would be nice. Maybe to save space they could take the oldest version of a page, the newest, and the midmost version timewise, discarding all other versions. They could also heavily compress images, video and audio to save storage space (would increase processing costs, but if willing to throw out quality, could compress quickly and still save a bunch of space. E.g. downscale all videos to 480p and use veryfast preset and CRF 28 with ffmpeg. Even 240p is a lot better than nothing. A pixelated form of history is better than no history.)
I have the opposite and sadly accelerating experience.
Information is removed or altered constantly and usually,
I cannot find anything on the Internet Archive either.
For whatever reason WayBackMachine, for my use cases,
is nearly always blank.
But I look for semi-obscure publications and statements
from (nation)states and organizations.
An archivist saves everything, because it's impossible to anticipate what future historians will need or want; they are working from a context you cannot access. You can winnow down a large dataset to it's relevant subset, but you can't study what wasn't preserved.
I agree, but I also think that data hoarding is similar to regular hoarding. All of this content and information seems like it could be useful, invaluable even. It's a problem in a world where we have excess to sort through all that information and only focus on what's important right now.
For me, everything swirls in a lovely vortex towards org-mode.
- Literate Programming, tangel/weave
- Export to DocX, PDF, HTML
- Org-Roam
- Time Management.
Markdown is a wonderful format (I use it all the time) but it's very narrow and I don't think it's appropriate for storing general 'things we might publish'. You lose a lot of semantics just replacing html with markdown. For a general purpose markup language, I don't think we can beat XML.
I agree, if the purpose is archival (versus manually reading it with your eyeballs) then you will want a format that (1) can capture information in a somewhat self-documenting way and (2) is in a form that can be easily parsed and converted into a newer format.
Keep it in the format appropriate to the information. If just the text is important, Markdown is probably fine. If the structure is important, keep it in HTML. If the layout is important, PDF. You wouldn't store a Gutenberg bible in Markdown, would you?
(Don't answer that - there's always one asshole who would)
Mediawiki. Let's balance durability against functionality.
MW gets you a massively scalable doc store that does not need much room. Most MW instances are MySQL/MariaDB backed and the schema etc is very well described.
Keep it plain text for "notes" but a MW will be easily discoverable for quite some time from now.
The biggest problem with Markdown is the baffling lack of plain VIEWERS. Not editors with a preview pane, but straight-up viewers that render Markdown for reading.
There are very, very few. I use Marked 2, for Mac. I don't even remember if I ever found another one. It's irritating as hell, because pretty much every open-source project's read-me files are in Markdown. Why, when there is no viewer anywhere near as ubiquitous as those for PDF... despite Markdown being much simpler and better understood?
Makefile-driven development. Run "make pdf" as needed (looped in a shell one-liner if you prefer, or driven by an event watcher). A decent PDF viewer will either reload the document automatically on change or can be readily reloaded. The Suckless PDF viewer zathura is among the former, I've also used, variously, xpdf (slightly grungy these days but an old reliable) or MacOS's Viewer app.
This lets you work on the doc in a terminal window and have the (reasonably constantly updated) formatted output in a PDF viewer.
Short documents will render virtually instantly. I've not had long renders until documents extend to at least several chapters worth of text if not book-length, and even then it's a matter of a few seconds in most cases. Highly-formatted texts may of course take longer.
Agreed. I use QLMarkdown [0] for preview in Finder and this markdown-viewer extension [1] for in-browser preview. But a standalone, native app would be pretty nice too.
Thanks. Try Marked; it is exactly what I wanted. It's $14, but I decided to reward whoever did what apparently no one else (including me) can be bothered to do.
I have tried MarkText, which is yet another editor with a viewer but it's free.
I use Markdown Viewer, in Chrome: I'd bet there are multiple equivalents in Firefox and Safari. Well. I don't know what Safari's extension universe is like but it seems likely.
I actually use a VS Code plugin for this called Dendron. It is in the same vein as Obsidian or Notion, markdown based, and just runs in VSC. Very handy and since plain text works wonderfully in a git repository.
The trick with burning optical media is the disks themselves can physically fail with time. I have a huge archive of various burned media from the early 00's and a number of them have developed literal holes in the material over the years. If these holes hit data tracks, the files on those tracks are lost. If you're burning to optical media, you should probably be checking them regularly for degradation.
In my experience hard drives, USB sticks fail and regular hard drives fail.
It has been many years since I have had any involvement but backup tapes
probably have issues as well, but the rapid production of new tapes
and new formats is an issue already
I dont have any data to evaluate the best choice is
SSD drives?
No matter what technology is picked, at some point to preserve the
data it needs to be migrated to whatever comes down the line.
Honestly, so far if you can afford the up-front investment for the space you need, and the ongoing power costs, a NAS with a RAID array (or similar redundancy scheme) that can tolerate more than one drive failure at a time is probably the best long term archival storage. Spinning rust disks in my experience rarely completely fail without warning so you can usually catch and replace failing media before data loss occurs. Additionally if you don't, I've found that recovering data from failed HDD is also usually "easier" and "cheaper" for most values of both compared to other media storage (admittedly with no experience with recovering tape media)
Beware for if you continue down this road you will end up sitting in class taking notes in markdown… yes I did do this… I am afraid I am beyond salvation
Can relate to that sentiment. What I'm still looking for is a simple solution that lets me use simple local files (eg plaintext/markdown; csv or single-page HTML would also be fine) as a backend for a web app (with login, obviously). Basically, I want to have something like a todo.txt that lives on my machine (in the folder that syncs to my cloud storage) but that I can also edit when I'm on my phone. Like using Google sheets as a backend but with a local file.
I just access my markdown files from Obsidian through nextcloud. When I'm on my phone I just use a simple markdown editor, when I'm on my PC I use Obsidian.
You don't have to use any plugins.
You can put your obsidian vault anywhere you like, e.g. in a folder that is synched by nextcloud.
I use a git repo for this, which works fine also on mobile.
With the AI coding tools getting better each day, I'm starting to think why I would spend any time researching what's out there for what I want, instead of just using an AI coding agent to put something together in 10 mins, and forget about it.
It's getting easier and faster to have AI build something that solves my exact problem. Maybe not perfect, but OK.
I'm sure it'd be super quick to build it with the help of an LLM once I know what setup I want. I actually used ChatGPT once for ideation, I'd need look it up again, but what I remember none of the proposed solutions were convincing.
I save everything interesting. I have a data folder with letters a-z in it.
Something interesting might be saved in HTML or PDF under data/a/ai/programming
Folders have a problem because the same thing could be saved under data/p/programming/ai
Indeed, I also realized that bookmarks are worthless on the long run. When choosing a note taking / knowledge management app, the main decision point was if it has a Firefox extension that can capture a web page into markdown and automatically save into my notes.
I used to use Joplin, lately switched to Obsidian. Both offer this functionality.
What drove you to switch to Obsidian? I'm considering it myself and have been playing around with Obsidian the past couple of days after about 4 years of Joplin, 2 of which with a self hosted Joplin Server.
I'm tired of basic features being missing and extensions breaking because they're no longer maintained, and basic features like linking between notes while writing a note not being built in.
Joplin worked great when I spent 8h+ daily on my laptop (computer with big screen and physical keyboard).
During my long sabbatical, I wanted to take notes on my phone, a LOT. Joplin sucks at that, clumsy, non-user friendly android client.
Tried obsidian (first on mobile) and it is superb. I had to install a couple extensons (S3 sync, "Ink" for drawing with a pen), and it just works. It's so good, I sometimes even edit tables on my phone. With Joplin, note taking on my phone was just dumping thoughts in random formats to it and later fixing it on my desktop.
Like Joplin, Obsidian also has a Firefox extension to capture a web page I to markdown.
So after a couple days of trial, I realized that all the features Joplin has, obsidian has it too, with a much better (and snappier) UX on both my Linux desktop and Android. The only thing I wish for of it was Open Source. But oh well, I'm not dogmatic about that anymore
@OP super inspiring. I'm working on a universal capture SDK, a bit like rewind.ai that would make it easy to grab information from screen and then store as Markdown etc. Have you ever wished for something like that?
My favorite is WikiCreole, with (subset of) HTML as a close second. MD is alright, but too restrictive as a general purpose format for knowledge bases and such.
> Even self-hosting isn't foolproof - your content can vanish when you forget to pay for hosting
I know what they mean - "running applications that you maintain and deploy yourself, on hardware/platforms that you don't" - but this is strange, to my eyes. If it's running on someone else's hardware (whatever it is), then it's not self-*hosted*, surely? It's self-owned, but not self-hosted?
When it comes to web archiving, I've found that Markdown has some real limitations. Sure, it's great for basic text, but it struggles with things like embedded content and non-standard layouts. Try archiving a Twitter thread or an app-style webpage in Markdown, and you'll see what I mean. It just doesn't capture the full picture.
That's why I've come to prefer formats like webarchive, mhtml, or single HTML files for archiving. They're incredibly faithful to the original content - you get almost perfect rendering of the original page, complete with styling and layout. Plus, they can capture stuff behind paywalls or on logged-in pages, which is a huge plus.
The real challenge, though, isn't just about saving the content. It's about making that saved content useful. These archive formats are great for preservation, but they can quickly become a mess of unorganized files that are hard to search through or make sense of.
I think the key is finding ways to organize and interact with these archives more effectively. Things like full-text search across all your saved pages, the ability to add notes or highlights directly on the archived content, and smart tagging systems could go a long way. And it'd be really powerful if we could integrate these archives with other knowledge management tools we use.
It's an interesting problem space, and I think there's a lot of room for innovation in how we approach personal web archiving and knowledge management.
But there's one reason I won't be using it as my main driver for markdown files: I can't open files that are not in a vault. I have markdown files everywhere on my drive. And I don't want to make the entire drive a vault (for various reasons).
Obsidian configurable as...
1) my default file handler for markdown files
2) capable of opening and saving markdown files in any location on my PC
...would be sweet. (From my research, it can't do these currently.)
AsciiDoc's fine. So is reStructuredText. In some ways they're both a lot better than Markdown, even though I think MD's surely easier to learn and use. But the one clear advantage MD has over the others is its ubiquity. If a tool works with formatted text, it almost certainly supports MD. It might also support the others, but if so, that's just a bonus.
I don’t like Markdown because I don’t want to remember a syntax. Most normal people I know have no idea what Markdown even is. The idea that I can’t see my formatting when I’m writing is annoying. What’s the point? It’s like MD is writing code and to “see” the document, you have to run it. In other words what you see is not what you get — you only see what you get when “previewing.”
> The format deliberately avoids precise control over display details like font selection4. Following the rule of least power, I consider this limitation a feature. For contrast, consider PDF - a format so powerful that it can run Doom.
Just pick a more relevant format for contrast to see that this is no feature! It's not like PDF is the only alternative
Markdown is great... But you know what else is great? OPML. We need more tooling around OPML. It's not being used nearly as much as it should be for Personal Knowledge Management.
I've used or built more personal knowledge/task/project management tools than I care to list over the years, and adopted various methods along the way. I've ended up in a place where I know what I need day to day: A place to dump my ideas, plans, reflections, and tasks, along with methods of processing and accessing all this data. It's hard to compete with plain text files, a notebook, and structured daily/weekly rituals that process these notes into actionable tasks, meeting agendas, and project docs. It's not that time consuming, it's super effective, and most importantly, it's infinitely and freely customizable because instead of software, you just have checklists and processes to manually follow. You can execute GTD without touching a computer: https://gettingthingsdone.com/wp-content/uploads/2014/10/Wee...
I can get by just fine with that system, but a handful of months back I started wanting software again. Reminders, task wrangling, workflows around taking meeting notes, taking and processing transcripts of talking through ideas, automated daily and weekly checkins with summaries, project work logs, managing lists of things to talk about with people, the list goes on....
Same reasons I have always reached for software, and the same reasons I wrote my own system a few times over. But this time I had some new thoughts:
- I want this to have a chance at being my last system. For that, I must be able to read/edit the data without special software. I settled on committing to building software that interfaces with folders of Markdown files exclusively. I could use Obsidian to cover any gaps and get work done immediately–I don't need my software to do it all right away.
- I want to own as much of my recorded activity/thoughts as possible, so I can drop it into new AI models, giving them a ton of context about me and what I'm up to, and avoid getting vendor locked to OpenAI.
- I want ubiquitous access to the system, which means it's gotta be easily used from a phone.
7k LOC later and I've got a Telegram bot with a plugin architecture and a pile of plugins that implement everything I've described and more. The plugin arch means there's a defined interface and every new piece of functionality never ends up with more than 1k LOC in a file. My objective was to structure the project specifically so I could avoid the pitfalls of AI generated code as projects get large. Everything isolated with well defined integration points.
I chose Telegram because they have a great API, supporting custom keyboards for quick actions, audio input for taking voice memos that my system transcribes, and reaching out to me with reminders/requests on whatever device I'm on.
The result is thousands of messages that have translated into a nicely organized Obsidian vault. Couldn't be happier and think there's a chance I'll live with this thing for the foreseeable future–and I can always swap out the interface away from Telegram, build a proper frontend, or drop it altogether and be left with my Markdown files.
If anyone is interested I'd be happy to share what I've got. Just my private project that I'm reaping a lot of benefit from.
Wow, this actually sounds quite neat. I'm already using markdown and being able to make my notes more interactive and useful via chat-like interface with automations would be great. Especially as I want to use AI systems on top to make the accumulated knowledge as useful as possible. Please share more
I wish. If you live in any country that uses more than ASCII, then certainly not since forever. I mean, just for my language there were 7 different encodings (according to Wikipedia, possibly more) before Unicode era. When you want to read these it's solvable problem, but still it is extra work to deal with it. Now that we have UTF-8 as de-facto standard, it is much better, but there are still problems. Like when you use Japanese and it gets displayed as Chinese (same characters are different glyphs depending on language).
The other major alternative to consider is RTF. I standardised on that about 10y ago, planning for a 30y horizon. It is a more complex format than Markdown, still text-based, but biased towards WYSIWYG presentation and editing, while Markdown is usually not WYSIWYG in the editor. Both formats suffer from a lack of standardisation, though Markdown seems to have more problems in practice - I've never had an issue caused by RTF incompatibility. Both are very widely supported. Both formats are very widely supported and it can reasonably be expected that this will continue.
I prefer RTF for two main reasons:
* I can't express simple formatting such as "make this text red" in Markdown. No, I don't mean "accentuate this text and leave the decision on how it looks to someone else", I really do mean "make this text red". I do a lot of public speaking, and I want to keep to certain conventions which are easy to read fast.
* Most of the time I am writing text, not reading a version after it goes through a formatter, so I prefer to see it formatted on screen. That's really a limitation on Markdown editors, but it's almost universal so for my point of view, it counts.
I remember, a long time ago, having to try and parse text out from RTF documents and I would rather have every internal organ pecked out by sparrows than try and deal with that abomination again.
> still text-based
The point about Markdown is that you don't need a complex parser[0] to be able to interpret the files - a simple human can read a Markdown file and get the gist of what is going on. RTF has a whole mess of control strings and codes going on that get in the way of a simple visual understanding.
> I can't express simple formatting such as "make this text red" in Markdown.
You can add raw HTML to Markdown which would accomplish this (at the expense of moving away from the "simple plain text", obvs.) There's a bunch of Markdown parsers with extensions for this kind of thing though (from what I can see, they're all not entirely "simple" either which is a shame.)
RTF is extremely easy to parse if you assume it isn't using an ancient code page. This is a pretty safe assumption since almost no modern software even supports all the code pages in the RTF standard. Word is far more likely store Arabic in default Windows-1252 with \u_____-specified code points than to use code page 708 or something.
Is it as easy as Markdown? No. But it should take about an afternoon for a halfway competent programmer to make an rtf2txt utility from scratch that correctly handles > 90% of the RTF files you're likely to encounter in practice.
Interesting. Why not HTML? Browsers have native basic WYSIWYG editing built in, and almost every screen we look at is nowadays HTML, including the code editor.
On a `<div contenteditable=true> element, calling document.execCommand('bold') will make selected text bold in WYSIWYG mode. See https://jsfiddle.net/z0umpb3x/12/ for the concept.
The main idea here is that I don't want to know any syntax like RTF, nor I want to use any tools.
> Why not HTML? Browsers have native basic WYSIWYG editing built in
Can you point me to a page in Firefox that I can use offline to WYSIWYG edit a hypertext document without needing to use the developer console to edit raw HTML?
You can use this bookmarklet to create new pages:
then save them as HTML-files.You can use another bookmarklet to turn existing pages or files into editable pages.
You can't easily take text and make it red that way, only edit the existing content.
You can add another bookmarklet for that.
I believe it's technically possible, but I don't know if there's a tool.
Btw, genuinely asking, what's your offline scenario? Long loong time ago, I used TiddlyWiki for a brief time period for my temp offline scenarios. Single local HTML, can edit, but not WYSIWYG.
> what's your offline scenario?
Keep my data out of corporate hands without needing to run a website myself (eg, a WYSIWYG site) while not having to use some different app.
If you want to do that offline, then you shouldn't use a browser but a HTML editor. There are plenty https://en.m.wikipedia.org/wiki/List_of_HTML_editors
I like this approach and I’m thinking I might agree. I’ve been starting with Markdown, but find myself going to HTML. At some point I end up wanting some specific thing that isn’t supported by Markdown so I need to leave Markdown for embedding a video or adding a tiny bit of interactivity (like show/hide), so sticking with HTML throughout is nice.
HTML would be a third alternative, yes. However historically it has been less stable in its definition, and also less stable in round-trip editing by a WYSIWYG editor - it has never been been an aim for HTML to be friendly to editing at that level. In contrast there are several editors which handle RTF without disrupting it for other editors. I use TextEdit (on the Mac), but of course it it is a common word processor format. RTFD is far less well supported, but I don't normally need images. I'm not familiar with any WYSIWYG editors for HTML which represent embedded images as though they were part of the same file.
> The main idea here is that I don't want to know any syntax like RTF
Nor do I. What I want is a file format which is long-term viable, and which I can edit in a WYSIWYG editor. The underlying file format being text-based useful as a recovery mechanism if RTF ever becomes unsupported, but it's not intended that one would edit it manually
> , nor I want to use any tools.
While I do not want to edit a file format manually.
I'm not the parent commenter but I'm going to guess because HTML is horrible at WYSIWYG.
If your editing tool changes or if your switch editors, they will all botch your HTML anywhere you make edits.
RTF is basically "it just works," very much like "Microsoft Word Light."
HTML isn't "horrible" at WYSIWYG, it just isn't WYSIWYG. Your comment is like saying "JSON is horrible at for loops"—it's just a misunderstanding of the tool.
Your complaint seems to be about the editors, not HTML.
Unless you're willing to write your own editor, that's not really a practical distinction
HTML WYSIWYG editors are not going to create worse tag soup than WYSIWYG editors for other formats. If you care about how the precise HTML cod then WYSIWYG is the wrong tool.
Makes sense. I was thinking that browsers have native support for WYSIWYG editing and most browsers/apps are Chromium based; so how the browser handles HTML formatting should be stable for a single person scenario even across apps.
RTF isn’t a better option than MD, plaintext is. We start with the premise that we are talking about plain text, as in plain text, texts. But pretty soon we are into colours, then maybe images and videos, and then what all not.
But MD still is closest to plaintext because in most cases MD (or a similar scheme), even when not rendered, can be read easily in its plaintext formatting.
And again, my requirement includes colours - also bold, underline, some font support, left indent etc. It's not sufficient that a file format be long-lived - it also has to encode the information that I need. MD is poor at that, plain text doesn't even attempt it.
Honestly plaintext has a pretty bad record as a format, with all of the UTF-16, Latin-2, Shift JIS, and so on. I’d suggest paper, or for true archival storage, parchment and stone.
That's like saying paper is a bad record as a format, with all of punch cards, barcodese, and so on.
Honestly paper, parchment and stone have a pretty bad record as a format. They either degrade or even if preserved can become unreadable (e.g. Linear B) without a surviving linguistic community. Even when the text survives and is rediscovered, translations produced with century-scale gaps often lose subtext or connotation that would still have registered across a narrower gap.
I'd suggest an unbroken chain of oral transmission...
> They either degrade
Much more slowly and gracefully than any digital medium we have concocted so far (save for core rope memory, maybe).
> even if preserved can become unreadable (e.g. Linear B) without a surviving linguistic community [...] translations produced with century-scale gaps often lose subtext
This pertains to the message and not the support; also, I'll take missing subtext over missing text any day of the week, thank you very much.
If you’re going oral tradition, you might as well use repeatedly-compressed JPEG instead. Same quality but less space used by the storage devices.
I’d suggest stone tablets
I’m glad you found something that works for you, but it’s barely a text format from my vantage point. Parsing it is downright user hostile.
Which is why you use one of the many editors that understand RTF, not edit it directly. The only time you would need to touch the underlying format is if RTF becomes unsupported, which is unlikely.
Parsing Markdown is downright everything-hostile. Computers and humans alike.
Markdown is an attempt to codify formatting conventions that were already long in use for newsgroups and email. It’s meant to be human readable first and only incidentally convertible to actual markup. If semantics are a mess and nobody can agree on how you lay out a table, I’m kind of ok with that. Plain text is the presentation format, and if you want precise conversion, drop some raw HTML or LaTeX in there. LaTeX especially has a long history of being dropped into emails between people who needed to talk about math.
> Markdown is an attempt to codify formatting conventions that were already long in use for newsgroups and email.
Markdown was never really about codifying anything existing. Sure, it took a lot of guidance from existing conventions, but it also invented quite a bit, ignored quite a bit, and compromised quite a bit because of wanting to mostly sit atop HTML.
The most notable deviation from custom is its link syntax, which is simply bad. Where any delimiter was necessary, the longstanding custom was to delimit with angle brackets; and Markdown did allow <https://example.com/>, but its text-with-hyperlink syntax of [text](href) is highly confusing, leads to frequent errors, and makes the huge mistake of using as its delimiters characters that are valid in URLs. (In current spec terms, parentheses are in the set of URL code points <https://url.spec.whatwg.org/#url-code-points>.) This has led to all sorts of trouble. My guess is that angle brackets weren’t used because of the potential for confusion with HTML tags, though there would have been no real parser ambiguity, and autolink syntax kinda messed that argument up anyway.
Then its image syntax of data:image/s3,"s3://crabby-images/098d5/098d532c5709257600e77d6a74db8a671c0acee8" alt="alt text", that’s just plain nonsense.
I’d say that reStructuredText (from a little earlier) matched existing conventions slightly better than Markdown, in general. And it was much more sane, as a language, especially when you wanted to extend it.
> It’s meant to be human readable first and only incidentally convertible to actual markup.
If it were incidental, you wouldn’t use it—just go informal and write what suits you, it will be nicer.
No, the only reason for Markdown is so that you can write HTML with a nicer syntax. It’s not incidental at all. Sure, that syntax is intended for human-readability and -authoring, but it’s essential to Markdown that it is actual markup, poorly-defined though it be as a family of lightweight markup languages <https://en.wikipedia.org/wiki/Lightweight_markup_language>.
> Plain text is the presentation format
Do you know how many README.md files there are on GitHub that are essentially HTML soup? It’s a bit sad, really. It doesn’t invalidate the intention of what you’re saying, but I would say that Markdown is noticeably less about plain text being the presentation format than it was twenty years ago.
Can you expand on why you think reading plain markdown is hostile? For the vast majority of my Obsidian vault, I could open the markdown file in Notepad and it would be just about as readable as it is in the editor. Of course, you lose the visual effect of the styling, but it's still perfectly legible.
You don't attempt to parse it. You don't try to understand what **foo* bar* means. You just assume things next to * are bullets or emphasized and things next to # are headings.
When you want to ensure that things are rendered in a particular way, Markdown might not be the best solution—it's for content, not style. In many cases (not all!), style doesn't matter much. It's better to give each device (desktop app, ebook reader, printer) flexibility of display according to user needs and preferences (dark mode, contrast, font size, fonts for dyslexia or personal taste).
HTML gives more control - from plaintext through basic formatting to building pretty much anything. This freedom can be a slippery slope - first comes red text, then custom fonts, then some SVG showing the font, and before you know it, you're building an app.
(That's one thing I love about Hacker News - they kept it simple by not allowing any formatting in posts.)
PDFs are great when you need an exact visual presentation. And yes, I keep plenty of things in PDFs. However, it is not a format I convert to too often—usually, I prefer the flexibility to reflow text for different screen sizes. YMMV.
As for RTF - I'm not sure about its current niche. It's a Microsoft proprietary format with TeX-like syntax, so for custom applications you might need to write your own parser rather than using standard tools like XML parsers. XML feels much cleaner to work with compared to the jungle of other formats we used to live in.
That said, if it works for you, great! I'm just curious why, in this case, you prefer them to a subset of HTML.
> Markdown ... is for content, not style
A drawback of markup languages, including TeX, is that they intermingle content and style instructions.
> RTF ... is a Microsoft proprietary format
RTF originated with Microsoft, but it's widely supported by scores of word processors and editors. I generally use TextEdit on a Mac, which supports plain text, RTF and RTFD directly. Hence you don't write your own parser: you use the editors already available. It has been around since 1987 with a high degree of stability in the core functionality, so it's a reasonable expectation that those editors will be around long-term, but if they disappear, you can get the text out.
> It's better to give each device (desktop app, ebook reader, printer) flexibility of display
For some purposes, perhaps. Not for mine: if I set text to be red, it should be red on the screen, and red on the paper. Not "emphasised" - red.
PDF is not intended as an editable format, so not relevant here.
FYI: Hacker news does allow some formatting.
https://news.ycombinator.com/formatdoc
HTML is also worth considering. It's not quite as nice to write in as Markdown, but it's still pretty reasonable to hand write (or even read if it comes to it).
Is this sarcasm? or are we coming back full circle unintentionally?
It's not sarcasm. HTML is and always has been a decent format for hand-authoring documents. It's not as nice to use as markdown, but it does support more advanced use cases like coloured text.
And if you want an "archival" format that will stand the test of time it's pretty good (so long as your text encoding is readable, HTML will be).
I agree with you, but I will say that Markdown (or at least, all the major "standards" for it) support interleaving HTML both inline and at the block level. Unless you are writing a very bombastic document or want to avoid a rendering step, I'd argue that you'd be best served by just writing CommonMark with embedded HTML.
I'd argue that, at that point, you might as well write HTML. I find it easier to work with one format at a time rather than two intermixed and a typical Markdown document is barely more readable than its equivalent HTML document, while the latter gives you a lot more flexibility and semantic accuracy.
versus markdown it is ungreppable with all the escape codes/entities, then you have to use special tooling for search.
Avoid character entities—store your HTML files as UTF-8 and just embed everything.
I guess that mostly works but lots of my notes have <>'s. I guess <code> covers most of that
When I created the SSG I use for my blog, I tested a bunch of different markup formats for the article content, but in the end I settled on plain-old raw HTML. Over the years, I've needed to render so many different specific visual elements in the blog entries that if I wasn't actually using raw HTML, I'd probably be using raw HTML snippets all over the place anyway. For another medium I might choose a different markup format, but when it comes to writing content for the web, with HTML I know exactly what I'm getting every time.
I don’t quite follow the first bullet point. If you are doing public speaking, your client is rendering the slides, right? Won’t you be the one interpreting “accentuate?” (So you can make it fit whatever convention you want).
Speaking, not producing slides. This is stuff I read aloud, and the formatting conventions such as red text are ones which fit that environment.
And yes, a rendering pass is a drawback.
The fact that there is a “rendering the slides” step is the problem.
Every once in a while you encounter an opinion on the Internet that you so deeply disagree that it's not even clear how you would even start to counter it.
"Use RTF instead of Markdown" is one of those.
> It is a more complex format than Markdown
Depending on what one means by “Markdown” and how completely you want to support RTF, I might argue RTF is way simpler than Markdown. Certainly it’ll be more consistent.
Markdown is far better suited for human authoring; RTF is a data format, not a markup language. And you can cut a lot of corners with Markdown and it mostly won’t bite you. But if you want to do things properly, Markdown gets rather complicated, whereas RTF stays comparatively simple after the initial parsing/serialising cost, especially if you exclude newer features like XML markup.
> Markdown is far better suited for human authoring
A lot of people are assuming that you would edit RTF manually. There's no earthly reason to do that.
Oh, I’ve definitely edited RTF manually, when I wanted fine control, or to strip out certain formatting but not all. But it’s definitely not intended for manual editing; and it’s hopeless if you want to go beyond ASCII, as you can’t (from memory) just use UTF-8 or UTF-16 encoding, you need to escape non-codepage characters as \uNUMBER?, where NUMBER is the signed decimal representation of a UTF-16 code unit, which is probably the most absurd escape representation I’ve ever encountered. (The character after the number, here and normally ? these many years, represents the fallback character to use if Unicode is not supported. \uc0 can disable that so it’d be just \uNUMBER. It’s cases like that that really show how RTF was designed as an internal file format for things like Word, back before they fully supported Unicode. RTF has not aged very well as a file format.)
Works in Commonmark:
Markdown editors like Typora somewhat close this gap in functionality.
I also prefer RTF. I have tried to live with markdown's limitations several times. I simply cannot.
It is standard enough but I really, really wish static websites would support it. The fact that it won't creates a huge split in my universe.
The nextcloud Markdown editor shows a live view while you edit... Its called "Notes"
You know what has live view all the time? RTF. What’s wrong with WYSIWYG?
Hard to parse when read as text. Heavier. Cannot read it through SSH. Need something else than your terminal.
As someone who live in terminal emulators (yes, even home), markdown is more WYSIWYG than most other formats (I still think org-mode is better though)
> Cannot read it through SSH. Need something else than your terminal.
Well, for reading you can always use pandoc with stdout output and pipe it to a pager. There are also fancier options like https://github.com/Orange-OpenSource/pandoc-terminal-writer (disclaimer: I contributed a small PR a long time ago)
nothing wrong with it. I wasn't evangelizing, just noting that at least some developers also want dual-pane markdown editing.
The part about seeing the results immediately can be done easily ie via plugins in text editors like Notepad++. You have left pane with markdown text and right pane with rendered result, updated after each keystroke.
I've tried 2 different plugins, they are a bit finnicky re formatting (I do a lot of bullet points in my various todo lists and a small thing can break whole block of them) and output looks a bit different, but I still prefer them.
Heck, I often prefer reading markdown itself these days just for myself since the structure is already there and visible, I've learned to see those formatting characters as already sort of rendered formatting so not even looking into rendered pane that much.
* Embedded images
markdown can do embedded images with data urls. It works really well, the data gets all appended at the end of the document
Good point. I just noticed that RTF really looks a lot like LaTeX syntax - LaTeX was clearly inspired by it.
> LaTeX was clearly inspired by it
By initial release dates, LaTeX (1984) and TeX (1978) predate RTF (1987).
Just a note that the most common Markdown flavor (Commonmark) doesn't actually support frontmatter. The author is using presumably Obisidian-flavored Markdown (which is a mixture of Commonmark, GH-flavored Markdown, and Latex).
For file-tagging, I would consider TMSU [0] instead of writing bespoke tools. (ideally we would just use xattrs, but the world isn't ready for that)
[0]: https://tmsu.org/
> Markdown flavor (Commonmark) doesn't actually support frontmatter.
That leads to mixing presentation logic (meta data, ToC) and content. When typesetting the Markdown, the ToC can be derived from headings and meta data should be isolated to avoid duplication. The following videos demonstrate some of the advantages to this approach:
* https://www.youtube.com/watch?v=cjQ-dle-tAE
* https://www.youtube.com/watch?v=3QpX70O5S30
See my editor's screenshots for more details:
https://keenwrite.com/screenshots.html
My FOSS editor is a cross-platform CLI and GUI application that replaces the shell scripts developed in my blog series about typesetting Markdown.
https://dave.autonoma.ca/blog/2019/05/22/typesetting-markdow...
I like Commonmark but I wish it would have been more opinionated. They chose to allow two ways to do everything[0].
Making * always be used for bold and _ always for italicizing is so much clearer, and some Markdown flavors (notably WhatsApp) do this. So you only have to do *haha* or _haha_, which also makes italic-bold more _*intuitive*_.
Similarly they should have gone with one style of headings, probably with #.
This frees up more visual clarity. Because you are no longer using *** for bold-italic, you can use that for lines, instead of both --- and ***.
This then further frees --- up to be used for tables.
Although I imagine there's a decent subset of people that uses the alternate style of doing headings === and the 'normal' way of doing lines ---, which would have killed adoption.
And good luck convincing people to adopt a new variant at this point. "Commonermark"? "Peasantmark"? "Rabblemark" actually sounds decent.
Edit: actually, having checked the discourse around it a bit more, Commonmark wasn't created as "one Markdown to rule them all", but rather as "Venn diagram markdown with the most overlap".
[0]https://commonmark.org/help/
> Making * always be used for bold
This is probably to support potential ambiguities and intraword emphasis e.g. underscore is a common pseudo-space so doesn't support intraword use but * does e.g.
I recently implemented a commonmark parser for emphasis. Holy shit it's painful. I regret doing it but it became a battle I refused to surrender.It's way harder than I expected because of the combination of the ambiguity of * and ** in multi-symbol runs which support infinite nesting even of the same type of emphasis. A given delimiter run could be many different permutations of plain text `*`, `em` and `strong` depending on context of other delimiter runs that might open and close sections along side other context like punctuation, intraword-ness, flanking and whether sums of runs can be be factored by three!
https://spec.commonmark.org/0.31.2/#emphasis-and-strong-emph...
I never expected "**" could be nested emphasis instead of bold so interpretation requires multiple passes to break down delimiter runs and match them up e.g.
> This is probably to support potential ambiguities and intraword emphasis e.g. underscore is a common pseudo-space so doesn't support intraword use but * does e.g.
That seems like a legacy spec mistake they had to adhere to. I'd expect to work and for _ literal usage to requireThis is what I would have chosen too as it's natural for programmer sensibilities.
I can see it as a choice from the "plain text first" philosophy i.e. the things you typically write in plain text should not need escaping. My intuition pump is that you can copy-paste an email into .md without edits or surprising rendering.
As such, it's doomed to never satisfy everyone. Personally I never use intraword emphasis and I typically only have underscores in non-code names i.e. `this_is_normally_code`.
If you want strictness, use a linter or a pretty-printer that follows your preferred style. Adopting an opinionated parser means you can't lint or pretty-print input from those with different opinions (I do not like underscores for emphasis), and thus somewhat goes against the goals of TFA here:
> Markdown files are essentially plaintext with some extra syntax for common elements like sections, bullet points, and links. The format deliberately avoids precise control over display details like font selection. Following the rule of least power, I consider this limitation a feature.
One of my biggest ongoing frustrations has been MDX - a sort of markdown-and-JSX mixture whose spec is now in its third release and which has made very little effort to maintain compatibility with either CommonMark or itself. It is fairly strict and fairly elegant, and moving to a new version requires rewriting all previously-written documents to eliminate no-longer-supported syntax and re-training writers. Both of those things are miserable tasks; it has absolutely killed any tolerance I might have had for a stricter parser.
I'm surprised they didn't make a conversion tool to do MDX(old) -> AST -> MDX(new). The library support is there, but it doesn't look like anyone has created a tool to do it.
> ... new version requires rewriting all previously-written documents to eliminate no-longer-supported syntax and re-training writers
Wonder if any of the LLMs could do that for you?
I would just like to see Obsidian adopt MDX. I feel like there is a whole class of interactivity that could be easily implemented that way.
OP here. I'm pretty cavalier about which Markdown features I use. I employ them differently in various contexts - in plain Markdown files and on my blog, for instance.
But primarily, I treat them as plaintext files. If I needed to remove frontmatter at some point, it would be a simple script. For any feature specific to a particular Markdown flavor, preprocessing, or system - I expect it to work only as plain text elsewhere.
Also, thanks for sharing about TSMU! I was thinking about similar issues—for example, a photo can simultaneously be "from 2022," "from a conference," and "emotionally important." This doesn't work well with typical nested filesystems, where we need to decide on a single folder hierarchy rather than allowing us to filter based on need (as we can in SQL).
Re TMSU, a scan through the bug list turns up this:
https://github.com/oniony/TMSU/issues/264
Thanks for sharing tmsu, I had never seen that before.
Though I wonder what benefits it has over just plain symlinks?
The only drawback of Markdown is images.
GitHub-flavored Markdown is so popular because you can really easy inline them. You don't have to worry about storing them, linking them correctly, and you can even paste to the Markdown field.
There is no elegant solution like this in actual Markdown.
I would add tables to that. Obsidian has some nice extensions to make working with tables easier, but it always feels janky.
Markdown is also missing:
- Diagrams
- Math
- Any custom blocks - like Figures, algorithms, image boxes, etc
- Numbered chapters / sections (Eg Chapter 1, Appendix A, etc).
- Semantic references
For readmes, thats fine I guess. But I miss all those features when I'm writing proper blog posts, articles and documentation.
There are various hacky workarounds. But as soon as you start using bespoke markdown extensions, you're locked out of 95% of the markdown tooling out there. And everything feels so janky.
I'm looking forward to Typst's HTML output getting more mature. Typst is the only typesetting tool I've ever used that is both enjoyable to use and powerful enough for the kind of documents I want to write. It manages that by being a full on programming language. You can define variables and write custom functions for reused blocks. And there is an ecosystem of 3rd party typst packages. For a paper I wrote recently, my benchmarking tool spat the results into a JSON file. My document loaded that JSON data directly, and used the benchmarking results to populate charts and tables in the paper. It was crazy cool.
Mermaid charts! Supported by github and very readable as plaintext
https://github.blog/developer-skills/github/include-diagrams...
I put that in the "hacky workarounds" category.
Does it work at all? Yeah. Does it work in my markdown editor? Probably not. Does it work in my markdown renderer? I don't know. Which version of Mermaid does it work with? Probably a different version of mermaid on every platform. Can I save my mermaid diagram to a file and link it instead of inlining the mermaid diagram inline? Who knows. Flip a coin everywhere mermaid is supported.
I tried pushing a markdown renderer to the limits once - only to find out that the markdown renderer I was using doesn't correctly implement commonmark, and my markdown file breaks with every other markdown renderer I've tried it with. To say nothing of the custom extensions I tried to use.
At this stage I'd rather keep my markdown files simple, and use something better for real documents. Something like typst.
Markdown should not do diagrams/math etc.
People just started doing all kinds of extensions and for me that’s silly.
> Markdown should not do diagrams/math etc.
Fine. But in that case, markdown is the wrong tool for blogging and documentation. I want to write rich and interesting content. Markdown is anaemic.
i do believe obsidian support all those (well maybe not all)
but thats added 'on top' of basic markdown, so when you open your markdown file in some other program it looks weird
More importantly, when that specific plugin stops being maintained/updated, you’ll have Markdown files which can’t be properly presented/read.
Yeah, what people like about Markdown is basically regular text.
The formatting is interesting, but it’s not revolutionary or anything.
> The only drawback of Markdown is images.
> GitHub-flavored Markdown is so popular because you can really easy inline them.
I'm not sure what you mean. GitHub-flavored Markdown has pretty much exactly the same image syntax as every other Markdown flavor.
I think they're mixing the GH web ui with the syntax. You can paste an image right into the editor and it does a really good job of inserting it right where you need to. It is really good UX that I miss when editing markdown locally. Obsidian also does a decent job, but not quite as smooth.
Sounds like powerful lock-in for Github. How could such a project ever decamp to Gitlab or Codeberg?
So I also do use gitlab quite a bit, but not as much recently. I went to compare. Gitlab actually does have a similiar ux experience, though I'd give the Github one just a bit of an edge. It looks like the key difference is that github converts a pasted image to an html image tag, while gitlab uses markdown with the width/height brackets that the end.
Honestly, I think using an html image tag is the right way to go. I type in markdown all the time, and I have no problem making links. But markdown image syntax I have to double check each time or let the editor figure it out. HTML image tags, I find easier to remember and read than a markdown one. (But maybe that's because I learned HTML before markdown).
Sounds like powerful UX others could also supports. Its all web stanards.
Also, as an old person, I will tell you that 1) I got my first personal computer in 1979 and have been trying to keep my bon mots archived ever since. I have tried a million things and have learned one key lesson: It's not really worth it.
I literally have a footlocker filled with old disk drives (remember, since 1979!) and I have never, ever gone back more than a few years, hell, more than a year.
Now that disks are big, I keep a lot of old stuff. I have, eg, screenshots dating back to 2015. Email before then. And so so much more.
I have never gone back more than a few years.
I will continue to archive because I must but, Old Person to Young People... Don't put too much effort into long term availability. It's not a good investment.
Similar perspective, but I'd offer a minor tweak. Just as before gmail people spent a lot of time "managing" their email. Gmail allowed us to stop bothering and just use search to find stuff among the now-messy volume of email. It works pretty well.
Similarly, I'd say save everything, but spend no time on organizing it, relying on search and ai/future technology to find what you want from among the mess.
Similar thought after many years of trying to be "organized". Search is what matters, make sure the tool or format of storage allows for easy searching.
I'm a big fan of general search.
> Also, as an old person, I will tell you that 1) I got my first personal computer in 1979 and have been trying to keep my bon mots archived ever since. > I literally have a footlocker filled with old disk drives
I had a private mailing list for 15 years and had emails squirreled away across several hard drives. I archived them all to my Mac, under a directory under /, and tossed the disks. Was too broke to have another disk for backups.
Then Apple decided in a upgrade to trash everything not-Apple under /. Archives gone. No warning. Really amateur move by them. Grrr.
A triage system is essential to determining your archival strategy! We can produce information faster than we can produce information storage systems, so we need to be discerning! Random gibberish is less valuable than a screenshot you took in 2015, and that screenshot from a decade ago is probably less valuable than your tax returns or your treatise on the meaning of life! If it's worth keeping, it's worth putting some time into every few years to make sure it's copied somewhere.
You might reconsider your stance. As LLMs get increasingly more powerful at making sense of all kinds of data, these old archive can suddenly become incredibly useful.
Hmmm. I see the use in this...
For me, everything swirls in an enjoyable vortex towards org-mode.
- Literate Programming, tangle/weave
- Export to DocX, PDF, HTML
- Org-Roam
- Time Management.
Several things mentioned above are day to day. I think spectacular things are often made up of collections of useful everyday things.
Org mode is great for plain text info storage. Just difficult to use if you havent joined the church of emacs
Why? I use it in VSCode and vim all the time.
Org was designed for being a PKM-like system. Markdown was designed for README.md. It doesn't mean that you cant use either for the other task, but Obsidian for example had to modify markdown significantly, to add support for tags, math, etc.
It's just a shame that org format works really well only in emacs.
[dead]
The killer app for markdown would be a collaborative editor that displays the raw markdown and formatted markdown side-by-side and makes both sides editable. Tech people can use `#` and `*` on one side for formatting, product people can use normal text-editor buttons like "header1", "italics", etc.
Someone is sort of building this on Obsidian: https://screen.garden
There’s also https://system3.md/relay
Thanks for the plug :)
yoooo this is sweet. Been looking for this for awhile.
I built this in college, but the code is lost. It was a week or so of hacking. I believe in you.
IIRC the trick was to get a pipeline for Markdown to HTML, render it into a WYSIWYG editor, then convert the HTML to an AST, and walk that to generate the markdown. I had to “format” both the markdown and html on each render (bidirectional round trip render) because parsing/gen wasn’t whitespace stable.
It's not collaborative, but this is what I love about Typora[0]. Click into a styled area and the styling becomes visible. Click out, and you just see the final styling.
[0] https://typora.io/
Typora is really a gem in the Markdown universe.
HackMD already does this. It has a dual-pane view for raw markdown and formatted output, supports WYSIWYG editing, and allows real-time collaboration. Surprised no one mentioned it.
- [HackMD: Your Collaborative Markdown Workspace for Knowledge Sharing](https://hackmd.io/)
You can do that in IntelliJ. If there's a way to control a tab on a browser you could do that too. When I was writing my thesis, I would have `inotifywait` running on one side and when it detected the file had changed it would run the entire `pdflatex` + `bibtex` pipeline the 6 times or whatever it needed and Evince would hot-reload so I had a live preview. I'm sure a browser can do the same with some command.
Here's a comparison chart of collaboration/teamwork plugins in Obsidian.
https://system3.md/observatory/categories/collaboration-team...
Peerdraft, Relay, and Screen Garden are all based on CRDTs, and Obsidian is also currently working on native collaboration.
(disclaimer: I work on Relay)
Unfortunately, all of these are not self-hostable. I wish there was something like Nextcloud for Markdown Collaboration.
Relay now supports self hosting the collaboration server. It works with tailscale out of the box.
We still have a centralized service for authentication and authorization, but if you self host it is impossible for us to access your files.
Sounds like an obsidian plugin.
Someone is building that: https://screen.garden
Notepad++ and VSC both have plugins for this.
They’re fine.
Notion had the potential to be like this, but instead it's garbage. You can put markdown in, but you can't get it out.
You might appreciate this: https://antmicro.github.io/myst-editor/
Why does it need to be side by side? Just let each client choose WSIWYG or raw.
When writing in text mode so that you can see what you are generating. Wouldn't really need it the other direction though.
Isn't the point of Markdown that you don't need to 'see what you are generating', you can just read it? I write Markdown every day, but I do it in a plaintext editor (with syntax highlighting). I have a keyboard shortcut to view a preview in my browser, but I don't see a great need to be viewing that preview all the time.
I think the point of Markdown is to signal that you're someone who uses things like Markdown.
Markdown is ubiquitous even in boring corporate technical writing processes nowadays. Nothing hip about it.
I think this feature is already in qownnotes.
Surprised , nobody mentioned qownnotes
Edit: I was wondering how to enable this mode because it wasn't in my qownnotes ,Here's how I found it , go to the help section , click find action , and search preview and click on show note preview panel.
Now the caveat is that if you want to see it blitted , you have to save the file once to see it automatically show in the other side. Maybe this can also be definitely automated / I feel like there was some feature that did that for that as well or atleast its very non trivial.
Edit 2 : okay so I just realized that qownnotes also ships with autosave feature which saves and thus also shows what you type in reader mode in like a 0.5 second delay. And I think there is also a way to decrease / increase the autosave part as well
Dude , I didn't realize it , but qownnotes is so good!
This is a very common mistake wasting half your screen, it can't be no killer app
Or just a plain VIEWER.
It feels like a long term solution would be to use a markdown that is both easy to write (not RTF or XHTML), but has a defined grammar in some standard format (ex: EBNF). Most platform/languages will have a parser and so you can whip up a "renderer" or converter trivially at any point.
The only markup I'm finding with a grammar is MediaWiki (sort of..)
https://www.mediawiki.org/wiki/Markup_spec
Even Djot doesn't seem to have one. Weird..
MediaWiki has one of the worst syntaxes and formalisations out there. I've been trying to render wikipedia pages on and off for more than 10 years and there is no independent parser covering the whole syntax and magic behaviour.
https://www.mediawiki.org/wiki/Alternative_parsers
There is only parsoid, developed for the visual editor and that took pretty much a decade to build with much pain and suffering.
This is not the answer.
It also doesn't help that in practice most wikis install at least some parser extensions.
Yeah, sorry.. i didn't mean to endorse MediaWiki. It did look kinda ugly as well.. haha
What do you recommend?
It does actually seem that djot has a grammar of sorts..
https://github.com/treeman/tree-sitter-djot/blob/master/gram...
(it's designed for a tree sitter.. I'm not super clear if it's globally usable)
Markdown is already fragmented, that would just be introducing a new fragment, not a standard [insert that comic that everyone posts any time someone proposes a new standard].
The long-term solution is having whatever markdown grammar you want and converting it to a standard AST. Then anyone can create their own transformations of that AST to render whatever document they want, including a new markdown document potentially in a different grammar.
https://pandoc.org/using-the-pandoc-api.html#pandocs-archite...
https://github.com/syntax-tree/mdast
https://unifiedjs.com/
"so you can whip up a "renderer" or converter trivially at any point"
And yet almost no one has...
To be fair, one there's a good one there's much less incentive to write something new. In this case the good converter is Pandoc: https://pandoc.org/
Thanks; but that's a converter, not a viewer.
The problem is the lack of READER applications to simply view (not edit or convert) all these Markdown documents.
In my defence, the comment I was replying to mentioned "renderers" and "converters". Furthermore IMHO, any text editor is a Markdown reader. If you want it formatted as is it were "markup" then might I suggest converting to e.g. PDF using Pandoc and then using one of the many capable viewers.
Noted, re the other comment.
But... come on. You might just as well say any text editor is a browser, because you can technically read HTML with it.
You can also technically read Word documents with a text reader.
I see your point, and maybe it's a matter of preference, but I really do use my text editor for reading Markdown. I wouldn't do the same for a Word doc or HTML, without at least running it through a convertor first.
b/c I think people start from the syntax and then think about how to parse it later?
A simple grammar probably really limits how you can design your syntax.
Here is an example of the problems:
https://roopc.net/posts/2014/markdown-cfg/
100% agree. I've been using markdown for a few years after moving away from proprietary note taking apps. Although this has led to me developing my own short hand for many things in my notes. And have been looking at a way to integrate a to-do list with my notes with some Python scripts.
So while my notes may rely on some personal scripts to get there most value out of them, I strongly value that they are still plain text and I can always move them into a new workflow if I need to.
Maybe consider Quarto? Free and open source, integrates Python scripting directly in Markdown and/or able to call from separate scripts.
Love Quarto. I write all my notes, presentations, blog posts, memos, etc in .qmd files. For non-technical stuff I use Obsidian to author (there is an extension which tells Obsidian to treat .qmd as ordinary markdown - ie ignoring the additional Quarto frontmatter and so on), then for everything else I use VS Code with the Quarto extension and just render out to the display format I need. I really appreciate that it’s built on Pandoc and it means I can just use one format and one set of tooling for everything.
Thanks for the recommendation! Sounds interesting
I love markdown and use it for all my notes, however it really needs a native way to underline. I have been converting some older books and lectures to markdown and underline is used all the time.
If anyone has a good solution I'm all ears.
Markdown is plaintext so you decide what it means. I personally write *italic* and **bold**, so I can use _underline_. Most Markdown to HTML converters would make the last example into italic, but you can customize many of them.
Commonmark doesn't even mention "bold", "italic", and "underline". It just says "emphasis" and "strong emphasis". You can style it however you want.
This kind of undercuts the advantage of a semi-universal format. Though I'd agree underscore wrappers are quite reasonable and natural.
Markdown isn’t really meant to be a universal markup format. Its primary goal is to document conventions of annotating plain text which keep the plaintext semi-consistent and readable.
So the purpose of , * etc is purely emphasis. If you need to represent something specific (bold, italic etc) then that’s a job for the Markdown parser (or embedded HTML etc). The result of the parser (HTML, etc) will be less human readable, but actually able to specify formatting.
I agree that CommonMark could be extended, but I think the focus should be on semantic* relevance rather than markup specification.
I love the Fountain spec for exactly this reason. I primarily began using it since it’s Markdown for screenwriting, but it has bold, underline, and italics along with the usual markdown stuff like comments etc. I find it to be by far the best way to write plaintext anything other than code. It’s also a bit more opinionated than Markdown which I highly prefer.
it might depend on what you want to do with the underline. Does it just indicate some kind of emphasis?
Could you use the convention in your documents that "_" is the underline delimiter? I know that the default is to render it as italic/emphasis but that is just a decision at rendering time. The semantics of emphasize/underline could easily overlap.
Of course if you want 3 levels of emphasis with bold, italic, and underline, then yes you need to look elsewhere.
Markdown isn't really a formatting tool. it is a way to structure text in the minimal way that a person would interpret it and a machine could render it.
>converting some older books and lectures...If anyone has a good solution I'm all ears.
I don't know if this helps you, but you said "older": in the 20th century world of typewriters--which had no italics--underlining was used as a substitute for italics. Transforming underlines to italics or going the other way was considered normal. You wouldn't use both in the same document.
There's notional underlining, which in typewritten documents is effectively the equivalent of italic, and there is typographical underlining, where "underline" means "there is a line under this element and/or text".
Both matter, and although Markdown flavours handle the notional case well, they fall down at this (and several other) typographical capabilities. Expressing text in a particular colour (or greyshade) is another example. It's possible to achieve this in practice through embedded HTML and/or CSS tags, or through augmented Markdown variants (Pandoc's Markdown can achieve some things CommonMark or DaringFireball Markdown cannot).
Ultimately though I find I need to switch to a more capable and consistent text-layout engine, usually LaTeX in my case.
Though for even quite large and modestly complex works, Markdown is either sufficient entirely or is useful in getting the work off the ground before switching to a more powerful option.
i said "typewriter", and there is only one kind of underline on a typewriter.
converting old typewritten notes, they may contain typewriter underlining, and it may represent italics. Markdown would be entirely sufficient to handle that.
there was no need to de-clarify my comment.
The typewriter is distinct and often intermediate writing device standing between the markedly free-form though also variable handwriting and the much more standardised, though fairly developed, capabilities of typeset documents.
Unlike handwriting, typewriting uniform (both in type and spacing), and markedly faster.
Unlike printing, typewriting is limited (generally a single typeface, no variability in face, size, or styling (e.g., roman, bold, italic), and requires further guidance to define specifically what result is desired where a typewritten work is not a document's final form.
It's worth noting that print itself differs from handwriting: when we write letters, forms and sizes vary, different writers often differ markedly in their own scripts, trained copyists may achieve a high level of standardisation, but that itself requires significant training and is achievable only by a limited number of artisans,[1] and letterforms themselves are not discrete but individually instanced each time they are created. With the advent of moveable-type printing,[2] letterforms became fixed, and with digital typesetting and computer fonts, each discrete shape or language-specific forms, say, the Roman A, Greek Α (alpha), and Cyrillic А (Azǔ/Азъ), are represented by distinct code points, but are nearly or entirely indistinguishable when rendered on-screen or in print. Further, over the history of both handwriting and typesetting, conventions have emerged for the textual representation of language, including spacing of words (versus scripto continuo), punctuation, paragraphs, page numbering, division of books into chapters, sections, parts, subsections, etc., of lists, tables, indices, (foot|end|side)notes, (parenthesis), drop-caps, figure captions, cataloguing, etc., etc. All of those were inventions and conventions not inherent to language, writing, printing, document preparation, or archival and retrieval themselves. There's still considerable variation between different print language representations, e.g., many texts lack equivalents of italic, bold, or even upper/lower case letterform distinctions.
Typewriting itself occupies an interesting space, being a primary endpoint for some types of documents (correspondence, forms, and the like) and an intermediate form for others, most notably published articles and books. Given that typewriting has both capabilities and limitations which aren't present in typeset documents (whether moveable type or digital), it's not possible to draw a distinct correspondence between what a typewriter outputs and how that might be represented in a derived document. Yes, typewriters can generate underlines, but that might be represented in typeset print as italic, bold, underline, or something else entirely. In practice, editors proofing marks were inserted (as handwritten notations) on a typed manuscript to indicate the preferred presentation, generally following the author's intent and/or the publisher's own house style conventions. See: <https://en.wikipedia.org/wiki/List_of_proofreader%27s_marks>.
________________________________
Notes:
1. An anecdote which sticks with me: among the 1001 Arabian Nights stories is one in which a character makes specific references to the not only his literacy and scribal capabilities, but the types of scripts he could produce. That is, this was a specific and valued skill of that age worth noting, even in a general-audience work.
2. As distinguished from earlier monoblock printing in which a whole work was engraved on a wood block or metal plate, typified by early Pamphilus, seu de Amore from which we have the word pamphlet, see: <https://www.etymonline.com/word/pamphlet>. Such monoblock prints were more like a photocopied handwritten letter, in which variations in individual letterforms are replicated, than they are standardised print obtained from moveable type or, more recently and familiarly, computer-based digital typesetting or Web documents, in which fonts are standardised and each given character is identical to all others matching that style.
The solution are HTML tags like <u></u>
Markdown is often, and was originally intended for, HTML generation. But that's not the only target which can be achieved, particularly with such tools as Pandoc, a document format interchange Swiss Army knife.
Relying on format-specific tags imposes stronger constraints on endpoints and/or increases complexity of your document build process.
Inline HTML is part of the standard Markdown syntax, not a complication. If your tool doesn't support HTML it doesn't support Markdown. The format can be so simple in the first place because it allows this escape hatch for anything non-trivial. And tools like Pandoc can handle that just fine.
My point is that Markdown conversion tools, notably Pandoc, whilst they will incorporate inline HTML when generating HTML endpoints will not convert such inlined code to other endpoints, e.g., LaTeX, DocBook, OpenDocument, etc.
If you want those outputs to faithfully represent formatting, you either need to juggle multiple inline directives for each desired output format, or find some universal Markdown-based mechanism for achieving the same result.
I'd like to make clear that I'm familiar with Markdown; the fact that its original design intent was streamlining HTML generation; that inline "native" code is a feature, not a but, but all the same a rather fraught one; and that actual practice has moved far beyond Markdown merely being used to generate HTML, least of all my own such practice.
I've discussed this situation previously on HN (ironically from the PoV of using LaTeX embeds within Markdown creating problems when attempting to generate other-than-LaTeX outputs), see: <https://news.ycombinator.com/item?id=29690056> (2021).
And I'd asked about the HTML and/or LaTeX conditional generation in a StackOverflow post about seven years ago: <https://stackoverflow.com/questions/4820502a9/pandoc-have-ei...>.
Ah, the good old Unarticulated Annotation element!
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/u
Not the best solution but you could use the HTML underline tag
I built Markdown to web, drag and drop solution: https://lmno.lol
Here's a demo https://www.youtube.com/watch?v=SykbiVweYH8
I wholeheartedly agree with this post. I also keep my notes in Markdown, I also have plenty of Python scripting around them, including automatic publishing of my website.
I use FSNotes today on macOS and iOS. Both apps are open source, both use well-structured .textbundle directories that separate Markdown content from JSON metadata and binary attachments. Synchronization happens through Git. It's a very powerful combination.
Ironically, I wrote a blog post some 8 years ago about this very subject. That blog post is now offline.
I appreciate the mention of FSNotes (and in turn textbundle). Somehow, despite trying tons of note taking apps and formats, I don't remember ever coming across mention of this format specifically.
http://textbundle.org/
My biggest beef with org mode and all of the markdown apps I've tried is the asset management problem. For me screenshots are almost as important as the text part of the note, and are usually strongly tied to a single note. I've taken to using apple notes at work just because it "solves" that well enough, but I'd really prefer to work in markdown/plain text (except for the images).
I’ve been self hosting linkding[0] and it has archiving capabilities. Saves in html not markdown but that’s basically the same thing. It’s been very useful and then I back the folder up to R2 for free. I enjoy knowing that if I find something I want to remember it won’t go away. Plus it works great for recipe sites because I don’t have to deal with ads.
[0]: https://github.com/sissbruecker/linkding
I love linkding. I try to add sites to it whenever I'm tempted to just keep the tab open for later because it is cool.
Obsidian is the killer app for this. I spent a month converting around 3 years of security notes to markdown and now use obsidian to search/archive everything.
I've been doing this recently with every URL I've bookmarked over the last 15 years or so since I signed up for pinboard.in. http://spider.cloud has been really nice for crawling sites and saving the results as markdown. I plan on expanding it to transcribing youtube videos I've saved, github repos I've starred, HN posts, etc.
Ultimately I'm trying to index my "window" to the web as embedded content in a vector store. Not sure exactly what I'm going to do with it yet but I imagine it will be a component of some kind of personal agent system I can use to reference old info and help as a writing tool or as an "idea generator" of some kind. I'll likely end up not using most of it but you never know.
I've scraped about 10k markdown files which has created a ~10gb chromadb instance so far. Eventually I'll probably create separate collections based on domain, and filter down items that I care about more.
When it comes to web archiving, I've found that Markdown has some real limitations. Sure, it's great for basic text, but it struggles with things like embedded content and non-standard layouts. Try archiving a Twitter thread or an app-style webpage in Markdown, and you'll see what I mean. It just doesn't capture the full picture.
That's why I've come to prefer formats like webarchive, mhtml, or single HTML files for archiving. They're incredibly faithful to the original content - you get almost perfect rendering of the original page, complete with styling and layout. Plus, they can capture stuff behind paywalls or on logged-in pages, which is a huge plus.
The real challenge, though, isn't just about saving the content. It's about making that saved content useful. These archive formats are great for preservation, but they can quickly become a mess of unorganized files that are hard to search through or make sense of.
I think the key is finding ways to organize and interact with these archives more effectively. Things like full-text search across all your saved pages, the ability to add notes or highlights directly on the archived content, and smart tagging systems could go a long way. And it'd be really powerful if we could integrate these archives with other knowledge management tools we use.
I develop a tool called HamsterBase that seems to address a lot of these issues we've been discussing. t's a local-first app. That means all your data stays on your own device - no need to worry about your personal archives being stored on someone else's servers. There's no sign-up or registration required, which is refreshing in today's cloud-centric world.
> [Markdown] struggles with things like embedded content and non-standard layouts.
I don't share that experience. I typeset all these documents using Markdown with pandoc's div extension, transformed into XHTML, and then passed to ConTeXt:
* https://impacts.to/downloads/lowres/impacts.pdf
* https://dave.autonoma.ca/blog/2020/04/28/typesetting-markdow...
* https://pdfhost.io/v/4FeAGGasj_SepiSolar_Highlevel_Software_...
From XHTML, the document is transformed into TeX statements, which opens a world of possibilities. In the following video, custom styling is applied to nested contents:
https://youtu.be/3QpX70O5S30?t=35
Those are all PDFs. Why, if Markdown is so great?
Not OP but I've done similar work myself.
Alternatives for authoring PDFs include LaTeX or similar markup languages, or GUI-based tools.
For many works, Markdown is more than sufficient for producing book-length texts (I've done this numerous times myself, either authoring my own works or transcribing/modifying books for improved access/readability). Markdown's benefit is that it is extraordinarily lightweight, and removes overhead from the authoring process.
Even where one ultimately chooses to migrate from Markdown to some more capable authoring format, Markdown remains useful for creating the original rough form of the work. Complex elements (figures, formulae, tables, etc.) can be indicated and, after document conversion from, say Markdown to LaTeX, fleshed out in full.
With tools such as Pandoc (see my earlier comments on it), it's trivially possible to create multiple outputs (I usually refer to these as "endpoints") of a document. I've used Makefiles to drive this process, such that I write source in Markdown and generate partial or full HTML documents,[1] other LWMLs,[2] PDF, ePub, straight ASCII/UTF-8/Unicode text, word-processing formats, etc., as I want. The set of Markdown + Pandoc makes this trivial in ways that, say, LaTeX alone isn't entirely suited.[3]
It's of course possible to use another LWML as the source format. Markdown has its limitations, but is most widely known and implemented, and limitations workarounds are typically reasonable.
________________________________
Notes:
1. A partial HTML doc may be useful for dropping into a larger document, and doesn't require global HTML elements such as the <html>, <head>, <body> tags, or others such as <nav> or <aside> in most cases.
2. Lightweight markup languages such as bbCode, AsciiDoc, RST, MediaWiki, OrgMode, etc., etc., see: <https://en.wikipedia.org/wiki/Lightweight_markup_language>. Useful when inserting the document into systems based on these formats.
I've landed on a workflow that I like a lot, and have shown to several people on my team. I use Google Drive for Desktop, which maps the G:\ drive to Google Drive. From there, I use VS Code for Markdown editing.
Google Docs now supports Markdown files, so if I need to convert the Markdown file to Word or PDF, I just open it in Docs and download it in the format I need. (Pandoc also works for this, as the author mentions). Converting HTML to Markdown can also be done in Docs: copy and paste the web page text into Google Docs, and download the file as Markdown.
For mobile, I use the DriveSync app to download my notes (Markdown) folder to my phone. Then I use Obsidian to open and edit the files.
But why should you need to convert it? Why does no one call out the giant problem with Markdown: the lack of READERS?
People read documents, not formats.
I adopted obsidian recently to replace notion and it’s been a refreshing change. In its basic state (no plugins), it’s just a bunch of markdown files.
Very easy to search notes and even have a dedicated folder for diary entries.
My pain is that I couldn't find a decent md viewer for Windows: free, fast, simple, no distractions. Imagine notepad. I have to open my md files with VSCode or Notepad++ (nasty view).
Try this one: Markdown Viewer. Other than the larger size that comes with being based on Electron, it's all quite good. https://github.com/c3er/mdview
Thanks. It looks pretty decent. And more important is the first one being just a reader/viewer not an editor with preview.
Browser extension as MD reader is quite nice.
- https://markdownreader.github.io - https://github.com/simov/markdown-viewer
Those two are quite good.
I have a small wrapper for pandoc with sane defaults to create PDFs of my notes which I can print out and write on.
Some of my scribblings are useful or important enough to get added to the markdown files and printed the next time.
Easy fix, just write your md files in HTML, that way they are easy to read, and when you need to look at the markdown you can just use a converter.
That's sarcasm, right?
EXACTLY. Open-source projects are rife with Markdown, but why? There are almost no VIEWERS for it. It's irritating as shit.
After years of looking, I finally ended up with Marked (for Mac). When you ask for a Markdown reader in any forum, you get nothing but suggestions for EDITORS, which happen to have a preview pane. But what is it "previewing," when everybody's just reading these things as plain text with the formatting codes embedded in them?
The underlying purpose of org-mode is to manage this issue (the text part). It doesn't solve it, instead it is a tool for managing the steadily increasing archive organizational complexity within an ever evolving timeline. You reconfigure your archive's implicit schema well now you're in a world of heavy editing. That's life. If you don't have a solid backup strategy, you are going to lose stuff. That's also life. Big binary blobs are a different, equally important problem.
Sure, keep your archive text in markdown (which one? a dumb person asks). But I'd recommend managing it with org-mode, it doesn't really care what format your text is in.
(Yeah I saw the footnote mentioning org-mode but that reads to me that org-mode's reference there is entirely about the markup flavor.)
Yeah, org-mode and by extension Emacs really help in this regard. Now that Emacs has been ported to Android I expect its usefulness to only increase.
Looking back I can't believe I considered just bookmarking a link enough to save it long-term. Sure, I lost a lot of cruft but there were some gems that in retrospect I'd have liked to still reference or look at today. Eh, hindsight is 20/20 as the saying goes.
I'm not surprised this post opens with a link to /r/DataHoarder. Hot take ... I understand the sentiment that you can't trust content on the web to be there forever, but there is also the other side of the argument which is: compulsively saving data is a waste of time and it introduces a cognitive overhead that you'd be better off without.
> If it's worth saving
Idk, I think if it's worth saving it's worth saving and the only person who can determine if it's "worth it" is me.
I agree that some people have an obsession where they save data that isn't worth it, but r/DataHoarder is a great place with a lot of information on building and maintaining large data systems for hobbyists, regardless of what you actually store.
Actually, “future you” will know even better in many cases.
Also, some people other than yourself might turn out to be a better predictor of “future you” than “current you”.
So many people are hyper individualistic and forget just how predictable individuals can be to others!
When I find a blog post/article I'm interested in, I save it to my laptop with the SingleFile extension and I take quick notes as well I write my thoughts about it in org-mode. It has a very low cognitive threshold and I can always read it back in the browser. I'm find if not all the outbound links are still working, I'd just like to read back sometimes.
This is almost exactly what I do too, though I also throw the link in the wayback machine so it's easier to share with others should the source go down (and to be courteous to any like-minded fellows who also wanted to see the content, but unfortunately came too late)
I'd disagree a bit there. I do something similar, saving interesting webpages, and it's really really nice being able to quickly search for something that I halfway remember a few months down the line.
I'm not saving everything, and it just gets stuffed unedited into a folder that I can search. Not too much in the cognitive overhead department.
Until run you out of space and/or storage bill is too expensive.
if you're storing text that is realistically not going to be a problem.
If I read something and then remember it 5 years later because it becomes relevant, I want to be able to find it. It's not even that I will look at it, I just want to have the option if I want to.
> I understand the sentiment that you can't trust content on the web to be there forever
The thing is, people say this, and I am sure for some amount of content it's true. However I eventually realised I have never had a single issue if required, in retrieving literally any piece of software or digital content after the fact from somewhere on the internet.
It's pretty much why I care so little about what happens to my steam library when Gaben kicks it. If I get the urge to replay something in twenty years that I paid $3 dollars for and its suddenly gone, i'll just go find it elsewhere.
Once I had gotten pretty much everything I wanted from running a large scale storage system (largely to learn the in's and outs of linux/general storage concepts) I pretty much just gave it up. Its a lot of money to hold onto things that at this point, I pretty much know i'll always be able to recover elsewhere. I'd rather someone else pay the electricity/drive cost for me.
That’s only possible because some people are hoarders who save this data or software and put it out there..
> That’s only possible because some people are hoarders who save this data or software and put it out there..
Sure. And I appreciate it. Like I appreciate the internet archive etc.
There is no comparable alternative to the Internet Archive though. They've gotten involved in several lawsuits and their future is far from guaranteed. They're an incredibly important organization, but I think it's too important of a project to be limited to one organization, or even one country or region of the earth. A solar flare could destroy a lot of history.
I don't know that the economics of having multiple Internet Archive-like organizations is currently feasible (I imagine getting funding for one of them is hard enough), but even a partial offline mirror hosted someplace else would be nice. Maybe to save space they could take the oldest version of a page, the newest, and the midmost version timewise, discarding all other versions. They could also heavily compress images, video and audio to save storage space (would increase processing costs, but if willing to throw out quality, could compress quickly and still save a bunch of space. E.g. downscale all videos to 480p and use veryfast preset and CRF 28 with ffmpeg. Even 240p is a lot better than nothing. A pixelated form of history is better than no history.)
I have the opposite and sadly accelerating experience.
Information is removed or altered constantly and usually, I cannot find anything on the Internet Archive either. For whatever reason WayBackMachine, for my use cases, is nearly always blank.
But I look for semi-obscure publications and statements from (nation)states and organizations.
An archivist saves everything, because it's impossible to anticipate what future historians will need or want; they are working from a context you cannot access. You can winnow down a large dataset to it's relevant subset, but you can't study what wasn't preserved.
I agree, but I also think that data hoarding is similar to regular hoarding. All of this content and information seems like it could be useful, invaluable even. It's a problem in a world where we have excess to sort through all that information and only focus on what's important right now.
Nobody is taking their data hoard and feeding it into a personal AI ?
Hmmm. I see the use in this...
For me, everything swirls in a lovely vortex towards org-mode. - Literate Programming, tangel/weave - Export to DocX, PDF, HTML - Org-Roam - Time Management.
The Markdownload browser extension is super useful for saving webpages as Markdown: https://addons.mozilla.org/en-US/firefox/addon/markdownload/
I wanted to like MarkDownload, but unfortunately it doesn't auto-add provenance when copy-pasting selected text blocks, at least out of the box.
Markdown is a wonderful format (I use it all the time) but it's very narrow and I don't think it's appropriate for storing general 'things we might publish'. You lose a lot of semantics just replacing html with markdown. For a general purpose markup language, I don't think we can beat XML.
I agree, if the purpose is archival (versus manually reading it with your eyeballs) then you will want a format that (1) can capture information in a somewhat self-documenting way and (2) is in a form that can be easily parsed and converted into a newer format.
Keep it in the format appropriate to the information. If just the text is important, Markdown is probably fine. If the structure is important, keep it in HTML. If the layout is important, PDF. You wouldn't store a Gutenberg bible in Markdown, would you?
(Don't answer that - there's always one asshole who would)
Mediawiki. Let's balance durability against functionality.
MW gets you a massively scalable doc store that does not need much room. Most MW instances are MySQL/MariaDB backed and the schema etc is very well described.
Keep it plain text for "notes" but a MW will be easily discoverable for quite some time from now.
https://www.mediawiki.org/wiki/Markup_spec
encouraging..
We never should have stopped using troff.
Unless it’s math in which case you are screwed.
Can we have a damn math keyboard and proper character encoding instead of doing shenanigans with latex / office equation editor ?
Why in this exact text box I cannot type a differential equation ?
Markdown is great, but not a panacea.
Tables, in particular, just suck, especially if you want to have even slight formatting inside of the cells.
Unfortunately, it’s either plain-text-readable or rich representation. Pick your poison.
The biggest problem with Markdown is the baffling lack of plain VIEWERS. Not editors with a preview pane, but straight-up viewers that render Markdown for reading.
There are very, very few. I use Marked 2, for Mac. I don't even remember if I ever found another one. It's irritating as hell, because pretty much every open-source project's read-me files are in Markdown. Why, when there is no viewer anywhere near as ubiquitous as those for PDF... despite Markdown being much simpler and better understood?
Makefile-driven development. Run "make pdf" as needed (looped in a shell one-liner if you prefer, or driven by an event watcher). A decent PDF viewer will either reload the document automatically on change or can be readily reloaded. The Suckless PDF viewer zathura is among the former, I've also used, variously, xpdf (slightly grungy these days but an old reliable) or MacOS's Viewer app.
<https://pwmt.org/projects/zathura/>
This lets you work on the doc in a terminal window and have the (reasonably constantly updated) formatted output in a PDF viewer.
Short documents will render virtually instantly. I've not had long renders until documents extend to at least several chapters worth of text if not book-length, and even then it's a matter of a few seconds in most cases. Highly-formatted texts may of course take longer.
Agreed. I use QLMarkdown [0] for preview in Finder and this markdown-viewer extension [1] for in-browser preview. But a standalone, native app would be pretty nice too.
[0] https://github.com/sbarex/QLMarkdown
[1] https://github.com/simov/markdown-viewer
Thanks. Try Marked; it is exactly what I wanted. It's $14, but I decided to reward whoever did what apparently no one else (including me) can be bothered to do.
I have tried MarkText, which is yet another editor with a viewer but it's free.
I use Markdown Viewer, in Chrome: I'd bet there are multiple equivalents in Firefox and Safari. Well. I don't know what Safari's extension universe is like but it seems likely.
I actually use a VS Code plugin for this called Dendron. It is in the same vein as Obsidian or Notion, markdown based, and just runs in VSC. Very handy and since plain text works wonderfully in a git repository.
When it comes to text (though I do include Word, PDF, text files, markdown, tex) I like to burn them to dvd.
I have one of those big dvd "catalogs" that takes 4 discs per side of a page.
Keep one at home and one at my parents' place.
I trust them more than usb-sticks. Though that may be irrational.
But the time for burning files to dvd seems almost over. It is hard /impossible to buy a computer with a dvd drive.
That is no problem for me since I have a collection fo externals as well as internals. and life is good now since blank dvd media is cheap .
But again, you need a dvd reader, and in the future, that may become difficult.
The trick with burning optical media is the disks themselves can physically fail with time. I have a huge archive of various burned media from the early 00's and a number of them have developed literal holes in the material over the years. If these holes hit data tracks, the files on those tracks are lost. If you're burning to optical media, you should probably be checking them regularly for degradation.
This is true. And a real problem.
In my experience hard drives, USB sticks fail and regular hard drives fail.
It has been many years since I have had any involvement but backup tapes probably have issues as well, but the rapid production of new tapes and new formats is an issue already
I dont have any data to evaluate the best choice is SSD drives?
No matter what technology is picked, at some point to preserve the data it needs to be migrated to whatever comes down the line.
Honestly, so far if you can afford the up-front investment for the space you need, and the ongoing power costs, a NAS with a RAID array (or similar redundancy scheme) that can tolerate more than one drive failure at a time is probably the best long term archival storage. Spinning rust disks in my experience rarely completely fail without warning so you can usually catch and replace failing media before data loss occurs. Additionally if you don't, I've found that recovering data from failed HDD is also usually "easier" and "cheaper" for most values of both compared to other media storage (admittedly with no experience with recovering tape media)
Beware for if you continue down this road you will end up sitting in class taking notes in markdown… yes I did do this… I am afraid I am beyond salvation
Can relate to that sentiment. What I'm still looking for is a simple solution that lets me use simple local files (eg plaintext/markdown; csv or single-page HTML would also be fine) as a backend for a web app (with login, obviously). Basically, I want to have something like a todo.txt that lives on my machine (in the folder that syncs to my cloud storage) but that I can also edit when I'm on my phone. Like using Google sheets as a backend but with a local file.
I just access my markdown files from Obsidian through nextcloud. When I'm on my phone I just use a simple markdown editor, when I'm on my PC I use Obsidian.
Do you use any plug-ins for that? Obsidian tells me it only supports Obsidian Sync and iCloud out of the box.
You don't have to use any plugins. You can put your obsidian vault anywhere you like, e.g. in a folder that is synched by nextcloud. I use a git repo for this, which works fine also on mobile.
Why not have AI build this for you? :)
With the AI coding tools getting better each day, I'm starting to think why I would spend any time researching what's out there for what I want, instead of just using an AI coding agent to put something together in 10 mins, and forget about it.
It's getting easier and faster to have AI build something that solves my exact problem. Maybe not perfect, but OK.
I'm sure it'd be super quick to build it with the help of an LLM once I know what setup I want. I actually used ChatGPT once for ideation, I'd need look it up again, but what I remember none of the proposed solutions were convincing.
Hm. Great.
I save everything interesting. I have a data folder with letters a-z in it. Something interesting might be saved in HTML or PDF under data/a/ai/programming
Folders have a problem because the same thing could be saved under data/p/programming/ai
But it is a start. For everything else, there is recoll. https://www.recoll.org/
Indeed, I also realized that bookmarks are worthless on the long run. When choosing a note taking / knowledge management app, the main decision point was if it has a Firefox extension that can capture a web page into markdown and automatically save into my notes.
I used to use Joplin, lately switched to Obsidian. Both offer this functionality.
What drove you to switch to Obsidian? I'm considering it myself and have been playing around with Obsidian the past couple of days after about 4 years of Joplin, 2 of which with a self hosted Joplin Server.
I'm tired of basic features being missing and extensions breaking because they're no longer maintained, and basic features like linking between notes while writing a note not being built in.
Joplin worked great when I spent 8h+ daily on my laptop (computer with big screen and physical keyboard).
During my long sabbatical, I wanted to take notes on my phone, a LOT. Joplin sucks at that, clumsy, non-user friendly android client.
Tried obsidian (first on mobile) and it is superb. I had to install a couple extensons (S3 sync, "Ink" for drawing with a pen), and it just works. It's so good, I sometimes even edit tables on my phone. With Joplin, note taking on my phone was just dumping thoughts in random formats to it and later fixing it on my desktop.
Like Joplin, Obsidian also has a Firefox extension to capture a web page I to markdown.
So after a couple days of trial, I realized that all the features Joplin has, obsidian has it too, with a much better (and snappier) UX on both my Linux desktop and Android. The only thing I wish for of it was Open Source. But oh well, I'm not dogmatic about that anymore
@OP super inspiring. I'm working on a universal capture SDK, a bit like rewind.ai that would make it easy to grab information from screen and then store as Markdown etc. Have you ever wished for something like that?
PDF/A... It was not that difficult. Don't reinvent the wheel, guys.
But I hate Markdown.
Then use any of the plethora of alternative text formatting tools (TeX, for one).
I hate that it's rampant but nobody calls out the near-total lack of READERS for it. WTF?
Imagine writing everything in HTML, but there are no browsers to render it. That's basically where Markdown has been forever.
My favorite is WikiCreole, with (subset of) HTML as a close second. MD is alright, but too restrictive as a general purpose format for knowledge bases and such.
Right. Markdown ain’t the only game in town.
I have personally started to archive pages I find interesting through a browser extension. Its html/css not markdown but good enough for my needs.
My thought on custom Astro components is that they provide a flexible format that can be converted into MD, HTML, JSON and other formats.
I suggest emacs org mode or asciidoc
> Even self-hosting isn't foolproof - your content can vanish when you forget to pay for hosting
I know what they mean - "running applications that you maintain and deploy yourself, on hardware/platforms that you don't" - but this is strange, to my eyes. If it's running on someone else's hardware (whatever it is), then it's not self-*hosted*, surely? It's self-owned, but not self-hosted?
When it comes to web archiving, I've found that Markdown has some real limitations. Sure, it's great for basic text, but it struggles with things like embedded content and non-standard layouts. Try archiving a Twitter thread or an app-style webpage in Markdown, and you'll see what I mean. It just doesn't capture the full picture.
That's why I've come to prefer formats like webarchive, mhtml, or single HTML files for archiving. They're incredibly faithful to the original content - you get almost perfect rendering of the original page, complete with styling and layout. Plus, they can capture stuff behind paywalls or on logged-in pages, which is a huge plus.
The real challenge, though, isn't just about saving the content. It's about making that saved content useful. These archive formats are great for preservation, but they can quickly become a mess of unorganized files that are hard to search through or make sense of.
I think the key is finding ways to organize and interact with these archives more effectively. Things like full-text search across all your saved pages, the ability to add notes or highlights directly on the archived content, and smart tagging systems could go a long way. And it'd be really powerful if we could integrate these archives with other knowledge management tools we use.
It's an interesting problem space, and I think there's a lot of room for innovation in how we approach personal web archiving and knowledge management.
AsciiDoc is basically DocBook-Markdown, which makes it a medium-independent format.
Windows here.
I use VSCode for markdown.
Obsidian's been coming up on the radar often.
This post finally made me try it out.
I like it a lot.
But there's one reason I won't be using it as my main driver for markdown files: I can't open files that are not in a vault. I have markdown files everywhere on my drive. And I don't want to make the entire drive a vault (for various reasons).
Obsidian configurable as...
1) my default file handler for markdown files
2) capable of opening and saving markdown files in any location on my PC
...would be sweet. (From my research, it can't do these currently.)
well, I wish I could have saved all my old Flash sound design and game experiments to Markdown, and still be able to play them
Why not Asciidoc instead of Markdown?
AsciiDoc's fine. So is reStructuredText. In some ways they're both a lot better than Markdown, even though I think MD's surely easier to learn and use. But the one clear advantage MD has over the others is its ubiquity. If a tool works with formatted text, it almost certainly supports MD. It might also support the others, but if so, that's just a bonus.
I get the feeling that the next step in this evolution is Typst.
I don’t like Markdown because I don’t want to remember a syntax. Most normal people I know have no idea what Markdown even is. The idea that I can’t see my formatting when I’m writing is annoying. What’s the point? It’s like MD is writing code and to “see” the document, you have to run it. In other words what you see is not what you get — you only see what you get when “previewing.”
> The format deliberately avoids precise control over display details like font selection4. Following the rule of least power, I consider this limitation a feature. For contrast, consider PDF - a format so powerful that it can run Doom.
Just pick a more relevant format for contrast to see that this is no feature! It's not like PDF is the only alternative
Markdown is great... But you know what else is great? OPML. We need more tooling around OPML. It's not being used nearly as much as it should be for Personal Knowledge Management.
I've used or built more personal knowledge/task/project management tools than I care to list over the years, and adopted various methods along the way. I've ended up in a place where I know what I need day to day: A place to dump my ideas, plans, reflections, and tasks, along with methods of processing and accessing all this data. It's hard to compete with plain text files, a notebook, and structured daily/weekly rituals that process these notes into actionable tasks, meeting agendas, and project docs. It's not that time consuming, it's super effective, and most importantly, it's infinitely and freely customizable because instead of software, you just have checklists and processes to manually follow. You can execute GTD without touching a computer: https://gettingthingsdone.com/wp-content/uploads/2014/10/Wee...
I can get by just fine with that system, but a handful of months back I started wanting software again. Reminders, task wrangling, workflows around taking meeting notes, taking and processing transcripts of talking through ideas, automated daily and weekly checkins with summaries, project work logs, managing lists of things to talk about with people, the list goes on....
Same reasons I have always reached for software, and the same reasons I wrote my own system a few times over. But this time I had some new thoughts:
- I want this to have a chance at being my last system. For that, I must be able to read/edit the data without special software. I settled on committing to building software that interfaces with folders of Markdown files exclusively. I could use Obsidian to cover any gaps and get work done immediately–I don't need my software to do it all right away.
- I want to own as much of my recorded activity/thoughts as possible, so I can drop it into new AI models, giving them a ton of context about me and what I'm up to, and avoid getting vendor locked to OpenAI.
- I want ubiquitous access to the system, which means it's gotta be easily used from a phone.
7k LOC later and I've got a Telegram bot with a plugin architecture and a pile of plugins that implement everything I've described and more. The plugin arch means there's a defined interface and every new piece of functionality never ends up with more than 1k LOC in a file. My objective was to structure the project specifically so I could avoid the pitfalls of AI generated code as projects get large. Everything isolated with well defined integration points.
I chose Telegram because they have a great API, supporting custom keyboards for quick actions, audio input for taking voice memos that my system transcribes, and reaching out to me with reminders/requests on whatever device I'm on.
The result is thousands of messages that have translated into a nicely organized Obsidian vault. Couldn't be happier and think there's a chance I'll live with this thing for the foreseeable future–and I can always swap out the interface away from Telegram, build a proper frontend, or drop it altogether and be left with my Markdown files.
If anyone is interested I'd be happy to share what I've got. Just my private project that I'm reaping a lot of benefit from.
Here's a quick dump of some of my plugin commands to get a flavor of what I'm talking about: https://gist.github.com/zackham/3c2d061e6dd0127958c913329aa0...
Wow, this actually sounds quite neat. I'm already using markdown and being able to make my notes more interactive and useful via chat-like interface with automations would be great. Especially as I want to use AI systems on top to make the accumulated knowledge as useful as possible. Please share more
WTF?
text.txt
Readable in everything, since forever.
I wish. If you live in any country that uses more than ASCII, then certainly not since forever. I mean, just for my language there were 7 different encodings (according to Wikipedia, possibly more) before Unicode era. When you want to read these it's solvable problem, but still it is extra work to deal with it. Now that we have UTF-8 as de-facto standard, it is much better, but there are still problems. Like when you use Japanese and it gets displayed as Chinese (same characters are different glyphs depending on language).