internet_points 5 hours ago

How does https://github.com/microsoft/markitdown compare to pandoc?

Ah, now I see it's all in https://github.com/microsoft/markitdown/blob/main/src/markit... – beautifulsoup for htmlish stuff, pdfminer for pdf's, docx via mammoth. (And compared to pandoc, it only outputs markdown and has way fewer options.)

On a similar note, pandoc now can run completely browser-side, very bare-bones demo at https://tweag.github.io/pandoc-wasm/ (try -f html -t markdown and type in some html)

Natfan 5 hours ago

> MarkItDown operates entirely within your browser. Your files are processed directly in your browser, and no files are uploaded to our servers or stored by us. This ensures your data remains secure, and the conversion is completed safely and privately.

This is a blatant lie, by simply pressing F12, going to the Network tab, and entering in a URI into the WebPage section and pressing Enter.

    POST https://markitdown.pro/api/markitdown
    -----------------------------238398091440825138514056309576
    Content-Disposition: form-data; name="url"
    
    https://wnd.sh
    -----------------------------238398091440825138514056309576--
I understand that cross-site fetching might not work in this case, but please do not blatantly lie in your FAQ page. It makes me (and others) trust you infinitely less.
kianworkk 9 hours ago

Hi HN,

I wanted to share a free online tool I created to let everyone easily test Microsoft's new open-source project, MarkItdown. It enables rapid conversion of different file formats and web pages into clean Markdown text. It's surprisingly fast and versatile.

Give it a try here: https://markitdown.pro/

Any thoughts or suggestions are welcome!

  • klabetron 5 hours ago

    > Your files are processed directly in your browser, and no files are uploaded to our servers or stored by us.

    How does it run a Python library entirely browser side? Just curious.

    (Given the faff of setting up a Python environment, this is a great idea.)

    • Natfan 5 hours ago

      It doesn't. It seems to send a POST request to an API endpoint to do the processing (in direct contradiction with their FAQ section)

    • banditelol 2 hours ago

      Now you make me wonder if I could run this entirely inside pyscript