segfaultbuserr 6 years ago

> It looks like archive.org got it. But the entire page is blank, because JavaScript.

The AJAXification of the web, even for trivial web pages, is defeating archive.org on multiple occasions, it's a huge threat to our history preservation. We cannot stop people from overusing AJAX, and we need to develop better archival tools.

  • DelightOne 6 years ago

    I'm wondering.. is there a way to use the archive.org tools to download the website and later transfer it to my own database to not take up too much space in the archive? If thats not possible then the archive controller alone is in control of history.

  • rtkwe 6 years ago

    > we need to develop better archival tools.

    That's the only real solution. Trying to redirect the whole ecosystem on to a more convenient path is doomed to fail unless your path is actually easier or better in some way.

    • marcosdumay 6 years ago

      > unless your path is actually easier or better in some way

      Well, I would argue it is, in nearly every way.

      The largest drawback is that by avoiding pulling the content with JS you won't automatically deny service to people that are trying to avoid trackers... This site looks like a medium wannabe, so this is probably important to them.

      • rtkwe 6 years ago

        Ultimately how do you expect the websites to fund? There's a few options a) tracker based ads from a network, b) contextual ads based on the content, c) paywall, d) patreon style where a core of dedicated fans get some additional content or inside info and the rest is put out for free, e) authors pay to post (for some reason?) or f) some kind of attention coin or just crypto mining on visitors browsers. Only (a) really starts getting you money from day one, even if you are willing to lose readers with some sort of pay gate [0] there's still the initial stage where you're not making much because you don't have an audience. Websites need some way to make money.

        [0] And just look at the HN comments for any NYT or WSJ article to see how well that goes over even with people who complain about ad trackers and privacy.

        • squiggleblaz 6 years ago

          > Ultimately how do you expect the websites to fund?

          Out of the joy of amateur and professional contribution to the community. It used to work...

          • rtkwe 6 years ago

            That's basically the patreon/authors pay model and doesn't work for anything actually trying to run a business or make money using a website. Also it's kind of limited in scale because eventually hosting costs are going to catch up to a point individuals are able to pay unless the community is very small or you're only hosting text.

            I get the nostalgia for the pre Endless September internet but the cats out of the bag on that pretty much. There's some small efforts to remake it with peer hosting and federated networks like Peertube or Mastodon but they'd crumble if they got as popular as the Youtube or Twitter they want to replace.

  • geofft 6 years ago

    This is an equivalent problem to the web search problem, right? I believe Google indexes webpages by visiting them in headless Chrome and dumping the resulting DOM - it seems like archive.org could do the same.

    https://searchengineland.com/google-will-ensure-googlebot-ru...

    (Also archive.today solves this somehow, no?)

    • userbinator 6 years ago

      Google has orders of magnitude more computing power and humans to throw at the problem.