sammysidhu 2 hours ago

Amazing work leveraging Daft for this!

EnricoShippole 8 hours ago

Given the increasingly closed-source nature of the U.S. AI ecosystem, it is now more important than ever to push for the proliferation of open model and dataset releases. Datamule, TeraflopAI, and Daft collaborated to release 43 Billion Tokens of SEC EDGAR data.

jgfriedman1999 8 hours ago

Neat! Surprised at how cheap it was.

  • jaychia 7 hours ago

    Very cool that this kind of work can now be performed at this kind of a price-point. 24 hours for 8M filings on just 12 cores :)

    Excited for unstructured/multimodal data processing to become increasingly commoditized and abstracted away so that more such datasets can be built