Ask HN: How much would it cost to replicate GPT-2's model?

4 points by sharemywin 3 months ago

Cost 1: Scrape 8 Million articles from Reddit posts.

http://files.pushshift.io/reddit/

Cost 2: Training a 1.5 B parameter transformer network.

I'm assuming it could be trained with google's TPUs. And the model would just be a bigger version of the model open ai released.

tree_of_item 3 months ago

> Their model used 256 of Google's Cloud TPU v3, though I've not seen training durations. The TPU v3 is only available individually outside of @Google (though @OpenAI likely got special dispensation) which means you'd be paying $8 * 256 = $2048 per hour.

https://twitter.com/Smerity/status/1096189352743301120

> Thanks. So then it was 32 TPUv3s, to be more precise, and sticker-price training costs would then be per Smerity 32 * 24 * 7 * 8 = $43k?

https://www.reddit.com/r/MachineLearning/comments/aqlzde/r_o...

  • solomatov 3 months ago

    It's likely even more since they needed to perform a hyperparameter tuning. Multiply it by 10 or 100 to get a more realistic estimate.