Show HN: A thin Python library to access HN data using Algolia's API
Hello community. Some time ago I was trying to create a project for my students using Hacker News data. As you might know, HN offers an official API [0], but it's based on Firebase and I felt it's main usage is to build clients, rather than consult data.
I found out that Algolia also provides an official REST API [1]. It's exactly what I needed: the ability to "search" HN. Either by keywords, type of stories (Show HN, Ask HN, etc) and/or date.
So I created a thin python wrapper on top of Algolia's Search API: https://github.com/santiagobasulto/python-hacker-news
The library is in early stage, but already usable. A few examples:
How to search posts from one user:
results = search_by_date(
author='pg',
hits_per_page=1000)
How to search posts by type (this would find this same post) results = search_by_date(
'thin python library',
show_hn=True,
hits_per_page=1000)
I'm working on implementing the the other methods. If you have suggestions please bring them up![0] https://github.com/HackerNews/API
[1] https://hn.algolia.com/api
Awesome. I imagine this being useful for things that quickly check if something exists on HN or watch for new items etc.
Though I think this needs clarification:
> but it's based on Firebase
The entire HN dataset is also available as a public BigQuery dataset, which enables much more intricate queries. For example, the following query means "Get all Show HNs with more than 5 or more points and 5 or more comments, along with the decoded submission title and all decoded top-level comments which are neither dead nor deleted" (and page):
So if you need to answer specific questions like this, which could return 500k+ rows, it's better to use BigQuery than stressing the API.