398 points by anvaka 5 months ago
This is a great idea for a project, the 'users who also posted' metric seems to have worked really well.
The site seems to fail to load the 'hot' items for the subreddits when I click on them but that's not a big deal for me. On closer inspection, it doesn't seem to be making any requests. Just says `Failed to download https://www.reddit.com/r/thinkpad/hot.json` etc
> the 'users who also posted' metric
— Hello, is this the anime channel?
— How do I patch KDE2 under FreeBSD?
The accuracy of this meme is stunning. I run an anime-related discord of 20-odd people and at least half of people there work in tech in some way. I've seen similar things in order such communities.
I wonder if this is just a cultural artifact from the time that anime and technology were both "geeky" niche interests (to a greater extent than they are now) or if there's a deeper underlying reason...
It may be a stereotype, but to me it seems that in geek circles it is much more acceptable to admit to continuing to appreciate things often seen as "childish" elsewhere in general.
Another metric could be looking at cross posts. I’m not sure which is better.
That could be cool, but it would eliminate any subs that don't allow crossposting. That includes a few of the heavy hitters like ShowerThoughts and AskReddit.
hmmm... I don't see the error on my end. What browser do you use? Can you try in "incognito" mode? Are there any extensions that might be blocking this?
Doesn't work for me either, on Firefox Developer Edition 65.0b10 (64-bit) with no extensions enabled (disabled them all to double-check it wasn't one of them blocking it).
Works fine in Edge.
It's purely the loading of the Hot sidebar, everything else works fine. It has already helped me find a few new subs I didn't know existed, so thanks!
Have you thought about building something similar for bot identification on Twitter? I suspect that would be quite the useful feature.
Yeah I'm on FF - can confirm my issue was related to the content blocking.
I'm getting the same error on Firefox Quantum.
Hm... I'm at lost.
https://jsbin.com/fuyijan/2/edit?js,console - this works in Chrome, and non-private mode of Firefox Quantum (64.0.2 (64-bit)). However when I open private browsing in Firefox Quantum request fails.
Anyone might know why?
That sounds like Content Blocking kicking in - that's only active in Private Browsing by default: https://support.mozilla.org/en-US/kb/content-blocking
I note that page says "By default, content blocking uses the Disconnect.me basic protection list" - and reddit.com is on that list: https://github.com/disconnectme/disconnect-tracking-protecti...
(I'm guessing reddit's "social button" is considered a tracker.)
 confirmed, it's definitely Content Blocking: I just loaded that jsbin in an FF private window, and there's a message in the console to that effect.
Thank you so much! I opened an issue here: https://github.com/disconnectme/disconnect-tracking-protecti...
>(I'm guessing reddit's "social button" is considered a tracker.)
It wouldn't surprise me. Even though /r/ has concepts like "Silver" and "Gold" to generate revenue, I think it's main driver is still advertising; so, for it to behave like Facebook, Google, etc. wouldn't be that much of a stretch of the imagination. (Or maybe I'm just far too paranoid?)
It's a cool tool, but it seems very biased towards bigger subs. If you let it loose on a small sub it will emphasize that big, kinda-but-not-really related subs over tiny-but-closely-related subs.
Using Jaccard has this effect, mutual information would correct more for the independent frequency of the posts per subreddit.
It's a shame since this tool would be particularly useful for recommending small subs. I don't need it to tell me about big subs, since I already know them.
This is seriously amazing man! Interesting to see how different subject-areas network themselves differently.
For example, comparing "r/permaculture" to "r/linux".
Also, looking at r/girlgamers makes me realize my privilege for being able to navigate my interest areas without such a clusterfuck of bullshit going on:
It's really sad how toxic Reddit brigading is
This is awesome! My input had exactly the results I expected.
Thanks for creating this tool, bookmarking!
Thank you! I'm very glad you liked it :)
I checked VXjunkies and found the level of weirdness I haven't expected. Will need a few hours to browse through this while nobody is around / can be startled by sudden, random laughter...
That's a cool tool. And useful extension would be if it preserves the location history if you navigate topics, so that you can go back.
Good call. I was worried that I'd "spam" the browser history and people who are coming from reddit or HN would never go back to where they came from :)
Usability improvement idea: make it easier to discover how to re-center the graph around a new subreddit.
I spent several minutes playing around with this, and I was just typing in the name of the desired subreddit because that was the only I could figure out. Finally, after much experimenting, I realized double-clicking is the solution.
Oh, and a second, related usability idea: if I double-click, don't open the preview sidebar at the right. I can see how the sidebar is useful, but if I'm doing one action, I don't want it to have two effects. Also, I have signaled clear intent to browse the graph, so I want more screen real estate to be devoted to that.
EDIT: bonus usability idea/request: clicking on a node brings up the preview sidebar. It'd be nice if clicking on it again (not double-clicking) makes the sidebar hide again.
Anvaka, when you accept BTC or ETH let us know, we can contribute to your efforts.
Thank you, Kasian!
> The relationship is determined by a metric "users who posted to this subreddit also post to...".
I'm interested, could you share with us the the entire metric you used to determine the relationship?
It is jaccard similarity https://en.wikipedia.org/wiki/Jaccard_index
Also I described it a bit more here: https://www.reddit.com/r/MachineLearning/comments/aek3yk/p_l...
Have you tried polling profiles to see how many are sharing upvotes/downvotes? It used to be a small percentage but is pretty informative.
You indicated that you used the Pushshift.io datasets, but how did you compute Jaccard Similarity on a dataset of 38M?
I didn't use pushshift, sorry. The data was collected from bigquery, stored locally into CSV files, and then I just wrote a node.js script to compute similarities.
Did you simply collect "user has posted to X, Y, and Z subreddits", or did you look at frequency too?
The reason I asked the question is because back in 2016 I had a similar (now out of date) approach to finding related subreddits at scale using Jaccard similarity: https://minimaxir.com/2016/06/reddit-related-subreddits/
There, I only built a user edge if a given user commented on 5 distinct threads in a subreddit, since a lot of subreddit interaction was due to brigading.
I didn't look into frequency. Is there a version of jaccard similarity that accounts for frequencies?
There's a weighted variant: https://en.wikipedia.org/wiki/Jaccard_index#Weighted_Jaccard...
Check out Graphlab Create's recommender toolkit, pretty fast for sets of that size
+1 for this recommendation, but it's called turicreate now and can be found here: https://github.com/apple/turicreate
*types in DunderMifflin
I know what I am going to be doing for the next 30 minutes
This is a really useful tool. It works so smoothly on my mobile.
Happy to hear :)!
I've been searching for a tool like this for ages, bookmarked!
This is really nice!
Good tool if possible add option to view result data in tabular format with no of subscribers. As this way its difficult to use.
Great tool! This site supports my suspicions that much of the activity on /r/The_Donald is the coordinated effort of a few individuals posting across multiple accounts. For those not familiar with this sub, it was created sometime during the 2016 election leadup and unabashedly supports Donald Trump with memes and shitposting. At one point, the entire frontpage of reddit was just posts from /r/The_Donald until reddit admins had to alter their algorithm to force the sub off.
If you look at the network graph for /r/The_Donald, it doesn't look...organic. There are 4 clearly delineated clusters of sub related to that sub. Posters to /r/The_Donald heavily post to /r/news & /r/politics, /r/TropicalWeather (?), /r/TwoXChromosomes (?) and /r/AskTheDonald (and other alt-right subs).
There's not much interaction with the rest of reddit. Posters from other subs don't also post content to the /r/The_Donald.
This is unusual.
Every other sub I've looked at there's a much more complex & dynamic graph where users post across various communities across the site. Every other major sub looks like a real network with dozens of interconnected links. Yet, /r/The_Donald, with almost 700,000 subscribers only has a strong connection to 4 clusters.
The alternate hypothesis is that people on that sub heavily use alternate accounts. This might also explain the lack of interaction with the site compared to other subs of similar size.
He manually overrided some large subs: https://old.reddit.com/r/dataisbeautiful/comments/ae88pk/int...
Thanks! That's probably it then. I guess this doesn't support my hypothesis after all.
This is great, and works flawlessly!
Thank you! I'm so happy you like it.
It’s simple and just works, don’t stop making great things.
Aww, thank you!
> don’t stop making great things.
Not going to ever stop! I have sooo many ideas - I wish I could be more efficient :).
Useful little tool! Reddit humor subs are so damn specific, it can be hard to find them all.
Is this only for tech subjects or am I using it wrong?
Edit. Somehow I missed the big searchbar at the top.
I tried "tits", that worked.
There was some cocks present anyway
Fantastic! I tested the heck out of this and found it really useful.
Already found some cool subs.
You should submit this to r/dataisbeautiful if not already done.
I was sort of expecting to be able to click through to the subreddit...
Very nice tool, thank you very much that. This is why is love HN
* /r/askscience is nested at the center of defaults (I think a lot of older, famous subs will end up highly connected)
* /r/relationship_advice is kind of a loner. The graph generates six distinct subreddit clusters- feminism, lgbt-issues, counseling, and misc. science fields. The last cluster is a very large, diffuse cluster of sex/porn/depression subreddits that skew towards defaults.
* /r/slatestarcodex has distinct clusters too. 1) Effective altruism and philosophy, 2) Psychiatry, 3) Rational fiction writing, 4)Liberal-tarian, IDW defaults, 5) "Classic effort post" subs like true_reddit and depth_hub.
* /r/bigboye is a tiny part of a very large network of animal gifs subreddits. /r/animalsbeingbros connects it to a bunch of high volume gif subs.
* /r/the_donald has a surprising link to /r/TwoXChromosomes [https://anvaka.github.io/sayit/?query=the_donald]
* /r/politics seems to have higher interconnection
* /r/awww is quite wholesome =) [https://anvaka.github.io/sayit/?query=Awww]
* /r/puppers has some strange nsfw links
>/r/the_donald has a surprising link to /r/TwoXChromosomes
I don't think it's surprising. Donald fans on social media tend to hate minorities and women, not surprised they would try to brigade women oriented subs.
It got so bad that subs like /r/offmychest automatically ban people that post in many alt right related subreddits.
The isolatedness of /r/relationship_advice might have to do with OP's being from throwaways?
This is great!
But on a side note, I can also waste more time on the Internets!
Is this built on top of your work on yasiv before?
It would be fair to say so. The core layout is the same with a bit more polished overlap removal and animation.
Why do I get stuck in "dead ends"? For instance, https://anvaka.github.io/sayit/?query=rtlsdr contains https://anvaka.github.io/sayit/?query=PlutoSDR but the inverse is not true -- once I'm in PlutoSDR there's only one other subreddit and the two of them are an island.
damn - i wondering if this with marketing in order to find out where your audience hangs out.
Ya' know this assumes one would use reddit as a reference for learning which one should never, EVER do, don't ya?
hi thanks for this. Is there a guide to how you are storing the data on github pages?
Great visualization! Nice work.
Incredibly useful, thanks!
Would be nice if banned subs appeared in a different colour.
If you have spacetime, you might consider sharing this with LGBTQ and kink communities experiencing the Tumblr diaspora.
Lots of people feel uprooted from sex-positive and/or tightly-bound communities they've been part of for years, and don't know how to rediscover or rebuild the healthy networks they've lost on Tumblr. I know full-grown adult women who are struggling to find footing again in the most personal of spaces.