This is very cool work! Some years ago, I was interested in text mining. I ended up playing with latent semantic analysis using Lucene etc. But that was a largely random choice, driven by the availability of open-source software and online discussion.
However, as cool as stylistic analysis is, I'm concerned about implications for online anonymity (which I consider valuable). But maybe the risk is limited by typical text length and false positive rate. I welcome suggestions for further reading.
Hi - I'm the author. Glad you liked it! I'm afraid I had no particularly good reason for switching to R for the visualization other than that I'd used R for network graphs in the past so I already had code written.
I can understand that. I find it's easier to use igraph in R, especially if you're going to be doing visualizations. igraph is alright in IPython notebooks, but getting the visualizations to work is a bit of a pain, whereas it works out of the box in RStudio.
I don't even really like igraph for visualizations, though. It's great for graph algorithms like community detection, but for visualizations I'll usually jump into something interactive like visNetwork. Check out this slide, for example: http://nicolewhite.github.io/neo4j-presentations/RNeo4j/Visu...
Yeah, igraph's visualizations aren't perfect. I've explored using d3 a bit in some of my other work (e.g. http://markallenthornton.com/blog/price-of-flavor/). It's great for interactive graphs, but it starts to tax the browser pretty heavily for larger ones (though perhaps that's just my poor coding). Thanks for the visNetwork suggestion - I'll have to check that out!
This is very cool work! Some years ago, I was interested in text mining. I ended up playing with latent semantic analysis using Lucene etc. But that was a largely random choice, driven by the availability of open-source software and online discussion.
However, as cool as stylistic analysis is, I'm concerned about implications for online anonymity (which I consider valuable). But maybe the risk is limited by typical text length and false positive rate. I welcome suggestions for further reading.
Very cool. I wonder why the author decided to use igraph within R instead of Python, as he was already using Python for the frequencies.
Hi - I'm the author. Glad you liked it! I'm afraid I had no particularly good reason for switching to R for the visualization other than that I'd used R for network graphs in the past so I already had code written.
I can understand that. I find it's easier to use igraph in R, especially if you're going to be doing visualizations. igraph is alright in IPython notebooks, but getting the visualizations to work is a bit of a pain, whereas it works out of the box in RStudio.
I don't even really like igraph for visualizations, though. It's great for graph algorithms like community detection, but for visualizations I'll usually jump into something interactive like visNetwork. Check out this slide, for example: http://nicolewhite.github.io/neo4j-presentations/RNeo4j/Visu...
Yeah, igraph's visualizations aren't perfect. I've explored using d3 a bit in some of my other work (e.g. http://markallenthornton.com/blog/price-of-flavor/). It's great for interactive graphs, but it starts to tax the browser pretty heavily for larger ones (though perhaps that's just my poor coding). Thanks for the visNetwork suggestion - I'll have to check that out!