Analyzing stylistic similarity amongst authors

mirimir 10 years ago

This is very cool work! Some years ago, I was interested in text mining. I ended up playing with latent semantic analysis using Lucene etc. But that was a largely random choice, driven by the availability of open-source software and online discussion.

However, as cool as stylistic analysis is, I'm concerned about implications for online anonymity (which I consider valuable). But maybe the risk is limited by typical text length and false positive rate. I welcome suggestions for further reading.

nicolewhite 10 years ago

Very cool. I wonder why the author decided to use igraph within R instead of Python, as he was already using Python for the frequencies.

cronbachs_beta 10 years ago

Hi - I'm the author. Glad you liked it! I'm afraid I had no particularly good reason for switching to R for the visualization other than that I'd used R for network graphs in the past so I already had code written.
- nicolewhite 10 years ago
  
  I can understand that. I find it's easier to use igraph in R, especially if you're going to be doing visualizations. igraph is alright in IPython notebooks, but getting the visualizations to work is a bit of a pain, whereas it works out of the box in RStudio.
  I don't even really like igraph for visualizations, though. It's great for graph algorithms like community detection, but for visualizations I'll usually jump into something interactive like visNetwork. Check out this slide, for example: http://nicolewhite.github.io/neo4j-presentations/RNeo4j/Visu...
  
  cronbachs_beta 10 years ago
  
  Yeah, igraph's visualizations aren't perfect. I've explored using d3 a bit in some of my other work (e.g. http://markallenthornton.com/blog/price-of-flavor/). It's great for interactive graphs, but it starts to tax the browser pretty heavily for larger ones (though perhaps that's just my poor coding). Thanks for the visNetwork suggestion - I'll have to check that out!