Stylish is a two-week data science project made by Scott Contri for his Galvanize capstone project that aims to identify the style of a writer based off his/her similarity to a famous author using natural language processing (NLP). It currently uses a Random Forest model to make its predictions based off a training set consisting of 46 authors from my personal collection of ebooks (in .txt format). The model runs on an associated website named Stylish, designed and programmed by Karen Kelly, using an EC2 instance from Amazon Web Services (AWS).


The limits of my language mean the limits of my world.
--Ludwig Wittgenstein

Everyday people tell themselves stories in their heads. These stories are told in voices unique to them and with a particular cadence depending on the terrain their mind is wandering. Using a personalized style to create and express their dreams, ideas, and of course, stories, people try to present themselves using various mediums. Often people try to express themselves through writing.

Everyone from Shakespeare (NLP was used to investigate suspicions of the playwright’s collaboration with others) to the unabomber, Ted Kaczynski (whose brother turned him in after recognizing Ted’s writing style in the unabomber’s manifesto), have a unique linguistic style. However, although almost anyone can clearly read the difference in style between Hamlet and Industrial Society and Its Future, differentiating between two writers is usually a challenge for most people.

In literature, style is loosely defined as the way an author uses words. With such a vacant definition it is obviously difficult to measure a writer’s

style, which is the challenge stylometry ventures to take on. Stylometry applies the study of linguistic style and is currently used to identify authorship of documents.

Unfortunately, no method has yet been produced to accurately identify different styles amongst a large amount of documents. The number of applications that could be granted by a program that could perform this incredible task are virtually limitless. Recommendations for books could be provided to readers based off of the style of books they have previously enjoyed reading. The efficiency of publication could be enhanced by syncing the style publishers were selling with the style authors were offering.

Perhaps one day machines could even produce interpretable stories that would fascinate and terrify us, but in the meantime I’d really just like to get a better understanding of my own writing style. Maybe you would too.



Your Results

{{author.first_name}} {{author.last_name}}

author photo


Similarity: {{author.prob * 100 | number:0}}%

{{author.prob * 100 | number:0}}%