My hack at #HackIllinois 2015.
Motivation - to help users better understand reviews when they make a search in Yelp.
Summary - Worked on analyzing reviews from the yelp dataset, performing keyword extraction, sentiment analysis and generating a word cloud for beter summarization/visualization of the reviews. The system's generic capability will allow it to be used with any dataset.
Steps -
You can download the dataset at http://www.yelp.com/dataset_challenge
Set up and run Elasticsearch.
Process the dataset and index the required data.
Helper scripts to perform various tasks like - process data, load data from dataset, keyword extraction and sentiment analysis, split data for elasticsearch bulk processing, searchapi and flask server.
For UI used bootstrap
External APIs used
Alchemy API for advanced text analysis.
Bootstrap for UI.
Flask to serve requests in a RESTful way.
ElasticSearch for indexing and searching.
NLTK for language modelling
Output is in the form of a word cloud