GSoC 2016 Final Submission

Sentiment Analysis Parser

Part I

GitHub Link

Created a working Sentiment Analysis Parser that takes any text as input and outputs its sentiment.

Initial version reported the sentiment in a binary fashion (positive or negative).
The latest version allows the user to train the model on either binary dataset (that allows for the binary output explained above) or the categorical dataset (that allows for the categorical output similar to Facebook reactions as follows: angry, sad, neutral, like, or love).
Both versions were run on certain amount of MEMEX gun ads (and, finally, all ~4 million ads) to get the results (see Part III for the results) and update all the Solr files with the resulting categorical sentiment.

In short, the parser can be used by following these steps (for more information please refer to the ReadMe on GitHub).

Part II

GitHub Link

Created a pull request in Apache OpenNLP to add the parser. This involved:

Created a pull request in Apache Tika to add the parser.

Part III

GitHub Page Link

Created a GitHub page to illustrate all the work done or, in other words, show how the parser works.

Use -- Explains how to use the parser (similar to the ReadMe).
Demo -- The results of running the parser on different data using the four models (shown as graphs).
Models -- The analysis of the four datasets used for training the models (the distribution of sentiment in all of them shown in the form of graphs).
Training Datasets -- The four datasets used to train the four models respectively (two binary and two categorical).
Download -- A link to the Maven Repository, where the parser can be downloaded as a jar file.

Part IV

GitHub Page Link

For the remaining part of the programme I worked on analysing Human Trafficking Reviews from MEMEX that involved: