GSoC 2016 Final Submission



Sentiment Analysis Parser
Part I
GitHub Link


    Created a working Sentiment Analysis Parser that takes any text as input and outputs its sentiment.
    • Initial version reported the sentiment in a binary fashion (positive or negative).
    • The latest version allows the user to train the model on either binary dataset (that allows for the binary output explained above) or the categorical dataset (that allows for the categorical output similar to Facebook reactions as follows: angry, sad, neutral, like, or love).
    • Both versions were run on certain amount of MEMEX gun ads (and, finally, all ~4 million ads) to get the results (see Part III for the results) and update all the Solr files with the resulting categorical sentiment.

    In short, the parser can be used by following these steps (for more information please refer to the ReadMe on GitHub).
    • Clone and compile using Maven.
    • Build the model (either binary or categorical).
    • Run the parser on chosen text.
    

Part II
GitHub Link

    Created a pull request in Apache OpenNLP to add the parser. This involved:
    • Adding all my existing work.
    • Creating evaluating tools (e.g. EvaluatorTool, CrossValidatorTool etc.)

    Created a pull request in Apache Tika to add the parser.


Part III
GitHub Page Link

    Created a GitHub page to illustrate all the work done or, in other words, show how the parser works.
    • Use -- Explains how to use the parser (similar to the ReadMe).
    • Demo -- The results of running the parser on different data using the four models (shown as graphs).
    • Models -- The analysis of the four datasets used for training the models (the distribution of sentiment in all of them shown in the form of graphs).
    • Training Datasets -- The four datasets used to train the four models respectively (two binary and two categorical).
    • Download -- A link to the Maven Repository, where the parser can be downloaded as a jar file.


Part IV
GitHub Page Link

    For the remaining part of the programme I worked on analysing Human Trafficking Reviews from MEMEX that involved:
    • Getting the data from the server.
    • Getting the needed information from the data using Gsoup and later Gson.
    • Parsing the data with the Sentiment Analysis Parser.
    • Visualising the results (see last eight graphs on the Demo page).