Content Evaluation of the Text Retrieval Conference (TREC) Polar Dynamic Domain Dataset
CSCI-599 Spring 2016
Team 22
D3 visualizations:
Request To Content Dendogram (Most Frequent Keyword)
Request To Content Dendogram (Most NER Extracted)
File Size Comaprison (Actual Size and Size of Solr Index)
Average File Size Analysis of Common Crawl Data
Parser Hierarchy (Parser vs Count)
Parser Hierarchy (Parser vs Metadata Retrieved)
Parser Hierarchy (Parser vs Raw Text Retrieved)
Parser Hierarchy (Parser vs TTR Text Retrieved)
Language Detection (Number of Documents of Each Language)
Mixed Language Detection
Word Cloud Showing Most Relevant Words
Word Cloud Showing Language Diversity
Grobid Quantity NER
NER Maximal Joint Agreement
Measurements Extracted (Grouped by Domain)
Measurements Extracted (Grouped by MIME Type)
Measurements Extracted (Count)
Range of Measurement Units along with Mean values
Team Members:
Harsh Fatepuria (fatepuri@usc.edu)
Warut Roadrungwasinkul (roadrung@usc.edu)
Rahul Agrawal (rahulagr@usc.edu)