Content Evaluation of the Text Retrieval Conference (TREC) Polar Dynamic Domain Dataset

CSCI-599 Spring 2016

Team 22

D3 visualizations:

  1. Request To Content Dendogram (Most Frequent Keyword)
  2. Request To Content Dendogram (Most NER Extracted)
  3. File Size Comaprison (Actual Size and Size of Solr Index)
  4. Average File Size Analysis of Common Crawl Data
  5. Parser Hierarchy (Parser vs Count)
  6. Parser Hierarchy (Parser vs Metadata Retrieved)
  7. Parser Hierarchy (Parser vs Raw Text Retrieved)
  8. Parser Hierarchy (Parser vs TTR Text Retrieved)
  9. Language Detection (Number of Documents of Each Language)
  10. Mixed Language Detection
  11. Word Cloud Showing Most Relevant Words
  12. Word Cloud Showing Language Diversity
  13. Grobid Quantity NER
  14. NER Maximal Joint Agreement
  15. Measurements Extracted (Grouped by Domain)
  16. Measurements Extracted (Grouped by MIME Type)
  17. Measurements Extracted (Count)
  18. Range of Measurement Units along with Mean values



Team Members: