Content Evaluation of the Text Retrieval Conference (TREC) Polar Dynamic Domain Dataset

CSCI-599 Spring 2016

Team 4

D3 visualizations:

  1. A full pie chart to show MIME TYPEs detected by APACHE TIKA
  2. Ratio of contributing parsers to text extraction
  3. Classification path of request to content
  4. Size ratio of Solr index to extracted text (pre)
  5. Size ratio of Solr index to extracted text (post)
  6. Content Ratio by MIME type
  7. Metadata Ratio by MIME type
  8. Language Diversity of the TREC DD Polar Dataset
  9. Text Word Cloud of the TREC DD Polar Dataset
  10. Analysis of GROBID Quantities
  11. Joint NER agreement between GROBID, OpenNLP, CoreNLP and NLTK
  12. Spectrum of extracted measurements using NER
  13. Min/mean and max of extracted measurements using NER
    1. By domain: aaeachicago.blogspot.com
    2. By domain: archpedi.jamanetwork.com
    3. By domain: code.djangoproject.com
    4. By domain: developer.apple.com