Content Evaluation of the Text Retrieval Conference (TREC) Polar Dynamic Domain Dataset
CSCI-599 Spring 2016
Team 4
D3 visualizations:
A full pie chart to show MIME TYPEs detected by APACHE TIKA
Ratio of contributing parsers to text extraction
Classification path of request to content
Size ratio of Solr index to extracted text (pre)
Size ratio of Solr index to extracted text (post)
Content Ratio by MIME type
Metadata Ratio by MIME type
Language Diversity of the TREC DD Polar Dataset
Text Word Cloud of the TREC DD Polar Dataset
Analysis of GROBID Quantities
Joint NER agreement between GROBID, OpenNLP, CoreNLP and NLTK
Spectrum of extracted measurements using NER
Min/mean and max of extracted measurements using NER
By domain:
aaeachicago.blogspot.com
By domain:
archpedi.jamanetwork.com
By domain:
code.djangoproject.com
By domain:
developer.apple.com