About Us

What is Polar Data Insights?

JPL and USC, under the direction of Dr. Chris Mattmann, have worked to collect a corpus of “deep web” polar datasets spanning many file types containing scientific data such as images, videos, and other information on the Web. These pieces of data were collected using Apache Nutch, Apache Tika, and Apache Solr.

Our goal is to aggregate this data into an intuitive search engine that scientists can utilize for polar research. Additionally, the data is analyzed and illustrated using visualization APIs Banana and D3.js, providing researchers a better understanding of the data's relationship within the Polar ecosystem.

Search Engine

Providing researchers with a powerful tool to find relevant data sets and websites.

Visualizer

Illustrating data set connections and related terms to narrow searches.

Publicizer

Demonstrate the value of these polar data sets to the NSF, USC, and NASA.

Presentations

Meetings and Conferences

Date	Meeting	Location	Link
4 April 2017	Arctic Science Summit Week	Prague, Czechia	Presented in session "ARCTIC DATA AND INFORMATION SCIENCE MEETS SYSTEM SCIENCE" Details
24 July 2017	International Geoscience and Remote Sensing Symposium	Ft. Worth, USA	Presented in session "Intelligence for Big Geospatial Data" Details
16-18 September 2017	SAON - Arctic Data Committee	Montreal, Canada	Details
19-20 September 2017	Research Data Alliance	Montreal, Canada	Details
4-5 October 2017	NITR Open Knowledge Network	Washington DC, USA	Details
11-15 December 2017	Fall AGU	Washington DC, USA	Details Presentation
8 January 2018	Semantics Symposium	Washington DC, USA	Details Presentation
9-11 January 2018	ESIP Winter Meeting	Washington DC, USA	Poster Presentation Details
1-2 March 2018	1st U.S. Semantic Technologies Symposium (US2TS)	Wright State University, Dayton, Ohio, USA	Details
21-23 March 2018	Research Data Alliance	Berlin, Germany	Details
6-8 June 2018	Earthcube All Hands Meeting	Washington DC, USA	Details Poster
17-20 July 2018	ESIP Summer Meeting	Tucson, USA	Details

Insights

Banana For Solr. Search Simplified

Search multiple keywords simultaneously for thousands of relevant URLs.

Add filters for more refined results using Banana's live-updating visualizations.

Go to the Banana Dashboard.

D3.js. See for yourself.

View data sets from a variety of sources to better understand polar relationships.

View some of our visualizations.

Facetview. Experience Solr.

Filter searches using facets and easily save, share, and consume documents from the Deep Web.

Go to the Facetview Dashboard.

USC Data Science Projects

Apache Sparkler Post Processing using Machine Learning

This code gets connected to Solr DB created for Sparkler Crawled Data to do further data extraction, classification, filtering and insights generation using various Machine Learning models.

The ML models are capable of using keywords list from user, extract features from URL content, and classify (score) output and update Solr parameter accordingly.

Visit

Polar Deep Insights

The Polar Deep Insights project is a tool that can be used as generic content extraction and evaluation tool on any dataset.

It is a Dockerized Pipeline consisting of a content extraction, enrichment and rich visualization interface to explore the spatial-conceptual-temporal trec polar dataset and documents downloaded from ACADIS, AMD, and NSIDC websites crawled using Sparkler Web Crawler.We plan to use this to gain deep insights about climate change and its impact on the Arctic region.

Visit

Domain Relevant Data Collection using Google Search API

This project uses Google Search API to provide a list of most occurred urls based on domain keywords and phrases list. The code generates the phrases first based on the provided keywords and then uses them for searching.

After each search, top 10 urls(or all active & working URLs from the first page) are considered and added to a dictionary. Iterating through all keywords, the dictionary is finally sorted based on the frequency of occurrence.

Visit

Sparkler

A web crawler is a bot program that fetches resources from the web for the sake of building applications like search engines, knowledge bases, etc. Sparkler (contraction of Spark-Crawler) is a new web crawler that makes use of recent advancements in distributed computing and information retrieval domains by conglomerating various Apache projects like Spark, Kafka, Lucene/Solr, Tika, and pf4j.

Sparkler is an extensible, highly scalable, and high-performance web crawler that is an evolution of Apache Nutch and runs on Apache Spark Cluster.

Visit

PDI Topics

LDA topic modeling for Polar Deep Insights.

Visit

Polar Domain Discovery

Domain Discovery on Polar Domain

Visit

Ocean_Observation_FacetView

This is a FacetView setup for ocean observation Crawled Data.

Visit

Team

Chris Mattmann

Wayne M Burke

Ruth Duerr

Siri Jodha Singh Khalsa

Simin Ahmadi Karvigh

Omid Davtalab

Thamme Gowda

Nithin Krishna Ottilingam

Karanjeet Singh

Madhav Sharan

Srinidhi Nandakumar

Prerana Teligi Harapanahalli Math

Dixita Patel

USC Data Science Partner Sites

TREC/Data Description

The goal of the Text Retrieval Conference (TREC) is to encourage research in information retrieval from large text collections by providing interesting and understudied domains of documents to crawl.

Currently, the polar domains contains the NSF-funded Advanced Cooperative Artic Data and Information System (ACADIS), NASA-funded Antarctic Master Directory (AMD), and National Snow and Ice Data Center (NSIDC) Arctic Data Explorer. Our data was retrieved using these directories and submitted to TREC in 2015.

Visit TREC

Polar Hack - November 2014

Hosted by the NSF, the goal of this hackathon was to implement visualizations of existing polar data sets to support new discoveries and promote cross agency collaboration between the NSF, NASA, NOAA and other Arctic/Polar related agencies.

Ultimately, the workshop fostered the understanding of the variability of the polar regions at different timescales, allowing the NSF to make longer-term investments in technologies and visualizations that can be adopted by the community.

Visit DataVis

IRDS

The Information Retrieval and Data Science Group’s (I.R.D.S.) mission is to research and develop new methodology and open source software to analyze, ingest, process, and manage Big Data and to turn it into information.

We have expertise in data collection and contribute to the world's largest and most often downloaded open-source projects, working with NASA, DARPA, DHS, NIH across a number of domains, Earth Science, Planetary Science, Astronomy, defense, and private industry.

Visit IRDS

Credits

Dr. Chris Mattmann - Visit his website

CS401 Group (Lorraine Sposto, Jonathan Luu, Ruthvik Peddawandla, Titus Jung, Janet Kim)

CS599 Spring 2016 Class - Visit the class website

CS572 Spring 2015 Class - Visit the class website

Polar Data Insights

USC Data Science

About Us

What is Polar Data Insights?

Search Engine

Visualizer

Publicizer

Presentations

Meetings and Conferences

Insights

Banana For Solr. Search Simplified

D3.js. See for yourself.

Facetview. Experience Solr.

Search

Every year, TREC solicits novel search topics with which to test and improve their latest and greatest ranking algorithms. This year, our group at USC has submitted >30,000 files from various Polar domains as well as created a collection of important queries for searching.

Click here to see a list of important queries or select a search option below.

USC Data Science Projects

Apache Sparkler Post Processing using Machine Learning

Polar Deep Insights

Domain Relevant Data Collection using Google Search API

Sparkler

PDI Topics

Polar Domain Discovery

Ocean_Observation_FacetView

Team

USC Data Science Partner Sites

TREC/Data Description

Polar Hack - November 2014

IRDS

Credits