TACC's SSI is a week-long workshop which introduces researchers, faculty, staff, students, and industrial partners to high performance computing, data analytics, and scientific visualization. TACC's technology experts will teach attendees how to effectively use advanced computing resources and technologies like Stampede, Maverick, and Wrangler.
The Information Retrieval and Data Science Group’s (I.R.D.S.) mission is to research and develop new methodology and open source software to analyze, ingest, process, and manage Big Data and to turn it into information. We contribute to the world’s largest and most often downloaded open source software projects, we apply tried and true techniques including content detection and analysis, crawling, deduplication, similarity, named entity recognition, construction of inverted indices, query analysis, search, relevancy and ranking, interactive query analysis, and management of large data sets. We have expertise in data collection, working with NASA, DARPA, DHS, NIH across a number of domains, Earth Science, Planetary Science, Astronomy, defense, and private industry.
He is the Chief Technology & Innovation Officer in the Information and Technology Solutions Directorate (ITSD), at the Jet Propulsion Laboratory (JPL) in Pasadena, California and an Adjunct Research Professor in the Computer Science Department within USC's Viterbi School of Engineering. At JPL, he developed the third generation of the Apache Object Oriented Data Technology (OODT) data processing and information integration system. OODT is an open source, data-grid middleware used across many scientific domains, such as planetary science, cancer research (go figure), and computer modeling, simulation and visualization. For more detail on OODT you can check out his ICSE 2006 paper that appeared in the Software Engineering Challenges and Achievements track and his 2009 IEEE Space Mission Challenges for Information Technology (SMC-IT) paper describing the refactorization and re-architecting of the data processing framework.
Dr. Jeffrey Miller is currently an Associate Professor of Engineering Practice in the Computer Science department at the University of Southern California. Prior to that, he was an Assistant and Associate Professor in the Computer Engineering department at the University of Alaska Anchorage for six years and an Adjunct Professor in the Computer Science department at California State University, Los Angeles for five years. Dr. Miller’s research interests include vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communication, ethical issues related to driverless vehicles, and Computer Science education (K12, undergraduate, and graduate). He has given talks for the National Academy of Engineers and other organizations related to ethics in driverless vehicles, and he has a passion for K12 STEM education. Jeff helps IRDS by running our Undergraduate Capstone class and has helped supervise several IRDS projects that have advanced machine learning, search, and AI.
Yao-Yi Chiang is an Associate Professor (Research) in Spatial Sciences, the Director of the Spatial Computing Laboratory at the Spatial Sciences Institute, and the Associate Director of the NSF's Integrated Media Systems Center (IMSC) at the University of Southern California (USC). Dr. Chiang received his Ph.D. degree in Computer Science from the University of Southern California; his bachelor’s degree in information management from the National Taiwan University. His general area of research is information integration and data mining with a focus on spatiotemporal data and their applications. Dr. Chiang is also an expert on digital map processing and geospatial information system (GIS). His research interests further include computer vision, image processing, and semantic web. Dr. Chiang develops computer algorithms and intelligent systems that discover, collect, fuse, and analyze data from heterogeneous sources to solve real-world problems. Before USC, Dr. Chiang worked as a research scientist for Geosemble Technologies and Fetch Technologies in California. Geosemble Technologies was founded based on a patent on geospatial data fusion techniques, and he was a co-inventor. Yao-Yi contributes by supervising and mentoring Directed Research projects for IRDS.
A command line gazetteer built around the Geonames.org dataset, that uses the Apache Lucene library to create a searchable gazetter.
The Geonames.org dataset contains over 10,000,000 geographical names corresponding to over 7,500,000 unique features. Beyond names of places in various languages, data stored include latitude, longitude, elevation, population, administrative subdivision and postal codes. All coordinates use the World Geodetic System 1984 (WGS84).
A distributed, parallelized (Map Reduce) wrapper around Apache™ RAT (Release Audit Tool). RAT is used to check for proper licensing in software projects. However, RAT takes a prohibitively long time to analyze large repositories of code, since it can only run on one JVM. Furthermore, RAT isn't customizable by file type or file size and provides no incremental output. This wrapper dramatically speeds up the process by leveraging Apache™ OODT to parallelize the workflow.
We maintain a web archives here of our mailing list conversations for transparency and openness.