Medical Informatics
http://ai.bpa.arizona.edu/MedInfo/

Project Goal
To support concept-based retrieval and analysis of large-scale medical literature and records.
Grant Support
Phase I development was funded by grant money from NSF, NCI, NLM, and the University of Illinois.
Building the National Cancer Information Infrastructure: Phase II project. B. Schatz, P.I., H. Chen, Co-P.I., funding from NCI, DARPA, and the University of Illinois.
Researchers/Collaborators
Hsinchun Chen, Associate Professor, Management Information Systems, University of Arizona, Tucson, AZ 85721, (520) 621-4153, hchen@bpa.arizona.edu
Bruce R. Schatz, Director of the Digital Library Research Program, University of Illinois at Urbana-Champaign, Urbana, IL 61801, (217) 244-0651, schatz@csl.ncsa.uiuc.edu
Susan M. Hubbard, Director of the International Cancer Information Center, National Cancer Institute, Bethesda, MD 20852, (301) 480-8105, su@icicb.nci.nih.gov
Research Assistants:
- Andrea Houston
- T. Dorbin Ng
- Kristin Tolle
- Robin Sewell
- Casey Zhang
- Harry Li
- Yohanes Santoso
- Elvina Hendrata

Key Technical Summary
CANCERLIT is a bibliographic database dating back to 1963, which contains approximately 1.1 million records from 200 core journals and increases at a rate of 70,000 abstracts per year. Our current testbed, the last five years of the collection (January 1 992 - July 1997), contains 624,000 abstracts occupying 1.5 Gb of memory.
Automatic Indexing and Concept Space generation required 5 hours of processing on the Origin2000 Supercomputer at NCSA.
The Arizona Noun Phraser requires 17 serial hours to generate noun phrases and 2 hours and 40 minutes to create the concept space.
Techniques:
- Arizona Noun Phraser
- Automatic Indexing
- Concept Space Generation
- Kohonen Self-Organization Map (SOM) Algorithm
- UMLS Metathesaurus
- Interface Design:
Java-based Graphical Thesaurus
Java-based Visual Categorization with SSOM
Key Results Summary
Preliminary testing of the automatic indexing tool on a small set of CANCERLIT abstracts (7,500 abstracts; May and June 1996) demonstrated its ability to generate search terms useful to cancer researchers and different from the terms provided by MeSH and the Internet Grateful Med's version of the Metathesaurus.
Future directions include the testing of the recently developed Arizona Noun Phraser and further exploration of Java-based graphical applications.
Publications
A. Houston, H. Chen, S. Hubbard, B. Schatz, et. al. Health Care Information Infrastructures: A Critical Component of the NII, submitted to the Journal of the American Society of Information Science.
A. Houston, H. Chen, B. Schatz, S. Hubbard, et. al. Exploring the Use of Concept Spaces to Improve Medical Information Retrieval, International Journal of Decision Support Systems, 1998, forthcoming.
K. Tolle, H. Chen, and T. D. Ng. Improving Concept Extraction from Text Using Natural language Processing Noun Phrasing Tools: An Experiment in Medical Information Retrieval, submitted to the Journal of the American Society for Information Science.
Demo Sites/Information
Cancer Space provides access to the CancerLit Concept Spaces and Self-Organizing Map. (http://ai.bpa.arizona.edu/CancerLit/)
Concept Search accesses the Noun Phraser, Automatic Indexing, and UMLS tools for searching the AI Groups CancerLit database. (http://ai.bpa.arizona.edu/CancerLit/cii.html)
Arizona Noun Phraser is able to isolate valid noun phrases from text to facilitate information retrieval.
- HTML, http://ai.bpa.arizona.edu/cgi-bin/ktolle/interface/nlpi
- Copyright © 1990-1997, Artificial Intelligence Group, The University of Arizona.