The
CMedPort was built to provide medical
and health information services to
both researchers and the general public.
It is a prototype to discover whether
the integrated techniques can help
improve Internet searching and browsing
in Chinese search engines. Because
users from mainland China, Hong Kong
and Taiwan use different forms of
Chinese characters (Simplified Chinese
and Traditional Chinese), the CMedPort
provides two versions of interfaces
to address the user’s needs.
The CMedPort indexed more than 300,000
medical related pages from mainland
China, Hong Kong and Taiwan, using
the spidering toolkit “SpidersRUs”
developed by AI Lab. It also meta-searches
six major search engines from those
three regions. Upon searching, the
encoding conversion program allows
users to search for three regions
simultaneously, and see the result
list in their familiar form of Chinese
characters. When the results are returned,
the CMedPort provides summarization
and categorization functions to allow
post-retrieval analysis. The Chinese
summarization is modified from TXTRACTOR,
an English summarization developed
in AI Lab. It uses cue phrases and
tf*idf to select summary sentences
from the original document. The categorization
extracts key phrases with highest
frequency from the title and summary
of the returned documents, and uses
those phrases as folder topics, thus
gives an overview of these documents.
|