CBizPort
(Chinese Business Intelligence Portal)
is an Internet search portal for Chinese business information.
It searches for business information in major Chinese search
engines and business portals in mainland China, Taiwan and
Hong Kong.
CBizPort is powered by a set
of meta-search engines that integrate several high-quality
online information resources. It enables encoding conversion
between Simplified Chinese and Traditional Chinese to support
cross-regional search. Post-retrieval analysis functions,
including summarization and categorization are also provided.
Major components in the
CBizPort:
User Interface: CBizPort
has two versions of interface, Simplified Chinese and Traditional
Chinese version interfaces. Each is designed for users of
the corresponding languages. Both versions have the same
look and feel and each version uses its respective character
encoding when processing queries.
Encoding Converter:
The encoding converter relies on a conversion dictionary
with 6,737 Chinese characters in each of the two encodings
(Big5 and GB2312). The dictionary includes the most commonly
used characters in the Chinese language. Encoding conversion
is performed when the portal sends out queries to other
search engines having different encoding than its own or
when the portal collects results from those search engines.
Meta Search:
Authoritative information sources are selected for meta-searching,
which include major Chinese search engines and business-related
portals from the three regions. General search engines include
Baidu, Yahoo Hong Kong and Yam. Business-related portals
include several commercial and government Web sites.
Categorization: The
CBizPort categorizer organizes the documents retrieved from
the meta-searching into different categories based on the
occurrence of keywords extracted from the title and introduction
of the documents. Two Chinese business lexicons are prepared
for Simplified Chinese and Traditional Chinese business
to extract keywords from Web pages. Categorized documents
are put into folders labeled by the key phrases to help
browse the results.
Summarization: The CBizPort
summarizer is modified from an English summarizer called
TXTRACTOR that uses sentence-selection heuristics to rank
text segments. This heuristic strives to reduce redundancy
of information in a query-based summary.
The summarizer can flexibly summarize Web pages using
one to five sentence(s).