Pay-as-You-Go Information Integration ABSTRACT
The third stage is to generate a check condition that can
We propose to incrementally elaborate a dataspace through
monitor that the relevant characteristic still holds as data
the stages of characterization, customization and checking.
sources evolve, and hence that the customization is still
After describing each of these stages, we present a specific
valid. For example, if a new version of the related source
example from the domain of medication standards.
contains an additional tuple <c1, has-form, c2>, where c1 is a branded drug, then we must reconsider the hasForm 1. INTRODUCTION
Recently, dataspace management has been proposed as an
We imagine a framework for dataspace elaboration where
approach that takes a holistic view of all information in an
new modules can be plugged in, each with characterization,
enterprise, be it structured, semi-structured or unstructured,
customization and checking components. Call such an ex-
and whether or not it is supported by an explicit informa-
tion system [1, 2]. While its goal is to provide services over the entirety of the data, dataspace management takes the
3. A MEDICATION DATASPACE
pragmatic view that initial limits on time and effort may
We give a more detailed example from a dataspace we are
only permit simple services at first, such as cataloging and
currently exploring. The RxSafe project (led by Samaritan
keyword search. Additional services or capabilities can
North Lincoln Hospital and Oregon Health & Science Uni-
come later, little by little, as resources permit. Such a pay-
versity) is developing a consolidated medication-list facil-
as-you-go approach seeks a steady return on investment
ity for rural elders in Lincoln County, Oregon. We are in-
(ROI), rather than a long period of implementation before
vestigating various standards proposed for e-prescribing to
add value to RxSafe, such as noting equivalences of ge-neric and brand names, or grouping medications by drug
2. CHARACTERIZATION AND
class. Two particular sources in our dataspace are RxNorm
and NDF-RT. RxNorm  is an effort from NLM to pro-
We have been investigating one path to pay-as-you-go
duce a standard nomenclature for medications and their
elaboration of a dataspace, involving tools to aid in incre-
components. NDF-RT (National Drug File – Reference
mental characterization and customization of a dataspace.
Terminology)  contains drug-class information (among
We assume that the data sources in a dataspace may be
other things) developed by the US Department of Veterans
incompletely – or incorrectly – documented.
The first stage is to run routines that determine a class of
We would like to connect information from these two
characteristics or traits of one or more data sources. These
sources to link brand names with drug classes. Figure 1
can be fairly simple, as in current data profiling systems,
shows an excerpt from a table we derived from RxNorm,
such as compiling value distributions for fields and deter-
relating Semantic Clinical Drug (SCD) with brand name.
mining keys, or more complex, such as detecting domain-
(SCD is essentially the most complete generic name for a
specific structure in generic representations.
drug, giving ingredients, their strengths and a dose form.) There are 8,431 SCDs in this table. Figure 2 is an excerpt
The second stage presents customizations or enhancements
from a table derived from NDF-RT relating SCDs with
that are enabled by specific discovered characteristics.
drug class and class type. (Class types are more general
Consider, for example, a UMLS-style generic relationship
categories over drug classes.) This table has 6,661 entries.
structure with related(Concept1 Rel_Name Concept2) as
These two tables are themselves the results of (manual)
schema. Suppose characterization discovers that in every
characterization and customization of the RxNorm and
tuple <c1, has-form, c2>, concept c1 is always a clinical drug and c2 is always a dose form (e.g., tablet, capsule,
It would seem that a join of these two tables would connect
syrup). Then one possible customization is to factor out
brand names and drug classes for us. The problem is that
tuples of this form from related into a specialized table
the connection is incomplete. About 54% of the SCDs in
hasForm(ClinicalDrug, DoseForm), either in material-
the RxNorm table do not appear in the NDF-RT table. Go-
ing in the other direction, 42% of the SCDs in the NDF-RT
table are missing from the RxNorm table. (Most of them
b1. (Note by the nature of a stratification, any SCD equiva-
are in RxNorm, just not connected with any brand name.)
lent to s1 will yield the same class c2.) The check condition
However, the situation is probably not as severe as it
for this customization is that the particular stratification
sounds. Figure 3 illustrates what we think is happening –
there are variations in strength and dose form.
Figure 1: SCD-to-brand-name connection derived Figure 3: Example of missing connections between 4. CURRENT WORK
We are currently developing specific C-C-C modules, as
well as considering high-level ways to define them and a
framework for incrementally investigating and enhancing a dataspace.
Figure 2: SCD-to-drug-class connection derived from NDF-RT. 5. ACKNOWLEDGMENTS This work is supported by NSF grant IIS-0534762 and
A human could probably figure out the connection between
brand names and drug classes on a case-by-case basis. But is there possibly a C-C-C module that might overcome this
connection problem? We think so. We could start with the
 A. Y. Halevy, M. J. Franklin, D. Maier. Principles of data-
collection of all SCDs in RxNorm that do have drug-class
space systems. Proc. of the Twenty-Fifth ACM SIGACT-
information in NDF-RT. We could then run a characteriza-
SIGMOD-SIGART Symp. on Principles of Database Systems,
tion routine to see if this collection obeys any non-trivial
stratification: an equivalence relation on a domain where
 M. J. Franklin, A. Y. Halevy, D. Maier. From databases to
equivalent items have the same image under a relationship.
dataspaces: A new abstraction for information management.
In our example, there may be a stratification based on
SIGMOD Record 34(4), December 2005.
equality of ingredient lists, ignoring strength and dose
 S. Liu, W. Ma, R. Moore, V. Ganesan, S. Nelson. RxNorm:
form. (Note that we might require a prior customization to
Prescription for electronic drug information exchange. IEEE
pick apart an SCD in order to express this equivalence.)
IT Professional 7(5), September 2005.
If there are such stratifications, we could choose one to
 S. H Brown, et al. VA National Drug File Reference Termi-
help connect “unconnected” brand names with drug class.
nology: A cross-institutional content-coverage study. Pro-
That is, consider a brand name b1 that is connected in
ceedings from the Medinfo 2004 World Congress on Medical
RxNorm to an SCD s1, but s1 is not assigned a drug class
Informatics, San Francisco, August 2004.
in NDF-RT. If s1 is equivalent to another SCD s2 that does
have a drug class c2, then we can impute c2 as the class for
POST-OPERATIVE INSTRUCTIONS FOR RHINOPLASTY Uncomplicated and early healing depends on how well you care for yourself after surgery. Please read the following instructions carefully before your surgery and ask us about any concerns or questions you have. You may retain a normal diet but avoid foods which are hard to chew or may upset your Stay upright as
NEW Plan Year 2007 Preferred Drug List Goes Into Effect January 1, 2007 Following is a list of the most commonly prescribed drugs on the West Virginia Preferred Drug List. It is an abbreviated version of the drug list that is at the core of the PEIA pharmacy benefit plan. The list is not all-inclusive and does not guarantee coverage. In addition to using this list, you're encouraged