## Deamo.prof.ufu.br

**MILPRIT : Mining Interval Logic Patterns with Regular**
**Expression Constraints**
**Sandra de Amo**1

**, Arnaud Giacometti**2

**, Magno Soares Santana**1

1Faculdade de Computac¸˜ao – Universidade Federal de Uberlˆandia (UFU)

[email protected],

[email protected],

[email protected]
**Abstract. **Most methods for temporal pattern mining assume that time is pon-

tual, that is, represented by points in a straight line beginning at some initial

instant. In this paper, we consider a new temporal pattern, where time is ex-

plicitly represented by intervals. We present the algorithm MILPRIT for mining

these kind of temporal patterns, which uses variants of the classical level-wise

search algorithms. We also present some preliminary experimental results and

discuss further applications.
**Keywords. **Sequential Pattern, Interval Logic, Contraint-based Mining.

**1. Introduction**

We present a new sequential pattern where time is measured in terms of

*intervals *instead

of

*points*. These patterns, which we call

*temporal interval patterns*, aims at capturing how

events taking place in time intervals relate to each other. For instance, (1) in a medical

application, we could be interested in discovering if patients who take some medicine

X during a certain period of time, and who presented the symptom Y

*before *taking the

medicine, will present the symptom Z

*during or after *taking the medicine X, (2) in an

agricultural application, we could be interested in discovering if the use of some organic

fertilizer during a period of time has an effect on the way a plant grows

*during *and

*after*

the fertilizer application.

In [5], Allen’s Propositional Interval Logic ([3]) has been used for the first time,
to treat the problem of discovering association rules over time series. However, to ourknowledge the use of Interval Logic in temporal data mining has been restricted to

*propo-sitional temporal patterns*, that is, patterns not involving first order predicates. The needfor more expressive kind of temporal patterns arises for instance, when modelling Unix-users behaviour ([6]), as is pointed out in [7]. Our temporal pattern is defined as a set ofatomic first order formulae where time is explicitly represented by an interval variable,together with a set of interval relationships (

*before, during, overlaps, start, finish, meet*)described in terms of Allen’s First Order Interval Logic ([3]). An example of such tempo-ral pattern is

*{M ed*(

*x, penicillin*,

*i*1),

*Symp*(

*x, dizziness*,

*i*2),

*during*(

*i*2

*, i*1)

*}*, meaning
that “

*patients who take penicillin during the time interval i*1

*will feel dizzy during thisperiod, but eventually the dizziness will stop before they stop taking the medicine*”.

**Some related work**. The algorithm MILPRIT we introduce in this paper is designed to

mine first order temporal patterns, where time is represented by

*intervals*, and regular

expression contraints are used to restrict the search space. In [7], the algorithm SeqLog,

based on Inductive Logic Programming, was introduced for discovering first order se-

quential patterns where time is represented by

*points *in a straight line (and not by inter-

vals like in our case). Recently, many work on contrained based temporal mining have

adopted regular expressions as a mechanism to specify user constraints ([4, 8, 2]) in order

to not overwhelming him(her) with uninteresting patterns. In [8], the algorithm (Spirit-

Log) designed to mine first order temporal patterns was introduced. Spirit-Log extends

SeqLog (where time is pontual), by pushing regular expression constraints in the mining

process.

**2. Problem Formalization**

In this section, we introduce the main concepts related to the problem of discovering

temporal interval patterns.

**Temporal i-Patterns.**
Let us suppose three disjoints sets of symbols

*P *(predicates),

*V*
(data variables) and

*C *(data constants). Let us also assume the set of temporal predicates

*T*=

*{before, meets, overlaps, during, finishes, starts } *and two other disjoint sets of symbols

*Vt *(temporal variables) and

*Ct *(temporal constants). A

*data atom *is an expression of type

*p*(

*s*1

*, ., sn, e*) or

*t*1 =

*t*2, where each

*si *are data variables or data constants,

*e *is a temporalvariable,

*t*1 is a data variable and

*t*2 is a data constant or a data variable. A

*temporal atom*is an expression of type

*r*(

*e, f *) or

*e *=

*f *, where

*r *is one of the temporal predicates in

*T*and

*e, f *are temporal variables.

The following figure illustrates the temporal predicates:
The temporal variables are evaluated over

*intervals *[

*a, b*], where

*a *and

*b *are dates
with

*a < b*. For instance, the formula

*before(e,f) *is true if the variables

*e *and

*f *are evalu-ated as [10

*/*12

*/*2004

*, *15

*/*12

*/*2005] and [17

*/*12

*/*2004

*, *28

*/*12

*/*2005] respectively. The for-mula

*during(e,f) *is true if the variables

*e *and

*f *are evaluated as [10

*/*12

*/*2004

*, *15

*/*12

*/*2005]and [5

*/*12

*/*2004

*, *28

*/*12

*/*2005] respectively.

A

*temporal i-pattern *(temporal interval pattern) is a triple (

*K*,

*D*,

*T *) where (1)

*D *is a set of data atoms (2)

*T *is a set of temporal atoms and (3)

*K *is a special data

atom

*K*(

*x*1

*, ., xn*) whose variables

*x*1

*, ., xn *appear in

*D*. The data atom

*K *is called the

*reference query*.

**Example 1 **Let us consider the temporal i-pattern (

*Patient(x), D, T *) where

*D*
*{Med*(

*x, y, e*)

*, Symp*(

*x, z, f *)

*} *and

*T *=

*{bef ore*(

*e, f *)

*}*. Intuitively, this temporal pat-tern translates the fact that a patient

*x *takes the medicine

*y *during a period of time

*e *and
during a further period of time

*f *(after he has stopped taking the medicine

*y*) presentsa disease symptom

*z*. Note that we are always interested in analyzing the behaviour ofpatients registered in the relation

*Patient*, even though relations

*Med *and

*Symp *may con-tain data related to other patients (not only those registered in

*Patient*). For this reason,

*Patient(x) *is called

*the reference query*.

An

*instantiation *of a temporal variable

*e *is a mapping

*v *associating an interval

*v*(

*e*) = [

*a, b*] to

*e*, where

*a*,

*b *are integers (mapping dates). A

*ground temporal atom *is thetemporal atom obtained by instantiating its temporal variables.

In what follows, we suppose that the set of temporal atoms

*T *is

*consistent*, i.e.,
the temporal variables can be instantiated in a consistent way. For instance, the set oftemporal atoms

*{before*(

*e, f *),

*before*(

*f, g*),

*before*(

*g, e*)

*} *is not consistent, because thefirst two atoms imply that

*e *comes before

*g *and so, there is no way to instantiate thetemporal variables

*e, f *and

*g *in order to make the three temporal formulae simultaneouslytrue. We also suppose that the set

*T *is

*complete*, i.e., for each pair of temporal variables

*e, f *appearing in T it is possible to infer a temporal relationship between

*e *and

*f *. Forinstance, the set

*{before(e,f),meet(f,g)} *is complete because the relationship between

*e*and

*g *is completely determined (even though it does not appear explicitly in the set):the only possibility is

*before(e,g)*. On the other hand,

*{during(e,f), overlaps(f,g)} *is notcomplete because the relationship between

*f *and

*g *is not deterministically implied bythe two relationships

*during(e,f), overlaps(f,g)*: it could be

*during*(

*e, g*)

*, bef ore*(

*e, g*) or

*overlaps*(

*e, g*).

As we will see next, given a consistent and complete set of temporal atoms, its
temporal variables can be ordered in a natural way. Due to space limitations, we will givean intuitive idea of this order relation, ommiting its rigorous formalization.

First of all, we can define a total order

*≤ *over the temporal domain

*I *of intervals
[

*a, b*],

*a, b ∈ ***N **in the following way:

**Definition 1 (A total order over the intervals) **Let [

*a, b*] and [

*c, d*] be two intervals in

*I*.

We say that [

*a, b*]

*≤ *[

*c, d*] if and only if one and only one of the following conditions is

verified: (1)

*a *=

*c *and

*b *=

*d*, (2)

*b < d *or (3)

*b *=

*d *and

*a < c*.

This order over the interval structure

*I *naturally induces an ordering over the
temporal variables of a set of temporal atoms

*T *. The following example illustrates theidea: let

*T *=

*{before(e,f), during(f,g)*. Then the induced ordering over the set of temporalvariables

*{e, f, g} *is

*e <T f <T g*. Indeed, for any instantiation

*Ie *= [

*ae, be*]

*, If *=
[

*af , bf *]

*, Ig *= [

*ag, bg*] such that the predicates in

*T *are simultaneously true, the endingpoints

*be, bf , bg *necessarily satisfy

*be < bf < bg*. Then

*Ie ≤ Ib ≤ Ic*.

The induced order over the temporal variables appearing in a temporal i-pattern

*M *allows to view

*M *as a

*sequence *of data atoms. Indeed, let

*M *= (

*K*,

*D*,

*T *), where

*D*=

*{p*1(

*x*1

*, ., x*1

*, e*
*k*)

*}*. We know that

*T *is complete and consistent,
and so, its temporal variables are ordered by the natural order

*<T *discussed above. Letus suppose, without loss of generality, that

*e*1

*<T e*2

*<T . . . <T ek*. So, the set

*D *can beviewed as the

*sequence < p*1

*, ., pk >*.

**The Dataset.**
Let

**R **=

*{K, R*1

*, ., Rn} *be a temporal database schema, i.e., a database

schema such that each relational schema

*Ri *has arity

*ki ≥ *2 and sort

*(data,.,data,time)*

and

*K *has arity

*m *and sort

*(data,.,data)*. A

*temporal dataset *over

**R **is a set of temporal

relations

*{k, r*1

*, ., rn} *where each

*ri *is a set of tuples (

*a*1

*, ., an , e*) over

*R*
set of tuples over

*K*. A temporal relation

*r *is called

*coalesced *if for all (

*a*1

*, ., ak, e*)and (

*a*1

*, ., ak, f *)

*∈ r*, we have

*e ∩ f *=

*∅*, i.e., intervals associated to a same tuple aremaximal. In what follows, we assume that all temporal relations are coalesced.

**Definition 2 **Let

*DB *be a temporal database over the schema

**R **=

*{K, R*1

*, ., Rn} *and

*M*

= (

*K*(

*x*1

*, ., xm*),

*{D*1

*, .Ds}, {T*1

*, ., Tl }*) be a temporal i-pattern over

**R**. The

*support *of

*M*
with respect to

*DB*, denoted by

*sup*(

*M *), is defined by

*sup*(

*M *) =

*QM *is the

*relational calculus *query

*∃y*1

*∃y*2

*.∃yk*(

*D*1

*∧ D*2

*∧ . . . ∧ Ds ∧ T*1

*∧ . . . ∧ Tl*), and

*y*1

*, ., yk *are the data variables not appearing in the reference query

*K*(

*x*1

*, ., xm*). Theexpression

*u |*=

*QM *means that

*u *is an answer to the query

*QM *when executed over

*DB*.

Given 0

*≤ α ≤ *1, we say that the temporal i-pattern

*M *is

*frequent *with respect to

*DB*and

*α *if

*sup*(

*M *)

*≥ α*.

**Example 2 **Let us consider the temporal database schema

**R **=

*{Patient(NamePat)*,

*Med(NamePat*,

*NameMed*,

*T*),

*Symp(NamePat*,

*NameSymp*,

*T*)

*}*. Let us consider the

temporal i-pattern

*M *= (

*Patient(x), {M ed*(

*x, y, e*)

*, Symp*(

*x, z, f *)

*}, {bef ore*(

*e, f *)

*}*). Let

*DB *be the following temporal database

*P atient *=

*{*(Paul),(Charles),(John)

*}*,

*Med *=

*{*(Paul,penicillin,[1,3]), (Charles,tetracycline, [4,7]), (Mary,penicillin,[8,10])

*} *and

*Symp*

=

*{*(Paul,dizziness,[4,6]), (Charles,dizziness, [5,6]), (John,fever,[8,10])

*}*.

tional query

*QM *can be specified in the relational algebra formalism by the expressionΠ

*NamePat*(

*P atient*
*Symp*). The answer of

*QM *is

*{*(Paul)

*}*. Note that Charles
is not an answer. Indeed, Charles has taken tetracycline and was dizzy

*during *(and not

*after*) the time he was taking this medicine. John is not an answer neither, because he hasnot taken any medicine at all. So, the support of

*M *w.r.t.

*DB *is 1 . Notice that Mary is
not taken into account because she does not appear in the relation

*Patient *(that is, she isnot an answer to the reference query

*Patient(x)*).

**3. Temporal Patterns with Restrictions**

In an application where the user has an idea of some special characteristics of the patterns

he(she) is searching, he(she) can be overwhelmed with uninteresting patterns. In this

section, we propose to use regular expressions in order to reduce the search space for

temporal patterns and better satisfy user requirements. These regular expressions must be

satisfied by the temporal i-patterns, viewed as

*sequences *as discussed before.

A

*mode *over a temporal database

**R **=

*{K, R*1

*, ., Rk} *is an expression of the

form

*R*(

*u*1

*, ., us, *#), where each

*ui *is a data variable or the (new) symbol

*∗*, # is a

new symbol, and

*R *is one of the predicates

*Ri ∈ ***R**. We say that a data atom

*A *is

*in*

accord with the mode

*R*(

*u*1

*, ., us, *#) if

*A *=

*R*(

*y*1

*, ., ys, e*) and for each

*i *= 1

*, ., s*

we have: (1) If

*ui *is a variable then

*yi *=

*ui*; (2) If

*ui *=

*∗*, then

*yi *is a variable or a

constant ; (3)

*e *is a temporal variable. For instance,

*M ed*(

*x, ∗, *#) is a mode and the

atom

*M ed*(

*x, penicillin, e*) is in accord with

*M ed*(

*x, ∗, *#). On the other hand, the atom

*Med*(

*y, penicillin, e*) is not in accord with the mode

*Med*(

*x, z, *#).

Let Σ be a set of modes over

**R **=

*{K, R*1

*, ., Rn}*. In what follows, we will

consider regular languages (sets of strings) over the alphabet Σ in order to reduce the

search space size. These languages are specified by regular expressions. Given a regular

expression

*L *over the alphabet Σ, we denote by

*M*(

*L*) the set of temporal i-patterns (

*K*,

*< p*1

*, ., pn >*,

*T *) over

**R **such that there exists a string

*w*1

*, w*2

*, ., wn ∈ *L, such that

*pi *is

in accord with

*wi *for each

*i *= 1

*, ., n*.

The

*search space *associated to

*L *is the set of temporal i-patterns

*M *=
(

*K*(

*x*1

*, ., xm*),

*D*,

*T *)

*∈ M*(

*L*) such that

*D *=

*< p*1(

*t*1

*, ., t*1

*, e*
satisfies the following property: Let

*< w*1

*, ., wn > *be the string of modes correspond-ing to the sequence

*< p*1

*, . . . , pn >*. The data variables appearing in

*M *which replacesa symbol

*∗ *in some

*wi *are not in

*{x*1

*, ., xm} *and have only one occurrence in

*D*. Wedenote by

*QL *the search space specified by

*L*.

**Example 3 **Let us consider the database schema

**R**=

*{Patient*,

*Med, Symp } *of

our running example.

Let

*L *=

*M ed*(

*x, ∗, *#)

*M ed*(

*x, ∗, *#)

*∗ Symp*(

*x, ∗, *#).

pattern

*M*1 = (

*P atient*(

*x*),

*{M ed*(

*x, a, e*)

*, M ed*(

*x, b, f *),

*Symp*(

*x, diarrhea, g*)

*}*,

*{bef ore*(

*e, f *)

*, overlaps*(

*f, g*)

*}*) belongs to

*QL*. On the other hand, the pattern

*M*2 =(

*P atient*(

*x*),

*{M ed*(

*x, y, e*),

*Symp*(

*x, y, f *)

*}*,

*{bef ore*(

*e, f *)

*}*) does not belong to

*QL*.

Intuitively, the regular expression

*L *captures the temporal i-patterns corresponding to apatient

*x *taking certain medicines during successive periods of time and eventually pre-senting some symptom.

Our mining task is: Given a temporal database

*DB*, a minimal support threshold

*α*, 0

*≤ α ≤ *1, and a regular expression

*L*, find all temporal i-patterns which belong to thesearch space associated to

*L *and are frequent with respect to

*DB*.

**4. The algorithm MILPRIT**

In this section, we present the Algorithm MILPRIT (Mining Interval Logic Patterns wiht

Regular expressIons conTraints) which generalizes the idea of the SPIRIT algorithm in-

troduced in [4] in the context of classical sequence patterns. In a high level, it follows

the general Apriori strategy of Agrawal/Srikant [1], working in passes, and each pass

producing patterns more

*specificic *than those produced in the previous pass.

The classical Apriori strategy uses a

*Pruning Phase*, where all patterns more spe-
cific than a pattern

*p *not belonging to the frequent patterns produced so far (at the previouspass) are pruned. This pruning strategy relies on the Antimonotonic Property: “

*frequentpatterns cannot be more specific than not frequent patterns*”. In our setting, at the end ofpass

*k*, the algorithm discovers the frequent patterns

*Fk *which satisfy a regular expression

*L*. The constraint

*L *is pushed into the generation phase at pass

*k *+ 1 in order to produceonly patterns satisfying it. So, the number of generated patterns is in direct proportionto the restrictiveness of

*L*. In the pruning phase, however, all patterns which are morespecific than a pattern

*satisfying L *and not belonging to

*Fk *should be pruned. We notice
that in this case, the size of the set of pruned patterns is

*in inverse proportion *to the re-strictiveness of the constraint

*L*. Such tradeoffs are due to the fact that regular expressioncontraints are not anti-monotone, that is, if a pattern satisfies

*L *it may have some sub-pattern not satisfying

*L*. In order to find a suitable trade-off, we use a

*relaxation *of theconstraint

*L*, that is, a less restrictive one, the

*prefix constraint associated to L*. We call

*P ref *(

*QL*) the set of prefixes of words

*w ∈ QL*. The general structure of the algorithmMILPRIT is depicted below:

**Procedure MILPRIT(***DB***,***α***,***R***)**

*DB *is a temporal dataset,

*R *is a regular expression over the alphabet of modes Σ over the temporal database

schema

**R **=

*{K, R*1

*, ., Rn} *underlying

*DB*. Let

*M*1 be the set of data atoms

*Ri*(

*x*1

*, ., xn *).

**begin**

1.

*F *:=

*F*1 =

*{M ∈ M*1 :

*M *is frequent and

*M ∈ P ref *(

*QL*)

*}*

2.

*k *:= 2

3.

**repeat**

4.

*Ck *:= Gen(

*Fk−*1,

*L*) (

**Generation Phase**)

5.

*P *:=

*{M ∈ Ck *:

*∃N ∈ P ref *(

*QL*), such that

*N ∈ F *and

*M *is more specific than

*N }*

6.

*Ck *:=

*Ck − P *(

**Pruning Phase**)

7.

*Fk *:=

*{M ∈ Ck *: support(M)

*≤ α} *(

**Validation Phase**)

8.

*F *:=

*F ∪ Fk*

9.

*k *:=

*k *+ 1

10.

**until ***Fk *=

*∅*

11.

*F *:=

*{M ∈ F *:

*M ∈ QL }*
**The Generation Phase.**
MILPRIT works in passes, each pass

*k *producing patterns of
“

*level*”

*k*. The “level” of a temporal i-pattern

*M *is measured by means of the

*refinementvector *associated to

*M *:

**Definition 3 **The

*refinement vector *associated to

*M *= (

*K*,

*D*,

*T *) is

*v*(

*M *) = (

*n, c*) where

*n *is the size of

*D *and

*c *is the number of constants appearing in

*D*. The

*refinement level*

of

*M *, denoted by

*l*(

*M *), is defined as

*n *+

*c*. For instance, let

*M *be the temporal i-pattern

(

*P atient*(

*x*),

*{M ed*(

*x, penicillin, e*)

*, Symp*(

*x, z, f *)

*}, {bef ore*(

*e, f *)

*}*). Then

*v*(

*M *) =

(2

*, *1) and

*l*(

*M*) = 3.

The procedure Gen(

*Fk−*1,

*L*) is designed to generate temporal i-patterns whose
refinement level is

*k*, by

*specializing *the temporal i-patterns of

*Fk−*1 (whose refinementlevel is

*k − *1) in such a way that the patterns produced constitute a prefix of patternssatisfying

*L*. This is accomplished by two

*specialization operators *defined below.

**Definition 4 (Extension) **Let

*L *be a regular expression and

*M ∈ P ref *(

*QL*),

*M *= (

*K*,

*D*,

*T *). The

*extension operator ρE *executed over

*M *is defined as follows: (a) if

*v*(

*M*) =

(

*n, c*) with

*c > *0, then

*ρE*(

*M*) =

*∅*; (b) if

*v*(

*M*) = (

*n, *0) then

*ρE*(

*M*) is the set of

temporal i-patterns

*M *= (

*K, D , T *)

*∈ QL *such that

*D *=

*D ∪ {pn*+1(

*x*1

*, . . . , xl, en*+1)

*}*

where

*pn*+1

*∈ P red*,

*xi *are data variables,

*en*+1 is a temporal variable and

*T *=

*T ∪*

{r1(

*e*1

*, en*+1)

*, . . . , rn*(

*en, en*+1)

*} *where

*ri *are temporal predicates, and

*T *is complete and

consistent.

**Example 4 **Let

*L *be the regular expression

*E *=

*M ed*(

*x, ∗, *#)

*∗Symp*(

*x, ∗, *#).

The i-pattern

*M*
*{Med*(

*x, y, e*1)

*}*,

*∅*) belongs to

*Pref *(

*QL*).

Let us consider the following i-patterns:

*M*1 = (

*P atient*(

*x*),

*{Med*(

*x, y*1

*, e*1),

*Med*(

*x, y*2

*, e*2)

*}*,

*{overlaps*(

*e*1

*, e*2)

*}*),

*Med*(

*x, y*2

*, e*2),

*Symp*(

*x, z, e*3)

*}*,

*{overlaps*(

*e*1

*, e*2),

*overlaps*(

*e*2

*, e*3)

*}*).

*ρE*(

*M*), but

*M*2 is not in

*ρE*(

*M*1), because there is no explicit nor implicit relationshipbetween the temporal variables

*e*2 and

*e*3.

**Definition 5 (Instantiation) **Let

*L *be a regular expression and

*M ∈ P ref *(

*QL*),

*M *=

(

*K*(

*x*1

*, ., xn*),

*D*,

*T *). The

*instantiation operator *executed over

*M *produces the set

*ρI*(

*M*) of temporal i-patterns

*M *= (

*K, E , T *) obtained by replacing some variable

*yi*

(

*yi *=

*xj*, for all

*j *= 1

*, ., n*) appearing in some data atom

*D *=

*p*(

*., yi, .*) of

*D *by a

constant

*c *in such a way that: (a) the mode

*ws *=

*p*(

*u*1

*, . . . , ui, . . . , ul, *#) corresponding

to the data atom

*D *in the specification of

*M *according to a string

*w*1

*.ws.wk ∈ P ref *(

*L*)

contains a “

*∗*” in the position

*ui*; (b) there is no

*∗ *after the position

*i *in

*ws *such that the

data atom

*p*(

*., yi, .*) of

*D *contains a constant in that position; (c) there is no constant in

the data atoms after

*p*(

*., yi, .*) in the sequence

*D*.

**Example 5 **Let

*L *be the regular expression

*E *=

*M ed*(

*x, ∗, *#)

*∗Symp*(

*x, ∗, *#). Then

*M*2 and

*M*3

*∈ ρI*(

*M*1),

*M*4

*∈ ρI*(

*M*1),

*M*4

*∈ ρI*(

*M*2),

*M*4

*∈ ρI*(

*M*3), where:

*M*1 = (

*P atient*(

*x*)

*, {Med*(

*x, y*1

*, e*1)

*, Med*(

*x, y*2

*, e*2)

*}, {overlaps*(

*e*1

*, e*2)

*}*)

*M*2 = (

*P atient*(

*x*)

*, {Med*(

*x, penicillin, e*1)

*, Med*(

*x, y*2

*, e*2)

*}, {overlaps*(

*e*1

*, e*2)

*}*)

*M*3 = (

*P atient*(

*x*)

*, {Med*(

*x, y*1

*, e*1)

*, Med*(

*x, prozac, e*2)

*}, {overlaps*(

*e*1

*, e*2)

*}*)

*M*4 = (

*P atient*(

*x*)

*, {Med*(

*x, penicillin, e*1)

*, Med*(

*x, prozac, e*2)

*}, {overlaps*(

*e*1

*, e*2)

*}*)
The specialization operator

*ρ *defined as

*ρ*(

*M *) =

*ρI*(

*M *)

*∪ ρE*(

*M *) satisfies some
nice properties: (a) it is

*complete*, in the sense that every i-pattern

*M ∈ P ref *(

*QL*) canbe obtained by applying

*ρ *successively to some i-pattern in the refinement level 1; (b)by appropriately restricting the form of the regular expression

*L *1,

*ρ *is

*optimal*, in thesense that a pattern cannot be obtained by specializing successively two non-equivalenti-patterns2.

**5. Preliminary Experimental Results and Discussion**

MILPRIT has been implemented in C++, under Linux, and executed on a Pentium IV,

3.0 GHz, 1Gb RAM and 80Gb HD SEAGATE ST336607SW. We have developped a

synthetic data generator allowing to adjust several parameters in order to test the perfor-

mance and scalability of MILPRIT. We have performed some preliminary experiments

over synthetic data. We have generated a dataset with two tables

*R*1(

*A, B, Begin, End*),

*R*2(

*A, C, Begin, End*), each table with 3000 tuples, a reference table

*K*(

*A*) with 500

tuples. This dataset has been generated in such a way it includes at least 20 temporal

i-patterns, satisfying the regular expression

*R*2(

*x, ∗, *#)*

*R*1(

*x, ∗, *#)* with support 15%.

1i.e., by requiring that if two distinct modes appear in

*L*, associated to the same predicate

*p*, the positions containing
the symbol * are the same in boths modes.

2Two temporal i-patterns

*M *and

*M *are equivalent if

*M *is obtained from

*M *by renaming its variables.

MILPRIT has been executed over this dataset, with minimum support threshold 10%, andthe execution time was 28 minutes. It stopped at level 10, producing all the temporali-patterns introduced in the dataset.

For the time being, MILPRIT executes correctly over small datasets, containing
an average of 50 temporal i-patterns. A lot of work has to be done yet. First, we have tooptimize MILPRIT in order to evaluate its performance and scalability over large volumeof data. We are investigating buffer management techniques in order to deal with candi-date sets which don’t fit in main memory. Another issue we are investigating is how tooptimize the support counting to validate “long” candidates. Secondly, we have to testMILPRIT over real data. We intend to use the

*AMDI *database for this purpose.

*AM DI*is part of a huge project which aims at building a system supporting an indexed atlas ofdigital mammograms. This system will be available online and intend to be used by ra-diologists as well as radiology students, in order to help them in breast cancer diagnosis.

The atlas will store, for each patient, a series of digital mammograms, taken during itslifetime. Besides image data, some important temporal data will be stored, related to thepatient habits and life style: in which periods of time did she smoke, or take contraceptivedrugs, for instance. MILPRIT will be used in order to find temporal i-patterns relating theevolution of a breast cancer tumor and the patient’s habits and life style.

**References**
[1] R. Agrawal, R. Srikant.

*Mining Sequential Patterns: Generalizations and Performance Improvements. *Proc.

of the Fifth Int. Conference on Extending Database Technology, Avignon, France, March 1996.

[2] H. Albert-Lorincza and J-F. Boulicaut. Mining frequent sequential patterns under regular expressions:
a highly adaptive strategy for pushing constraints. In

*International Conference on Data MiningSDM’03*, pages 316–320, 2003.

[3] James F. Allen and George Ferguson. Actions and Events in Interval Temporal Logic. Technical Report
[4] Minos N. Garofalakis, Rajeev Rastogi, and Kyuseok Shim. SPIRIT: Sequential Pattern Mining with Regular
Expression Constraints. In

*The VLDB Journal*, pages 223–234, 1999.

[5] Frank H¨oppner. Discovery of Temporal Patterns: Learning Rules about the Qualitative Behaviour of Time
Series. In

*PKDD 2001*, pages 192–203, 2001.

[6] Nico Jacobs and Hendrik Blockeel. From Shell Logs to Shell Scripts. In

*ILP’2001*, volume 2157 of

*Lecture*
*Notes in Computer Science*, pages 80–90. Springer, 2001.

[7] Sau Dan Lee and Luc De Raedt. Constraint Based Mining of First Order Sequences in Seqlog. In

*Workshop*
*on Knowledge Discovery in Inductive Databases (KDID´02)*, 2002.

[8] Cyrille Masson and Franc¸ois Jacquenet. Mining Frequent Logical Sequences with Spirit-Log. In

*ILP’2002*,
volume 2583 of

*Lecture Notes in Computer Science*, pages 166–181. Springer, 2003.

Source: http://www.deamo.prof.ufu.br/arquivos/deamoWADM2005.pdf

SPECIAL REPORT-THE DRS SYSTEMOctober 1998 Low back pain can have many causes. It is exceedingly frequent, and is experienced at some time by up to 80% of the population. The differential diagnosis of low back pain is broad and includes systemic diseases (e.g. metastatic cancer), primary spine disease (e.g. disk herniation, degenerative arthritis), and regional diseases (e.g. aortic dissection)

J. Am. Chem. Soc. 2000, 122, 12898-12900 Validation of a Model for the Complex of HIV-1 Reverse Transcriptase with Sustiva through Computation of Resistance Profiles Robert C. Rizzo, De-Ping Wang, Julian Tirado-Rives, andWilliam L. Jorgensen* Department of Chemistry, Yale Uni V ersity New Ha V en, Connecticut 06520-8107 Re V ised Manuscript Recei V ed October 24, 2000 All retrovi