Joinutility seperatorLogin utility separator Infobright.com
   
1 of 2
1
Various Types of Inexact Data
Posted: 01 April 2009 08:44 PM   Ignore ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  453
Joined  2008-08-18

I open this thread to discuss various cases of data with values not necessarily crisply defined.

This is the topic getting more and more popular in theory and applications (see e.g. ICDE 2009 thread).

Among various examples of inexact data, let me refer to non-deterministic information systems (see academic projects thread, #1 and #2), as well as to Infobright’s Knowledge Grid as the source of rough data (see rough data contest thread as well as relevant posts on our academic blog).

Certainly, all those sources of inexact data require specific solutions with respect to specific applications. Nevertheless, let’s try to discuss them within the same thread. I believe that in this way we’ll be able to address some common high-level aspects, with no harm for more detailed discussions about particular cases.

Best greetings,

Dominik

PS: H.S. suggested starting posts with subtopics, e.g.: “Subtopic: Non-deterministic information” or “Subtopic: Interval information” to better navigate through this thread. I guess it may work out well. I’ll try to remember about it while posting.

[ Edited: 01 April 2009 10:50 PM by Dominik Slezak]
Signature 
Profile
 
Posted: 06 April 2009 07:24 PM   Ignore ]   [ # 1 ]  
Newbie
Rank
Total Posts:  15
Joined  2008-11-09

[Subtitle: Non-deterministic information]
Background (No.1)

Hello everyone,

I am Hiroshi Sakai at Kyushu Institute of Technology, Japan. I would survey some backgrounds of non-deterministic information and our framework on Rough Non-deterministic Information Analysis (RNIA) sequentially. There is several work on rough sets, and as far as I know, non-deterministic information systems (NISs) was proposed by Professor Z.Pawlak, Professor E.Orlowska and Professor W.Lipski.

[1] E.Orlowska and Z.Pawlak: Representation of Nondeterministic Information, Theoretical Computer Science, Vol.29, pp.27-39, 1984.
[2] W.Lipski: On Semantic Issues Connected with Incomplete Information Data Base, ACM Trans. DBS, Vol.4, pp.269-296, 1979.

Rough set theory usually handles tables with deterministic information or deterministic information systems (DISs). In a NIS, some attribute values are given as a set of attribute values instead of an attribute value due to the information incompleteness. In a NIS, we replace each set with an element in the set, and generate a DIS. We name such a DIS a derived DIS from a NIS, and there exist a DIS with unknown real attribute values.

The purpose of our research is simple, i.e., we want to handle not only DISs but also NISs. In the attached file, examples of a DIS and a NIS are shown. In a DIS, rough sets based concepts are known well. Then, what are the rough sets based concepts in a NIS? We think the rough sets based concepts in a NIS should be defined by summarizing the results in each derived DIS. This way of thinking is almost the same as Kripke model, and the certainty and the possibility are naturally defined.

(Certainty) If a fact A occurs in each derived DIS from a NIS, we can conclude that this fact A also occurs in DIS_{real}.
(Possibility) If a fact A occurs in some derived DISs from a NIS, we can conclude that this fact A may occur in DIS_{real}.

(to be continued)

File Attachments 
Background(No.1).pdf  (File Size: 108KB - Downloads: 290)
Profile
 
Posted: 07 April 2009 10:27 PM   Ignore ]   [ # 2 ]  
Newbie
Rank
Total Posts:  15
Joined  2008-11-09

[Subtitle: Non-deterministic information]
Background (No.2) Incomplete Information and Non-deterministic Information

Hello everyone,

This is the second post. Here, I have to clarify that these posts are not authorized, and these posts are depending upon just my opinion.

I think that non-deterministic information is a kind of incomplete information, but they are not the same. In several work, unknown or missing values may be expressed by * symbol. The most familiar interpretation of this * symbol will be such that the * symbol may be a value in a domain. On the other hand, non-deterministic information may specify a subset of the domain. This causes the difference of the meaning. In the attached file, object 1 and 3 may be inconsistent in Table 1. However, object 1 and 3 are consistent in Table 2.

Thank you.


(to be continued)

File Attachments 
Background(No.2).pdf  (File Size: 162KB - Downloads: 298)
Profile
 
Posted: 09 April 2009 03:27 PM   Ignore ]   [ # 3 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  453
Joined  2008-08-18

[Subtitle: Non-deterministic information]

Dear Hiroshi,

Thanks a lot for providing us with the background!

Let me digress a little bit and mention that incomplete data is a significant challenge for Business Intelligence tools. In databases, incomplete information is usually represented by NULL values. During SQL-based processing, we need to decide how to deal with NULLs while preparing aggregate data for reporting, visualization, et cetera. Consider SQL statement of the form select x,y,z,avg(a) from t group by x,y,z. For every group determined by columns x,y,z, there may be some rows partially filled in with NULLs that might or might not be members of the group. So, wouldn’t it be nice to have approximations of results in particular groups? Well, it’s easy to say but not so easy to do. Moreover, it’s not so obvious how to prepare the reporting or visualization tools to deal with such approximations.

Incomplete data is also a significant challenge for data mining tools. Even the simplest data mining task becomes far more complicated and even not so obvious to re-define when some values are not crisply identified. The example in the previous post (see attachment) is perfect. The task of extracting IF-THEN rules from data seems to be so easy to formulate! However, once we have missing values, we’re not sure any more how to understand the rules’ accuracy, support and coverage. Actually, most of widely-known data mining tools based on statistical analysis, AI, et cetera, attempt to replace missing values with some estimates or most probable values prior to further calculations. Although it make work in some cases, it surely introduces additional level of complexity and, after all, it may look quite scary and unreliable when the percentage of NULLs is significant. From this perspective, I highly evaluate algorithms that are non-invasive, i.e. they can work directly on the data with missing values instead of additional replacement procedures.

Hiroshi, I write so much about incomplete data in order to emphasize the importance of your area of research. You are actually dealing with even more complex and more important case of data values that are neither completely crisp nor completely NULL. Formally, we can say that these are data tables that are not in the first normal form. From this perspective, the goal of our cooperation may be interpreted as the design of efficient SQL-based framework that would provide sound and meaningful results when running against somehow normalized versions of non-deterministic data.

Best greetings,

Dominik

Signature 
Profile
 
Posted: 11 April 2009 02:35 AM   Ignore ]   [ # 4 ]  
Newbie
Rank
Total Posts:  15
Joined  2008-11-09

[Subtitle: Non-deterministic information]

Dear Dominik,

I thank you for your comments.

> I highly evaluate algorithms that are non-invasive, i.e. they can work directly on the data with missing values instead of
> additional replacement procedures.

My interest was mainly how to discriminate rules from a table with non-deterministic information, and I might think an ideal case of a table with non-deterministic information. In an implementation by SQL, some constraints for data sets may be necessary. I have touched some data sets in UCI machine learning repository, but I am not familiar with SQL nor real large data sets by companies. Therefore, I do not mind your selection of constraints to data sets.

Thank you.

Regards,
Hiroshi

Profile
 
Posted: 11 April 2009 09:32 PM   Ignore ]   [ # 5 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  453
Joined  2008-08-18

[Subtitle: Non-deterministic information]

Dear Hiroshi,

I do hope that there are not so many constraints necessary. The whole point is how to rewrite the non-deterministic data to let SQL work efficiently and properly.

Let me refer to the example in the data mining thread. Precisely, let me refer to the reply #2, where I discussed the algorithm for searching for frequent itemsets. Let me try to modify that approach in order to search for frequent itemsets in non-deterministic systems. I assume that we should work with something like lower and upper approximation of the supports of particular itemsets and that you may wish to have some constraints for both of such approximations during the search process. I suggest that once we make the first step at the level of itemsets and their supports, we will be then able to proceed with searching for the rules, with additional constraints for their accuracy. Certainly, the case of itemsets is easier. In case of rules, I will first need to understand how you would like to compute lower and upper approximations of accuracies (and probably also coverage).

So let’s start with itemsets:

Let me rewrite the data you used in your post above. Have a look at the attachment. You may see that this is exactly the same information but structured in a different way. It’s pretty easy to transform a non-deterministic information system to such form prior to starting computations. Please also note that I added an extra column det. It indicates whether the value of a given object (row) on a given attribute is deterministic or not. It helps in dealing with lower and upper approximations simultaneously. It could be actually added using SQL but I think it’s more efficient to construct such an additional column during transformation of original non-deterministic system into the new form, prior to starting SQL operations.

Now, let me recall the following query from the data mining thread:

insert into F_k 
   select      F_k
-1.attr_1F_k-1.val_1,F_k-1.attr_k-1F_k-1.val_k-1F.attrF.val,
                   
count(*) as supp 
   from        F_k
-1FT t_1,,T t_k 
   where       F_k
-1.attr_k-F.attr and
                   
t_1.attr F_k-1.attr_1 and 
                   
t_1.val F_k-1.val_1 and
                   
… 
                   t_k
-1.attr F_k-1.attr_k-and
                   
t_k-1.val F_k-1.val_k-and
                   
t_k.attr F.attr and
                   
t_k.val F.val and
                   
t_1.rid t_2.rid and 
                   ... 
                   
t_k-1.rid t_k.rid 
   group by    F_k
-1.attr_1F_k-1.val_1,F_k-1.attr_k-1F_k-1.val_k-1F.attrF.val
   having      supp 
>= minsup

If you apply such a query to the data in the attachment, you will obtain the set of all k-long itemsets with upper approximations of their supports greater or equal than the threshold minsup. Now, let us imagine that we want to search for itemsets with lower and upper approximations of their supports greater or equal than minsup_lower and minsup_upper, respectively. Then, we should additionally use the column det as follows:

insert into F_k 
   select      F_k
-1.attr_1F_k-1.val_1,F_k-1.attr_k-1F_k-1.val_k-1F.attrF.val,
                   
sum(t_1.det*...*t_k.det) as supp_lowercount(*) as supp_upper
   from        F_k
-1FT t_1,,T t_k 
   where       F_k
-1.attr_k-F.attr and
                   
t_1.attr F_k-1.attr_1 and 
                   
t_1.val F_k-1.val_1 and
                   
… 
                   t_k
-1.attr F_k-1.attr_k-and
                   
t_k-1.val F_k-1.val_k-and
                   
t_k.attr F.attr and
                   
t_k.val F.val and
                   
t_1.rid t_2.rid and 
                   ... 
                   
t_k-1.rid t_k.rid 
   group by    F_k
-1.attr_1F_k-1.val_1,F_k-1.attr_k-1F_k-1.val_k-1F.attrF.val
   having      supp_lower 
>= minsup_lower and supp_upper >= minsup_upper

By using sum(t_1.det*...*t_k.det) we indicate that we are interested in counting only those cases wherein all the values are deterministic. Certainly, there are also other ways of expressing it in SQL (e.g. using case-when functions) but let’s omit such technical details here.

It’s just a quick post to illustrate that it’s possible to easily (as I hope) adapt some SQL-based techniques to deal with non-deterministic information systems as well. Hiroshi, I hope you like this approach. As I mentioned above, if you provide more details how you want to express the approximations of accuracy (and probably also coverage) in non-deterministic information systems, then I’ll try to follow up with SQL-based rule extraction algorithms.

Many thanks and best greetings,

Dominik

[ Edited: 12 April 2009 10:48 PM by Dominik Slezak]
File Attachments 
data example.pdf  (File Size: 20KB - Downloads: 314)
Signature 
Profile
 
Posted: 14 April 2009 01:24 AM   Ignore ]   [ # 6 ]  
Newbie
Rank
Total Posts:  15
Joined  2008-11-09

[Subtitle: Non-deterministic information]
NISapriori (Apriori in NISs) in SQL

Hello Dominik,

I thank you for your comments.

At first, I did not think of the implementation by SQL. Therefore, I did not think the data structure in SQL. However, now I think the implementation by SQL will be necessary. I thank Dominik for coping with the framework in SQL.

Now, I propose the criterion based rules in NISs in the attached file NISapriori, i.e., the lower system, the upper system and how to realize such systems. In this attached file, I extend the rule generation from DISs to rule generation from NISs. In NISs, the number of derived DISs increases in exponential order, so we do not apply typical method sequentially. However, we proved a calculation method, which handles the worst case and the best case of support and accuracy. In the attached file Criteria, we employ two sets Descinf([A,a]) and Descsup([A,a]) and we show the min_support, min_accuracy, max_support and max_accuracy. This calculation does not depend upon the number of derived DISs. By using this calculation, we can handle the lower and upper systems. Furthermore, the complexity will be almost the same as Apriori algorithm. The details are in the following.

H. Sakai, R. Ishibasi and M. Nakata: Rules and Apriori Algorithm in Non-deterministic Information Systems, Transactions on Rough Sets, Springer-Verlag, Vol. 9 (LNCS Vol. 5390), pp. 328-350, 2008.

I attach an execution of #2 data by our program. For blue.csv, we first translate it to an internal form, then we store each value to arrays for descriptors. In this program, I did not need to consider the data structure.  This system is an experimental system, so we need an implementation by SQL. It will be fine, if Dominik translates NISapriori in C to NISapriori in ICE.

Thank you.

Regards,
Hiroshi

File Attachments 
NISapriori.zip  (File Size: 166KB - Downloads: 388)
Profile
 
Posted: 14 April 2009 09:12 AM   Ignore ]   [ # 7 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  453
Joined  2008-08-18

[Subtitle: Non-deterministic information]

Dear Hiroshi,

I found that attaching zipped documents may cause some problems. Unfortunately, I wasn’t able to open your last attachment. Was it pdf? If it was, could you please attach it without zipping?

I’ll be happy to look at it! I suspect that we are close to formulating everything in a nice SQL framework that would be efficient on ICE. (Although one could obviously check it out also with other databases supporting standard SQL.)

Many thanks and best greetings,

Dominik

[ Edited: 14 April 2009 09:43 AM by Dominik Slezak]
Signature 
Profile
 
Posted: 14 April 2009 08:51 PM   Ignore ]   [ # 8 ]  
Newbie
Rank
Total Posts:  15
Joined  2008-11-09

[Subtitle: Non-deterministic information]
NISapriori (Apriori in NISs) in SQL

Hello Dominik,

I am sorry for attaching a zip file.
I attach three pdf files, again.
1. NISapriori: A new definition of rules in NISs
2. criteria: A method of calculation on Min_support, Min_accuracy, Max_support and Max_accuracy
3. Execution: An example of execution related to #2

Thank you.

Regards,
Hiroshi

File Attachments 
NISapriori.pdf  (File Size: 102KB - Downloads: 305)
criteria.pdf  (File Size: 127KB - Downloads: 314)
Execution.pdf  (File Size: 14KB - Downloads: 332)
Profile
 
Posted: 15 April 2009 07:48 AM   Ignore ]   [ # 9 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  453
Joined  2008-08-18

[Subtitle: Non-deterministic information]
Dear Hiroshi,

Below is the algorithm for finding all decision minimal rules such that maxsup >= alpha and maxacc >= beta, wherein maxsup and maxacc are defined in DEF-DEF mode. The algorithm can be surely reformulated for minsup and minacc, as well as for other modes (DEF-INDEF, INDEF-DEF, INDEF-INDEF).

I use the form of table T as in data example.pdf in #5 above. I also refer to the rule extraction described for deterministic data in data mining thread.

I start by creating table C_1 of attribute-value conditions. Their support is measured in its maxsup form:

insert into C_1 
   select      attr 
as attr_1val as val_1sum(det) as minsupcount(*) as maxsup
   from        T
   where       attr 
!= DEC   // DEC denotes the decision attribute 
   
group by    attrval
   having      maxsup 
>= alpha

I also create table D_1 with the candidate decision rules with single attribute-value conditions:

insert into D_1 
   select      C_1
.attr_1C_1.val_1t_dec.val as decval,
                  
count(*) as maxsupmaxsup/[min(C_1.minsup)+maxsup-sum(t_1.det)as maxacc
   from        C_1
T t_1T t_dec 
   where       t_1
.attr C_1.attr and
                   
t_1.val C_1.val and
                   
t_1.rid t_dec.rid
   group by    C_1
.attrC_1.valt_dec.val
   having      maxsup 
>= alpha

The SQL-based maxacc corresponds to the original formula in DEF-DEF mode. The minsup values of the rules’ conditional parts are “borrowed” from table C_1 (refer to #8 in data mining thread for explanation of the usage of min(C.minsup)).

The rest of the algorithm is based on iterative creation of the sets of k-long conditional patterns (table C_k) and decision rules (table D_k). At every kth step, k>1, we use the results of (k-1)th step together with C_1 and D_1.

Note that before computing every next D_k, we remove from D_k-1 the rules that do not require further analysis. Namely, all rules with maxacc >= beta are put into FINAL RESULT which is the set of all minimal rules satisfying predefined constraints for maxsup and maxacc. The reminder of D_k-1 is denoted as F_k-1. The stop criterion is the same as for rules in deterministic data in data mining thread.

insert into C_k 
   select      c_old
.attr_1c_old.val_1,..., c_old.attr_k-1c_old.val_k-1,
                   
c.attr as attr_kc.val as val_k,
                   
sum(t_1.det*...*t_k.det) as minsupcount(*) as maxsup
   from        C_1 c
C_k-1 c_oldT t_1,..., T t_k
   where       c_old
.attr_k-attr_k and
                   
t_1.attr c_old.attr_1 and
                   
t_1.val c_old.val_1 and
                  ...
                   
t_k-1.attr c_old.attr_k-and
                   
t_k-1.val c_old.val_k-and
                   
t_k.attr c.attr and
                   
t_k.val c.val and
                   
t_1.rid t_2.rid and
                   ...
                   
t_k-1.rid t_k.rid
   group by    c_old
.attr_1c_old.val_1,..., c_old.attr_k-1c_old.val_k-1c.attrc.val
   having      maxsup 
>= alpha

And finally:

insert into D_k 
   select      C_k
.attr_1C_k.val_1,..., C_k.attr_kC_k.val_kt_dec.val as decval,
                   
count(*) as maxsup,
                   
maxsup/[min(C_k.minsup)+maxsup-sum(t_1.det*...*t_k.det)as maxacc
   from        C_k
F_1 fF_k-1 f_oldT t_1,..., T t_kT t_dec 
   where       C_k
.attr_1 f_old.attr_1 and
                   
C_k.val_1 f_old.val_1 and
                   ...
                   
C_k.attr_k-f_old.attr_k-and
                   
C_k.val_k-f_old.val_k-and
                   
C_k.attr_k f.attr_1 and
                   
C_k.val_k f.val_1 and
                   
t_1.attr C_k.attr_1 and
                   
t_1.val C_k.val_1 and
                   ...
                   
t_k.attr C_k.attr_k and
                   
t_k.val C_k.val_k and
                   
t_dec.val f_old.decval and
                   
t_dec.val f.decval and
                   
t_1.rid t_dec.rid and
                   ...
                   
t_k.rid t_dec.rid
   group by    C_k
.attr_1C_k.val_1,..., C_k.attr_kC_k.val_kt_dec.val
   having      maxsup 
>= alpha

Please let me know whether it sounds reasonable.

Best greetings,

Dominik

[ Edited: 15 April 2009 07:53 AM by Dominik Slezak]
Signature 
Profile
 
Posted: 16 April 2009 03:30 AM   Ignore ]   [ # 10 ]  
Newbie
Rank
Total Posts:  15
Joined  2008-11-09

[Subtitle: Non-deterministic information]
NISapriori (Apriori in NISs) in SQL

Hello Dominik,

Oh, fine!!  This is great.

I do not know SQL well. Therefore, I just now do not decide whether it sounds reasonable. I see this recursive step is as below: (alpha is the condition on support, beta is the condition on accuracy)
(1) a sequence C_k (ele_CON belongs to C_k, and ele_CON >=alpha)  is generated.
(2) If maxacc(ele_CON & ele_DEC)>=beta (ele_CON belongs to C_k, ele_DEC belongs to T_dec.val) , ele_CON=>ele_DEC is stored in D_k as a rule. Otherwise, ele_CON is picked up in C_{k+1}, again.

I think that maxsup>=alpha may be maxacc>=beta in D_1 and D_k.

In your post, the DEF-DEF case in the upper system are coded in SQL. I expect the realization of other DEF-INDEF, INDEF-DEF, INDEF-INDEF cases and the lower system. I will try to examine your SQL program.

Thank you.

Regards,
Hiroshi

Profile
 
Posted: 16 April 2009 06:28 AM   Ignore ]   [ # 11 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  453
Joined  2008-08-18

[Subtitle: Non-deterministic information]

Hello Hiroshi,

Sure, I’ll follow with the remaining cases of DEF-INDEF, INDEF-DEF, INDEF-INDEF cases and the lower system.

With regards to the recursive step, it looks as follows:

First you use SQL to create and fill in table C_k. Every row in this table is conjunction of k attribute=value condions. I’ll refer to this as ele_CON as you suggest. Surely we’re interested only in those ele_CONs that have support (its min or max case, depending on implementation) greater or equal to alpha. This is because other ele_CONs cannot lead to decision rules with high enough support anyway.

Then you use SQL to create and fill in table D_k. Every row in this table is conjunction of k attribute=value condions plus the decision value. I still at the support constraint here, i.e. D_k contains only the rows corresponding to the decision rules with sup(ele_CON & ele_DEC)>=alpha. (Again, it may be minsup or maxsup.) Accuracy is computed as one additional column in table D_k but there are no constraints for its level while creating D_k. All the rules with high enough support are stored in table D_k regardless of whether their accuracy is lower or greater or equal to beta.

The split between the rules with respect to their accuracy is done as the next step. (You do not need to use SQL here but of course you can if it is more convenient.)

1. The rules with accuracy >= beta are moved to FINAL RESULT
2. The rules with accuracy < beta are moved to table F_k, which is used at the next stage of recursion.

The stop criterion is when the table F_k turns out to be empty for a given k in the recursive process. Then we are sure that all irreducible rules satisfying support and accuracy constraints are in FINAL RESULT.

Best greetings,

Dominik

PS: I should have added SQL commands for creating tables C_k, D_k and F_k…

Signature 
Profile
 
Posted: 17 April 2009 10:18 PM   Ignore ]   [ # 12 ]  
Newbie
Rank
Total Posts:  15
Joined  2008-11-09

[Subtitle: Non-deterministic information]
NISapriori (Apriori in NISs) in SQL

Dear Dominik,

I know C_k, D_k and F_k. Is the following OK?
C_k: a set of length k conjunctions of descriptors, which satisfy maxsup>=alpha.
D_k: a set of conjunctions (ele_CON & ele_DEC, ele_CON belongs to C_k, ele_DEC belongs to T_dec.val), which satisfy maxsup>=alpha.
F_k: a set of conjunctions (ele_CON & ele_DEC, ele_CON belongs to C_k, ele_DEC belongs to T_dec.val), which satisfy maxacc>=beta.

Now, I briefly show the intuitive explanation of the lower and upper systems. Let us suppose an implication T. As for the upper system, we employ maxsupp(T) and maxacc(T) as criteria, and these two values are the same in other types of implications, i.e., DEF-INDEF, INDEF-DEF and INDEF-INDEF. In order to be the best derived DIS for T, each indefinite implication tries to support T. Namely, each indefinite implication tries to become T by the proper selection of value from non-deterministic information. Like this, the best derived DIS for T is generated.

On the other hand, in the lower system each indefinite implication does not support T. Namely, each indefinite implication tries to become
(1) the condition is the same as T, but the decision is different from T, or
(2) the condition is different from T, if the decision is the same as T.
Like this, the worst derived DIS for T is also generated.

I think that the implementation of the upper system seems much easy. Therefore, the order of research will be the upper system at first, then the lower system. I defined four types of implications, DEF-DEF, DEF-INDEF, INDEF-DEF and INDEF-INDEF. However, I think DEF-DEF implications may be enough for large scale data.

Thank you.

Regards,
Hiroshi

Profile
 
Posted: 20 April 2009 05:47 AM   Ignore ]   [ # 13 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  453
Joined  2008-08-18

[Subtitle: Non-deterministic information]

Dear Hiroshi,

Thanks a lot for sharing intepretations of all the types of implications with us.

Let me also confirm:

H.S. - 17 April 2009 10:18 PM

C_k: a set of length k conjunctions of descriptors, which satisfy maxsup>=alpha.
D_k: a set of conjunctions (ele_CON & ele_DEC, ele_CON belongs to C_k, ele_DEC belongs to T_dec.val), which satisfy maxsup>=alpha.
F_k: a set of conjunctions (ele_CON & ele_DEC, ele_CON belongs to C_k, ele_DEC belongs to T_dec.val), which satisfy maxacc>=beta.

Yes, this is how the tables C_k, D_k, F_k should be understood.

You also write:

H.S. - 17 April 2009 10:18 PM

I think that the implementation of the upper system seems much easy. Therefore, the order of research will be the upper system at first, then the lower system. I defined four types of implications, DEF-DEF, DEF-INDEF, INDEF-DEF and INDEF-INDEF. However, I think DEF-DEF implications may be enough for large scale data.

I also think this is a reasonable approach. If necessary, we can always go back to the DEF-INDEF, INDEF-DEF and INDEF-INDEF cases. For now, however, focusing on DEF-DEF case will give us more clarity. It should be also sufficient for testing the speed of execution of SQL statements because their syntax for all the cases is quite similar.

Best greetings,

Dominik

Signature 
Profile
 
Posted: 30 April 2009 01:58 AM   Ignore ]   [ # 14 ]  
Newbie
Rank
Total Posts:  15
Joined  2008-11-09

[Subtitle: Non-deterministic information]
Background (No.3) Consistency based Rules in NISs

Hello Dominik and everyone,

I am enjoying the application of ICE to NISapriori, which is a hot topic in this thread.

In order to define rules in NISs, we employ two strategies, i.e., the consistency based strategy and the criteria based strategy.
NISapriori is related to the criteria based strategy, however both strategies will be necessary. 
In the attached pdf, the following is surveyed.
(1) A definition of an implication in a DIS and a NIS
(2) Consistency based rules in a DIS
(3) Consistency based rules in a NIS

For such rules, the problem is to find the most proper algorithm and to realize a software tool for obtaining them.

Thank you.

Regards,
Hiroshi

File Attachments 
Background(No.3).pdf  (File Size: 161KB - Downloads: 300)
Profile
 
Posted: 02 May 2009 06:29 AM   Ignore ]   [ # 15 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  453
Joined  2008-08-18

[Subtitle: Non-deterministic information]

Hello Hiroshi,

It’s good to hear from you! Please let me know whether you were able to run some of those SQL statements?

Could we treat the consistency-based rules as special cases of the rules that we analyzed before? I have always thought about them as the rules with accuracy = 1. Then, perhaps, we could use the same algorithms. On the other hand, we may indeed think about a faster, more specific procedure for searching for such special types of rules.

Please let me know your thoughts and I’ll try to come up with something.

Many thanks and best greetings,

Dominik

Signature 
Profile
 
   
1 of 2
1