Next: Agents Up: Java Uniform Document Environment Previous: A Simple Case Study Contents

Subsections

Knowledge-Bases

This chapter shows the benefits of a knowledge-base specification in the field of information systems, introduces a language for their specification and a process for their implementation.

Deductive Databases

In [Mayol] we can read:

``Deductive databases are based on the concepts of deductive rules (views) and integrity constraints. Deductive rules allow to deduce new facts (derived or view facts) from base facts explicitly stored in the database and from other derived facts. On the other hand, integrity constraints define conditions that each state of the database is required to satisfy.

Deductive databases use (first-order) logic as a base language and generalize relational data bases by overcoming the limitations of relational languages in the definition of complex views and constraints. These features together with appropriate reasoning capabilities ease the sharing of common knowledge within complex application domains, facilitating program development and reuse on the way. ``

A deductive database in Jude is called knowledge-base and it represents an explicit conceptualization, as described in [Gruber 1993]:

`` A conceptualization is an abstract, simplified view of the world that we wish to represent for some purpose. Every knowledge base, knowledge-based system, or knowledge-level agent is committed to some conceptualization, explicitly or implicitly.

An ontology [knowledge-base] is an explicit specification of a conceptualization.[..] When the knowledge of a domain is represented in a declarative formalism, the set of objects that can be represented is called the universe of discourse. This set of objects, and the describable relationships among them, are reflected in the representational vocabulary with which a knowledge-based program represents knowledge. [..] Formally, an ontology is the statement of a logical theory.''

The knowledge-base is specified using a declarative programming paradigm. A declarative statement specifies what must be done, on the contrary an imperative statement specifies how it should be done. Declarative specifications often are written using an high-level formalism. An high level formalism permits expressing things directly related to the problem domain. Despite low level specifications, an high-level specifications ignore details related to computer resources management. So declarative statements tend to be more concise and readable than imperative specifications because all the low level details about the effective instructions to execute are not explicitly specified. A compiler and/or a run-time engine (in our case the inference-engine of a deductive database management system) translate the declarative specifications into imperative machine executable instructions for you.

XSB Deductive Database Management System

There are many available deductive database management systems. I have selected XSB because it is under the GNU license and it is actively maintained. This is a brief description of XSB tracts from the user manual:[Sagonas 2000]

``XSB is a research-oriented Logic Programming system for Unix and Windows-bases systems. In addition to providing all the functionality of Prolog, XSB contains several features not usually found in Logic Programming systems, including: (*) Evaluation according the Well-Founded Semantics through full SLG resolution; (*) A compiled HiLog implementation; (*) A variety of indexing techniques for asserted code, along with a novel transformation technique called unification factoring that can improve program speed and indexing for compiled code; (*) A number of interfaces to other software systems, such a C, Java, Perl and Oracle[...](*) Source code availability for portability and extensibility.''

However up to date XSB lacks lots of facilities needed by a DBMS. For example it can not execute concurrent queries.

SimpleLogic

SimpleLogic is the language used in Jude to specify knowledge-bases. It is a language created during Jude design, in order to increase the readability of Datalog or Prolog-like programs for non skilled users.

At the moment it lacks a lot of useful features of a real modern language. See chapter for more details.

Syntax

Simple-Logic statements are written and read from left to right as English sentences. This helps to produce knowledge bases understable also from non technical-skilled users. The disadvantage is that SimpleLogic has a verbose syntax, while there are other formalisms using a more terse syntax that is more understable for skilled users.

All capital terms represent variable names. Lower case terms are SimpleLogic keywords, relations or object identifiers. Remember that the symbol ``_'' represent a variable without name and without particular meaning inside the statement, equivalent to the word ``any'' in English.

A SimpleLogic knowledge-base is composed by:

cluster: a file containing types, relation declarations, facts and rules specification related to a command and specific domain. Every cluster starts with a textual description.
relation-declaration: declare a relation assigning a name, a cardinality, argument types and an informal description. For example

: ``The age of a person at a certain age.'':
declare relation person has_age number at_date date''

is the declaration of the has_age_at_date relation, with cardinality three and with person, number and date as argument types, and the first line as informal description.

object: is a concept, abstraction, or thing that can be individually identified and has meaning for the knowledge-base.
type: is a set containing elements(objects) with common properties. object is the most general set that contains all the objects of the knowledge-base. type is the set containing all the types definite in the knowledge-base.
object-identifier: is a unique key inside the knowledge-base identifying an object.
oid: abbreviation for object-identifier
atomic-oid: is an oid that cannot be decomposed.
compound-oid: is an oid that is compound from other oids.
identifier-oid: is an atomic-oid that identify an object without adding any additional information. For example ``jude_object_1'' is a identifier-oid, but not ``5''.
number-oid: is an atomic-oid that identifies a real number.
date-oid: is a compound oid that identifies a date. It has the format ``date(year,month,day)''. Valid date-oid are: ``date(2000,7,1)'',''date(YEAR,MONTH,MONTH+5'') etc.
list-oid: is a compound oid that identifies a list. It has the same form of Prolog list. Valid list are: ``[1,2,3]'',''[``uno'',''due'',''tre'']'',''[1,2 | [3]]'',''[UNO,UNO+1,UNO+2]''.
string-oid: is a compound oid that identifies a list of characters enclosed between quotation marks (``).
variable: is a place holder for an object-identifier inside a query or a rule. Each variable name is composed from upper case characters. Examples of variable are: PERSON, TYPE etc.
fact: is a relation between objects, for example: ``massimo has_age 26 at_date date(1974/01/01)''
rule: is a statement that derives new facts, specified in the rule-implication when the rule-condition is satisfied. For example:

``Assert the age of a person using its birthday'':

assert PERSON has_age AGE at_date DATE

if PERSON is_a person

and PERSON has_birthday BIRTHDAY

and BIRTHDAY has_year YEAR

and DATE has_year THIS_YEAR

and AGE is_equal_to THIS_YEAR - YEAR

is a rule where ``PERSON has_age AGE at_date DATE'' is the implication and the next part is the condition.

extensional-fact: synonymous of base fact, it is explicitly asserted.
intensional-fact: synonymous of derived facts, it is implicitly asserted by a rules.
query: something like:

: get { PERSON, AGE | PERSON has_age AGE at_date today};

where PERSON and AGE are result parameters of the query and ``PERSON has_age AGE at_date today'' is the condition that variable parameters must respect.

query-result: is a list of values satisfying the query. Given the above query result is something like: [ [massimo, 26],[maurizio,25]]
test: is the test of a given condition. An example of test is: ``test if massimo has_age 26 at_date today;''
test-result: is a true or false values corresponding to the truth value of a given test

Semantics

Simple-Logic has a first-order predicate calculus semantics.

The real source for documentation of SimpleLogic semantic is the compiler itself written in Java and obviously the related XSB-Prolog semantic.

The semantics is described informally through examples from SimpleLogic to XSB-Prolog.

The SimpleLogic fact

: assert massimo has_age 26 at_date date(2000,07,01)

is converted to XSB-Prolog statement

: has_age_at_date(massimo,26,date(2000,07,01)).

The SimpleLogic relation declaration

: declare relation person has_age number at_date date;

is converted to XSB-Prolog statements:

is_a(has_ate_at_date,relation_).

has_cardinality(has_ate_at_date,3).

has_argument_type(has_ate_at_date,person,1).

has_argument_type(has_ate_at_date,number,2).

has_argument_type(has_ate_at_date,date,2).

.. others meta info facts ..

The SimpleLogic rule

``Assert the age of a person using its birthday'':

assert PERSON has_age AGE at_date DATE

if PERSON is_a person

and PERSON has_birthday BIRTHDAY

and BIRTHDAY has_year YEAR

and DATE has_year THIS_YEAR

and AGE is_equal_to THIS_YEAR - YEAR

is converted to XSB-Prolog statement:

is_a(rule_1,rule).

has_description(rule_1,``Assert the age of a person using its birthday'').

is_a(instance_1, single_relation_instance).

.. and others meta info facts ..

has_age_at_date(PERSON,AGE,DATE) :-

is_a(PERSON,person),

has_birthday(PERSON,BIRTHDAY),

has_year(BIRTHDAY,YEAR),

has_year(DATE,THIS_YEAR),

is_equal_to(AGE,THIS_YEAR - YEAR).

Object-Oriented Features

At the moment SimpleLogic has a minimal support for object-oriented programming style. See chapter for possible extensions of the language.

The basic.simplelogic cluster defines two important types: object and type.

Object is the type of all the objects of the knowledge-base, is the set containing all the elements , is the directory containing all the resources.

Type is the type of all other type of the knowledge-base, is the set containing all other sets, is the directory containing all the types.

The set theory is specified using there rules:

declare relation object is_a type ;

declare relation type is_type_of object;

assert TYPE is_type_of OBJECT if OBJECT is_a TYPE;

assert OBJECT is_a TYPE if TYPE is_type_of OBJECT;

declare relation type is_subtype_of type ;

assert _ is_a object;

assert X is_a type if X is_subtype_of _;

assert object is_a type;

assert type is_subtype_of object;

assert type is_a type;

assert X is_subtype_of X if X is_a type;

assert X is_subtype_of Y

if X is_subtype_of Z

and X not_unify_with Z

and Z is_subtype_of Y;

assert OBJECT is_a SUPER_TYPE

if TYPE is_subtype_of SUPER_TYPE

and OBJECT is_a TYPE ;

This naive set theory implies some paradoxes: object is_element_of type, type is_element_of object, type is_subset_of object, type is_a type. However these paradoxes do not disturb during normal knowledge specification. In order to avoid the paradoxes a more elegant (and complex) set theory most be chosen, for example the Zermelo-Fraenkel set theory.

In SimpleLogic you can use rules in order to classify objects. For example:

assert S is_a hexagonal_system

if S is_a crystalline_structure

and S has_alpha_angle right_angle

and S has_beta_angle right_angle

and S has_gamma_angle angle_120

and S has_a_distance D_AB

and S has_b_distance D_AB

and S has_c_distance D_C

and not D_C is_equal_to D_AB

and S has_ions_per_cell 8;

In Jude an object can change its type dynamically when you add new facts about it or related objects.

SimpleLogic supports multiple inheritance, so an object can have multiple types. Note that if a rule applies to a given type, then it applies also to all its sub types, if it is not explicitly avoided. This differs from typical object-oriented languages where a subtype method overrides the super type method having the same name.

Transaction Logic

This section describes a SimpleLogic support for transaction logic that is only planned and not already implemented.

Transaction-logic is described in [Bonnery 1995]:

``An extension of predicate logic, called Transaction Logic, is proposed, which accounts in a clean and declarative fashion for the phenomenon of state changes in logic programs and databases. Transaction Logic has a natural model theory and a sound and complete proof theory, but unlike many other logics, it allows users to program transactions. This is possible because, like classical logic, Transaction Logic has a "Horn" version which has a procedural as well as a declarative semantics. In addition, the semantics leads naturally to features whose amalgamation in a single logic has proved elusive in the past. These features include both hypothetical and committed updates, dynamic constraints on transaction execution, nondeterminism, and bulk updates. Finally, Transaction Logic holds promise as a logical model of hitherto non-logical phenomena, including so-called procedural knowledge in AI, active databases, and the behavior of object-oriented databases, especially methods with side effects.''

Transaction Logic is the more coherent way to add knowledge-base modification commands to SimpleLogic.

First we must define elementary actions. An elementary action corresponds to assert or retract of a base fact in the knowledge-base. Suppose for example to have this relation:

: declare relation person has_balance number;

A command that asserts a new fact could be this:

: assert:[john has_balance 100]

As you imagine the command for retract the fact is:

: retract:[john has_balance 100]

If we try to retract a non existing fact then the action fails.

Suppose we want to declare a new non elementary command that add money to a user balance:

declare action add_money number to person;

add_money MONEY to USER

if ((USER has_balance OLD_BALANCE

and retract:[USER has_balance OLD_BALANCE])

or (OLD_BALANCE equal_to 0))

and assert:[USER has_balance OLD_BALANCE + MONEY].

Suppose we want to declare a new non elementary command that removes money from a user balance, reusing whenever possible already defined actions:

declare action remove_money number to person;

remove_money MONEY to USER balance

if USER has_balance OLD_BALANCE

and OLD_BALANCE is_greater_or_equal_to MONEY

and MONEY_TO_REMOVE is_equal_to 0 - MONEY

and add_money MONEY_TO_REMOVE to USER;

An equivalent form for the rule is:

remove_money MONEY to USER balance

if MONEY_TO_REMOVE is_equal_to 0 - MONEY

and add_money MONEY_TO_REMOVE to USER

and USER has_balance NEW_BALANCE

and NEW_BALANCE is_greater_or_equal_to 0;

As you can imagine from this last example, if the entire condition part of an action is not satisfied then the actions executed inside the condition are reverted.

A transaction is a group of actions that succeeds or fail as a single unit of work. An action is executed only if all conditions and sub actions are satisfied, otherwise the action (transaction) is not executed. This paradigm simplifies the specification of an action because we are not forced to group all conditions before the action but we can freely mix conditions and actions and specify compound actions.

Knowledge-Base Design Criteria

These are the criteria that must guide the design of a knowledge-base described in [Gruber 1993]:

``When we choose how to represent something in an ontology, we are making design decisions. To guide and evaluate our designs, we need objective criteria that are founded on the purpose of the resulting artifact, rather than based on a priori notions of naturalness or Truth. Here we propose a preliminary set of design criteria for ontologies whose purpose is knowledge sharing and interaction among programs based on a shared conceptualization.

1. Clarity: An ontology should effectively communicate the intended meaning of defined terms. Definitions should be objective. While the motivation for defining a Ontology are often equated with taxonomic hierarchies of classes, but class definitions, and the subsumption relation, but ontologies need not be limited to these forms. Ontologies are also not limited to conservative definitions, that is, definitions in the traditional logic sense that only introduce terminology and do not add any knowledge about the world (Enderton, 1972). To specify a conceptualization one needs to state axioms that do constrain the possible interpretations for the defined terms. [..] When a definition can be stated in logical axioms, it should be. Where possible, a complete definition (a predicate defined by necessary and sufficient conditions) is preferred over a partial definition (defined by only necessary or sufficient conditions). All definitions should be documented with natural language.

2. Coherence: An ontology should be coherent: that is, it should sanction inferences that are consistent with the definitions. At the least, the defining axioms should be logically consistent. Coherence should also apply to the concepts that are defined informally, such as those described in natural language documentation and examples. If a sentence that can be inferred from the axioms contradicts a definition or example given informally, then the ontology is incoherent.

3. Extendibility: An ontology should be designed to anticipate the uses of the shared vocabulary. It should offer a conceptual foundation for a range of anticipated tasks, and the representation should be crafted so that one can extend and specialize the ontology monotonically. In other words, one should be able to define new terms for special uses based on the existing vocabulary, in a way that does not require the revision of the existing definitions.

4. Minimal encoding bias: The conceptualization should be specified at the knowledge level without depending on a particular symbol-level encoding. An encoding bias results when a representation choices are made purely for the convenience of notation or implementation. Encoding bias should be minimized, because knowledge-sharing agents may be implemented in different representation systems and styles of representation.

5. Minimal ontological commitment: An ontology should require the minimal ontological commitment sufficient to support the intended knowledge sharing activities. An ontology should make as few claims as possible about the world being modeled, allowing the parties committed to the ontology freedom to specialize and instantiate the ontology as needed. Since ontological commitment is based on consistent use of vocabulary, ontological commitment can be minimized by specifying the weakest theory (allowing the most models) and defining only those terms that are essential to the communication of knowledge consistent with that theory.''

Knowledge-Base Implementation

According to clarity requisite, Simple-Logic uses an infix syntax that is understable also from unskilled users and permits user to specify comments about types, relations and rule declarations.

The first-order logic is rather powerful and there are many inference engines that execute it fast. Obviously there are field where others logical formalisms are better suited.

In order to achieve extendibility Simple-Logic adopts an object-oriented structure. The class role is played by type, the method-declaration role is played by relation-declaration and the method-implementation role is played by rule-declaration. The object-oriented structure permits reusing already specified types as building blocks for new conceptualizations.

Knowledge-Base Construction Process

Writing a good knowledge-base is not a simple task, so in order to reduce the effort a construction process must be followed. I think that a good process to follow is the object-oriented software construction process described in in [Meyer 1997]:

``object-oriented software construction is the software development method which bases the architecture of any software system on modules deduced from the types of objects it manipulates (rather than the function or functions that the system is intended to ensure).''

The first phase is specification. The scope of specification is to understand the problem. During this phase you must interact with costumers and future users, read the documentation for other systems in the same domain, search for useful design abstraction, applicable design patterns, existing libraries, possible use-cases for the final application usage etc. After this phase you have a well knowledge of the problem and a list of most important terms and relations. You must group the types and discovered concepts into clusters, as described in [Meyer 1997]:

``A cluster is a group of related classes or, recursively, of related cluster.

[..]The cluster is also the natural unit for single-developer mastery: each cluster should be managed by one person, and one person should be able to understand all of it - whereas in a large development no one can understand all of a system or even a major subsystem.''

Clusters permit splitting too complex problem into a collection of manageable sub problems that a single person can grasp. In Simple-Logic clusters are a set of types, relations and rules describing an important and distinct conceptualization. For example Jude has already definite clusters about physical unit of measures, documents and projects acknowledgment, chemical elements etc.

The life-cycle of the software is not based on the entire system at all but on every cluster, in order to support concurrent-engineering.

For each cluster there is again a specification phase.

The next phase is the design. During design you select the important class, reject the bad class and find the proper relations. Read [Meyer 1997] for more hints and guidelines.

The implementation phase correspond to the the specification of SimpleLogic rules and of types not directly related to user application domain.

The verification and validation phase consists in the check that the clusters types perform satisfactorily.

The last phase is generalization, described in [Meyer 1997]:

``It goals is to polish the types so as to turn them into potentially reusable software components. [..] The generalization task may involve the following activities:

abstracting: introducing a deferred class to describe the pure abstraction behind a certain class

factoring: recognizing that two classes, originally unrelated, are in fact variants of the same general notion, which can then be described by a common ancestor.

adding assertions, especially postconditions and invariant clauses, which reflect increased understanding of the semantic of the class and its features[..]

adding documentation''

The process permits you to build a first version of the cluster speedily without inhibition. Once you have understand better the problem, you can now refine the cluster.

Conclusion

During knowledge-base specification you must:

group real world entities into SimpleLogic types;
represent real world entities relationship through SimpleLogic relations between objects;
express rules of real world (sometimes called business-rules) into SimpleLogic rules

The object-oriented structure of knowledge-base permits reusing already specified types with theirs relations and rules, inside new clusters.

The knowledge-base is a formal description of the problem. It is specified using a declarative and understable language. This helps expert users of a particular domain to collaborate with knowledge engineerings during development process and also to use knowledge-base as documentation source for users that are aware of the application domain.

During the construction process shifts of notation do not exist. According to [Meyer 1997] this can be defined a seamless development process. The process consists of successive-refinements of initial specification. Each phase increases understanding of the problem and improve the final solution.

The splitting of a big problem into simpler sub problems (clusters) simplifies the development effort and permit concurrent engineering.

Next: Agents Up: Java Uniform Document Environment Previous: A Simple Case Study Contents

Massimo Zaniboni 2001-03-10