This chapter shows the benefits of a knowledge-base specification in the field of information systems, introduces a language for their specification and a process for their implementation.
In [Mayol] we can read:
``Deductive databases are based on the concepts of deductive rules (views) and integrity constraints. Deductive rules allow to deduce new facts (derived or view facts) from base facts explicitly stored in the database and from other derived facts. On the other hand, integrity constraints define conditions that each state of the database is required to satisfy.
Deductive databases use (first-order) logic as a base language and generalize relational data bases by overcoming the limitations of relational languages in the definition of complex views and constraints. These features together with appropriate reasoning capabilities ease the sharing of common knowledge within complex application domains, facilitating program development and reuse on the way. ``A deductive database in Jude is called knowledge-base and it represents an explicit conceptualization, as described in [Gruber 1993]:
`` A conceptualization is an abstract, simplified view of the world that we wish to represent for some purpose. Every knowledge base, knowledge-based system, or knowledge-level agent is committed to some conceptualization, explicitly or implicitly.
An ontology [knowledge-base] is an explicit specification of a conceptualization.[..] When the knowledge of a domain is represented in a declarative formalism, the set of objects that can be represented is called the universe of discourse. This set of objects, and the describable relationships among them, are reflected in the representational vocabulary with which a knowledge-based program represents knowledge. [..] Formally, an ontology is the statement of a logical theory.''The knowledge-base is specified using a declarative programming paradigm. A declarative statement specifies what must be done, on the contrary an imperative statement specifies how it should be done. Declarative specifications often are written using an high-level formalism. An high level formalism permits expressing things directly related to the problem domain. Despite low level specifications, an high-level specifications ignore details related to computer resources management. So declarative statements tend to be more concise and readable than imperative specifications because all the low level details about the effective instructions to execute are not explicitly specified. A compiler and/or a run-time engine (in our case the inference-engine of a deductive database management system) translate the declarative specifications into imperative machine executable instructions for you.
There are many available deductive database management systems. I have selected XSB because it is under the GNU license and it is actively maintained. This is a brief description of XSB tracts from the user manual:[Sagonas 2000]
``XSB is a research-oriented Logic Programming system for Unix and Windows-bases systems. In addition to providing all the functionality of Prolog, XSB contains several features not usually found in Logic Programming systems, including: (*) Evaluation according the Well-Founded Semantics through full SLG resolution; (*) A compiled HiLog implementation; (*) A variety of indexing techniques for asserted code, along with a novel transformation technique called unification factoring that can improve program speed and indexing for compiled code; (*) A number of interfaces to other software systems, such a C, Java, Perl and Oracle[...](*) Source code availability for portability and extensibility.''However up to date XSB lacks lots of facilities needed by a DBMS. For example it can not execute concurrent queries.
SimpleLogic is the language used in Jude to specify knowledge-bases. It is a language created during Jude design, in order to increase the readability of Datalog or Prolog-like programs for non skilled users.
At the moment it lacks a lot of useful features of a real modern language. See chapter for more details.
Simple-Logic statements are written and read from left to right as English sentences. This helps to produce knowledge bases understable also from non technical-skilled users. The disadvantage is that SimpleLogic has a verbose syntax, while there are other formalisms using a more terse syntax that is more understable for skilled users.
All capital terms represent variable names. Lower case terms are SimpleLogic keywords, relations or object identifiers. Remember that the symbol ``_'' represent a variable without name and without particular meaning inside the statement, equivalent to the word ``any'' in English.
A SimpleLogic knowledge-base is composed by:
declare relation person has_age number at_date date''
assert PERSON has_age AGE at_date DATE
if PERSON is_a person
and PERSON has_birthday BIRTHDAY
and BIRTHDAY has_year YEAR
and DATE has_year THIS_YEAR
and AGE is_equal_to THIS_YEAR - YEAR
Simple-Logic has a first-order predicate calculus semantics.
The real source for documentation of SimpleLogic semantic is the compiler itself written in Java and obviously the related XSB-Prolog semantic.
The semantics is described informally through examples from SimpleLogic to XSB-Prolog.
The SimpleLogic fact
has_cardinality(has_ate_at_date,3).
has_argument_type(has_ate_at_date,person,1).
has_argument_type(has_ate_at_date,number,2).
has_argument_type(has_ate_at_date,date,2).
.. others meta info facts ..
assert PERSON has_age AGE at_date DATE
if PERSON is_a person
and PERSON has_birthday BIRTHDAY
and BIRTHDAY has_year YEAR
and DATE has_year THIS_YEAR
and AGE is_equal_to THIS_YEAR - YEAR
has_description(rule_1,``Assert the age of a person using its birthday'').
is_a(instance_1, single_relation_instance).
.. and others meta info facts ..
has_age_at_date(PERSON,AGE,DATE) :-
is_a(PERSON,person),
has_birthday(PERSON,BIRTHDAY),
has_year(BIRTHDAY,YEAR),
has_year(DATE,THIS_YEAR),
is_equal_to(AGE,THIS_YEAR - YEAR).
At the moment SimpleLogic has a minimal support for object-oriented programming style. See chapter for possible extensions of the language.
The basic.simplelogic cluster defines two important types: object and type.
Object is the type of all the objects of the knowledge-base, is the set containing all the elements , is the directory containing all the resources.
Type is the type of all other type of the knowledge-base, is the set containing all other sets, is the directory containing all the types.
The set theory is specified using there rules:
declare relation type is_type_of object;
assert TYPE is_type_of OBJECT if OBJECT is_a TYPE;
assert OBJECT is_a TYPE if TYPE is_type_of OBJECT;
declare relation type is_subtype_of type ;
assert _ is_a object;
assert X is_a type if X is_subtype_of _;
assert object is_a type;
assert type is_subtype_of object;
assert type is_a type;
assert X is_subtype_of X if X is_a type;
assert X is_subtype_of Y
if X is_subtype_of Z
and X not_unify_with Z
and Z is_subtype_of Y;
assert OBJECT is_a SUPER_TYPE
if TYPE is_subtype_of SUPER_TYPE
and OBJECT is_a TYPE ;
In SimpleLogic you can use rules in order to classify objects. For example:
if S is_a crystalline_structure
and S has_alpha_angle right_angle
and S has_beta_angle right_angle
and S has_gamma_angle angle_120
and S has_a_distance D_AB
and S has_b_distance D_AB
and S has_c_distance D_C
and not D_C is_equal_to D_AB
and S has_ions_per_cell 8;
SimpleLogic supports multiple inheritance, so an object can have multiple types. Note that if a rule applies to a given type, then it applies also to all its sub types, if it is not explicitly avoided. This differs from typical object-oriented languages where a subtype method overrides the super type method having the same name.
This section describes a SimpleLogic support for transaction logic that is only planned and not already implemented.
Transaction-logic is described in [Bonnery 1995]:
``An extension of predicate logic, called Transaction Logic, is proposed, which accounts in a clean and declarative fashion for the phenomenon of state changes in logic programs and databases. Transaction Logic has a natural model theory and a sound and complete proof theory, but unlike many other logics, it allows users to program transactions. This is possible because, like classical logic, Transaction Logic has a "Horn" version which has a procedural as well as a declarative semantics. In addition, the semantics leads naturally to features whose amalgamation in a single logic has proved elusive in the past. These features include both hypothetical and committed updates, dynamic constraints on transaction execution, nondeterminism, and bulk updates. Finally, Transaction Logic holds promise as a logical model of hitherto non-logical phenomena, including so-called procedural knowledge in AI, active databases, and the behavior of object-oriented databases, especially methods with side effects.''Transaction Logic is the more coherent way to add knowledge-base modification commands to SimpleLogic.
First we must define elementary actions. An elementary action corresponds to assert or retract of a base fact in the knowledge-base. Suppose for example to have this relation:
Suppose we want to declare a new non elementary command that add money to a user balance:
add_money MONEY to USER
if ((USER has_balance OLD_BALANCE
and retract:[USER has_balance OLD_BALANCE])
or (OLD_BALANCE equal_to 0))
and assert:[USER has_balance OLD_BALANCE + MONEY].
remove_money MONEY to USER balance
if USER has_balance OLD_BALANCE
and OLD_BALANCE is_greater_or_equal_to MONEY
and MONEY_TO_REMOVE is_equal_to 0 - MONEY
and add_money MONEY_TO_REMOVE to USER;
if MONEY_TO_REMOVE is_equal_to 0 - MONEY
and add_money MONEY_TO_REMOVE to USER
and USER has_balance NEW_BALANCE
and NEW_BALANCE is_greater_or_equal_to 0;
A transaction is a group of actions that succeeds or fail as a single unit of work. An action is executed only if all conditions and sub actions are satisfied, otherwise the action (transaction) is not executed. This paradigm simplifies the specification of an action because we are not forced to group all conditions before the action but we can freely mix conditions and actions and specify compound actions.
These are the criteria that must guide the design of a knowledge-base described in [Gruber 1993]:
``When we choose how to represent something in an ontology, we are making design decisions. To guide and evaluate our designs, we need objective criteria that are founded on the purpose of the resulting artifact, rather than based on a priori notions of naturalness or Truth. Here we propose a preliminary set of design criteria for ontologies whose purpose is knowledge sharing and interaction among programs based on a shared conceptualization.
1. Clarity: An ontology should effectively communicate the intended meaning of defined terms. Definitions should be objective. While the motivation for defining a Ontology are often equated with taxonomic hierarchies of classes, but class definitions, and the subsumption relation, but ontologies need not be limited to these forms. Ontologies are also not limited to conservative definitions, that is, definitions in the traditional logic sense that only introduce terminology and do not add any knowledge about the world (Enderton, 1972). To specify a conceptualization one needs to state axioms that do constrain the possible interpretations for the defined terms. [..] When a definition can be stated in logical axioms, it should be. Where possible, a complete definition (a predicate defined by necessary and sufficient conditions) is preferred over a partial definition (defined by only necessary or sufficient conditions). All definitions should be documented with natural language.
2. Coherence: An ontology should be coherent: that is, it should sanction inferences that are consistent with the definitions. At the least, the defining axioms should be logically consistent. Coherence should also apply to the concepts that are defined informally, such as those described in natural language documentation and examples. If a sentence that can be inferred from the axioms contradicts a definition or example given informally, then the ontology is incoherent.
3. Extendibility: An ontology should be designed to anticipate the uses of the shared vocabulary. It should offer a conceptual foundation for a range of anticipated tasks, and the representation should be crafted so that one can extend and specialize the ontology monotonically. In other words, one should be able to define new terms for special uses based on the existing vocabulary, in a way that does not require the revision of the existing definitions.
4. Minimal encoding bias: The conceptualization should be specified at the knowledge level without depending on a particular symbol-level encoding. An encoding bias results when a representation choices are made purely for the convenience of notation or implementation. Encoding bias should be minimized, because knowledge-sharing agents may be implemented in different representation systems and styles of representation.
5. Minimal ontological commitment: An ontology should require the minimal ontological commitment sufficient to support the intended knowledge sharing activities. An ontology should make as few claims as possible about the world being modeled, allowing the parties committed to the ontology freedom to specialize and instantiate the ontology as needed. Since ontological commitment is based on consistent use of vocabulary, ontological commitment can be minimized by specifying the weakest theory (allowing the most models) and defining only those terms that are essential to the communication of knowledge consistent with that theory.''
According to clarity requisite, Simple-Logic uses an infix syntax that is understable also from unskilled users and permits user to specify comments about types, relations and rule declarations.
The first-order logic is rather powerful and there are many inference engines that execute it fast. Obviously there are field where others logical formalisms are better suited.
In order to achieve extendibility Simple-Logic adopts an object-oriented structure. The class role is played by type, the method-declaration role is played by relation-declaration and the method-implementation role is played by rule-declaration. The object-oriented structure permits reusing already specified types as building blocks for new conceptualizations.
Writing a good knowledge-base is not a simple task, so in order to reduce the effort a construction process must be followed. I think that a good process to follow is the object-oriented software construction process described in in [Meyer 1997]:
``object-oriented software construction is the software development method which bases the architecture of any software system on modules deduced from the types of objects it manipulates (rather than the function or functions that the system is intended to ensure).''The first phase is specification. The scope of specification is to understand the problem. During this phase you must interact with costumers and future users, read the documentation for other systems in the same domain, search for useful design abstraction, applicable design patterns, existing libraries, possible use-cases for the final application usage etc. After this phase you have a well knowledge of the problem and a list of most important terms and relations. You must group the types and discovered concepts into clusters, as described in [Meyer 1997]:
``A cluster is a group of related classes or, recursively, of related cluster.
[..]The cluster is also the natural unit for single-developer mastery: each cluster should be managed by one person, and one person should be able to understand all of it - whereas in a large development no one can understand all of a system or even a major subsystem.''Clusters permit splitting too complex problem into a collection of manageable sub problems that a single person can grasp. In Simple-Logic clusters are a set of types, relations and rules describing an important and distinct conceptualization. For example Jude has already definite clusters about physical unit of measures, documents and projects acknowledgment, chemical elements etc.
The life-cycle of the software is not based on the entire system at all but on every cluster, in order to support concurrent-engineering.
For each cluster there is again a specification phase.
The next phase is the design. During design you select the important class, reject the bad class and find the proper relations. Read [Meyer 1997] for more hints and guidelines.
The implementation phase correspond to the the specification of SimpleLogic rules and of types not directly related to user application domain.
The verification and validation phase consists in the check that the clusters types perform satisfactorily.
The last phase is generalization, described in [Meyer 1997]:
``It goals is to polish the types so as to turn them into potentially reusable software components. [..] The generalization task may involve the following activities:
abstracting: introducing a deferred class to describe the pure abstraction behind a certain class
factoring: recognizing that two classes, originally unrelated, are in fact variants of the same general notion, which can then be described by a common ancestor.
adding assertions, especially postconditions and invariant clauses, which reflect increased understanding of the semantic of the class and its features[..]
adding documentation''The process permits you to build a first version of the cluster speedily without inhibition. Once you have understand better the problem, you can now refine the cluster.
During knowledge-base specification you must:
The knowledge-base is a formal description of the problem. It is specified using a declarative and understable language. This helps expert users of a particular domain to collaborate with knowledge engineerings during development process and also to use knowledge-base as documentation source for users that are aware of the application domain.
During the construction process shifts of notation do not exist. According to [Meyer 1997] this can be defined a seamless development process. The process consists of successive-refinements of initial specification. Each phase increases understanding of the problem and improve the final solution.
The splitting of a big problem into simpler sub problems (clusters) simplifies the development effort and permit concurrent engineering.