Computing Applications

The Extensible Rule Markup Language

XRML explicates the rules implicitly embedded in Web pages, enabling software agents to process the rules automatically.

By Jae Kyu Lee and Mye M. Sohn

Posted May 1 2003

Introduction
KBSs and KMSs
Syntax of RIML, RSML, and RTML
Rule Markup Language Research
Conclusion
References
Authors
Figures
Tables

The Hypertext Markup Language (HTML) makes Web technology possible by giving users the ability to browse. But software agents cannot understand HTML files because their general-purpose natural language processing capability is not totally reliable or efficient. To overcome this limitation, the Extensible Markup Language (XML) explicates the implicitly embedded data in a formal structure with mutually agreed-on semantic definitions. Many industrial-strength standard initiatives using XML are under way, including the Electronic Business XML Initiative, XML Common Business Library, Open Trading Protocol, Open Business on the Internet, Common Business Language, RosettaNet, and BizTalk. Some of them have become standards for message exchanges in business-to-business (B2B) e-commerce in a variety of industries. Knowledge representation for software agent communication has also adopted the XML platform [6].

The Semantic Web community is concerned about processing the rules implicitly embedded in Web pages that cannot be processed with XML. The implicit rules must be represented in such a way as to allow software agents to process and browse the original Web pages for human comprehension. We have now developed a language—the eXtensible Rule Markup Language (XRML)—for this purpose; Figure 1 contrasts the topology of XRML with XML and HTML. Pages written in XRML should be transformable into XML and eventually into HTML; an early version of XRML was presented in [7].

In order to achieve these goals, the implicit rules embedded in Web pages must be identifiable, interchangeable with structured-format rule-based systems, and accessible by applications. Thus XRML requires three components:

Rule Identification Markup Language (RIML). The meta-knowledge expressed in RIML should be able to identify the existence of implicit rules in the hypertexts on the Web; the formal association with the explicitly represented structured rules should also be identified.
Rule Structure Markup Language (RSML). The rules in knowledge-based systems (KBSs) must be represented in a formal structure so they can be processed with inference engines. The identified implicit rules are transformed into the formal rule structure of RSML. However, since there is no clue for linking two representations directly, we need an intermediate representation—RSML. The rules represented in RSML should be transformed automatically into structured rules, while RSML needs the support of generation and maintenance from RIML in hypertext.
Rule Triggering Markup Language (RTML). RTML defines the conditions that trigger certain rules. RTML is embedded in KBSs, as well as in software agents (such as the forms in workflow management systems).

Figure 2 outlines an architecture we designed to apply XRML to workflow management. The RTML embedded in forms can trigger the inference engine to use the rules generated from RSML. Note that humans can read hypertext on the browser, XML statements are transformed into data in the database, and RSML statements are transformed into rules in the KBS. The inference engine in the KBS calls rules and data, returning the inference results back to the inquiring software agents (in this case, the workflow management system). A challenging issue is how to extract RSML from hypertext while maintaining consistency with RIML.

As we designed XRML, we pursued six main goals:

Expressional completeness. RSML should be completely transformable into a canonical syntax of structured rules;
Relevance linkability. Linkages of the relevance between hypertexts with RIML, as well as rules in RSML syntax (called RSML rules), should be expressed completely;
Polymorphous consistency. Consistency should be maintained for knowledge expressed in various types of expressions (such as RSML rules and hypertext with RIML);
Applicative universality. The rule expressions in RSML should be able to support multiple applications embedding RTML in the domain universe;
Knowledge integratability. Structured rules collected from multiple sources, including those from RSML, should be integrated uniformly; and
Interoperability. Rules in RSML should be exchangeable and sharable among multiple commercial solutions.

KBSs and KMSs

To explain XRML’s inherent practical value as a Semantic Web, consider two relevant disciplines—KBSs and knowledge management systems (KMSs)—which, despite having similar names, have evolved from different research roots (see the table).

Stemming from research in artificial intelligence in the early 1980s [9], a KBS’s main goal is the automatic inference of coded knowledge. Natural language understanding is a key part of knowledge processing, but success in real-world applications has been limited. Therefore, for knowledge processing, practical KBSs use only structured knowledge representations (such as rules, predicate calculus, objects, and tailored inference engines). In light of its limited capability in commonsense reasoning, applications are developed for specific domains (such as diagnosis, configuration, manufacturing planning, and managerial decision making). Knowledge acquisition and maintenance have been hurdles for justifying implementation. The KBS application domain was recently expanded to include intelligent email interpretation and classification, smart advisories about products for customer service and training, online configurations, and help desks providing technical support [4]. Knowledge acquisition from Web pages is being explored [5], as is a tool that automatically generates the hypertext structure [12].

KMSs began with Web technology supporting powerful search engines covering the vast knowledge available through the Internet, as well as through intranets and extranets. The primary users of KMSs are not software agents but humans desiring comprehension of Web pages with the help of interactive search. Technically speaking, any knowledge on the Web (or any other storage structure, including databases) is within the scope of a KMS application. The KMS application domain is thus more general. Historically, knowledge management research has exploited the issues of knowledge sharing and reuse, mainly from managerial and motivational points of view.

Commercial-scale KBS/KMS convergence is inevitable because knowledge should be sharable by humans and software agents [9, 10] and is precisely the goal XRML researchers pursue today. The necessity of maintaining consistency between the hypertext knowledge in KMS and the structured rules in KBS is a key research issue in XRML development. XRML is thus a framework for integrating KBSs and KMSs.

Generating RSML rules from hypertext can be regarded as a process of knowledge extraction; generating meta-knowledge of the relationships between hypertext and RSML rules (regarding which hypertext is related to which RSML rules and vice versa) is a process of meta-knowledge extraction. Knowledge acquisition from a variety of sources is generally very expensive, but knowledge extraction from existing hypertexts is less a social issue than a technical issue and thus can be cost-effective.

XRML is thus a framework for integrating knowledge-based systems and knowledge-management systems.

A sea of hypertext knowledge is already coded in markup language form on the Internet, so the cost of XRML applications is readily justified while yielding enormous benefit. XRML is not only the next step for KBSs and KMSs but also the direction rule markup language researchers worldwide are pursuing for the Semantic Web [1, 3].

Syntax of RIML, RSML, and RTML

Consider the syntax of RIML, RSML, and RTML; see [11] for the full XRML 0.5 syntax, along with an example application involving research fund account management.

RIML. Suppose two paragraphs in HTML describe the regulations about a research budget’s expenditures:

<HTML>
A research account can be spent only within the limit of the contract budget, according to the following restrictions.
If the budgetary source is the type-P research fund, the spendable items are limited to on student’s salary and expenses for data collection.
</HTML>

The second paragraph includes an implicit rule that can be explicitly expressed as:

Rule Title: Restriction of Type-P Research Fund Expenditure
IF ((budgetary source IS type-P research fund) AND ((spendable item IS student’s salary) OR (spendable item IS expense for data collection)))
THEN expenditure IS permitted

Even though the two types of expression imply the same regulation, the relationship between them is not clear. So we need to add meta-knowledge on how the text relates to the structured rule, as outlined in the following example:

<HTML>
A research account can be spent only within the limit of the contract budget, according to the restrictions.

<RIML>
<Rule>
<RuleTitle> Restriction of Type-P Research Fund Expenditure </RuleTitle>
If the <variable1>budgetary source<variable1> is the <value1>type-P research fund</value1>, the <variable2>spendable items</variable2> are limited to on <value2>student’s salary</value2> and <value2>expenses for data collection</value2>.
</Rule>
</RIML>
</HTML>

Here, the section related to the structured rules is delineated by <RIML> … </RIML>. The rule and its title are identified by <Rule> … </Rule> and <RuleTitle> … </RuleTitle>. The tags <variable#> and <value#> identify the variables and values used in the structured rule. The same numbers in the tags imply a particular association between a variable and a value. The HTML/RIML can be transformed into the original HTML file by eliminating RIML statements. The transformation process can become increasingly complex as more RIML commands are employed for arithmetical functions.

These tags are extensible if we need to identify algebraic operators, functions, and tables. For instance, simple algebraic operators (such as GreaterThan and LessThan) can be added. More sophisticated and domain-specific tags allow easier comprehension of the relationships but require more knowledge-editing effort. So we need to balance the benefit of sophisticated RIML with the effort to transform RIML statements into RSML statements. The tags may be abbreviated to shorten terms (such as vr, for variable, and vl, for value). The HTML/RIML editor must support the process of editing the hypertext, as well as its meta-knowledge.

RSML. After identifying RIML statements, we need to transform them into the intermediate representation of rules specified in RSML that can be readily associated with RIML. Note that the variables are transformed into tags in XML syntax with their values within the paired tags. Rules in this syntax can be directly matched with the data in the XML file—a big advantage of RSML.

<RSML>
<Rule>
<RuleTitle> Restriction of Type-P Research Fund Expenditure </RuleTitle>
<IF>
<AND>
<budgetary source>type-P research fund</budgetary source>
<OR>
<spendable item>student’s salary</spendable item>
<spendable item>expense for data collection</spendable item>
</OR>
</AND>
</IF>
<THEN>
<Result Action=”Add_Value”>
<expenditure>permitted</expenditure>
</Result>
</THEN>
</Rule>
</RSML>

Note that the variables and values in RSML are the same as the words identified in RIML. Using the definitions in RIML, the RSML editor generates a crude version of the rules by assigning the key words to corresponding slots of variables and values. In the future, as the rules in RSML need to be revised, the same relationship can be traced in the reverse direction to identify which paragraphs and words are associated with the rules. Thus, consistency between RIML and RSML can be maintained. A thesaurus of synonyms and the plausibility of associations between variables and between variables and values in the application domains ensure that knowledge editing and maintenance are easier and more accurate. Developing aids for performing consistency maintenance is a challenging research issue and an opportunity for ontology-based knowledge engineering.

RSML statements can be transformed into canonical rules by matching the reserved words of RSML. To improve the automated editing of RSML, knowledge engineers need to specify more meta-knowledge during the RIML stage; for instance, if the association knowledge concerning a variable and a value is specified in RIML, a statement with the variable and the value can be generated automatically, as mentioned earlier. Thus, the total effort needed for knowledge management is determined by how RIML and RSML are generated and maintained.

RTML. RTML is a language embedded in such applications as forms in workflow management and in software agents. Therefore, RTML has to define a set of standard statements about when to trigger the inference, as well as which rules to use and how to use the obtained result. RTML tags are useful for identifying the relevant tags in RIML and RSML and the data files in the XML format.

The following example of RTML syntax shows that when a requisition using Type-P research funds is requested, the rule-based inference is triggered to derive the permission decision:

<RTML>
 <WhenTrigger>

 <AND>
 <requisition>on</requisition>

 <budgetary source>type-P research fund</budgetary source>

 </AND>

 </WhenTrigger>

 <Bring>

 <RuleTitle>Restriction of Type-P Research Fund Expenditure</RuleTitle>

 <DataFile>Research Fund Accounts</DataFile>

 </Bring>

 <Result>

 <expenditure>permitted </expenditure >

 </Result>

</RTML>

The tag <WhenTrigger> specifies the condition of rule triggering; <Bring> brings the relevant rules and data to the inference engine; and <Result> returns the inference result as the value of the tag. The application program, probably written in Java, can call up the returned results.

Rule Markup Language Research

The emerging research in rule markup languages covers a range of issues; for example, RuleML was described in [2]. Although most of it focuses on the representation of rules, objects, cases, and functions in XML format, they do not cover the knowledge association and consistency issues found in XRML:

Business Rules Markup Language. Specifies a common rule structure to exchange rules between heterogeneous rule-based systems;
Agent-Object-Relationship Markup Language. Describes the business rules to be processed with software agents, including the business process, interaction process, sequence of events, actions, activities, and control;
Universal Rule Markup Language. Represents the input/output data of AI applications in XML for reducing conversion effort and time;
Artificial Intelligence Markup Language. Is the XML specification for the Artificial Linguistic Internet Computer Entity (ALICE) using a simple pattern-matching technique;
Case-Based Markup Language. Is an XML-based case-representation language for achieving interoperability and flexibility of case reuse; and
Relational-Functional Markup Language. Is an XML version of Relfun, a logic programming language using call-by-value expressions.

XRML can be applied to a spectrum of Web-based KBSs and make KMSs more intelligent. Figure 2 outlines an XRML application in the workflow environment; other examples include the following:

Automated form processing. Object-oriented forms equipped with RTML can trigger inquiries for automated approval of routine and frequent tasks (such as business travel reimbursement and petty acquisitions). This function is effectively integrated with desktop purchases in which the requisitioner bypasses both the approval process and the procurement department [8]. The hypertext used in forms can also be visualized for the requisitioners via the Web.

Preventive auditing. Certain activities need to be audited. Audit knowledge implemented in XRML can be displayed for the inquirers and automatically stamp the auditors’ approval. Auditors can focus on knowledge maintenance instead of on audit transactions.

Agent-based intra- and interorganizational e-commerce. During B2B transactions and collaboration, software agents can request knowledge about products and services, as well as about contract terms and conditions. XRML is particularly useful in call centers because human agents are often not expert enough to address the inquiries. In this case, they may use the relevant KBS and share it with customers. Synchronized Web browsers for both human agents and customers would improve their communication.

The technical challenges involved in implementing XRML in real-world applications should be viewed more as an opportunity than a hurdle:

Consistency maintenance of polymorphic knowledge representations. The same data and rules may exist in a relational database, in HTML and XML files, in RSML, and in the rulebase. So when one type of knowledge or data changes, consistency must be maintained. Consistency in XRML among HTML/RIML, RSML, and XML is especially important. Meta-knowledge supports the process.
Domain-specific thesaurus. RIML can start with a natural language-independent syntax. Understanding the syntax of a particular natural language (such as English and Korean) is, however, helpful for identifying the relationships among variables and values. Moreover, to support the frequently used domains, including online customer support for electronic products, knowledge engineers can a priori define the relationship among vocabulary items.
Multi-URL-based inference. In the earlier example concerning the research fund account, a single Web page was used to infer a certain issue. However, an inference may require more than one Web page. Therefore, because the RSML rules need to maintain the information about the URL, as well as the rules in each URL, the tags in XRML also have to be extended.
Integration of rules from RSML and other sources. When inquiries by RTML require knowledge from RSML, as well as from other sources, RTML should be able to identify the rules from all necessary sources. If all types of rules are transformed into a canonical structure in advance, then no matter what the initial form had been, the problems at execution time can be avoided. However, this approach increases the integration effort during knowledge maintenance.

Conclusion

During the past two years, we designed XRML version 1.0 and developed its prototype—called Form/XRML—an automated form processing system for disbursing research funds at the Korea Advanced Institute of Science and Technology in Seoul. Since XRML allows humans, as well as software agents, to use the rules implicitly embedded in Web pages, the potential for its application in knowledge management is promising. XRML can also contribute to the progress of Semantic Web platforms, making knowledge management and e-commerce more intelligent. Since many research groups and vendors are investigating these issues, we can expect to see XRML in commercial products within a few years. Meanwhile, mature XRML applications may change the way information and knowledge systems are designed and used.

Figures

Figure 1. XRML topology.

Figure 2. Illustrative XRML architecture.

Tables

Table. KBS and KMS compared.

Submit an Article to CACM

CACM welcomes unsolicited submissions on topics of relevance and value to the computing community.

You Just Read

The Extensible Rule Markup Language

View in the ACM Digital Library

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

DOI

10.1145/769800.769802

May 2003 Issue

Published: May 1, 2003

Vol. 46 No. 5

Pages: 59-64

Table of Contents

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Explore More

News Apr 18 2024

Keeping AI Out of Elections

Bennie Mols

Artificial Intelligence and Machine Learning

BLOG@CACM Apr 17 2024

Technical Marvels

Herbert Bruderer

Computer History

BLOG@CACM Apr 16 2024

The Value of Data in Embodied Artificial Intelligence

Shaoshan Liu

Artificial Intelligence and Machine Learning

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More

KBSs and KMSs

Syntax of RIML, RSML, and RTML

Rule Markup Language Research

Conclusion

Figures

Tables

The Extensible Rule Markup Language

DOI

May 2003 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.