CMS IN 1999/051
CMS Internal Note
The content of this note is intended for CMS internal use and distribution only
10 November 1999
Object Oriented Database SerializationUsing Extensible Markup Language, XML
Muhammad Anzar Afaq
&
Syed Shafqat SaeedCTC-CERN Computer CentreCTC, Islamabad, Pakistan
Heinz StockingerCERN, Geneva, Switzerland
Abstract
Various R&D activities are going on within CMS towards identifying and implementing adatabase replication and migration methods that can be used to replicate or migrate objectoriented databases across the network between CMS and its regional Centres around the world[1] [2] [3]. There are various techniques for doing it. Basically to replicate/migrate a database,first it has to be serialized in a format suitable for database transfer through the network. XML ischosen as the data format for serialized objects. Objects are read from an Objectivity databaseand serialized into an XML file. To facilitate distribution of the serialized database, CORBAtechnology is used for the exchange of database among various Centres. This work is done as apart of WISDOM project currently in progress at CTC Islamabad, Pakistan.
This work is also going to be used as a part of CRISTAL project to replicate/migrate OOdatabases between Central and Local Centres [4] [5].
1 Introduction
Object serialization means storing an object state and model in a form that could be accessed serially,like storing it in a disk file, or transferred via network, like HTML data. The serialized data is thenused to reproduce or re-create the object. Every object has two things associated to it, the state of theobject which usually means the current values stored in various member variables of the object and itsmodel which means the skeleton of the object with member variable definitions and member functiondefinition/implementation. To serialize an object, both the state of the object and its model, should beserialized in order to be able to de-serialize the object.
De-serialization means, recreating the object by reading its state from the serialized data. The objectmodel plays an important role in the serialization/de-serialization process, it gives completeinformation about the object. The names of the member variable associated to the object, possibleoperations defined in the object and links of the object with other objects are all defined in the objectmodel or class definition. The Object Oriented Database created in Objectivity is persistent. Objectshave two parts ’The Object Data’ and ’The Object/Data Model’ called Schema. The Serialization ofobject data is relatively simple as Objectivity provides methods to open an object from a database andaccess its data and other attributes like its type and its Unique Object ID (OID). The generalinformation about the object, like OID and type, is accessible to every application while the access toobject data is only possible if its Schema is known. Knowing the schema a serializer application canput this information in some format for the de-serializer. This Serialized Object can be transported tode-serializer application as a serialized object. The serialized object can be de-serialized by re-creatingit at destination database and filling the data values from the values stored in the serialized object. Thisre-creation can only be possible if the schema of the object is known to the de-serializer application aswell. This means for a complete serialization/de-serialization process the Schema of the ObjectOriented Database has to be known to both serializer and de-serializer applications.
The other issue would be to make the serialization/de-serialization Schema Independent. That could beachieved only if there is a way to serialize the object model (Schema) along with the object data, bythe serializer at source database, and this Schema could be made known to Objectivity database at thedestination. Unfortunately there is no such way of doing it in Objectivity yet. There are some\"Dynamic Schema Binding\" specifications in Objectivity 5.2 but they are still in their BETA version.As discussed above, knowing the schema is only required when accessing the data of the object, toserialize it, and when re-creating the object in the target database to de-serialize it. Other than thesetwo requirements a Schema Independent application can be developed that opens any object, in anyContainer in any Database of any Federation and write its serialized object. Using these basic ideas wehave developed a serializer/de-serializer application that can be used to replicate/migrate anObjectivity Database. The only limitation is that both ends should know the Schema.
We are using the widely acceptable Extensible Markup Language, XML, as the data format forserialized objects for the following reasons,
• XML supports transfer over the network.
• Availability of standard parsers/de-parsers for XML generation and interpretation.• Standard data format for exchange data among different applications.
2
DB ReaderCORBA ClientFigure 1: UML model of the software
2 Description of Software
Figure 1 shows the UML model of software that we have developed. It consists of following modules,
• Objectivity/DB• DB reader• XML Generator• CORBA Server & Client.
2.1 Objectivity/DB
An Event database is created in Objectivity. Figure 2 shows the Object Model for the database. We areusing a realistic Database Model for this test database (Event database). There are many object typesand there are many associations among the objects. For example consider the Event object, it isassociated to Run, DetectorSet etc.
3
Figure 2: UML model of Event database
2.2 DB Reader
This module is our interface to the Objectivity database (Objy/DB) discussed above. It providesservices to the XML_Generator module, which produces the XML form of the Object or the serializedObject. This module can be interrogated about any details of the Federation, Database, Container orObject. It opens the Federation, iterates over Databases, iterates over the Containers, and then iteratesover Objects in a Container. After opening an Object, DB Reader writes its information in transientSTL containers then the XML_Generator writes down its XML form. DB Reader then goes for thenext Object. This approach is adapted to keep memory requirements low. Reading a single object at atime reduces the memory requirements and allows for a huge database access and serialization.This DB Reader module consists of a class DBReader, this class has the following services:-• Provide the mechanism of iteration through the Federation, container and objects.• Return the information of the current object selected.
• Provide the mechanism of checking associations between objects.
DB Reader provides us the above-discussed schema independence by implementing the first two ofthe above-mentioned services without direct use of the Schema. In order to find the name and type ofthe member variables of the object and to get information about the existence of association betweenthe objects the schema is used.
4
2.3 XML Generator
This module reads information object by object from DB Reader and puts it in an XML file. It is usingthe DOM API provided by IBM to develop XML applications. DOM is a W3C recommendation. Itconsists of a class obj2xml. To serialize an object we invoke writeObjXML(), passing it the necessaryinformation about the object read from DBReader. writeObjXML() adds the node corresponding to thisobject in the DOM structure according to the following generalized DTD for an Objectivity database.This DTD is an early version that can be changed in future according the requirement for a de-serializer and a more generic serialization approach.
]>
For objects of type RUN & EVENT we get the following XML (showing one Object per type here forspace conservation).
5 We use the OID returned by Objectivity to create links between various objects. The OID returned byObjectivity is a string like #2-4-8-1. Here we have replaced ’#’ character with ’L’ (like L2-4-8-1)because of a limitation of XML. XML requires that an ID/IDREF attribute can only start with a letter.The use of ID/IDREF can help in providing a presentation layer if an XSL is added to the XMLdocument. One can even create an ooBrowser in Internet Explorer. The XML so generated is \"VALID\" and \"WELL-FORMED\". We use XML tools like XML-SPY andInternet Explore 5 to check the validity of the XML file. 2.4 CORBA Server & Client The XML data produced has to be transported to the system where it needs to be de-serialized. Thiscan be achieved by transferring the XML file using FTP or simple network socket programming. Thisapproach could surely provide the file transfer mechanism. But as the databases involved in CMS andCMS related projects like CRISTAL are of distributed nature it is better to provide a mechanismwhich supports it. A CORBA server, implemented in IONA’s ORBIX, is developed to provide accessto serialized data (i.e. the XML file generated by XML_Generator). Depending upon the size of thesource database, the size of the XML file can pose network transfer problems. The XML server isdesigned to provide access to the XML file in various ways. The XML file could be transferred as awhole or line by line. 6 Figure 3: Microsoft Internet Explorer displaying the XML file generated by XML_Generator Module (Note that a RUN and 10 Event Objects are displayed) The XML data on the de-serializer end is accessed by a CORBA client also developed in IONA’sORBIX. The client can access the XML file as a whole or line by line and recreates an XML file,which can be used by the de-serializer or some other application. 3 Conclusion and Remarks The software developed in this project can be used to serialize any Objectivity federation or its scopecan be reduced to serialize a database, container or object within a federation. With some care multiplefederations can also be serialized. The following remarks can be made about the design of thesoftware. 3.1 Serializable Objects An object can itself have the capability to serialize itself, by adding an additional method for thispurpose to the object. This is a much easier and powerful approach and doesn’t need a schema forproducing XML, as the object knows its state and model. Various languages like JAVA support suchobjects, which can serialize themselves. Objectivity/DB objects can also be given a serializingcapability. However, this project is not targeted to any specific database rather a general tool for anydatabase, having serializing capable or non-serializing capable objects. We are using a generalapproach that can serialize any type of object from any federation. 7 3.2 Document Type Definition, DTD, Selection There can be many possible DTD choices for an Objectivity database. As the Objects are inter-linkedand contained in containers and databases in Objectivity, one can suppose a hierarchical or nestedstructure in which an object contains all the objects related to it or there is a database object in whichall container objects are nested and then each container object nests all \"basic\" objects in it. This maybe useful in a limited number of applications and may be easier to give a complete overview of thestructure of the database, but if only few objects are taken individually from the XML it will bedifficult to identify its position in the database without going into other structures present in the XML.So we are targeting objects and each object has its own identity in the XML file and it can individuallybe processed. Also our approach does not put any restriction on serialization at least in the databasescope which may be the case in the other approach discussed above. That can help inreplication/migration of even individual objects from the database and this doesnot limit the capabilityof serializing the whole container, database or federation. 3.3 Multiple Federations There is an important issue of similar OIDs in different federations. If one opts to use this tool toserialize multiple federations together then multiple files could be generated with OIDs valid withinthe scope of the same federation. Even the tools provided by Objectivity target a single federation at atime. 3.4 Future The time this note was going for publication, Objectivity has released Objectivity Active Schema aspart of version 5.2 of Objectivity/DB. Active Schema is an interface to the database schema ofObjectivity which allows instance objects of schema classes to be accessed and modified. We areplanning, to add, investigate and use these tools in this software to make it completely schemaindependent. This task will be accomplished in workpackage WPWisd5 of the WISDOM project.. Asdescribed in the section 2.2, in order to find the name and type of the member variables of the objectand to get information about the existence of association between the objects, we are using hard codedschema. In future this will be accessed using the Active Schema to make this serialization facilitycompletely schema independent. References [1] RD45 project, A persistent Manager for HEP, http://wwwinfo.cern.ch/asd/rd45. [2] MONARC Project, Model for Network Analysis at Regional Centres for LHC Experiments, http://www.cern.ch/MONARC.[3] [4][5] Heinz Stockinger, Data Replication in Distributed Database Systems, CMS NOTE-1999/046.The CRISTAL project, CMS Workflow Management, http://cmsdoc.cern.ch/Cristal. The WISDOM project, Widearea database Independent Serialization of Distributed Objects dataMigration. 8 因篇幅问题不能全部显示,请点此查看更多更全内容