Implementing an XML Object Identification System on an archive data


Publication date



Høgskolen i Oslo. Avdeling for journalistikk, bibliotek- og informasjonsvitenskap

Document type


Joint Master Degree in Digital Library Learning (DILL)


Despite the existence of various techniques and tools at early stage, the data quality problem was not given the attention it deserves, until recent time,1990s the data quality was restricted to certain sectors, but later following the exposition of the huge losses due to data quality related problems different works has been seen. A few scholars have been involved in exposing the data quality problem and also finding solutions; among the initiatives to study the data quality problem systematically was the total data quality management methodology. The archiving sector is not a different from the above case, in the process of archiving or long term preservation unless the data preserved is accurate and authentic its use would be of little value. This paper is the study of how to ensure the accuracy of digital archives data and it presents a data quality approach called an object identification technique as a way of ensuring that an archive data is accurate. Most of the research undertakings have been focusing on relational data, but with the increasing popularity and importance of the XML data, there is a concern for developing data quality tools and methodologies which suit the XML data need. Based on this fact the object identification technique on this study focused on an XML data. The research used the Noark data as a case study and developed a prototype of an object identification technique. The prototyped object identification technique has shown a good result upon a test on sample Noark representative data. This study is of significant in taking the initiative to create the awareness on data quality issues in the case of an archive.


Permanent URL (for citation purposes)