Data integration is the most crucial and important component of data warehousing. When data or information is passed from the operational environment, which is the source of application orientation, to the data warehouses, there is a need to resolve the possible redundancies and inconsistencies (Calvanese et al., 2001, p.237). This will provide reconciled and integrated view of information of the organization. This paper describes an innovative approach regarding data integration in data warehousing. The approach is founded on the theoretical demonstration of the application domain of Data warehousing (Calvanese et al., 2001, p.237). This paper proposes a technique or method for declaratively identifying appropriate reconciliation associations to be used so as to resolve differences between the data of dissimilar sources. The key objective of the technique is to support the proposal or design of intermediaries that turn up the data or information in the data Warehouse relations (Calvanese et al., 2001, p.237).
Abstractii
Introduction1
Data warehousing approaches3
Industrial perspective4
Architecture of data warehousing system5
Problems regarding data warehousing6
Wrapper/monitors6
Translation6
Change detection7
Issues with wrapper/monitor7
Miscellaneous issues8
Warehouse management8
Warehouse and source evolution8
Inconsistent and duplicate information8
Outdated information9
Solution and improvement in the architecture9
Update filtering9
Self maintainability10
Optimization of multiple views11
Conclusion11
References13
Data Integration in Data Warehousing
Introduction
Information Integration is the issue of obtaining data or information from several dissimilar sources that are accessible or available for the application or function of interest. The conventional architecture of a data integration system is portrayed in terminologies of two types of units: mediators and wrappers (Widom, 1995, p.n.d.). The purpose of a wrapper is to enable access to source, take out the relevant data, and represent that data in a particular format. The objective of a mediator is to combine data extracted by the different wrappers in order to meet a particular need regarding information of the data integration system (Widom, 1995, p.n.d.). The realization and the specification of mediators are the basic problem in the proposed design of a data integration system. This issue has currently become a core issue in several frameworks including Data Warehousing, multi-database systems and information gathering (Widom, 1995, p.n.d.). The limitations of Data Warehouse applications are restricting the large range of approaches that are being planned.
Supplying integrated approach to heterogeneous, distributed and multiple databases and other sources of information has become one of the important problems in the database industry and research (Widom, 1995, p.n.d.). In the research society, most techniques to the data integration issue depend on the following very common two-step process:
Receive a query, determine the suitable set of data or information resources in order to answer the query, and produce the suitable commands or sub-queries for each source of information (Widom, 1995, p.n.d.)
Acquire results from the sources of information, perform respective conversion, merging and filtering of the information and afterwards return the ultimate answer to the application or user
This approach is referred as on-demand or lazy approach of data integration (Widom, 1995, p.n.d.). It is referred as lazy approach because information or data is extracted only when the queries are made from the sources. This approach also referred as mediated ...