I would take this opportunity to thank my research supervisor, family and friends for their support and guidance without which this research would not have been possible.
DECLARATION
I, [type your full first names and surname here], declare that contents of this dissertation/thesis represent my own unaided work, and that dissertation/thesis has not previously been submitted for academic examination towards any qualification. Furthermore, it represents my own opinions and not necessarily those of University.
Signed __________________ Date _________________
Table of Contents
ACKNOWLEDGEMENTII
DECLARATIONIII
2. BACKGROUND5
Review Database Designs for Standard Compliance6
Database Design Tasks:6
Data Model Review6
Physical Database Design7
Database Security Design7
Architecture of Query Optimizer8
Cardinality Estimation utilising Histograms9
Selection Queries10
Select Project Join (SPJ) Queries11
3. SIT: STATISTICS ON QUERY EXPRESSIONS13
Statistics on Partitioned Objects13
Column Statistics and Histograms14
Extended Statistics15
Creating Column Group15
Getting Column Group16
Dropping Column Group17
Gathering Statistics on Column Groups17
Expression Statistics18
Creating Expression Statistics18
Monitoring Expression Statistics19
Dropping Expression Statistics19
Determining Stale Statistics20
User-Defined Statistics20
Setting Preferences for Manual Statistics Gathering21
Comparing Statistics with DBMS_STATS Functions22
System Statistics23
Workload Statistics26
Cardinality Estimation utilising SITs27
An Illustrative Experiment29
4. AUTOMATED SELECTION OF SITS33
5. EXPERIMENTAL STUDY37
6. RELATED WORK42
7. CONCLUSIONS49
REFERENCES50
1. INTRODUCTION
Most query optimizers for relational database administration schemes (RDBMS) depend on cost form to choose best possible query execution plan for granted query. Thus, value of query execution plan depends on accuracy of cost estimates. Cost approximates, in turn, crucially depend on cardinality estimations of diverse sub-plans (intermediate results) developed throughout optimization. (Aboulnaga, 2005,, 181)
Traditionally, query optimizers use statistics presentation over groundwork benches for cardinality approximates, and suppose independence while propagating these base-table statistics through query plans (see Section 2 for comprehensive discussion). However, it is broadly recognized that such cardinality approximates can be off by instructions of magnitude . Therefore, customary propagation of statistics that supposess independence between attributes can lead query optimizer to choose significantly low-quality execution plans. (Gunopoulos, 2005,, 137)
In this paper, we introduce concept of SITs, which are statistics presentation on attributes of outcome of query expression 1. Thus, Sits can be utilised to accurately form circulation of tulles on intermediate nodes in query execution plan. We will display that in some cases, when optimizers have appropriate SITs accessible throughout query optimization, producing query plans are drastically improved, and their execution times are tens, and even hundreds of times more efficient than those of plans produced when only base-table statistics are used.
Despite conceptual simplicity of SITs, significant challenges require to be addressed before they can be effectively utilised in living RDBMS. First, we should display how query optimizers can be adapted to exploit SITs for choosing better execution plans. Next, we should address problem of recognising appropriate SITs to present and maintain. The last cited is nontrivial problem since for moderate schema dimensions, there can be too numerous syntactically applicable SITs.
Finally, we require to address topic of efficiently accomplishing and sustaining SITs in database system. (Piatetsky-Shapiro, 2000,, 256)
In this paper, we take first steps in direction of gathering these challenges.
While we succinctly comment on last topic, we primarily focus on first two matters mentioned to above. We explain how customary relational query optimizer can be changed to take benefit of ...