The Role Of Statistics In Query Optimization

The Role of Statistics in Query Optimization

by

Acknowledgement

I would take this opportunity to thank my research supervisor, family and friends for their support and guidance without which this research would not have been possible.

DECLARATION

I, [type your full first names and surname here], declare that contents of this dissertation/thesis represent my own unaided work, and that dissertation/thesis has not previously been submitted for academic examination towards any qualification. Furthermore, it represents my own opinions and not necessarily those of University.

Signed __________________ Date _________________

Table of Contents

ACKNOWLEDGEMENTII

DECLARATIONIII

2. BACKGROUND5

Review Database Designs for Standard Compliance6

Database Design Tasks:6

Data Model Review6

Physical Database Design7

Database Security Design7

Architecture of Query Optimizer8

Cardinality Estimation utilising Histograms9

Selection Queries10

Select Project Join (SPJ) Queries11

3. SIT: STATISTICS ON QUERY EXPRESSIONS13

Statistics on Partitioned Objects13

Column Statistics and Histograms14

Extended Statistics15

Creating Column Group15

Getting Column Group16

Dropping Column Group17

Gathering Statistics on Column Groups17

Expression Statistics18

Creating Expression Statistics18

Monitoring Expression Statistics19

Dropping Expression Statistics19

Determining Stale Statistics20

User-Defined Statistics20

Setting Preferences for Manual Statistics Gathering21

Comparing Statistics with DBMS_STATS Functions22

System Statistics23

Workload Statistics26

Cardinality Estimation utilising SITs27

An Illustrative Experiment29

4. AUTOMATED SELECTION OF SITS33

5. EXPERIMENTAL STUDY37

6. RELATED WORK42

7. CONCLUSIONS49

REFERENCES50

1. INTRODUCTION

Most query optimizers for relational database administration schemes (RDBMS) depend on cost form to choose best possible query execution plan for granted query. Thus, value of query execution plan depends on accuracy of cost estimates. Cost approximates, in turn, crucially depend on cardinality estimations of diverse sub-plans (intermediate results) developed throughout optimization. (Aboulnaga, 2005,, 181)

Traditionally, query optimizers use statistics presentation over groundwork benches for cardinality approximates, and suppose independence while propagating these base-table statistics through query plans (see Section 2 for comprehensive discussion). However, it is broadly recognized that such cardinality approximates can be off by instructions of magnitude . Therefore, customary propagation of statistics that supposess independence between attributes can lead query optimizer to choose significantly low-quality execution plans. (Gunopoulos, 2005,, 137)

In this paper, we introduce concept of SITs, which are statistics presentation on attributes of outcome of query expression 1. Thus, Sits can be utilised to accurately form circulation of tulles on intermediate nodes in query execution plan. We will display that in some cases, when optimizers have appropriate SITs accessible throughout query optimization, producing query plans are drastically improved, and their execution times are tens, and even hundreds of times more efficient than those of plans produced when only base-table statistics are used.

Despite conceptual simplicity of SITs, significant challenges require to be addressed before they can be effectively utilised in living RDBMS. First, we should display how query optimizers can be adapted to exploit SITs for choosing better execution plans. Next, we should address problem of recognising appropriate SITs to present and maintain. The last cited is nontrivial problem since for moderate schema dimensions, there can be too numerous syntactically applicable SITs.

Finally, we require to address topic of efficiently accomplishing and sustaining SITs in database system. (Piatetsky-Shapiro, 2000,, 256)

In this paper, we take first steps in direction of gathering these challenges.

While we succinctly comment on last topic, we primarily focus on first two matters mentioned to above. We explain how customary relational query optimizer can be changed to take benefit of ...

The Role Of Statistics In Query Optimization

Project Title: “the Role...

The Use Of Relational Dat...

Structured Query Language...

Structured Query Language

Data Mining Assignment