1、直方图介绍
你可以为列统计直方图。这些直方图为列数据的分布情况提供准确的估算。当列数据倾斜时,直方图提供更为优化的选择性估算,为数据分布不均匀的情况提供最优的执行计划。
Oracle Database为提供2种类别的列统计信息直方图:
- Height-Balanced Histograms
- Frequency Histograms
数据库存储直方图信息*TAB_COL_STATISTICS视图(用户和DBA)。列值范围:HEIGHTBALANCED, FREQUENCY, or NONE。
2、Height-Balanced Histograms
在height-balanced histogram中,列值被划分为桶,使得每个桶包含大致相同数量的行。直方图显示端点在值范围内的位置。
考虑一个my_col值为1到100之间的列,以及一个10个桶的直方图。如果数据my_col均匀分布,则直方图看起来与图13-1类似,其中数字是端点值。例如,第七个桶具有值在60到70之间的行。
图13-1具有均匀分布的高度平衡直方图
每个桶中的行数为总行数的10%。在这个均匀分布的例子中,40%的行的值在60到100之间。
如果数据不均匀分布,则直方图可能如图13-2所示。在这种情况下,大多数行的列的值为5。只有10%的行的值在60到100之间。
图13-2具有非均匀分布的高度平衡直方图
您可以使用USER_TAB_HISTOGRAMS表格查看高度平衡的直方图,如示例13-1所示。
- BEGIN
-
DBMS_STATS
.
GATHER_table_STATS
(
-
OWNNAME
=
>
'OE'
,
-
TABNAME
=
>
'INVENTORIES'
,
-
METHOD_OPT
=
>
'FOR COLUMNS SIZE 10 quantity_on_hand'
)
; -
END
; - /
-
SELECT
COLUMN_NAME
,
NUM_DISTINCT
,
NUM_BUCKETS
,
HISTOGRAM -
FROM
USER_TAB_COL_STATISTICS -
WHERE
TABLE_NAME
=
'INVENTORIES'
AND
COLUMN_NAME
=
'QUANTITY_ON_HAND'
; - COLUMN_NAME NUM_DISTINCT NUM_BUCKETS HISTOGRAM
- —————————— ———— ———– —————
- QUANTITY_ON_HAND 237 10 HEIGHT BALANCED
-
SELECT
ENDPOINT_NUMBER
,
ENDPOINT_VALUE -
FROM
USER_TAB_HISTOGRAMS -
WHERE
TABLE_NAME
=
'INVENTORIES'
AND
COLUMN_NAME
=
'QUANTITY_ON_HAND' -
ORDER
BY
ENDPOINT_NUMBER
; - ENDPOINT_NUMBER ENDPOINT_VALUE
- ————— ————–
- 0 0
- 1 27
- 2 42
- 3 57
- 4 74
- 5 98
- 6 123
- 7 149
- 8 175
- 9 202
- 10 353
在示例13-1查询输出中,一行(1-10)对应于直方图中的每个桶。Oracle数据库向该直方图添加了特殊的第0个数据桶,因为第1个数据桶(27)中的值不是quantity_on_hand列的最小值。第0个桶的最小值为0 quantity_on_hand。
3、
frequency histogra
m
在
frequency histogram中,列的每个值对应于直方图的单个桶。每个桶包含此单个值的出现次数。例如,假设36行包含列的值1 warehouse_id。端点值1具有端点号36。
数据库在以下条件下自动创建频率直方图,而不是高度平衡的直方图:
- 不同值的数量小于或等于指定的直方图桶数(最多254个)。
- 每个列值重复一次。
您可以使用USER_TAB_HISTOGRAMS视图查看频率直方图,如示例13-2所示。
- BEGIN
-
DBMS_STATS
.
GATHER_TABLE_STATS
(
-
OWNNAME
=
>
'OE'
,
-
TABNAME
=
>
'INVENTORIES'
,
-
METHOD_OPT
=
>
'FOR COLUMNS SIZE 20 warehouse_id'
)
; -
END
; - /
-
SELECT
COLUMN_NAME
,
NUM_DISTINCT
,
NUM_BUCKETS
,
HISTOGRAM -
FROM
USER_TAB_COL_STATISTICS -
WHERE
TABLE_NAME
=
'INVENTORIES'
AND
COLUMN_NAME
=
'WAREHOUSE_ID'
; - COLUMN_NAME NUM_DISTINCT NUM_BUCKETS HISTOGRAM
- —————————— ———— ———– —————
- WAREHOUSE_ID 9 9 FREQUENCY
-
SELECT
ENDPOINT_NUMBER
,
ENDPOINT_VALUE -
FROM
USER_TAB_HISTOGRAMS -
WHERE
TABLE_NAME
=
'INVENTORIES'
AND
COLUMN_NAME
=
'WAREHOUSE_ID' -
ORDER
BY
ENDPOINT_NUMBER
; - ENDPOINT_NUMBER ENDPOINT_VALUE
- ————— ————–
- 36 1
- 213 2
- 261 3
- 370 4
- 484 5
- 692 6
- 798 7
- 984 8
- 1112 9
在例13-2中,第一个桶为warehouse_id1。该值在表中显示36次,如以下查询所证实:
oe@PROD> SELECT COUNT(*) FROM inventories WHERE warehouse_id = 1;
COUNT(*)
———-
36
5、
练习4
、直方图优化练习
统计已销户用户数量,请优化以下语句
select count(1) from ht.c_cons where status='close';
-
SQL
>
select
status
,
count
(
1
)
from
ht
.
c_cons
group
by
status
; -
STATUS
COUNT
(
1
) - ———————————————————— ———-
- close 19
- open 9519
- creating 462
-
SQL
>
create
index
ht
.
idx_c_cons_status
on
ht
.
c_cons
(
status
)
; -
SQL
>
col
owner
for
a10 -
col
table_name
for
a20 -
col
column_name
for
a20 -
col
data_type
for
a30 -
col
histogram
for
a20 -
select
owner
,
table_name
,
column_name
,
data_type
, -
column_id
,
num_distinct
,
histogram
,
NUM_NULLS
,
LAST_ANALYZED
from
-
dba_tab_columns
where
table_name
=
'C_CONS'
and
owner
=
'HT' -
order
by
column_id
;
SQL
>
SQL
>
SQL
>
SQL
>
SQL
>
2 3 4 - OWNER TABLE_NAME COLUMN_NAME DATA_TYPE COLUMN_ID NUM_DISTINCT HISTOGRAM NUM_NULLS LAST_ANALYZED
- ———- ——————– ——————– —————————— ———- ———— ——————– ———- ——————————
-
HT C_CONS CONS_NO
NUMBER
1 10000 NONE 0 20
–
AUG
–
17 -
HT C_CONS CONS_NAME
VARCHAR2
2 5057 NONE 0 20
–
AUG
–
17 -
HT C_CONS ORG_NAME
VARCHAR2
3 12 NONE 0 20
–
AUG
–
17 -
HT C_CONS BUILD_DATE
DATE
4 10000 NONE 0 20
–
AUG
–
17 -
HT C_CONS STATUS
VARCHAR2
5 3 NONE 0 20
–
AUG
–
17 -
SQL
>
exec DBMS_STATS
.
GATHER_TABLE_STATS
(
ownname
=
>
'HT'
,
tabname
=
>
'C_CONS'
,
estimate_percent
=
>
30
,
method_opt
=
>
'for columns size 50 status'
,
no_invalidate
=
>
FALSE
,
degree
=
>
4
,
cascade
=
>
TRUE
)
; -
PL
/
SQL procedure successfully completed
. -
SQL
>
col
owner
for
a10 -
SQL
>
col
table_name
for
a20 -
col
column_name
for
a20 -
col
data_type
for
a30 -
col
histogram
for
a20 -
select
owner
,
table_name
,
column_name
,
data_type
, -
column_id
,
num_distinct
,
histogram
,
NUM_NULLS
,
LAST_ANALYZED
from
-
dba_tab_columns
where
table_name
=
'C_CONS'
and
owner
=
'HT' -
order
by
column_id
;
SQL
>
SQL
>
SQL
>
SQL
>
2 3 4 - OWNER TABLE_NAME COLUMN_NAME DATA_TYPE COLUMN_ID NUM_DISTINCT HISTOGRAM NUM_NULLS LAST_ANALYZED
- ———- ——————– ——————– —————————— ———- ———— ——————– ———- ——————————
-
HT C_CONS CONS_NO
NUMBER
1 10000 NONE 0 20
–
AUG
–
17 -
HT C_CONS CONS_NAME
VARCHAR2
2 5057 NONE 0 20
–
AUG
–
17 -
HT C_CONS ORG_NAME
VARCHAR2
3 12 NONE 0 20
–
AUG
–
17 -
HT C_CONS BUILD_DATE
DATE
4 10000 NONE 0 20
–
AUG
–
17 -
HT C_CONS STATUS
VARCHAR2
5 3 FREQUENCY 0 20
–
AUG
–
17 -
SQL
>
select
count
(
1
)
from
ht
.
c_cons
where
status
=
'open'
; - Execution Plan
- ———————————————————-
-
Plan hash
value
:
2016425671 - ——————————————————————————————-
-
|
Id
|
Operation
|
Name
|
Rows
|
Bytes
|
Cost
(
%
CPU
)
|
Time
| - ——————————————————————————————-
-
|
0
|
SELECT
STATEMENT
|
|
1
|
6
|
8
(
0
)
|
00
:
00
:
01
| -
|
1
|
SORT AGGREGATE
|
|
1
|
6
|
|
| -
|
*
2
|
INDEX
FAST FULL SCAN
|
IDX_C_CONS_STATUS
|
9639
|
57834
|
8
(
0
)
|
00
:
00
:
01
| - ——————————————————————————————-
-
Predicate Information
(
identified
by
operation id
)
: - —————————————————
-
2
–
filter
(
"STATUS"
=
'open'
) - Statistics
- ———————————————————-
- 1 recursive calls
- 0 db block gets
- 28 consistent gets
- 0 physical reads
-
0 redo
size -
527 bytes sent via SQL
*
Net
to
client -
523 bytes received via SQL
*
Net
from
client -
2 SQL
*
Net roundtrips
to
/
from
client -
0 sorts
(
memory
) -
0 sorts
(
disk
) -
1
rows
processed -
SQL
>
-
SQL
>
select
count
(
1
)
from
ht
.
c_cons
where
status
=
'close'
; - Execution Plan
- ———————————————————-
-
Plan hash
value
:
2292286995 - —————————————————————————————
-
|
Id
|
Operation
|
Name
|
Rows
|
Bytes
|
Cost
(
%
CPU
)
|
Time
| - —————————————————————————————
-
|
0
|
SELECT
STATEMENT
|
|
1
|
6
|
1
(
0
)
|
00
:
00
:
01
| -
|
1
|
SORT AGGREGATE
|
|
1
|
6
|
|
| -
|
*
2
|
INDEX
RANGE SCAN
|
IDX_C_CONS_STATUS
|
24
|
144
|
1
(
0
)
|
00
:
00
:
01
| - —————————————————————————————
-
Predicate Information
(
identified
by
operation id
)
: - —————————————————
-
2
–
access
(
"STATUS"
=
'close'
) - Statistics
- ———————————————————-
- 1 recursive calls
- 0 db block gets
- 2 consistent gets
- 0 physical reads
-
0 redo
size -
526 bytes sent via SQL
*
Net
to
client -
523 bytes received via SQL
*
Net
from
client -
2 SQL
*
Net roundtrips
to
/
from
client -
0 sorts
(
memory
) -
0 sorts
(
disk
) -
1
rows
processed