Unser Newsletter rund um technische Themen,
das Unternehmen und eine Karriere bei uns.

7 Minuten Lesezeit (1436 Worte)

MySQL HeatWave - Speed up your OLAP tasks

A few months ago, Oracle published the new MySQL HeatWave feature for the Oracle Cloud Infrastructure (OCI), which promised to speed up analysis and transactional queries on your MySQL database without big effort.

By using so called "HeatWave-Engines" (or "RAPID-Engines") in addition to the original MySQL instance, customers will be able to use OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) workloads directly in one MySQL database.

To benefit from this feature no single SQL needs to be adjusted to optimize the performance.

https://dev.mysql.com/doc/heatwave/en/heatwave-introduction.html
The database system will be expanded by a HeatWave plugin which is responsible to receive the original SQL queries and proceed the statements together with a HeatWave cluster.

Beside the better performance, the customer will not feel any disadvantage.

To make the HeatWave plugin work, the tables used by the SQL query need to be enhanced by secondary engine: The HeatWave plugin. Afterwards the data needs to be loaded into this second engine:

alter table lineitem secondary_engine=rapid;

alter table lineitem secondary_load;

As soon as the secondary engine for a table is enabled, the data will be locked to avoid any DML statements.

The above listed example which loads table data into the secondary engine is simplified. If the data model is more complex, there are multiple steps which should or must be checked in advanced e.g., you need to check if enough memory is available in the HeatWave cluster to load the table data.

There are also some restrictions regarding the table definitions – at the end, your data and data model needs to be compatible with the HeatWave plugin.

If you do not want to think about loading all used tables into the secondary engine on your own, you can use provided system calls by oracle (https://dev.mysql.com/doc/heatwave/en/auto-parallel-load.html):

sys.heatwave_load (db_list,[options]);

I will go through such restrictions in a future article. For now, I will simply show you the benefit of HeatWave by using a compatible data model.

TPC-H Benchmark 

To create a comparable basis, I used the predefined queries from the TPC-H benchmarks (http://www.tpc.org/tpch/). With these data model I created multiple databases (3, 9, 18 and 36 GB). The TPC-H queries will be executed and summarized in the following examples with and without using the HeatWave engine.

To prepare the first example I will use the 3GB database called tpc_3 and the third query of the TPC-H benchmark. Therefore, I will load all database tables into the HeatWave engine:

MySQL  10.0.0.125:3306 ssl  tpch_3  SQL > SELECT NAME, LOAD_STATUS from performance_schema.rpd_tables,performance_schema.rpd_table_id where rpd_tables.ID = rpd_table_id.ID;
Empty set (0.0010 sec)
SET SQL_SAFE_UPDATES = 0;
MySQL  10.0.0.125:3306 ssl  tpch_3  SQL > CALL sys.heatwave_load(JSON_ARRAY("tpch_3"),NULL);
+------------------------------------------+
| INITIALIZING HEATWAVE AUTO PARALLEL LOAD |
+------------------------------------------+
| Version: 1.22                            |
|                                          |
| Load Mode: normal                        |
| Load Policy: disable_unsupported_columns |
| Output Mode: normal                      |
|                                          |
+------------------------------------------+
6 rows in set (0.0065 sec)

+------------------------------------------------------------------------+
| OFFLOAD ANALYSIS                                                       |
+------------------------------------------------------------------------+
| Verifying input schemas: 1                                             |
| User excluded items: 0                                                 |
|                                                                        |
| SCHEMA                       OFFLOADABLE    OFFLOADABLE     SUMMARY OF |
| NAME                              TABLES        COLUMNS     ISSUES     |
| ------                       -----------    -----------     ---------- |
| `tpch_3`                               8             61                |
|                                                                        |
| Total offloadable schemas: 1                                           |
|                                                                        |
+------------------------------------------------------------------------+
10 rows in set (0.0065 sec)

+-----------------------------------------------------------------------------------------------------------------------------+
| CAPACITY ESTIMATION                                                                                                         |
+-----------------------------------------------------------------------------------------------------------------------------+
| Default load pool for tables: TRANSACTIONAL                                                                                 |
| Default encoding for string columns: VARLEN (unless specified in the schema)                                                |
| Estimating memory footprint for 1 schema(s)                                                                                 |
|                                                                                                                             |
|                                TOTAL       ESTIMATED       ESTIMATED       TOTAL     DICTIONARY      VARLEN       ESTIMATED |
| SCHEMA                   OFFLOADABLE   HEATWAVE NODE      MYSQL NODE      STRING        ENCODED     ENCODED            LOAD |
| NAME                          TABLES       FOOTPRINT       FOOTPRINT     COLUMNS        COLUMNS     COLUMNS            TIME |
| ------                   -----------       ---------       ---------     -------     ----------     -------       --------- |
| `tpch_3`                           8        5.78 GiB       32.00 MiB          29              0          29         20.00 s |
|                                                                                                                             |
| Sufficient MySQL host memory available to load all tables.                                                                  |
| Sufficient HeatWave cluster memory available to load all tables.                                                            |
|                                                                                                                             |
+-----------------------------------------------------------------------------------------------------------------------------+
13 rows in set (0.0065 sec)
+-------------------------------------------------------------------------------+
| LOAD SUMMARY                                                                  |
+-------------------------------------------------------------------------------+
|                                                                               |
| SCHEMA                          TABLES       TABLES      COLUMNS         LOAD |
| NAME                            LOADED       FAILED       LOADED     DURATION |
| ------                          ------       ------      -------     -------- |
| `tpch_3`                             8            0           61       6.73 s |
|                                                                               |
+-------------------------------------------------------------------------------+
 

Now the whole database is loaded into the HeatWave engine.

MySQL  10.0.0.125:3306 ssl  tpch_3  SQL > SELECT NAME, LOAD_STATUS from performance_schema.rpd_tables,performance_schema.rpd_table_id where rpd_tables.ID = rpd_table_id.ID;
+-----------------+---------------------+
| NAME            | LOAD_STATUS         |
+-----------------+---------------------+
| tpch_3.orders   | AVAIL_RPDGSTABSTATE |
| tpch_3.lineitem | AVAIL_RPDGSTABSTATE |
| tpch_3.nation   | AVAIL_RPDGSTABSTATE |
| tpch_3.region   | AVAIL_RPDGSTABSTATE |
| tpch_3.customer | AVAIL_RPDGSTABSTATE |
| tpch_3.part     | AVAIL_RPDGSTABSTATE |
| tpch_3.supplier | AVAIL_RPDGSTABSTATE |
| tpch_3.partsupp | AVAIL_RPDGSTABSTATE |
+-----------------+---------------------+
 

To compare the performance with and without HeatWave engine I enabled profiling and run the third query:

MySQL  10.0.0.125:3306 ssl  SQL > use tpch_3
MySQL  10.0.0.125:3306 ssl  tpch_3  SQL > set @@use_secondary_engine=off;
SET @@profiling = 0;
SET @@profiling_history_size = 0;
SET @@profiling = 1;
SET @@profiling_history_size = 100;
SELECT
    l_orderkey,
    sum(l_extendedprice * (1 - l_discount)) as revenue,
    o_orderdate,
    o_shippriority
FROM
    customer,
    orders,
    lineitem
WHERE
    c_mktsegment = 'BUILDING'
    AND c_custkey = o_custkey
    AND l_orderkey = o_orderkey
    AND o_orderdate < date '1995-03-15'
    AND l_shipdate > date '1995-03-15'
GROUP BY
    l_orderkey,
    o_orderdate,
    o_shippriority
ORDER BY
    revenue desc,
    o_orderdate
LIMIT 20;

show profiles;
 

 → | 4 | 7.0137665

The query was finished without HeatWave engine in 7 seconds. Now let us run the same query after enabling the secondary engine:

set @@use_secondary_engine=on;
SET @@profiling = 0;
SET @@profiling_history_size = 0;
SET @@profiling = 1;
SET @@profiling_history_size = 100;
<Query 3>
show profiles;
 

 → | 15 | 0.7033645

By enabling the secondary engine, the query was finished within 0,7 seconds. By looking at the execution plan you can see a hint that RAPID as secondary engine was used:

Using where; Using temporary; Using filesort; Using secondary engine RAPID

To increase the complexity of my tests a little bit, I will now compare all TPC-H queries and run all of them on the different database sizes. This led to the following result:

The difference becomes particularly clear by looking on query 9: The database needs 114 seconds to run the query without a secondary engine and only 0,26 seconds when using RAPID.

Keep in mind that my biggest database with 38 GB is not big. All data can simply be loaded into the database buffer which already speeds up the performance without using a secondary engine.

The difference will become bigger and bigger by increasing the size of the used data.

DB Größe (in GB) Query InnoDB (in s) RAPID (in s)InnoDB/RAPID
3 3 7,12 0,059120-fache
917,4070,073238-fache
9350,980,107476-fache
9364,870,825442-fache
18365,030,197330-fache
9684,854,548150-fache
363250,120,384651-fache
9---

Just to point it out: by linearly increasing the amount of data (9 – 18 – 36 GB) the runtime of the RAPID engine is also increasing linearly (0,107 – 0,197 – 0,384 seconds) while the consumption of time used by the InnoDB engine is starting to increase exponential (50,98 – 65,03 – 250,12 seconds).

By using the HeatWave feature you will be able to increase the performance of typical OLTP/OLTP queries based on big data in comparison to the classic InnoDB engine. For sure it is a matter of database configuration (e.g., size of the database buffer) by which database size the advantage will be bigger than the effort to enable a secondary engine.

The benefit of using the HeatWave engine is obvious: if you know that OLTP/OLAP queries will run against your database on a regular basis, you can simply enable a secondary engine during that time window and speed up the performance. Afterwards, you can go ahead and disable the secondary engine again, so that all locks on the tables will be removed as a consequence.

For databases which are already running in the OCI it will not increase the costs significant, because you need the additional HeatWave resources temporary.

Unfortunately, the fact that the HeatWave engine is currently only available in the OCI, the costs/efforts for customers to put their database into the OCI initially is for sure bigger. There are several options to use the HeatWave feature for En-Premise databases (e.g., migration, replication ..).

If you are already running your DWH MySQL database in the OCI, you should give HeatWave a try.

hat noch keine Informationen über sich angegeben
 

Kommentare

Derzeit gibt es keine Kommentare. Schreibe den ersten Kommentar!
Sonntag, 05. Januar 2025

Sicherheitscode (Captcha)

×
Informiert bleiben!

Bei Updates im Blog, informieren wir per E-Mail.

Weitere Artikel in der Kategorie