MySQL HeatWave - Speed up your OLAP tasks
A few months ago, Oracle published the new MySQL HeatWave feature for the Oracle Cloud Infrastructure (OCI), which promised to speed up analysis and transactional queries on your MySQL database without big effort.
By using so called "HeatWave-Engines" (or "RAPID-Engines") in addition to the original MySQL instance, customers will be able to use OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) workloads directly in one MySQL database.
To benefit from this feature no single SQL needs to be adjusted to optimize the performance.
alter table lineitem secondary_engine=rapid;
alter table lineitem secondary_load;
As soon as the secondary engine for a table is enabled, the data will be locked to avoid any DML statements.
The above listed example which loads table data into the secondary engine is simplified. If the data model is more complex, there are multiple steps which should or must be checked in advanced e.g., you need to check if enough memory is available in the HeatWave cluster to load the table data.
There are also some restrictions regarding the table definitions – at the end, your data and data model needs to be compatible with the HeatWave plugin.
If you do not want to think about loading all used tables into the secondary engine on your own, you can use provided system calls by oracle (https://dev.mysql.com/doc/heatwave/en/auto-parallel-load.html):
sys.heatwave_load (db_list,[options]);
I will go through such restrictions in a future article. For now, I will simply show you the benefit of HeatWave by using a compatible data model.
TPC-H Benchmark
To prepare the first example I will use the 3GB database called tpc_3 and the third query of the TPC-H benchmark. Therefore, I will load all database tables into the HeatWave engine:
MySQL 10.0.0.125:3306 ssl tpch_3 SQL > SELECT NAME, LOAD_STATUS from performance_schema.rpd_tables,performance_schema.rpd_table_id where rpd_tables.ID = rpd_table_id.ID; Empty set (0.0010 sec) SET SQL_SAFE_UPDATES = 0; MySQL 10.0.0.125:3306 ssl tpch_3 SQL > CALL sys.heatwave_load(JSON_ARRAY("tpch_3"),NULL); +------------------------------------------+ | INITIALIZING HEATWAVE AUTO PARALLEL LOAD | +------------------------------------------+ | Version: 1.22 | | | | Load Mode: normal | | Load Policy: disable_unsupported_columns | | Output Mode: normal | | | +------------------------------------------+ 6 rows in set (0.0065 sec) +------------------------------------------------------------------------+ | OFFLOAD ANALYSIS | +------------------------------------------------------------------------+ | Verifying input schemas: 1 | | User excluded items: 0 | | | | SCHEMA OFFLOADABLE OFFLOADABLE SUMMARY OF | | NAME TABLES COLUMNS ISSUES | | ------ ----------- ----------- ---------- | | `tpch_3` 8 61 | | | | Total offloadable schemas: 1 | | | +------------------------------------------------------------------------+ 10 rows in set (0.0065 sec) +-----------------------------------------------------------------------------------------------------------------------------+ | CAPACITY ESTIMATION | +-----------------------------------------------------------------------------------------------------------------------------+ | Default load pool for tables: TRANSACTIONAL | | Default encoding for string columns: VARLEN (unless specified in the schema) | | Estimating memory footprint for 1 schema(s) | | | | TOTAL ESTIMATED ESTIMATED TOTAL DICTIONARY VARLEN ESTIMATED | | SCHEMA OFFLOADABLE HEATWAVE NODE MYSQL NODE STRING ENCODED ENCODED LOAD | | NAME TABLES FOOTPRINT FOOTPRINT COLUMNS COLUMNS COLUMNS TIME | | ------ ----------- --------- --------- ------- ---------- ------- --------- | | `tpch_3` 8 5.78 GiB 32.00 MiB 29 0 29 20.00 s | | | | Sufficient MySQL host memory available to load all tables. | | Sufficient HeatWave cluster memory available to load all tables. | | | +-----------------------------------------------------------------------------------------------------------------------------+ 13 rows in set (0.0065 sec) +-------------------------------------------------------------------------------+ | LOAD SUMMARY | +-------------------------------------------------------------------------------+ | | | SCHEMA TABLES TABLES COLUMNS LOAD | | NAME LOADED FAILED LOADED DURATION | | ------ ------ ------ ------- -------- | | `tpch_3` 8 0 61 6.73 s | | | +-------------------------------------------------------------------------------+
Now the whole database is loaded into the HeatWave engine.
MySQL 10.0.0.125:3306 ssl tpch_3 SQL > SELECT NAME, LOAD_STATUS from performance_schema.rpd_tables,performance_schema.rpd_table_id where rpd_tables.ID = rpd_table_id.ID; +-----------------+---------------------+ | NAME | LOAD_STATUS | +-----------------+---------------------+ | tpch_3.orders | AVAIL_RPDGSTABSTATE | | tpch_3.lineitem | AVAIL_RPDGSTABSTATE | | tpch_3.nation | AVAIL_RPDGSTABSTATE | | tpch_3.region | AVAIL_RPDGSTABSTATE | | tpch_3.customer | AVAIL_RPDGSTABSTATE | | tpch_3.part | AVAIL_RPDGSTABSTATE | | tpch_3.supplier | AVAIL_RPDGSTABSTATE | | tpch_3.partsupp | AVAIL_RPDGSTABSTATE | +-----------------+---------------------+
To compare the performance with and without HeatWave engine I enabled profiling and run the third query:
MySQL 10.0.0.125:3306 ssl SQL > use tpch_3 MySQL 10.0.0.125:3306 ssl tpch_3 SQL > set @@use_secondary_engine=off; SET @@profiling = 0; SET @@profiling_history_size = 0; SET @@profiling = 1; SET @@profiling_history_size = 100; SELECT l_orderkey, sum(l_extendedprice * (1 - l_discount)) as revenue, o_orderdate, o_shippriority FROM customer, orders, lineitem WHERE c_mktsegment = 'BUILDING' AND c_custkey = o_custkey AND l_orderkey = o_orderkey AND o_orderdate < date '1995-03-15' AND l_shipdate > date '1995-03-15' GROUP BY l_orderkey, o_orderdate, o_shippriority ORDER BY revenue desc, o_orderdate LIMIT 20; show profiles;
→ | 4 | 7.0137665
The query was finished without HeatWave engine in 7 seconds. Now let us run the same query after enabling the secondary engine:
set @@use_secondary_engine=on; SET @@profiling = 0; SET @@profiling_history_size = 0; SET @@profiling = 1; SET @@profiling_history_size = 100; <Query 3> show profiles;
→ | 15 | 0.7033645
By enabling the secondary engine, the query was finished within 0,7 seconds. By looking at the execution plan you can see a hint that RAPID as secondary engine was used:Using where; Using temporary; Using filesort; Using secondary engine RAPID
To increase the complexity of my tests a little bit, I will now compare all TPC-H queries and run all of them on the different database sizes. This led to the following result:
The difference becomes particularly clear by looking on query 9: The database needs 114 seconds to run the query without a secondary engine and only 0,26 seconds when using RAPID.
Keep in mind that my biggest database with 38 GB is not big. All data can simply be loaded into the database buffer which already speeds up the performance without using a secondary engine.
The difference will become bigger and bigger by increasing the size of the used data.
DB Größe (in GB) | Query | InnoDB (in s) | RAPID (in s) | InnoDB/RAPID |
3 | 3 | 7,12 | 0,059 | 120-fache |
9 | 17,407 | 0,073 | 238-fache | |
9 | 3 | 50,98 | 0,107 | 476-fache |
9 | 364,87 | 0,825 | 442-fache | |
18 | 3 | 65,03 | 0,197 | 330-fache |
9 | 684,85 | 4,548 | 150-fache | |
36 | 3 | 250,12 | 0,384 | 651-fache |
9 | - | - | - |
Just to point it out: by linearly increasing the amount of data (9 – 18 – 36 GB) the runtime of the RAPID engine is also increasing linearly (0,107 – 0,197 – 0,384 seconds) while the consumption of time used by the InnoDB engine is starting to increase exponential (50,98 – 65,03 – 250,12 seconds).
By using the HeatWave feature you will be able to increase the performance of typical OLTP/OLTP queries based on big data in comparison to the classic InnoDB engine. For sure it is a matter of database configuration (e.g., size of the database buffer) by which database size the advantage will be bigger than the effort to enable a secondary engine.
The benefit of using the HeatWave engine is obvious: if you know that OLTP/OLAP queries will run against your database on a regular basis, you can simply enable a secondary engine during that time window and speed up the performance. Afterwards, you can go ahead and disable the secondary engine again, so that all locks on the tables will be removed as a consequence.
For databases which are already running in the OCI it will not increase the costs significant, because you need the additional HeatWave resources temporary.
Unfortunately, the fact that the HeatWave engine is currently only available in the OCI, the costs/efforts for customers to put their database into the OCI initially is for sure bigger. There are several options to use the HeatWave feature for En-Premise databases (e.g., migration, replication ..).
If you are already running your DWH MySQL database in the OCI, you should give HeatWave a try.
Bei Updates im Blog, informieren wir per E-Mail.
Kommentare