Online Solutions with MySQL: 004. MySQL Cluster

Erik Hoekstra, from Daisycon, has pointed out this problem related to Replication in general and with a specific example on MySQL Cluster and Replication in 5.1.

In the manual for 5.1 there's an entry about scripting the failover for MySQL Cluster Replication.

In this part of the manual they speak about fetching the needed variables, like the filename and position, and place them into the CHANGE MASTER TO statement.

The example is here:

CHANGE MASTER TO
MASTER_LOG_FILE='@file',
MASTER_LOG_POS=@pos;

I'm now trying to do the following:
On a slave I've created a Federated tables, 1 pointing to the current master, and 1
to the stand-in master, should the current master fail.

Federated table 1, let's say F1, is just a simple table on the current master with only one record.

Federated table 2, let's say F2, is the ndb_binlog_index table in the mysql database on the stand-in master.

I wrote a procedure which checks if F1 exists and if not (current master down) it selects the needed vars from F2. I would like to place these variables into a CHANGE MASTER TO statement so the slave can start and pickup at the right epoch, but it seems that the CHANGE MASTER TO statement doesn't accept variables.

DELIMITER //

CREATE PROCEDURE check_master()

BEGIN
  DECLARE master_dead BOOLEAN DEFAULT false;
  DECLARE latest BIGINT(20) DEFAULT 0;
  DECLARE fil VARCHAR(255);
  DECLARE pos BIGINT(20) DEFAULT 0;

  DECLARE pointer CURSOR FOR
  SELECT `id` FROM `replication`.`master_alive`;
  DECLARE CONTINUE HANDLER FOR
          SQLSTATE 'HY000' SET master_dead = true;

  OPEN pointer;
  IF( master_dead ) THEN

    STOP SLAVE;

    SELECT @latest:=MAX(`epoch`)
      FROM `mysql`.`ndb_apply_status`;

    SELECT @file:=SUBSTRING_INDEX(File, '/', -1), @pos:=Position
      FROM `replication`.`new_master_binlog_index`
     WHERE `epoch` > @latest
     ORDER BY `epoch` ASC LIMIT 1;

    CHANGE MASTER TO
           MASTER_HOST='192.168.0.1',
           MASTER_PORT=3306,
           MASTER_USER='replica',
           MASTER_PASSWORD='mypasswd',
           MASTER_LOG_FILE='@file',
           MASTER_LOG_POS=@pos;
    START SLAVE;
    ALTER EVENT `check_master_alive` DISABLE;
  END IF;
  CLOSE pointer;
END;
//
DELIMITER ;

CREATE EVENT check_master_alive
    ON SCHEDULE EVERY 5 SECOND
       ENABLE DO CALL check_master();

This is the error message:
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'CHANGE MASTER TO MASTER_HOST='10.0.33.104', MASTER_PORT=3306, MASTER_USER='repli' at line 15

I also tried the procedure with SELECT .. INTO but it doesn't seem to work at all.

I have submitted a bug - #28270 - and our tech support is working on it.

In the meantime, Erik has found a workaround to the problem, so he can implement the script. The workaround is based on the PREPARE/EXECUTE statement:

CREATE PROCEDURE check_master()
BEGIN
  DECLARE master_dead BOOLEAN DEFAULT false;
  DECLARE pointer CURSOR FOR

  SELECT `ip` FROM `replication`.`master_alive`;

  DECLARE CONTINUE HANDLER FOR
          SQLSTATE 'HY000' SET master_dead = true;

  OPEN pointer;
  IF( master_dead ) THEN

    STOP SLAVE;

    SELECT @latest:=MAX(`epoch`)
      FROM `mysql`.`ndb_apply_status`;

    SELECT @file:=SUBSTRING_INDEX(File, '/', -1), @pos:=Position
      FROM `replication`.`new_master_binlog_index`
     WHERE `epoch` > @latest
     ORDER BY `epoch` ASC LIMIT 1;

    SET @new_master_command = CONCAT( 'CHANGE MASTER TO
                 MASTER_HOST=\'192.168.0.1\',
                 MASTER_PORT=3306,
                 MASTER_USER=\'replication_user\',
                 MASTER_PASSWORD=\'mypasswd\',                  MASTER_LOG_FILE=\'',@file,'\',
                 MASTER_LOG_POS=',@pos,';' );

    PREPARE nmc FROM @new_master_command;
    EXECUTE nmc;
    DEALLOCATE PREPARE nmc;

    START SLAVE;

    ALTER EVENT `check_master_alive` DISABLE;
  END IF;
END;

In this way, the server accepts the CHANGE MASTER command.

Thanks to Erik for this great workaround, in the meantime we will check that the bug will be fixed.

Q from Olivier - Are the data nodes MySQL servers too ?
No, the data node is handled by a separate process, ndbd, that only manages data.

Q from Olivier: So, what is MySQL Cluster? A MySQL AB product ?
Technically speaking, MySQL Cluster is a storage engine, based on a network distributed database. From a commercial point of view, we refer at MySQL Cluster as a product. We also provide APIs to access to the Cluster database directly, bypassing the MySQL Server and the storage engine architecture.

Q from Ludovico: Are there load balancing mechanisms to balance sql requests between active sql server nodes?
Yes, absolutely. The SQL nodes all see the same data, and load may well be balanced between them, if necessary.
The only caveat is that transactions must be locked into the same SQL server for the duration of the transaction.
In real life, load balancing can be achieved through hardware and software load balancers, or using the capabilities provided by some connectors such as JDBC, or just handling the load balancing at http level to the web servers, assuming that on average they will generate pretty much the same workload.

Q from Ludovico: What about embedded solutions, HW load balancers or round robin dns?
Any of them should work. Keep in mind the transaction caveat though. The load balancer must be activated when the client must initiate the connection, then the transaction must be related to the same connection.

Q from Roberto: Using the API and the native access, do we lose whole or part of the HA features ?
No, not at all, Cluster, even without the SQL nodes, is fully reduntant. The HA features are at the Cluster level, below the SQL nodes.

Q from Vasilij: Does Linux support MySQL Cluster?
Absolutely, various Linux and Unices are fully supported

Q from Manuel: Is the management server is a Single Point of Failure?
No, it's not. First of all, you may have as many as you want. Secondly, it is important that the management server is on when a new node is added to the cluster or in case of failure in a small cluster with only two data nodes. During normal operations, it could be switched off.
Typically, the management node is a lightweight process and it can be colocated with other servers.

Q from Manuel: You said that the minimum number of nodes is 3. Can't we have only two nodes and the management server running on both nodes and so you have absolute redundancy with only 2 boxes?
No, having 2 nodes is not a good idea, as that might cause a split-brain situation. If you implement such configuration, MySQL Cluster will issue a warning

Q from Olivier: Where are the data stored? On MySQL servers? On the NDBD nodes ? In memory or on disk ?
Data is In the data nodes (NDB nodes) and data is in memory. With version 5.1, data can be optionally on disk, although indexes must still reside in memory.

Q from Manuel: May the management servers run on the same machine where the nodes or the SQL Servers run?
Yes, they can. There is no need for an extra physical server, provided the cluster has at least 3 physical servers.

Q from Colin: What is the simplest cluster config if all access is via C++ API?
3 nodes. 2 datanodes, one for mgm, which may be colocated with the application node, for example.

Q from John: All the examples of hostnames in the cinfig file are ipv4 addresses - does cluster support ipv6?
No, not at the moment.

Q from John: If mysql nodes and data nodes are on the same boxes can they share IP's?
Sure, they can share the same address, as they respond to different ports

Q from Manuel: What is a split-brain and how does it happen?
A split brain is a particular situation that may happen in some HA infrastructures. It may happen that in an HA solution with only two servers, in case of network fault between the servers, the two nodes cannot talk each other, but they are both active and can provide the service to their clients. In this case we have a split brain situation. MySQL cluster does not allow a split brain, since it requires a third node as arbitrator.

Q from Danilo: Is is possible to have multiple INSERT coming from different MySQL Server that require auto_increment fields?
Yes, that is absolutely possible and fully supported.

Q from Olivier: Is all the datas on every node? What happens, for example, if I have 4 nodes with 20GB of memory for each server?
First of all, you have to specify the exact number of replicas that you want. You can have only one copy of data (NoOfReplica=1), but in this scenario the system will not be redundant. You can have up to 4 replicas. WIth a classic replica of 2, you will have only two copies of the same data. In the example that you have mentioned, the total memory available will be 4 (nodes) x 20 (memory) / 2 (replica) = 40 GB in total.

Q Danilo: How many data nodes are supported?
The current limit is 48.

Q from Domenico: Can MySql Server and Data Node reside on the same physical server?
Yes, they can. For better scalability and performance, you should consider separate nodes though.

Q from Colin: If the data is not on disk which safety/backup policy is advised in case of total system failure?
There is a checkpointing and redo logging system that assure that the data are safely stored on disk. You can also used the online backup provided for disaster recovery.

Q from Ludovico: What happens if a power shortage happens and all my data nodes are turned off?
In this case you have to recover from disk up to the latest checkpoint and then you can apply everything up to the latest position in the redo log. Bear in mind you should configure the whole infrastructure in a way that is suitable for high availability.

Q from Hilmar: So in the worst case scenario, when a checkpoint was not fired after an updated, you loose power, then cluster recovers from loss of power and after the restart of the servers it has ' rolledback' to its previous checkpoint, it that correct?
First it recovers up to the latest checkpoint, then it applies the redo log.

Q from David: Does the Transaction Coordinator ensure referential integrity?
No, it does not. Foreign keys are not supported in the current release and not in 5.1

Q from Guy: Is the whole database held in memory or just the tables in use?
The whole database is in-memory. That is the attribute of an In-Memory database, to achieve deterministic performance.

Q from Ludovico: Il the FK support at SQL level coming also for NDB?
YeS, as far as the SQL interface is used.

Q from Colin: Is the FK support planned for all storage engines in future versions?
We are working on this. We are still unsure about global support for all the storage engines, but this is definitely one of the possible solutions.

Q from Manuel: If you have 2 management servers on different machines and you lose network connection between them, don't you have a split-brain situation?
Not necessarily. As long as there is an odd # of arbitration capable nodes, you don't.

Q from Jonas: Is there any estimate in time on when the 'next' MySQL version, with FK at SQL level will be available?
Not at the moment, but this is one of the hottest topics within MySQL.

Q from Manuel: Can you explain or give a URL where the use of redundant management servers is explained?
In the reference manual for version 5.0 - section 15.4.4.2 - there are examples of connection strings with multiple management nodes

Q from Guy: What happens if the DB drows significantly in a short period therefore size increases above the memory limit?
The size is fixed and preallocated in this point in time. When available memory is full it's full. 5.1 will be able to have non-indexed data on disk, which will change this.

Q from Danilo: When will you plan to introduce multiple index use in mysql select?
It is in there already, to an extent. I assume you ask about multiple indexes per table, and again, that is to an extent already implemented. The usefulness of this depends on the stoarge engine though. Falcon will support this in a very effective manner.

Q from Hilmar: How do I know for sure that a certain transaction is sent to disk asap? is possible to trigger a checkpoint using an SQL statement?
You can setup the system in order to have continuous checkpoints, although this is not the answer to the problem. The checkpoint will not be related to a single commit in any case. MySQL Cluster architecture has been designed to be an in-memory database, therefore the infrastructure underneath must be able to support the architecture. Bear in mind that MySQL cluster achieves HA through redundancy; since you can have up to 4 replicas, in the worst case scenario you have to have 4 faults on 4 servers and you may lose data executed after the last position available in the REDO LOG on disk - I am not aware of any other HA architecture that may achieve more than this, whether it is in-memory or on disk.

Q from Hilmar: But what happen if you have a power failure for the whole data center?
First of all, you should have UPS systems in place that should at least allow you to shut down the server gracefully, in order to not lose any data at all. Secondly, with version 5.1 you may use geographic replication and activate the cluster in another data center.

Q from Jonas: Is MySQL Cluster Replication is done on a node-by-node basis, or the cluster as a whole?
In 5.1, it is Cluster to Cluster.

Q from Hilmar: Has mysql cluster already been tested on sparc T1's for the SQL nodes? And are these a better option than a regular AMD box ?
SUN HW and Solaris and well supported, you have only to take care of having the same hardware type (SPARC, Intel or AMD) between nodes.

Online Solutions with MySQL

Thursday, 31 May 2007

On MySQL Cluster Replication

Tuesday, 29 May 2007

Q&A Webinar Part 4 - MySQL Cluster

Posts by Subject

Blog Archive

About Me