2021年5月31日 星期一

MySQL NDB cluster 狀況處理

各種狀況教學

各狀況教學

Error 233 MaxNoOfConcurrentOperations

Got temporary error 233 'Out of operation records in transaction coordinator (increase MaxNoOfConcurrentOperations)' from NDBCLUSTER

參考說明
  • 1 編輯每個 ndb_mgmd 設定:/var/lib/mysql-cluster/config.ini
    [NDBD Default]
    MaxNoOfConcurrentOperations=100000 //加大同步處理的上限
    MaxNoOfLocalOperations=110000//加大local 處理上限,一般多 10%
  • 2 kill mgm, restart ndb_mgmd 每次一個
  • 3 從 mgm 裡面 stop ndbd, process停了,要從外部啟動 sudo ndbd 每次一個
  • 4 sql node 基本不需要重啟


  • Error 1297 Got temporary error 410

    Error 1297 Got temporary error 410 'REDO log files overloaded (decrease TimeBetweenLocalCheckpoints or increase NoOfFragmentLogFiles)' from NDBCLUSTER

    參考說明
    local check point需要一段時間才完成
    在這期間 REDO Log 會增長,如果用完 REDO Log 的容量就會出錯。
    1 增加容量
       data node number x NoOfFragmentLogFiles x NoOfFragmentLogFiles
       
    2 加快 LCP 完成速度。
      MinDiskWriteSpeed
      MaxDiskWriteSpeed
      MaxDiskWriteSpeedOwnRestart
      MaxDiskWriteSpeedOtherRestart
    
    
    The TimeBetweenLocalCheckPoints is plenty short. 
    Probably The REDO log needs to be increased such that it is 
    long enough to accommodate all transactions that occur over 
    the course of 2-3x LCPs. 
    
    Reducing the TimeBetweenLocalCheckPoints only decreases the 
    amount of time ndbd sits idle between LCPs. With this volume of 
    inserts there should be essentially no time waited between LCPs. 
    
    The time it takes to complete the LCP is determined by used 
    DataMemory/DiskCheckpointSpeed.
    
    If you know the size and number of inserts being written to the 
    cluster per second you can predict the minimal size of the log.
    
    For example if you have 3.5G of DataMemory used and default 
    DiskCheckpontSpeed of 10MB/s, 
    Each LCP will take about 6 minutes (360s.) to complete. 
    
    The default REDO log is 4 x NoOfFragmentLogFiles x FragmentLogFileSize = 1G long. 
    
    This would be overloaded performing 200x5k writes/second.
    
    方案:
    Increasing NoOfFragmentLogFiles and/or FragmentLogFileSize will 
    increase the total available length of the REDO log, 
    allowing for longer LCPs and/or higher volume of inserts/updates 
    per second. This requires more disk space.
    
    方案:
    Increasing DiskCheckPointSpeed 
    (改用 MinDiskWriteSpeed / MaxDiskWriteSpeed) leaves the length 
    of the log the same but causes the the LCP to complete faster. 
    Thus reducing the length of the REDO log required to hold all 
    incoming updates for two complete LCP. 
    
    This requires ndbd to use more disk IO bandwidth and CPU. 
    So if dimensioned incorrectly this risks reducing available disk bandwidth for 
    global checkpoint and disk data operations to the point 
    there is GCP Stop errors or a loss in performance.
    
    

    沒有留言:

    張貼留言