AZIC Education

Frequent Restarts — Troubleshooting Guide

Diagnose and fix ASIC miners that keep rebooting. Covers over-temperature shutdowns, PSU overload, watchdog resets, hashboard shorts, and control board failures across Antminer, Whatsminer, and Avalon.

Overview

A miner that continuously reboots is losing 100% of its potential hashrate during each restart cycle. Depending on the model, a full boot cycle takes 2–8 minutes, and if the miner restarts every 5–15 minutes, the effective uptime (and revenue) can drop to 50% or less.

Frequent restarts are always a symptom of an underlying protection mechanism triggering: over-temperature, overcurrent, watchdog timeout, or a hardware fault causing the control board to crash. Identifying which protection mechanism is responsible is the key to a fast resolution.

Symptoms

  • Miner reboots every few minutes — uptime counter in the web UI never exceeds a consistent threshold
  • Periodic hashrate drops to zero on the pool dashboard, followed by recovery
  • Fans spin up, then everything powers off and restarts — the classic PSU overcurrent protection trip
  • Web UI becomes unreachable periodically — the control board is rebooting
  • LED indicators cycle through boot sequence repeatedly
  • Antminer: Red LED blinks, then green, then red again in a loop
  • Whatsminer: Status LED cycles through startup pattern, or "heartbeat" LED stops and restarts
  • Avalon: AUC disconnects and reconnects in the management software

Quick Checks

Check PSU Capacity

Verify the PSU is rated for the miner's power consumption with at least 10% headroom:

MinerPower DrawMinimum PSU
S19 Pro3,250W3,600W
S19 XP3,010W3,300W
S213,500W3,850W
M30S++3,472W3,800W
M50S3,276W3,600W
M60S3,600W4,000W
A13663,250W3,600W
A14663,100W3,400W

If you are using a third-party or used PSU, its actual capacity may be less than its rating. PSUs degrade over time, especially in dusty or hot environments.

Check Ambient Temperature

If ambient temperature exceeds 40°C, the miner may be hitting its thermal shutdown threshold and rebooting. Check the chip temperature history — if it reaches the maximum (typically 85–95°C depending on model) right before each restart, temperature is the cause.

Check Error Logs Before Reboot

The most valuable diagnostic step: read the error logs to see what happened immediately before the restart. The last few lines before a reboot reveal the protection mechanism that triggered.

# SSH into the miner quickly after a reboot
ssh root@<miner-ip>

# Check kernel log for shutdown reason
dmesg | tail -50

# Check miner log for error before shutdown
cat /tmp/log/bmminer.log | tail -50

# Check for temperature-related shutdowns
grep -i "temp\|overheat\|shutdown" /tmp/log/bmminer.log

# Check watchdog status
cat /proc/watchdog
ssh root@<miner-ip>

# Check miner log
tail -100 /tmp/btminerlog

# Look for fault codes
grep -i "fault\|error\|restart\|reboot" /tmp/btminerlog

Or use WhatsMiner Tool → "Log" tab. Download the full log file for analysis.

Check the AUC log through the management interface. Look for "over temperature", "over current", or "watchdog" messages immediately before each disconnect event.

Diagnostic Flowchart

Miner keeps rebooting

├─ Can you read error logs before reboot?
│  ├─ YES → What error appears last?
│  │  ├─ "Temp too high" / "overheat" → Temperature issue.
│  │  │  Go to Cause #1 below.
│  │  ├─ "Power supply fault" / PSU error → PSU issue.
│  │  │  Go to Cause #2 below.
│  │  ├─ "Watchdog" / no specific error → Software/firmware crash.
│  │  │  Go to Cause #3 below.
│  │  ├─ "Voltage error" / "chain error" → Hashboard fault.
│  │  │  Go to Cause #4 below.
│  │  └─ Kernel panic / segfault → Control board hardware issue.
│  │     Go to Cause #5 below.
│  └─ NO (reboots too fast to read logs) ↓

├─ Does the miner run longer with fewer hashboards installed?
│  ├─ YES → A hashboard is causing PSU overcurrent or
│  │        a board-level fault triggers the reboot.
│  │        Test each board individually.
│  └─ NO ↓

├─ Does the miner reboot even with NO hashboards installed?
│  ├─ YES → Control board or PSU issue (not hashboard-related).
│  │        Try a different PSU. If still rebooting,
│  │        control board fault.
│  └─ NO → Hashboard-related issue confirmed.
│          Install boards one at a time to identify the culprit.

Common Causes (Ordered by Probability)

1. Over-Temperature Shutdown and Restart

Probability: ~30%

The miner heats up under load, reaches the thermal shutdown threshold, powers off to protect the chips, cools down briefly, and then firmware initiates a restart. This cycle repeats indefinitely.

Indicators:

  • Chip temperatures reach 85–95°C before each restart
  • The miner runs longer in cooler ambient conditions
  • Fans are running at 100% speed
  • Restarts are more frequent during summer or midday

Root causes behind overheating:

  • Ambient temperature too high (above 35–40°C)
  • Dust-clogged heatsinks reducing airflow efficiency
  • Failed fan(s) reducing total airflow
  • Degraded thermal paste after 12+ months of operation
  • Recirculating airflow (hot exhaust feeding back into intake)
  • Miner positioned too close to a wall or other miners

Fix:

Immediate: Reduce Load

If possible, reduce the miner's power target or frequency to lower heat generation while you address the root cause.

Clean the Miner

Power off and blow compressed air through the heatsinks, removing all dust. Pay special attention to the intake side where dust accumulates most heavily.

Check Fans

Verify all fans are spinning at expected RPM. Replace any fans that are slow, noisy, or stopped. See Fan Error Troubleshooting.

Improve Airflow

Ensure at least 30cm (12 inches) of clearance at both intake and exhaust. If running multiple miners, stagger them to avoid hot exhaust from one feeding into the intake of another.

Reapply Thermal Paste

If the miner is over 12 months old and cleaning/fans do not resolve the issue, the thermal paste between chips and heatsinks has likely degraded. Remove the heatsinks, clean old paste, and reapply high-quality thermal paste (Arctic MX-6, Thermal Grizzly Kryonaut, or similar).

2. PSU Overload or Failure

Probability: ~25%

The PSU cannot sustain the miner's power demand. When the load exceeds the PSU's capacity, its overcurrent protection (OCP) trips and cuts power. The miner powers off completely, the PSU recovers, power returns, and the miner boots again — only to trip OCP again once hashing begins.

Indicators:

  • Restarts happen 2–5 minutes after boot (once hashing at full speed begins)
  • A "click" or "pop" sound from the PSU just before power loss
  • All LEDs go dark simultaneously (versus just the control board rebooting)
  • PSU is hot to the touch
  • Problem worsens if you increase the miner's frequency/power target

How to diagnose:

  1. Measure PSU output voltage under full load — if it dips below spec (e.g., below 11.5V for a 12V PSU), the PSU is failing
  2. Try the miner with a known-good PSU
  3. Try running with only 1 or 2 hashboards — if the miner is stable, the PSU cannot handle full load

Fix: Replace the PSU. Ensure the replacement is rated for at least 10% more than the miner's maximum power draw. For older PSUs, capacitor degradation is the most common cause of reduced capacity — capacitor replacement can restore the PSU if you have the skills, but PSU replacement is faster and more reliable.

3. Firmware Bug / Watchdog Reset

Probability: ~18%

The miner's firmware crashes or hangs, and the hardware watchdog timer resets the control board. This is a software-level issue — the hardware may be perfectly fine.

Indicators:

  • The restart interval is fairly consistent (e.g., always around 10–15 minutes)
  • Kernel log shows "watchdog" or "wdt" messages before the reset
  • The problem appeared after a firmware update
  • No temperature or power anomalies in the logs before the reset
  • The miner runs stable on a different firmware version

Fix:

Try a Different Firmware Version

Flash a known-stable firmware version for your model. If you recently updated, try reverting to the previous version.

Factory Reset

Perform a full factory reset to clear any corrupt configuration that might be causing a crash loop.

Try Third-Party Firmware

If stock firmware consistently crashes, try Braiins OS, VNish, or LuxOS (for supported models). Different firmware implementations may not have the same bug.

Some firmware versions have known bugs that cause watchdog resets under specific conditions (certain pool configurations, specific chip counts, etc.). Check the firmware release notes and community forums for known issues with your version.

4. Hashboard Short Causing PSU Protection Trip

Probability: ~12%

A short circuit on one hashboard can draw enough excess current to trip the PSU's overcurrent protection, shutting down all boards. This looks like a PSU issue but is actually caused by the hashboard.

Indicators:

  • Restarts happen under load (same timing as PSU overload)
  • Removing one specific board makes the problem disappear
  • The suspect board has visible damage or burns
  • PSU output voltage is fine when tested without the faulty board

How to diagnose:

  1. Remove all hashboards
  2. Install them one at a time, testing stability with each
  3. The board that causes restarts when installed is the culprit
  4. Measure resistance across the power input of the suspect board (should be 2–50 ohms; less than 0.5 ohms indicates a short)

Fix: Locate and repair the short on the hashboard. See Abnormal Chip Voltage Troubleshooting for techniques to find shorted components on a hashboard.

5. Control Board Memory or Hardware Issue

Probability: ~8%

The control board itself has a hardware fault — failing NAND flash, bad RAM, or a degraded processor — causing crashes.

Indicators:

  • Restarts happen even with no hashboards installed
  • Kernel log shows memory errors, NAND errors, or filesystem corruption messages
  • The problem persists across different firmware versions
  • The miner was subjected to a power surge or lightning strike

Fix: Replace the control board. Control board-level repair (NAND replacement, RAM replacement) is possible but requires specialized equipment and is typically not cost-effective compared to a replacement board.

6. Electrical Issues (Input Power)

Probability: ~7%

Unstable input power to the PSU — voltage sags, brownouts, or poor wiring — can cause the PSU to lose input and restart.

Indicators:

  • Other devices on the same circuit also experience issues
  • Restarts correlate with times of high electrical demand (air conditioning starting, other equipment turning on)
  • Restarts are more frequent at certain times of day
  • The miner is on a long extension cord or undersized wiring

Fix: Ensure the miner is on a dedicated circuit with appropriate amperage. Check all connections from the wall outlet to the PSU input. For mining farms, install a UPS or power conditioner to filter input power quality.

How to Read Kernel Logs via SSH

The kernel log is your best friend for diagnosing restart causes. Here is how to capture it effectively:

Connect via SSH Immediately After Boot

The log from the previous session may be lost after reboot (it is stored in RAM on most miners). You need to either:

  • Connect quickly after a reboot and read the log before the next crash
  • Set up remote syslog to capture logs on a separate machine
# Quick connection after reboot
ssh root@<miner-ip>

# Save the kernel log immediately
dmesg > /tmp/saved_dmesg.txt

# Save the miner application log
cp /tmp/log/bmminer.log /tmp/saved_bmminer.log 2>/dev/null

Set Up Remote Syslog (Persistent Logging)

For miners that reboot too quickly to read logs, send logs to a remote syslog server:

# On the miner (Antminer example)
echo "*.* @<syslog-server-ip>:514" >> /etc/syslog.conf
/etc/init.d/syslog restart

This sends all log messages to a remote server where they persist across miner reboots.

Interpret the Log

Look for these key patterns in the last 20–50 lines before the restart:

Log PatternMeaning
Temp too highThermal shutdown
fan speed errorFan failure triggered shutdown
Watchdog or wdtSoftware crash / hang
Kernel panicCritical software or hardware error
Out of memoryRAM exhaustion
NAND read errorFlash storage failing
voltage errorHashboard voltage fault
overcurrentPSU protection triggered
No error before abrupt endPSU power loss (external)

When to Seek Professional Help

Consider professional assistance if:

  • The miner reboots even with no hashboards and a known-good PSU — this strongly indicates a control board failure
  • Kernel logs show persistent NAND or memory errors — the control board storage is failing
  • You suspect electrical wiring issues at the facility level — have an electrician inspect the circuit
  • The miner went through a power surge or lightning event — multiple components may be damaged simultaneously

Frequently Asked Questions

How do I tell the difference between a PSU restart and a control board restart?

A PSU restart causes a complete power loss — all LEDs go dark, fans stop, and everything powers up from scratch. A control board restart keeps the PSU running (fans may continue spinning briefly) and only the control board reboots. Watch the fans: if they stop completely during each restart, it is a PSU issue. If they briefly stutter but the PSU stays on, it is a control board issue.

My miner runs fine for hours then restarts once and runs fine again. Should I worry?

An occasional single restart (once per day or less) is often caused by a transient event — a brief power fluctuation, a pool connection timeout triggering a firmware restart, or a single temperature spike. If it is truly rare and the miner recovers and runs normally, it is not urgent. Monitor the frequency — if restarts increase over time, investigate.

Can firmware corruption cause a restart loop?

Yes. If the firmware image is partially corrupted, the miner may boot partially, encounter the corrupt section, crash, and reboot. This creates a consistent restart loop where the miner never fully starts. The fix is to reflash firmware via SD card (Antminer), USB recovery (Whatsminer), or TFTP (Avalon).

Why does my miner restart more in summer?

Higher ambient temperature causes faster thermal rise, reaching the shutdown threshold sooner. A miner that barely stays within thermal limits at 25°C ambient will thermally throttle and restart when ambient reaches 35°C+. Solutions: improve ventilation, add intake cooling, reduce power target during hot months, or switch to immersion cooling.

Is it safe to disable the watchdog timer?

No. The watchdog timer exists to recover from firmware crashes. Disabling it means that if the firmware hangs, the miner will stay hung indefinitely with zero hashrate instead of rebooting and recovering. The watchdog is a safety net, not the problem — fix the root cause of the crash instead.