Uncategorized

watchdog

2
1

I’m building a system with a raspberry pi located in a very remote area connected to internet with an internet stick. The tests are promising so far but the pi freezes every here and then and I’m not able to connect to the pi anymore. Because I don’t want to take a 2 hour drive everytime it freezes I want to build a redundant system which checks the other system.
The worst case will be to cut the frozen system from power to reboot. This should be done by the working pi.

Now the question as a total noob when it comes to building electronics.

I checked out the ATXRaspi R3 but I’m not sure how to “digitally” fire off the 6sec press on that power controller to cut the power by the other pi…

What would be the easiest way to cut power by another pi? Any hints are greatly welcomed.

 New contributor
  • 1
    Not sure anyone is going to design this circuit for you. But one additional thing to consider: Whatever causes the first Pi to freeze might have a common failure mode to the second Pi. For example, if it’s freezing because of a power fluctuation, you might end up with two frozen Pis instead of the independent redundancy that you want. Might be worth trying to understand why that first Pi freezes first. – Brick Jun 14 at 12:33
  • 1
    How quickly do you need the pi to come back online? A simple holiday light timer could cycle the power every X hours, as long as you don’t mind waiting until the reset interval to have it back online again. – TimJun 14 at 17:30
  • @Jurudocs, I followed #berto’s watchdog timer tutorial and found everything good. I don’t quite understand what the watchdog is doing, but I am 90% sure that the watchdog timer method should solve your problem, much cleaner to my proposed hardware solution. – tlfong01 yesterday   

5 Answers

2

Question

Remote Rpi’s freeze from time to time. How to wake them up?

Answer

2019jun17 Updates

So I tried the fork bomb. The system rebooted after executing the program, in about 15 seconds.

fork bomb test results

2019jun16 Updates

I found @berto’s fork bomb program is a bit newbie scary. So I am learning Bash to find out what that fork bomb is doing. Basically it is just a function named “:”, which is defined as a function calling itself two times, thus forking indefinitely, as fast as rabbits growing exponentially, using up all the resources, and crashing linux.

fork bomb

I have also found the following interesting version of forkbomb using Unicode symbols:

💣 ( ) { 💣 | 💣 & } ; 💣

2019jun14/15 Updates

@thesnow suggests a very nice layered approach using a smart plug. I think the smart plug or smart IoT stuff is the way to go. However, I am a not so smart newbie in smart stuffm though I am keen to learn. So I am going to buy a smart plug, do some research, and improve my answer afterwards. For now, I have added some related learning resources in the reference section below.

I found @berto’s suggestion of using Rpi’s hardware watchdog timer also very good. I have not played with any watchdoog stuff before. So I am going to try it now. @berto’s instructions are very detailed, but still a bit hard for me, because I don’t know very well the meaning of the commands “grep” and “dmseg”. So I googled and made some reading notes in the appendices below. Then I followed @berto’s suggestion, and strugged a bit to complete part 1. I have not yet reboot, because I need to take a break to digest things. Anyway, here is the screen capture.

watchdog_test_2019jun1501

I rebooted and got the following dmesg:

watchdog 3

I think I am going too fast and now need to take a break to first study more linux things, like systemd, before coming back to carry on the test on watchdog.

systemd architecture

/ to continue, …

The Answer

I have the same problem. I am building a rooftop garden with a couple of Rpi’s each of which connects to various wireless stuff (BlueTooth, Wifi) sensors, relays, and solenoids. There are two huge motors near by, controlling big water tanks and lifts. The motors generate EMI and from time to time freeze nearby electronics things.

My plan is to use software switchable PSUs (Power Supply Units) to power switch off/on frozen Rpi’s and other devices (Bluetooth devices freeze most often. The BlueTooth and other little devices do not have any software reset command or hardware reset pin, so powering off/on their 5V Vcc is a quick and dirty, but still safe get around). In short, The Rpi’s regularly watch each other and their devices and POR (Power On Reset) any guy fallen to sleep.

Of course I can also use a GPIO pin to trigger the Rpi hardware on board reset pin. But I am too lazy to do extra wiring, and too poor a hobbyist to afford professional/industrial grade non stop system devices such as the SwitchDoc Labs Dual WatchDog Timer (see reference below)

I modify ordinary DC-DC (12V to 5V) PSUs’ so that any Rpi or MCP23x17 GPIO pins can power on/off the LM2956/LM2947 voltage regulator chip of the PSU. (LM2941 can be used for 1A current switches, LM2596 for 5V 3A PSU. The on/off pin is also connected to a push button, for manual power on/off testing.)

Actually each of my 7 Rpi3B+’s is connected to a cheapy DS3231 Real Time Clock Module which has a hardware interrupt pin to reset PSU, Rpi, or other devices.

Whenever possible and practical I tie up all the devices’ reset pins together (removing some of the pull up resistors, so not to overload the GPIO pin).

Now the external DS3231 RTC wakes up everybody in the morning, and switches off lights at midnight, so everybody goes to bed.

software switchable PSU

software switch PSU

software switch

References

References related to this answer

LM2596/LM2941 Based Software Resettable PSU / Current Switches – Rpi StkEx Discussion

Rpi Hardware watchdog Discussion

SwitchDoc Labs Dual WatchDog Timer

ATXRaspi R3 – LowPowerLab US$14.95

References for proposed smart plug solution

A hackable ESP8266 inside a smart plug Want to play with ESP8266 without worrying about the hardware? – Mat 2017aug06

Reverse Engineering 101 of the Xiaomi IoT ecosystem HITCON Community 2018 – Dennis Giese

Xiaomi WiFi socket + MiHome app 21,307 views

espHome [ESP8266/ESP32]

AliExpress WiFi Smart Plug

Smart device -Wikipedia

WiFi Garage Door Opener using ESP8266 – Ray Wang 2016may13 56,335 views

Appendices

Note – The following appendices are mainly a newbie’s research or reading notes. They are very long winded, and would be pruned or removed later.

Appendix A – WatchDog Timer Reading Notes

Watchdog timer -Wikipedia

Linux WatchDog Man Page

Linux Watchdog – General Tests

Watch Dog Timer Reading notes

A watchdog timer is an electronic timer that is used to detect and recover from computer malfunctions. During normal operation, the computer regularly resets the watchdog timer to prevent it from elapsing, or “timing out”. If, due to a hardware fault or program error, the computer fails to reset the watchdog, the timer will elapse and generate a timeout signal. The timeout signal is used to initiate corrective action or actions. The corrective actions typically include placing the computer system in a safe state and restoring normal system operation.

Watchdog timers are commonly found in embedded systems and other computer-controlled equipment where humans cannot easily access the equipment or would be unable to react to faults in a timely manner. In such systems, the computer cannot depend on a human to invoke a reboot if it hangs; it must be self-reliant. For example, remote embedded systems such as space probes are not physically accessible to human operators; these could become permanently disabled if they were unable to autonomously recover from faults. A watchdog timer is usually employed in cases like these. Watchdog timers may also be used when running untrusted code in a sandbox, to limit the CPU time available to the code and thus prevent some types of denial-of-service attacks.

Architecture and operation – Watchdog restart

The act of restarting a watchdog timer, commonly referred to as “kicking” the watchdog, is typically done by writing to a watchdog control port. Alternatively, in microcontrollers that have an integrated watchdog timer, the watchdog is sometimes kicked by executing a special machine language instruction or setting a specific bit in a register. An example of this is the CLRWDT (clear watchdog timer) instruction found in the instruction set of some PIC microcontrollers.

In computers that are running operating systems, watchdog resets are usually invoked through a device driver. For example, in the Linux operating system, a user space program will kick the watchdog by interacting with the watchdog device driver, typically by writing a zero character to /dev/watchdog. The device driver, which serves to abstract the watchdog hardware from user space programs, is also used to configure the time-out period and start and stop the timer.

Single-stage watchdog

Watchdog timers come in many configurations, and many allow their configurations to be altered. Microcontrollers often include an integrated, on-chip watchdog. In other computers the watchdog may reside in a nearby chip that connects directly to the CPU, or it may be located on an external expansion card in the computer’s chassis. The watchdog and CPU may share a common clock signal, as shown in the block diagram below, or they may have independent clock signals.single stage watch dog

Fault detection

A computer system is typically designed so that its watchdog timer will be kicked only if the computer deems the system functional. The computer determines whether the system is functional by conducting one or more fault detection tests and it will kick the watchdog only if all tests have passed. In computers that are running an operating system and multiple processes, a single, simple test may be insufficient to guarantee normal operation, as it could fail to detect a subtle fault condition and therefore allow the watchdog to be kicked even though a fault condition exists.

For example, in the case of the Linux operating system, a user-space watchdog daemon may simply kick the watchdog periodically without performing any tests. As long as the daemon runs normally, the system will be protected against serious system crashes such as a kernel panic. To detect less severe faults, the daemon can be configured to perform tests that cover resource availability (e.g., sufficient memory and file handles, reasonable CPU time), evidence of expected process activity (e.g., system daemons running, specific files being present or updated), overheating, and network activity, and system-specific test scripts or programs may also be run.

Upon discovery of a failed test, the Linux watchdog daemon may attempt to perform a software-initiated restart, which can be preferable to a hardware reset as the file systems will be safely unmounted and fault information will be logged. However it is essential to have the insurance of the hardware timer as a software restart can fail under a number of fault conditions. In effect, this is a dual-stage watchdog with the software restart comprising the first stage and the hardware reset the second stage.

Daemon – Wikipedia

In multitasking computer operating systems, a daemon is a computer program that runs as a background process, rather than being under the direct control of an interactive user. Traditionally, the process names of a daemon end with the letter d, for clarification that the process is in fact a daemon, and for differentiation between a daemon and a normal computer program. For example, syslogd is the daemon that implements the system logging facility, and sshd is a daemon that serves incoming SSH connections.

In a Unix environment, the parent process of a daemon is often, but not always, the init process.

Systems often start daemons at boot time which will respond to network requests, hardware activity, or other programs by performing some task. Daemons such as cron may also perform defined tasks at scheduled times.

Terminology

The term was coined by the programmers of MIT’s Project MAC. They took the name from Maxwell’s demon, an imaginary being from a thought experiment that constantly works in the background, sorting molecules.

The word daemon is an alternative spelling of demon,

Alternate terms for daemon are service (used in Windows, from Windows NT onwards – and later also in Linux)

Implementations

MS-DOS

In the Microsoft DOS environment, daemon-like programs were implemented as terminate and stay resident (TSR) software.

Windows NT

On Microsoft Windows NT systems, programs called Windows services perform the functions of daemons. They run as processes, usually do not interact with the monitor, keyboard, and mouse, and may be launched by the operating system at boot time.

Daemons have no particular bias towards good or evil, but rather serve to help define a person’s character or personality. The ancient Greeks’ concept of a “personal daemon” was similar to the modern concept of a “guardian angel”

A further characterization of the mythological symbolism is that a daemon is something which is not visible yet is always present and working its will. In the Theages, attributed to Plato, Socrates describes his own personal daemon to be something like the modern concept of a moral conscience: The favour of the gods has given me a marvelous gift, which has never left me since my childhood. It is a voice which, when it makes itself heard, deters me from what I am about to do and never urges me on.

Appendix B – Linux commands grep and dmesg reading notes

grep – Wikipedia

grep is a command-line utility for searching plain-text data sets for lines that match a regular expression. Its name comes from the ed command g/re/p (globally search a regular expression and print), which has the same effect: doing a global search with the regular expression and printing all matching lines.

Grep was originally developed for the Unix, first version of grep written by Ken Thompson in PDP-11 assembly language to analyze the text.

The ed text editor (also authored by Thompson) had regular expression support but could not be used on such a large amount of text, so Thompson excerpted that code into a standalone tool.

Thompson chose the name because in ed, the command g/re/p would print all lines matching a previously defined pattern.

grep was first included in Version 4 Unix.

In the Perl programming language, grep is the name of the built-in function that finds elements in a list that satisfy a certain property. This higher-order function is typically named filter in functional programming languages.

Ports of grep (within Cygwin and GnuWin32, for example) also run under Microsoft Windows.

Usage as a verb

In December 2003, the Oxford English Dictionary Online added draft entries for “grep” as both a noun and a verb.

A common verb usage is the phrase “You can’t grep dead trees” — meaning one can more easily search through digital media, using tools such as grep, than one could with a hard copy (i.e., one made from dead trees, paper). Compare with google.

dmesg – Wikipedia

dmesg (display message or driver message) is a command on most Unix-like operating systems that prints the message buffer of the kernel. The output of this command typically contains the messages produced by the device drivers.

Booting

When initially booted, a computer system loads its kernel into memory. At this stage device drivers present in the kernel are set up to drive relevant hardware. Such drivers, as well as other elements within the kernel, may produce output (“messages”) reporting both the presence of modules and the values of any parameters adopted. (It may be possible to specify boot parameters which control the level of detail in the messages.) The booting process typically happens at a speed where individual messages scroll off the top of the screen before an operator can read/digest them. (Some keyboard keys may pause the screen output.) The dmesg command allows the review of such messages in a controlled manner after the system has started.

After booting

Even after the system has fully booted, the kernel may occasionally produce further diagnostic messages. Common examples of when this might happen are when I/O devices encounter errors, or USB devices are hot-plugged. dmesg provides a mechanism to review these messages at a later time. When first produced they will be directed to the system console: if the console is in use then these messages may be confused with or quickly overwritten by the output of user programs.

Output

The output of dmesg can amount to many complete screens. For this reason, this output is normally reviewed using standard text-manipulation tools such as more, tail, less or grep. The output is often captured in a permanent system logfile via a logging daemon, such as syslog.

Appendix C – systemd references

systemd System and Service Manager – FreeDeskTop

systemd is a suite of basic building blocks for a Linux system. It provides a system and service manager that runs as PID 1 and starts the rest of the system.

systemd provides aggressive parallelization capabilities, uses socket and D-Bus activation for starting services, offers on-demand starting of daemons, keeps track of processes using Linux control groups, maintains mount and automount points, and implements an elaborate transactional dependency-based service control logic.

systemd supports SysV and LSB init scripts and works as a replacement for sysvinit. Other parts include a logging daemon, utilities to control basic system configuration like the hostname, date, locale, maintain a list of logged-in users and running containers and virtual machines, system accounts, runtime directories and settings, and daemons to manage simple network configuration, network time synchronization, log forwarding, and name resolution.

See Lennart’s blog story for a longer introduction, and the three status updates since then. Also see the Wikipedia article. If you are wondering whether systemd is for you, please have a look at this comparison of init systems by one of the creators of systemd.

systemd – Wikipedia

The systemd software suite provides fundamental building blocks for a Linux operating system. It includes the systemd “System and Service Manager”, an init system used to bootstrap user space and manage user processes.

systemd aims to unify service configuration and behavior across Linux distributions. It replaces the UNIX System V and BSD init systems. Since 2015, the majority of Linux distributions have adopted systemd, and it is considered a de facto standard.

The name systemd adheres to the Unix convention of naming daemons by appending the letter d. It also plays on the term “System D”, which refers to a person’s ability to adapt quickly and improvise to solve problems.

Design

Lennart Poettering and Kay Sievers, the software engineers working for Red Hat who initially developed systemd,4 sought to surpass the efficiency of the init daemon in several ways. They wanted to improve the software framework for expressing dependencies, to allow more processing to be done concurrently or in parallel during system booting, and to reduce the computational overhead of the shell.

Poettering describes systemd development as “never finished, never complete, but tracking progress of technology”. In May 2014, Poettering further described systemd as unifying “pointless differences between distributions”, by providing the following three general functions:

A system and service manager (manages both the system, by applying various configurations, and its services)

A software platform (serves as a basis for developing other software)

The glue between applications and the kernel (provides various interfaces that expose functionalities provided by the kernel)

systemd is not just the name of the init daemon but also refers to the entire software bundle around it, which, in addition to the systemd init daemon, includes the daemons journald, logind and networkd, and many other low-level components. In January 2013, Poettering described systemd not as one program, but rather a large software suite that includes 69 individual binaries. As an integrated software suite, systemd replaces the startup sequences and runlevels controlled by the traditional init daemon, along with the shell scripts executed under its control. systemd also integrates many other services that are common on Linux systems by handling user logins, the system console, device hotplugging (see udev), scheduled execution (replacing cron), logging, hostnames and locales.

Like the init daemon, systemd is a daemon that manages other daemons, which, including systemd itself, are background processes. systemd is the first daemon to start during booting and the last daemon to terminate during shutdown. The systemd daemon serves as the root of the user space’s process tree; the first process (PID 1) has a special role on Unix systems, as it replaces the parent of a process when the original parent terminates. Therefore, the first process is particularly well suited for the purpose of monitoring daemons; systemd attempts to improve in that particular area over the traditional approach, which would usually not restart daemons automatically but only launch them once without further monitoring.

systemd executes elements of its startup sequence in parallel, which is faster than the traditional startup sequence’s sequential approach. For inter-process communication (IPC), systemd makes Unix domain sockets and D-Bus available to the running daemons. The state of systemd itself can also be preserved in a snapshot for future recall.

Core components and libraries

Following its integrated approach, systemd also provides replacements for various daemons and utilities, including the startup shell scripts, pm-utils, inetd, acpid, syslog, watchdog, cron and atd. systemd’s core components include the following:

systemd is a system and service manager for Linux operating systems.

systemctl may be used to introspect and control the state of the systemd system and service manager.

systemd-analyze may be used to determine system boot-up performance statistics and retrieve other state and tracing information from the system and service manager.

systemd tracks processes using the Linux kernel’s cgroups subsystem instead of using process identifiers (PIDs); thus, daemons cannot “escape” systemd, not even by double-forking. systemd not only uses cgroups, but also augments them with systemd-nspawn and machinectl, two utility programs that facilitate the creation and management of Linux containers.20 Since version 205, systemd also offers ControlGroupInterface, which is an API to the Linux kernel cgroups. The Linux kernel cgroups are adapted to support kernfs,22 and are being modified to support a unified hierarchy.

systemd for Administrators, Part XV [Watchdog] – Pid Eins 2012jun28

Appendix D – Fork and Fork Bomb References

Fork (system call) Wikipedia

In computing, particularly in the context of the Unix operating system and its workalikes, fork is an operation whereby a process creates a copy of itself. It is usually a system call, implemented in the kernel. Fork is the primary (and historically, only) method of process creation on Unix-like operating systems.

Overview

In multitasking operating systems, processes (running programs) need a way to create new processes, e.g. to run other programs. Fork and its variants are typically the only way of doing so in Unix-like systems. For a process to start the execution of a different program, it first forks to create a copy of itself. Then, the copy, called the “child process”, calls the exec system call to overlay itself with the other program: it ceases execution of its former program in favor of the other.

The fork operation creates a separate address space for the child. The child process has an exact copy of all the memory segments of the parent process. In modern UNIX variants that follow the virtual memory model from SunOS-4.0, copy-on-write semantics are implemented and the physical memory need not be actually copied. Instead, virtual memory pages in both processes may refer to the same pages of physical memory until one of them writes to such a page: then it is copied. This optimization is important in the common case where fork is used in conjunction with exec to execute a new program: typically, the child process performs only a small set of actions before it ceases execution of its program in favour of the program to be started, and it requires very few, if any, of its parent’s data structures.

When a process calls fork, it is deemed the parent process and the newly created process is its child. After the fork, both processes not only run the same program, but they resume execution as though both had called the system call. They can then inspect the call’s return value to determine their status, child or parent, and act accordingly.

Communication

The child process starts off with a copy of its parent’s file descriptors. For interprocess communication, the parent process will often create a pipe or several pipes, and then after forking the processes will close the ends of the pipes that they don’t need.

Fork bomb -Wikipedia

Python fork bomb

import os
while True:
    os.fork()

Appendix E – Bash Learning Notes

GNU Bash 5.0 Manual

At its base, a shell is simply a macro processor that executes commands. The term macro processor means functionality where text and symbols are expanded to create larger expressions.

A Unix shell is both a command interpreter and a programming language. As a command interpreter, the shell provides the user interface to the rich set of gnu utilities. The programming language features allow these utilities to be combined. Files containing commands can be created, and become commands themselves. These new commands have the same status as system commands in directories such as /bin, allowing users or groups to establish custom environments to automate their common tasks.

Shells may be used interactively or non-interactively. In interactive mode, they accept input typed from the keyboard. When executing non-interactively, shells execute commands read from a file.

A shell allows execution of gnu commands, both synchronously and asynchronously.

The shell waits for synchronous commands to complete before accepting more input; asynchronous commands continue to execute in parallel with the shell while it reads and executes additional commands. The redirection constructs permit fine-grained control of the input and output of those commands. Moreover, the shell allows control over the contents of commands’ environments.

Shells also provide a small set of built-in commands (builtins) implementing functionality impossible or inconvenient to obtain via separate utilities. For example, cd, break, continue, and exec cannot be implemented outside of the shell because they directly manipulate the shell itself. The history, getopts, kill, or pwd builtins, among others, could be implemented in separate utilities, but they are more convenient to use as builtin commands. All of the shell builtins are described in subsequent sections.

While executing commands is essential, most of the power (and complexity) of shells is due to their embedded programming languages. Like any high-level language, the shell provides variables, flow control constructs, quoting, and functions.

Shells offer features geared specifically for interactive use rather than to augment the programming language. These interactive features include job control, command line editing, command history and aliases.

  • Such a great answer! Thanks also for the pictures. Glad that you didn’t took it just for this question 😀 So I guess what I need is the LM25966S PSU to connect it to the GPIO as you said. I will try!!! Good that I have still my old soldering iron… – Jurudocs Jun 14 at 8:55
  • @Jurudocs Thank your for your nice words. I cut and pasted, and modify my old answers for your question, so it did not take me much time. I am a PSU hobbyist, and I DIYed PSUs using LM2596 chips and inductor coils etc. But nowadays everything goes SMD and assembled modules are dirt cheap, so I have been lazy to “make” things. By the way, to messy around the LM2596 PSU, you don’t need to test by using Rpi GPIO. You can just test by hand! 🙂 Good luck! – tlfong01 Jun 14 at 9:15    
  • I noticed you mentioned reading up on Systemd. While I definitely recommend you do that because it’s a significant component to the way modern Linux systems work, fully understanding it is going to take a long time and not necessary to try out the watchdog. 🙂 – berto 2 days ago
  • 1
    @berto, I agree it might take me a very long time to understand the complicated SystemD. As Poettering says: “[systemd] never finished, never complete, but tracking progress of technology”. I remember Oliver Heaviside, saying: “Am I to refuse to eat because I do not fully understand the mechanism of digestion?” – en.wikiquote.org/wiki/Oliver_Heaviside So I will forget systemd now and come back to watchdog. Actually I need to learn Bash first, before I can understand the weird Bash script of Fork Bomb. – tlfong01 yesterday  
  • The fork bomb line is pretty simple once you understand what you are looking at. It’s a function named :that calls itself recursively and puts a copy of itself in the background which also calls itself recursively. The Wikipedia page you have in your notes explains this further. – berto yesterday
  • Well, I was not aware that the symbol “:” can be a function name. In the beginning, I wrongly thought that the function has no name, a “lambda”, in other words. I guess over 90% of the visitors in this forum don’t understand what is the idea of recursion, not to mention double recursion used here. Recursion in mathematics is an algorithm that would come to an end and problem solved. In this case, there is no end. IT IS INCORRECT AND MISLEADING to call the function recursive. Function calling itself, is not recursion in full sense, or according to the rigorous mathematical definition. – tlfong01 yesterday   
  • I found your older answer (6 years ago!) here: raspberrypi.stackexchange.com/questions/3732/… Things are more complicated than I thought, so I will spend more time before defusing the bomb. 🙂 – tlfong01yesterday    
  • I checked everything OK. So I executed the fork bomb. As you expected, the system rebooted in about 15 seconds. To summarize, I followed your nice tutorials and found everything good, though I don’t quite understand what is going on. I need to spend more time to understand you commands, before I know how to set the watchdog timer. – tlfong01 yesterday   

4

Cutting power is a brute force method and has risks.

The conventional solution to lock-up problems is to use a watchdog.

There is a BCM hardware watchdog; If you want to start the hardware watchdog include dtparam=watchdog=on in /boot/config.txt

In and of itself this does little, although it should restart the system if not “kicked” regularly. You can write code which opens /dev/watchdog to kick it off.

There is also a watchdog daemon which you can configure to activate the watchdog; you should be able to start with sudo systemctl enable watchdog

PS Incidentally, if you want to pursue the brute force approach – don’t bother cutting power – just pull the Reset pin (labeled RUN) low. This is equivalent to powering off then on again.

2

Before you go looking into additional hardware, please read up on what’s called a “watchdog timer”. The Raspberry Pi has a hardware watchdog built in that will power cycle it if the chip is not refreshed within a certain interval.

I have setup the watchdog on a Raspberry Pi 3 and a new’ish version of Raspbian with very little configuration. The first thing to check is that the hardware watchdog is available (I checked my system and it looks like the version of Raspbian I have installed compiles watchdog support right into the kernel; no need to load a kernel module):

pi@unicornpi:~ $ ls -al /dev/watchdog*
crw------- 1 root root  10, 130 Nov  3  2016 /dev/watchdog
crw------- 1 root root 252,   0 Nov  3  2016 /dev/watchdog0

If you see /dev/watchdog you’re all set. All you have to do is configure the watchdog facility built into Systemd.

In the file /etc/systemd/system.conf, set the following lines:

pi@unicornpi:~ $ grep Watchdog /etc/systemd/system.conf
RuntimeWatchdogSec=10
ShutdownWatchdogSec=10min

What the lines above say is:

  • refresh the hardware watchdog every 10 seconds. if for some reason the refresh fails (I believe after 3 intervals; i.e. 30s) power cycle the system
  • on shutdown, if the system takes more than 10 minutes to reboot, power cycle the system

Once you have this configured and reboot, you will see something like this in the dmesg logs:

pi@orangepi:~ $ dmesg | grep -i watchdog
[    0.763148] bcm2835-wdt 3f100000.watchdog: Broadcom BCM2835 watchdog timer
[    1.997557] systemd[1]: Hardware watchdog 'Broadcom BCM2835 Watchdog timer', version 0
[    2.000728] systemd[1]: Set hardware watchdog to 10s.

If you see Set hardware watchdog to 10s you’re all set.

The best way I’ve found to verify that the watchdog works is to overload the system. I’ve done this with a “fork bomb”, which will completely saturate the system with garbage process forks. If you run this the Pi will become unresponsive and the watchdog should kick in. Your system should be up and running again after about a minute:

:(){ :|:& };:

Paste that into a shell and your system will be taken down. You’ve been warned.

More info on the watchdog system built into Systemd is on the author’s website.

  • Many thanks for advice. I have heard watchdog for a long time but never tried it, because no necessity, until now, building smart rooftop garden away from home (actually 50 feet above home). Another reason did not try because tutorials not newbie friendly. When started Rpi1 years ago, I found terminal commands very scary (it took me more than three hours to download a zip (tar actually) and extracted it, but I did not know where to find the extracted files!) Now I find terminal commands not that scary, but sometimes very efficient, though I still love Win PowerShell terminal commands, … – tlfong01 Jun 15 at 4:03   
  • And the advice at the beginning of your answer of first reading up what is a watch dog is very good. I did not know that watchdog is actually “watchdog TIMER” in short. This is important because if I know it is a timer beforehand, I can understand things better. And as usual, I started with Wiki, which is always a good read for newbies. Now I know that watch dog is actually some sort of hardware sitting alongside the Rpi. So even Rpi messes up things, the outside guy can come to rescue (or “kick in”?). Reading Wiki let me know that “kick in” is not slang, but technical term. – tlfong01 Jun 15 at 5:10   
  • I also didn’t know what is a “daemon”. When I was a child, I read the Bible that daemon is a bad guy, so righteous programmers like me should not use daemons, otherwise I might go be Hell. But then Wiki tells me who the MIT/UNIX guys coined the name and why it spells “daemon” not demon. It also clarifies that daemons can be good and even the righteous guy Socrates owns a daemon. Anyway, I finished reading Wikis, and now ready to start your tutorials, 🙂 – tlfong01 Jun 15 at 5:16   
  • So I have followed your very detailed watchdog tutorial and found everything OK to the point of setting the watchdog to 10 seconds. Next step is to try a fork bomb, perhaps late this evening or tomorrow. – tlfong01 2 days ago   
  • Thank you for suggesting to call it a “watchdog timer”. I’ve made the edit 👍🏽 – berto 2 days ago

1

I have quite a few Pis. All of them, except one ran flawlessly. The problem child would crash periodically and would never recover after a power outage without being power cycled again. I had it reboot itself every night via cron and that helped somewhat.

What fixed it though was taking the SD card and sensor hardware and putting them into another Pi. It has run without error ever since. Maybe you too have a hardware issue.

 New contributor
  • I didn’t catch your second paragraph about the hardware problem. Did you mean that the SD card and sensor caused all the trouble, and replacing them solved the problem? – tlfong01 Jun 15 at 2:44   
  • No, The Pi itself was the problem. I had a spare one, so I transferred the SD card and the sensors to the spare and used it instead of the original. No problems since. – Wildbill yesterday
  • I see. So it is always a good idea to have a spare Rpi for swap troubleshooting. Perhaps the OP should also consider this. – tlfong01 yesterday   

0

If you have wi-fi and just need to power off / power on, you could also consider using a smart plug. Amazon makes one for ~$25, you can power it on / off remotely and also set up timer routines if that’s preferable. I’ve had a few for several months and they’re quite reliable. You don’t actually need an Echo or any other dedicated device. I use my smart phone. Amazon Smart Plug

Edit: I realize this doesn’t provide a solution to the first part of the question, but if I had the prospect of a 2 hour drive if something went wrong I’d consider a layered approach.

 New contributor
  • , I appreciate very much your suggestion of a layered approach, with a smart plug at the top layer. Actually some months I have been trying to DIY a smart plug based on the ESP8266 WiFi controller. However I found the ESP8266 with NodeMCU Lua has a very steep learning curve. It took the newbie, ie, me over 100 hours just to blink a LED (compared to less than one hour writing an Arduino or Rpi blinky program) So I sadly gave up and now decide cheat by buying a ESP8266 XiaoMi smart plug and modify it. I am going to add your suggestion to my answer soon. Many thanks again! 🙂 – tlfong01 Jun 15 at 2:17   

Categories: Uncategorized

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: