stack overflow problem


I am working on a project with a PI compute module set up on a custom PCB. The project has to reliably run ideally indefinitely while logging sensor data and writing video and audio to a usb when the device is triggered. The program that handles all of this is written in is a multiprocessed/threaded Python program that starts on boot. When it is recording audio and video, it uses around 35% of CPU and around 15% when it is not. The system runs from a solar panel and battery

It recently crashed after around 3.5 days of running reliably. The logging that I set up within the program did not indicate that anything was wrong, and I have a system set up that restarts within the python program if any process or threads does go down, so the problem was not the python program itself.

The following was found in /var/syslog (I will edit in the full syslog if someone thinks it will be useful):

May 19 12:28:02 raspberrypi kernel: [322680.792629] INFO: task scsi_eh_0:69 blocked for more than 120 seconds.

May 19 12:28:02 raspberrypi kernel: [322680.792893] INFO: task usb-storage:71 blocked for more than 120 seconds.

May 19 12:28:02 raspberrypi kernel: [322680.793216] INFO: task python3:724 blocked for more than 120 seconds.

These are timestamped about 4 minutes after the last log of the python program. The last few of these logs indicate that it was recording audio and video, and that the CPU and MEM usage were 14.6% and 23.8% respectively a maximum of a minute beforehand.

according to the below link this error message appears when “whole process has not been scheduled for any CPU-time for 120 seconds”. I haven’t been able to find any other useful references to this error as most seem to refer to high-load servers.

Is this error in syslog a result of the PI itself or could it be caused by my code that writes audio and video, or could it be a power issue? Any help is appreciated. This is difficult to debug since it takes so long for the error to occur.

I originally asked this question on stackoverflow but didn’t get any responses (for obvious reasons) so I have deleted that and have tried to make it a bit more concise here. I hope this is acceptable.

[1] https://helpful.knobs-dials.com/index.php/INFO:_task_blocked_for_more_than_120_seconds.

 New contributor
  • I once tried Rpi python 3.5.3 multi-threading using pool, async process etc, and found it easier than I thought, though I have not used sync primitives like semaphore, critical sections etc, otherwise I think I should find troubleshooting mission impossible. Your error message of “blocking for longer than 120 seconds” after running a couple of days is of course crux of the matter. I know multi-processing is of course non blocking, so I have nothing to share on debugging, and wish you good luck, and have a nice weekend. – tlfong01 17 hours ago   
  • But if you are using semaphore and critical section etc to sync the producer and consumer processes, then the processes might be blocked when the buffer/file is empty or overflow. If your program runs smoothly but crashes after a couple of days, then it is likely that buffer/stack overflows. And usually whenever stack overflow causing a crash, you usually would not get any error message, and what is worse is that the errors are usually intermittent, so you cannot repeat the situation. That is why I earlier said for me the troubleshooting mission is impossible, … 😦 – tlfong01 just now   Edit   Delete

Categories: Uncategorized

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: