CRON DEATH – can’t lock /var/run/, otherpid may be

by admin on

On Debian systems, sometimes I find that cron can give some problems. Not regarding the scheduling in itself; more of the process.

Lately I found this error in /var/log/messages:

repeated countless times, usually 1 message per second. This means that there is already a pid file locking the execution of the process, but since another cron is trying to start, probably there’s a problem: maybe previous process crashed so that it couldn’t delete its pid, or previous process is not responding (even if it is still running), or previous process is running correctly but something is trying to start another instance.

Following there’s a solutions I found for each case. The cause is usually too system dependant to be useful to discuss.

List of Solutions

CRON crashed

One possible scenario is that cron crashed (probably because of a script you wrote that made it crash or because of a syntax error in cron files). You can check and resolve this case by executing these commands:

this will tell you if there’s a process running with the above pid (the otherpid of the error). If there isn’t any process running, the above command should return nothing. In this case, you can delete the pid file and check if process can start without errors:

CRON is stuck

This is probably the simplest – and at least in my case – most frequent error. I have a lot of cron jobs doing lots of automatic operations. In a couple of system, sometimes I see the error above; one of my jobs is stuck, blocking cron – something like it can’t go on with operations. Automatically the system tries to restart cron, but it can’t, so I see the error above. Errors like these can be extremely tricky to debug – they happens 3 or 4 times a YEAR, so maybe is not worth the time to find the real cause.

Anyway – the solution is extremely simple in this case: kill the offending process.

Then what is trying to start the process again should immediatly restart it correctly.

To check if it worked run below commands:

If you see again the error in the title, this wasn’t the problem. You can try to restart the process (/etc/init.d/cron restart) but in my experience it didn’t solve the problem.

For the last case, I don’t know at the moment if there is a more common case. It seems to me that a mechanism that tries to restart cron once a second while the official one is running must be a custom one or a linux internal one – possibly with some sort of strange bug. But I don’t know for sure.

More In Depth

Above I said that the second but is not worth solving. Well, I hate leaving processes with known bugs, but I had to give up to similar bugs. If they are very rare, appears at random times, and causes relatively few damage, the cost in time to track down and solve those bugs may be too high to be worth the solution.

To be honest, sometimes in my (limited) experience, I spent at least a month (in total) hunting for a problem like the above, never to find a real solution and experiencing only a minor annoyance. In the end, the server reached its end of life and was migrated. I never really understood what was happening, but in fact the problem disappeared in the new server. So, in fact, the month I spent on this bug was just a big waste of time; and it is something I don’t want to do again.

Written by: admin


David P

I see this problem with Debian9 for AWS Marketplace, when launched on an ec2 with an encrypted disk. Just after launching such a machine, ssh to it, and cron -l
I get the same error. After killing, cron -l is usable for a few seconds but then the problem reappears.


Lascia un commento

Il tuo indirizzo email non sarà pubblicato. I campi obbligatori sono contrassegnati *