?

Log in

No account? Create an account
Computers

Perl oddness


In trying to get mon (http://www.kernel.org/software/mon/) to run as a daemon on a specific box I think I have found a rather wierd race condition in the perl fork code:

2696 syslog ('info', "attempting to run as daemon");
2697 sleep 1;
2698
2699 if ($pid = fork()) {
2700 # the parent goes away all happy and stuff
2701 sleep 2;
2702 syslog ('info', "forked with PID $pid");
2703 sleep 2;
2704 exit (0);
2705 } elsif (!defined $pid) {
2706 syslog ('err', "could not fork: $!");
2707 die "could not fork: $!\n";
2708 }
2709
2710 #print "Hello cruel world";
2711 syslog ('info', "daemon process spawned");
2712 setsid();
2713 syslog ('info', "setsid");

Without the first of the sleep calls it fails to run any perl commands in the child process at all, even the (currently commented out) hello world.

With a minor amount of debugging (a waitpid() call) I determined that the child process was actually returning before the exit(0) call. My first thoughts were that the race condition was within the above code, hence the "sleep 2" calls appearing, which didn't actually work. The strace below seems to be showing something odd going on with sockets...

*** Without the sleep ***

write(6, "<30>mon[27765]: attempting to ru"..., 45) = 45
clone(Process 27767 attached
child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7f50708) = 27767
[pid 27765] select(8, [6], NULL, [6], {0, 0}
[Error: Irreparable invalid markup ('<unfinished ...>') in entry. Owner must fix manually. Raw contents below.]

<lj-cut text="Does this look like a race condition in perl to anyone?">
In trying to get mon (http://www.kernel.org/software/mon/) to run as a daemon on a specific box I think I have found a rather wierd race condition in the perl fork code:

2696 syslog ('info', "attempting to run as daemon");
2697 sleep 1;
2698
2699 if ($pid = fork()) {
2700 # the parent goes away all happy and stuff
2701 sleep 2;
2702 syslog ('info', "forked with PID $pid");
2703 sleep 2;
2704 exit (0);
2705 } elsif (!defined $pid) {
2706 syslog ('err', "could not fork: $!");
2707 die "could not fork: $!\n";
2708 }
2709
2710 #print "Hello cruel world";
2711 syslog ('info', "daemon process spawned");
2712 setsid();
2713 syslog ('info', "setsid");

Without the first of the sleep calls it fails to run any perl commands in the child process at all, even the (currently commented out) hello world.

With a minor amount of debugging (a waitpid() call) I determined that the child process was actually returning before the exit(0) call. My first thoughts were that the race condition was within the above code, hence the "sleep 2" calls appearing, which didn't actually work. The strace below seems to be showing something odd going on with sockets...

*** Without the sleep ***

write(6, "<30>mon[27765]: attempting to ru"..., 45) = 45
clone(Process 27767 attached
child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7f50708) = 27767
[pid 27765] select(8, [6], NULL, [6], {0, 0} <unfinished ...>
[pid 27767] getsockname(3, <unfinished ...>
[pid 27765] <... select resumed> ) = 0 (Timeout)
[pid 27767] <... getsockname resumed> {sa_family=AF_INET, sin_port=htons(38138), sin_addr=inet_addr("138.38.101.115")}, [16]) = 0
[pid 27765] write(6, "<30>mon[27765]: forked with PID "..., 39 <unfinished ...>
[pid 27767] getpeername(3, <unfinished ...>
[pid 27765] <... write resumed> ) = 39
[pid 27767] <... getpeername resumed> {sa_family=AF_INET, sin_port=htons(636), sin_addr=inet_addr("138.38.100.29")}, [16]) = 0
[pid 27767] write(3, "\25\3\1\0\22\201\300\22cL\353r\326Y.\223\246\255\262\342"..., 23 <unfinished ...>
[pid 27765] close(6 <unfinished ...>
[pid 27767] <... write resumed> ) = -1 EPIPE (Broken pipe)
[pid 27767] --- SIGPIPE (Broken pipe) @ 0 (0) ---
Process 27767 detached
<... close resumed> ) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
close(5) = 0
close(4) = 0
exit_group(0) = ?

*** With the sleep (works and continues on past what I've pasted) ***

write(6, "<30>mon[9852]: attempting to run"..., 44) = 44
clone(Process 9854 attached
child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7f0e708) = 9854
[pid 9852] time(NULL) = 1154438448
[pid 9852] rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
[pid 9852] rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0
[pid 9852] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid 9852] nanosleep({2, 0}, <unfinished ...>
[pid 9854] getsockname(3, {sa_family=AF_INET, sin_port=htons(48156), sin_addr=inet_addr("138.38.101.115")}, [16]) = 0
[pid 9854] getpeername(3, 0xbfb1efc0, [128]) = -1 ENOTCONN (Transport endpoint is not connected)
[pid 9854] select(8, [6], NULL, [6], {0, 0}) = 0 (Timeout)
[pid 9854] write(6, "<30>mon[9854]: daemon process sp"..., 39) = 39
[pid 9854] setsid() = 9854
[pid 9854] select(8, [6], NULL, [6], {0, 0}) = 0 (Timeout)
[pid 9854] write(6, "<30>mon[9854]: setsid\n\0", 23) = 23
[pid 9854] clone(Process 9855 attached child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7f0e708) = 9855
[pid 9854] close(6) = 0
[pid 9854] close(5) = 0
<snip>
[pid 9852] time(NULL) = 1154438450
[pid 9852] select(8, [6], NULL, [6], {0, 0}) = 0 (Timeout)
[pid 9852] write(6, "<30>mon[9852]: forked with PID 9"..., 37) = 37
[pid 9852] time(NULL) = 1154438450


So the strace seems to suggest that if the socket has already gone away, it behaves fine, if not, it falls flat on its arse when slightly later it goes away...

The sleeps work, but that counts as a serious kludge to me. Do I report this to the perl lot, sit on it, or bother digging further?
</lj-cut>

Comments

Send it to p5p, see what they think.
Hmm have you tried (closing and) re-opening the log right after the fork?
fireworks

August 2010

S M T W T F S
1234567
891011121314
15161718192021
22232425262728
293031    
Powered by LiveJournal.com