Mercurial > hg > freeDiameter
diff libfdcore/p_psm.c @ 1010:357c2f892d24
Implement a new counter on pending answers to send back to a peer.
Function fd_peer_get_load_pending updated to retrieve this counter as well.
When a peer has answers pending, the connection is not immediately teared down
upon DPR/DPA exchange, but a GRACE_TIMEOUT delay (default 1 sec) is granted.
author | Sebastien Decugis <sdecugis@freediameter.net> |
---|---|
date | Mon, 25 Mar 2013 16:39:32 +0100 |
parents | 5c564966a754 |
children | 0117a7746b21 |
line wrap: on
line diff
--- a/libfdcore/p_psm.c Mon Mar 25 14:35:55 2013 +0100 +++ b/libfdcore/p_psm.c Mon Mar 25 16:39:32 2013 +0100 @@ -44,9 +44,9 @@ The delivery of Diameter messages must not always be unordered: order is important at begining and end of a connection lifetime. It means we need agility to switch between "ordering enforced" and "ordering not enforced to counter -HotLB" modes of operation. +Head of the Line Blocking" modes of operation. -The connection state machine represented in RFC3588 (and rfc3588bis) is +The connection state machine represented in RFC3588 (and RFC6733) is incomplete, because it lacks the SUSPECT state and the 3 DWR/DWA exchanges (section 5.1) when the peer recovers from this state. Personnally I don't see the rationale for exchanging 3 messages (why 3?) @@ -90,12 +90,15 @@ DPA. Peer A receives the DPA before the application message. The application message is lost. -This situation is actually quite possible because DPR/DPA messages are +This situation is actually happening easily because DPR/DPA messages are very short, while application messages can be quite large. Therefore, they require much more time to deliver. I really cannot see a way to counter this effect by using the ordering of the messages, except by applying a timer (state STATE_CLOSING_GRACE). +This timer can be also useful when we detect that some messages has not +yet received an answer on this link, to give time to the application to +complete the exchange ongoing. However, this problem must be balanced with the fact that the message that is lost will be in many cases sent again as the failover mechanism @@ -196,7 +199,7 @@ return 0; } -static int leave_open_state(struct fd_peer * peer) +static int leave_open_state(struct fd_peer * peer, int skip_failover) { /* Remove from active peers list */ CHECK_POSIX( pthread_rwlock_wrlock(&fd_g_activ_peers_rw) ); @@ -207,7 +210,9 @@ CHECK_FCT( fd_out_stop(peer) ); /* Failover the messages */ - fd_peer_failover_msg(peer); + if (!skip_failover) { + fd_peer_failover_msg(peer); + } return 0; } @@ -287,9 +292,12 @@ CHECK_POSIX( pthread_mutex_unlock(&peer->p_state_mtx) ); if (old == STATE_OPEN) { - CHECK_FCT( leave_open_state(peer) ); + CHECK_FCT( leave_open_state(peer, new_state == STATE_CLOSING_GRACE) ); } - + if (old == STATE_CLOSING_GRACE) { + fd_peer_failover_msg(peer); + } + if (new_state == STATE_OPEN) { CHECK_FCT( enter_open_state(peer) ); } @@ -298,6 +306,9 @@ /* Purge event list */ fd_psm_events_free(peer); + /* Reset the counter of pending anwers to send */ + peer->p_reqin_count = 0; + /* If the peer is not persistant, we destroy it */ if (peer->p_hdr.info.config.pic_flags.persist == PI_PRST_NONE) { CHECK_FCT( fd_event_send(peer->p_events, FDEVP_TERMINATE, 0, NULL) ); @@ -528,6 +539,11 @@ TS_DIFFERENCE( &delay, &reqsent, &rcvon ); fd_msg_log( FD_MSG_LOG_TIMING, msg, "Answer received in %d.%06.6d sec.", delay.tv_sec, delay.tv_nsec / 1000 ); } + } else { + /* Mark the incoming request so that we know we have pending answers for this peer */ + CHECK_POSIX_DO( pthread_mutex_lock(&peer->p_state_mtx), goto psm_end ); + peer->p_reqin_count++; + CHECK_POSIX_DO( pthread_mutex_unlock(&peer->p_state_mtx), goto psm_end ); } if (cur_state == STATE_OPEN_NEW) {