Integration Guide: Reliable Redis CDR's

CDR's are appended to the List at the end of each call. 

And ensuring that the cdr's are retrieved reliably in extreme edge cases is unfortunately a user problem (when using redis).

Failure Points:

There are 2 points where CDR notifications can be lost: 

Catastrophic power failure: 

Specifically between snapshots/aof writes, this verges on the impossible, it would be the result of power failures at the hosting environment.

It was never added: 

  • There are known use cases where CDR's are not generated, for instance an agent hangup in Click2Dial
  • Application bug, please report it, cdrs can still be pulled manually and rectified after the fact.

Non-failure Points:

Network issues:

Due to the sensitivity of Telephony systems, network safety is of the highest importance, thus if this is the reason then there are likely much bigger problems. 

Redis uses TCP, and testing using pumba at various packet loss levels still yielded 100% reliability (albeit extreme performance drops).

Packet LossRecordsTime(S)Reliability
0%2000006.5100%
5%192320100%
30%13620100%

Implementing Reliable Pop:

However there is still a way in userland to make this more reliable, but testing becomes a huge issue, as you would have to replicate a power failure on a redis node.

Psuedo code for reliable queues
// by default most integrators use this
var fullRawCdr = BLPOP enigma:notifications:cdr
 
/* 
 * in order to make it more reliable we need to use a 2 step retrieval process
 * BRPOPLPUSH pops it from the tail of one list onto the head of another, 
 * and returns the value that was popped
 */
var fullRawCdr = BRPOPLPUSH enigma:notifications:cdr <integratorName>:processing:cdr
	
// on success of the BRPOPLPUSH command, delete the cdr from the processing queue
LREM <integratorName>:processing:cdr 0 fullRawCdr
 
/*
 * On failure of BRPOPLPUSH (due to i.e. network) we can either:
 *   1) immediately start processing the processing list, BEFORE carrying on with enigma:notifications:cdr
 *      we pop from the tail of processing list onto the head of the processing list
 *      this way we don't modify enigma:notifications:cdr and can handle repeated failures.
 *   2) on a timed schedule consume all the lost notifications 
 *      (it would be best if enigma:notifications:cdr list handler is also halted)
 */
while (LLEN <integratorName>:processing:cdr > 0) {
	// use a non-blocking operation, because if the list is empty we can carry on as per usual
	var fullRawCdr = RPOPLPUSH <integratorName>:processing:cdr <integratorName>:processing:cdr
    if (RPOPLPUSH not failed) { 
		LREM <integratorName>:processing:cdr 0 fullRawCdr
		// handle cdr
    }
}
// then carry on with the blocking BRPOPLPUSH on enigma:notifications:cdr