VoIP service provider anti-patterns

In the software business realm, the term anti-pattern refers to dysfunctional but commonplace solutions to technical and business problems. Anti-patterns occur widely enough that one can reasonably generalise about them — that’s why they’re “patterns”.

There are many acknowledged technical anti-patterns in the software engineering world, such as database-as-an-IPC, or, my personal favourite (hat tip to The Daily WTF), the Inner-Platform Effect. The latter will be easily recognisable to a programmer who has been asked to write an application with business rules and data objects that can be extensively customised by non-programmers; invariably, the customisability demands made upon such systems approach infinity, ensuring that, over time, working with the system comes to resemble programming (if not necessarily “coding”) in its cognitive and technical dimensions, and therefore to require a skill set that approximates that of a programmer. Not only does this fail to address the original demand of the businesspeople–reduce dependence on programmers–but now there is a poorly performing, half-baked system-within-a-system. Such a system has all of the downsides without any of the benefits. Yet, it happens all the time where people who don’t really understand how software works are in charge. If you work with many organisations, you’ll come to encounter some manifestation of it over and over. That’s what makes it a prime example of an anti-pattern.

It’s hard to meaningfully identify anti-patterns in new industries or fields of commercial endeavour. The VoIP ITSP is a relatively recent development, all things considered. Companies in immature industries whose business models and equilibria still unsettled tend to try lots of different ways to make money, as well as to package and productise what they do in different shapes and sizes. Failed experiments–even repeated failures–in new growth markets aren’t necessarily anti-patterns. A lot of praxis, industry consolidation, and market development has to happen before something can truly be deemed an anti-pattern.

I would go further than most to say that anti-patterns aren’t just ideas that have repeatedly seen failure over a significant stretch of time, but in fact are consistently bad ideas or misconceptions that, for one reason or another, retain a stubborn hold on the imagination of businesspeople and managers even in the face of accumulated common knowledge that they are bad. My preferred term for such ill-fated ventures is “worst practices”. By now, the VoIP ITSP (Internet Telephony Service Provider) industry has been around long enough, and I have worked with so many ITSPs that I can safely venture to typify some of these “worst practices” from my 12 years of consulting experience, growing up with this industry.

Not all of them can be identified and avoided, of course, and I certainly don’t plan to try to survey them all in one meagre article. But some of the more conspicuous ones are worth stepping through in the hope that it will help someone starved of quality advisors avoid bad technical and business decisions. I must also be candid about the limitations of my perspective; my experience is primarily (though not exclusively) with US-based small to medium ITSPs and telcos, and heavily weighted toward open source VoIP solutions, so some of what I say may be unavoidably tendentious from a strictly enterprise or profoundly foreign-market perspective. Caveat emptor.

#1: SBC metaphysics

The Session Border Controller (SBC) industry has come to have an indelible hold on the conceptual vocabulary in which VoIP-related plans are laid. I have spoken this in a previous article on the suitability of Kamailio as an SBC and elaborated upon the problems it poses in my Kamailio World 2019 talk in Berlin (“Kamailio as an SBC: definitive answers”), so I won’t belabour it here very much.

What bears remark here is that there are a lot of ways to engineer the core, the customer access layer, and the intra-industrial carrier interface of VoIP networks, and conventional SBCs from the big brands are only one of several possible avenues, each with their own trade-offs deserving of thoughtful consideration. Yet somehow, there is a pervasive meme out there that SBCs are the essential building block of VoIP service delivery.

SBCs consist of a SIP back-to-back user agent (B2BUA) and (typically) integrated media relay combined with a routing policy engine/business layer, plus some other commonplace features (server-side NAT traversal, DoS protection/security, etc.) packaged into a particular kind of appliance. Talking about VoIP networks strictly in terms of SBCs is like being fixated on Coca Cola in a conversation that is ostensibly about beverages as a genus.

Overlooking the mountains of money so often shoveled into an open pit by buying, licencing and supporting SBCs where they are profoundly unnecessary (and only very occasionally, overlooking them where they are in fact necessary), this is an anti-pattern because of the sheer amount of communication and time spent when people talk past each other, typically where one side lacks the imagination, expertise, or agility to tune out a certain amount of brainwashing by the SBC industry and think in more fundamental SIP architecture concepts. A lot of cognitive bandwidth is sucked up in meetings and on conference calls trying to unpack business requirements that are slathered in the vernacular of SBC features and related marketing gibberish when all one really needs to talk about is SIP endpoints, and perhaps proxies.

By no means am I saying that SBCs are useless or inappropriate. Indeed, they are properly applicable to a variety of scenarios. Nor am I saying that the names the SBC industry has given to things are ipso facto bad; in some cases, they are a fine flag of convenience. However, a shockingly large amount of valuable engineering time is spent driving the conversation with business stakeholders to a point outside of a maddeningly confined SBC “Matrix” where one can explore options comprehensively. It’s a ubiquitous tax on getting real work done.

#2: Rejected open-source platform transplants

The acquisition of ITSPs with custom platforms built on open-source seems to be cyclical. There are periods where larger telecoms buy open-source service providers with voracious appetite, and there are periods where the discourse is all “if it’s not Broadsoft or Sonus, we don’t understand it”.

Everyone understands a customer/revenue grab; buy the subscribers, get a few decent acqui-hirees out of it, transition the subs to Sonus or Metaswitch or whatever, kill the scrappy legacy open-source platform (at which point the acqui-hirees may leave, whatever), end of story. Because small ITSPs tend to cultivate smaller customers and often to stake out local-colour or vertical-specific niches, two icebergs of some specificity lie there (both avoidable by seasoned management): the cost scaling and profitability implications of delivering smaller transactions and booking much lower ARPUs than the larger entity may be set up for, and the possibility of churning away not-so-sticky customers who preferred the old platform or the old crew over your cookie-cutter POTS/Key System replacement. Setting aside the more general and universal problems of any acquisition, such as integration of IT systems and billing, management of support workflow and process, etc., this is fundamentally doable.

But it’s far easier to buy customers on a mainstream big box platform that is already more consumable by the enterprise. As far as I can tell, the motives for buying an open-source ITSP with a custom platform are usually strategic; it’s a technology buy, with eyes on the intellectual property as a vehicle for saving money on big brand licencing costs, offering new products or entering new kinds of specialty markets, or just adding a down-scale switch to be able to deliver smaller transactions for smaller customers with more economies.

Strategic technology buys of open-source custom platforms are a fertile field for serious problems. The problem I see so often–so very often that it pleads for a star on the anti-pattern walk of fame–is that the acquirer is a sales-heavy organisation with the wrong kind of “corporate DNA”. A typical master agent/channel partner/managed services provider is sales-driven and very light on engineering, and can’t properly metabolise the huge core competency commitment that open-source platforms demand. It often comes as something of a shock to them that open-source is not, in fact, free, and virtually always requires an engineering-led corporate culture. The latter is a critical factor to attracting and retaining the kind of engineering talent that is needed to run, maintain, and above all else, extract value from such a platform, and sales shops don’t have that sauce. They’re often blindsided. Even if they take a write-down, they’re stuck with uneconomical legacy commitments to customers they don’t really want and can’t quite unload without churning them straight out the door.

Beyond that, I’ve seen enough FreeSWITCH or Asterisk potpourri slapped together under the heading of some kind of next-generation “cloud platform” acquired for comically large multi-million dollar sums to deduce that as often as not, the typical acquirers of these things do not really understand what they’re getting, and are susceptible to exuberant valuation voodoo. One tends to impute magic to what one does not understand, especially if it comes without overt enterprise-sized licencing and support costs. A lot of engineering effort may have gone into these platforms, but they are seldom “turn-key” as sales shops ordinarily understand that word.

Even for acquirers with some non-trivial commitment to engineering and a strong internal customer support organisation, it’s important to understand that, depending exactly on what you’re buying and its exact contours, you probably can’t take your support staff and just train them up on this new open-source stuff real quick, as one might with an Avaya, 3CX or Broadsoft grab. Open-source isn’t free; the costs are usually paid in operational expertise and integration costs rather than port licences.

I’m not saying that buying open-source platform companies for the platform is inherently a bad idea. But do your technical due-diligence, or hire someone who really gets open-source IP telecommunications to do it for you. More impactfully: take a serious, cold-eyed and blunt look at your human resources, with a special eye to whether your organisation has the in-house engineering apparatus for the care and feeding of your new creature. It’s natural to ask these questions before investing in a big, expensive commercial platform, but for some reason the critical faculties often seem to be offline when buying a free-range, grass-fed, organic open-source voice medley.

It takes a certain kind of company to competently appropriate and fruitfully squeeze value out of an open-source platform, just as an organ can only go into a certain kind of body and blood type. Put it in the wrong body, and the immune system will reject it right out. To the dismay and bewilderment of many sales folks moving seats and trunks through the channel, a lot of these genetic factors relate to people and culture. Your culture may not be wrong and it may not be bad for running a hosted PBX sales machine, but the fact is, you may not have the kind of place where open source-savvy engineering talent lingers, nor the business processes, workflows and institutional memory to embrace open-source.

Even if you’re lucky to get a crack development team as part of the acquisition, plan for them putting in their notice at 23:59 on the day their earn-out contract ends or their options vest or whatever, and figure out what you’re going to do without them. There’s really no large-scale track record of FOSS developers being happily absorbed into some sort of Borg cube, un-learning cherished customs and habits, and embracing things like C# and change control. I’ve heard of too many executives flabbergasted that the acqui-hires leave; “but we pay them so well!” Well, they built a highly scalable open-source platform (right? See the caveat about ensuring you get what you think you’re buying); they’ve got options.

I’m an open-source VoIP consultant. My colleagues and I would love to take your money under the theory that we’re going to rescue you, and it happens often enough that it’s a whole anti-pattern. But we also don’t want to be a begrudgingly necessary cost centre, and we know this isn’t what you had in mind with a strategic investment in a custom platform.

#3: Cutting out the switch resale platform

This one is a more generic variation of the previous theme, and I sense an uptick in recent years. The archetypal actor is, again, the sales-heavy managed service provider or agency without much engineering in the tent. Indulging the all-American zeal for “cutting out the middleman” and the broader mercantile passion for increasing gross margins, the notion possesses them to get off Coredial, someone else’s Broadsoft partition or what have you, and buy their own Class 4/5 switch platform.

The problem is, even the most artisanally packaged enterprise switch solutions require the operator to take fairly deep technical ownership. Operationalising a switch into the business, to say nothing of the migration process or the necessary back-office integrations and process development, demands a technical core competency commitment a sales-focused shop may not understand or be prepared to make. More damningly, they may not know how to so much as try; executives without competent technical advisors don’t how to hire or nurture next-level technicians.

The real self-styled trail-blazers in this group are excellent candidates for the disaster outlined in anti-pattern #2, as they see in an open-source platform buy a seductive opportunity to kill three birds with one stone:

  • Avoid the mega CAPEX and OPEX of big-brand commercial solutions;
  • Stop their switch platform provider relieving them of a sizable chunk of their subscriber revenue;
  • Grab intellectual property/technology capital with valuation multiplier effect on a future acquisition.

If it were so easy, everyone would do it. There’s a lot of reflexive dismissal of the value-add of perceived middlemen among this crowd. The value-add is usually invisible until it’s gone. Selling PBX and trunking isn’t the same as running the PBX and trunking, and there are a lot of sales-focused MSPs out there who would make a lot more money if they just stayed in their lane and didn’t try to run switches. I say that as someone eager to sell you a Class 4 trunking platform of your very own.

(I’d be remiss not to give an honourable mention to the small, but not wholly invisible subset of these companies who get the idea to build their own softswitch and/or SIP stack, though their efforts are mostly abortive. They are typified by a swashbuckling frontiersman type who is only emboldened by others’ dismissal of this “impractical” or “quixotic” venture, believing himself to have struck gold if it’s got the naysayers exercised. Go forth, pioneer, and blaze the path.)

#4: Mindless stampede into The Cloud

It’s not really news that a lot of open source-centric ITSPs have jumped onto the bandwagon of a build-out onto Amazon Web Services (AWS), or one of the other cloud majors. I covered some of the common misconceptions around this in some detail in my Kamailio World 2018 talk – “Kamailio in the ITSP: The Changing Winds”.

The operative fantasy here among most executives is that infrastructure can be someone else’s problem and one can fire those sysadmins, NOC techs, and gophers who are sent to the data centre at 3 AM to swap out blown power supplies. That’s an understandable aspiration, but one which does not in any way require nor specifically point to an AWS, Azure or Google cloud deployment.

Ultimately, this all stems from a deliberately engineered conflation between “cloud” (as the foregoing vendors implement and define it), and “running things on someone else’s computers”. It just so happens that Amazon and friends have captured and packaged this burning desire to run things on someone else’s computers in a way that is, from a marketing point of view, digestible to the business class at large, and have accordingly been granted something of a monopoly on the concept of farming out the infrastructure problem in general.

Running your communications platform on infrastructure cared for by a third party has been possible for a very long time in the form of leased dedicated servers and leased virtual machines. Indeed, a great deal of clustering and automatic service discovery on such a layout is made possible by modern tooling. Many providers offer measured hourly billing and straightforward APIs to automatically provision, turn up and spin down servers “elastically” in response to shifting demand throughout the course of a business day. Running an ITSP without owning or maintaining a single physical server has been possible since at least the mid-2000s.

AWS, for example, offer a particular paradigm for elastic, on-demand computing that, if used as they intend, makes heavy use of (e.g. Amazon’s) proprietary tooling and infrastructure helpers. It also requires extensive familiarity with the AWS Way of Doing Things, from a nuanced understanding of the limitations of various instance sizes, to their software-defined networking and security concepts, to various complementary products such as dynamic storage (EBS, S3, etc.). Moreover, AWS was built to meet the needs of web application and web service delivery; special considerations are required to run real-time, delay-sensitive media-involved communications on that type of system.

Thus, there are two distinct but related misconceptions proffered in the exuberance over “cloud”–in any form, really–which lead to the diagnosis of anti-pattern:

  • “Cloud” infrastructure magically runs itself and requires little or no headcount to support it;
  • No idiosyncratic knowledge is required to competently leverage an esoteric platform such as AWS.

Neither are true. What is true is that the nature of the required skill set changes, often with significant consequences; infrastructure consisting of your own server hardware can be supported by more or less entry-level IT staff with something like an ‘A+ Certification’ and a basic command of Linux, while any cloud venture, whether it’s of the esoteric AWS-style flavour or more generic, is going to involve DevOps-heavy concepts such as configuration management, orchestration, service discovery, etc.

Note especially the “Dev” part of “DevOps”; a lot of cloud architecture management relies on semi-programmatic tooling that draws upon skill sets higher up the technical value chain, and, accordingly, pay scale. It’s probably true that you can reduce operations headcount with cloud, but you most certainly cannot eliminate it, and what headcount you do have will probably be more costly because of the higher skill requirements of managing it.

Anyway, it does not seem that either fact is particularly well-known, if we are to judge by the number of scenarios in which folks uncritically ploughed straight into an AWS or similar deployment without much aforethought. Businesspeople who bought into the marketing around the concept of shedding operations baggage and throwing it all into “the cloud” are often surprised that managing cloud costs major money, and requires additional “elastic” resources and services they did not plan for–together with people who know how to use them, and especially to massage them in ways that meet the needs of IP telecoms.

This doesn’t mean that running IP telecom systems in AWS or similar is impossible or ill-advised. In fact, some of our largest and most successful customers do exactly that. However, it should not be confused with hosted infrastructure; if used as intended, AWS entails a lot of learning, and it is certainly Amazon’s aim to foster dependence on their cloud tooling with a view to vendor lock-in. If you don’t use AWS or its ilk as intended, there aren’t necessarily a lot of benefits to using it, and potentially plenty of downsides.

To properly realise value from AWS or its cousins, you have to really understand how to do cloud architecture right within their paradigm, and take full advantage of the various auto-scaling and self-assembly mechanisms on offer. If you’re not doing that, there’s no rationality in using the major cloud platforms, and depending on your service delivery architecture, there may not be much point in a full-bodied commitment to this kind of cloud approach. Either way, the devil is entirely in the details, and the decision must be carefully weighed against other cloud alternatives or more traditional infrastructure–which, as mentioned above, can still be made quite “elastic”. For many ITSPs, the latter is, in fact, from a holistic business point of view, the most sensible choice.

As a colleague pointed out to me, the biggest victims with the most to lose from not properly grasping the costs and benefits of cloud in detail are established companies who are already heavily invested in their own facilities. When the siren song of “cloud” and the chilly wind of FOMO (Fear of Missing Out) blow through the country club during tee time, their executives end up with huge OPEX for their colocation facilities, and more huge OPEX for their cloud build-out, which now, without a trace of irony, runs things they could be running in their existing facilities, along with over-taxed operations team burdened with maintaining two infrastructures that demand not-especially-overlapping ways of doing things. Alternately, they might hire an additional and massively expensive DevOps team specifically for the cloud operation. The vanishing horizon of aspirational, never-completed migrations to the cloud platform are a frequent theme in such cases.

I’m not against IP telephony in the cloud, and don’t consider it an anti-pattern. The anti-pattern is blind, pollyannaish, didn’t-see-it-coming, who-knew-this-shit-is-complicated, I-thought-there’d-be-savings marriage to the most iconic cloud platforms without a diligent and qualified analysis of the true, fully burdened costs and the human capital shifts required.

#5: B2BUAs and heavyweight network elements as quick-fix band-aids

This is a narrowly technical one, but we have seen it a lot in our Kamailio consulting work.

The typical case is that of an ITSP that has built out a Kamailio-centric routing platform but has hit a knowledge limit in what they can do with Kamailio while facing an immediate ask from the business side. In a rush to fulfill, they fall back to using the tools they know best–FreeSWITCH, Asterisk, etc.

RTPEngine can do transcoding and call recording, and that Kamailio can rewrite ANI/Caller ID in a SIP-compliant mannerspeak to SIP-over-WebSocket (WebRTC) endpointsflexibly rate-limit SIP requestscount and limit concurrent callsmake arbitrary database queriesmanipulate codecs in SDP, and even asynchronously query HTTP APIs and parse their JSON output. Yet, we often see a proliferation of a miscellany of FreeSWITCH or Asterisk servers to do these kinds of things, often for simple lack of awareness that they can be done any other way.

The result is redundancy and SPOF (Single Point of Failure) concerns, Rube-Goldbergian call flows, and a morass of burdensome infrastructure commitments and associated costs, the business risk of pertinent knowledge walking out the door, etc.

This is not to say that there is no legitimate use-case for a B2BUA in the call path of an otherwise proxy-heavy platform. For example, although interoperability is, overall, a declining a problem relative to a decade ago, there certainly remain cases where a B2BUA is the best vehicle to mediate between two subtly different flavours of SIP; a B2BUA can be liberal in what it accepts, and conservative in what it emits. And of course, B2BUAs continue to plug a number of topology hiding helpful to certain business models or security, or both.

Still, one should consider whether, in such a case, to deploy a lightweight, signalling-only B2BUA without an attached media gateway apparatus, as opposed to a full rig suitable for PBX or application server duty. Not merely once or twice, I’ve walked into a deployment with an otherwise powerful and logic-ridden Kamailio load balancer spreading calls across 15 Asterisk or FS servers for the sole purpose of doing something like ANI/Caller ID manipulation. This is irrational and wasteful, but is sufficiently widespread to earn mention as an anti-pattern.

#6: Back-end development done by front-end developers

Real-time communication systems have exacting timing and performance requirements that take real back-end programming experience and expertise to meet.

I don’t mean to make systems programming sound sound like rocket science; it’s not. However, it is a different problem space than front-end application development or dealing with HTTP workloads, and requires a deep understanding of parallelism and concurrency among other topics. The average web developer, which is what the term “developer” has come to presume in the eyes of many businesspeople nowadays, does not have the expertise to build services for high-performance call processing–at least, not without some help. All developers have their specialties.

The rise of NodeJS and isomorphic front/back-end JavaScript has done much to muddy the waters by giving currency to the idea that JavaScript web developers can write back-end services. One back-end service is not the same as another. Yet somehow, the idea has become widespread that “developer” means “JavaScript developer” and that “development” is fungible. The ludicrous and facile meme of “full-stack developer” bandied about in this context is still more misleading; the “full stack” of a web application is not the “full stack” of the rest of networked computing.

This problem pre-dates server-side JavaScript, though. In the mid-late 2000s, I was involved in rescue efforts that seemed to have been made necessary by a conversation like this:

A: “We need to build a PHP front-end for our contact centre product.”
B: “Okay, I’ve hired some PHP developers and they’ve built the front-end.”
A: “Oh. Now we have to build the actual call processing logic, I guess.”
B: “We need to hire developers for that.”
A: “But we’ve already hired developers.”
B: “Yes, they’re PHP developers.”
A: “Right, developers, so let’s have them develop the back-end.”
B: “…”

A few synchronous, blocking and poorly-performing, database-bound PHP-AGI scripts later, and the telephony backend was born.

While it seems doubtful, for ecosystem reasons alone, that anyone would non-ironically hire PHP developers in 2019, things aren’t too different a decade later. It’s just that now, a commonly preferred way to shoot onesself in the foot is with a blunderbuss that says something like, “drive SIP call routing with a single-threaded Node API service. Everything’s asynchronous, right?” The choice of technology is hardly the point here, and I’m not knocking Node–it’s perfectly good for what it is. There are simply a lot of considerations that go into a technical decision like that, not the least of which is how the call SIP element can consume such an API without compromising its throughput, how much work Node can really do in one thread, etc. Replace “Node” with “Java servlets” or “Python Flask web services” or whatever and the same basic idea applies; high-level web technology is not systems programming.

A blithe indifference to the pernicious consequences of combining the folk traditions of the latest web development fashion cycle with the rather ironclad requirements of real-time multimedia communications is sufficiently widespread, made more so by the stereotypical categories into which the concept of “software development” has fallen in the popular imagination, to warrant identification of an anti-pattern.

A decade ago, the response to naive implementations was in itself an anti-pattern: throw more hardware at the problem. This begat a kind of Jevons paradox, a vicious cycle of rewarding bad software engineering with more resources for it to consume. Today, that tends to be addressed with “horizontal scaling”; if you just throw five more m4.xlarge instances at it, inefficiency is no big deal–until you glance at your cloud provider bill.

We–and they–will happily take your money. 🙂


Thanks to Ryan Delgrosso for his valuable feedback and suggestions on drafts of this article.

Kamailio and SIP training: notes from the field

Being one of the leading companies involved in Kamailio and open-source SIP infrastructure implementation for VoIP service providers in North America, we run our Kamailio and SIP fundamentals training curriculum a fair bit. It’s a distinctly secondary line of business for us, but since 2011, we’ve done it somewhere around 15 times by now, mostly here in the USA and occasionally internationally. That’s enough repetitions and customer feedback cycles for us to draw some conclusions and generalise about what we ourselves have learned.

It’s usually a two or three-day affair. The first day consists of SIP fundamentals training, which we consider prerequisite to anything else (if you know a bit about what’s required to configure Kamailio effectively, you’ll agree). There are customers who want that portion only, and depending on how in-depth they want to go, that can last two days. Otherwise, the second day is usually focused on Kamailio, and, when there is a third day, it’s usually filled with hands-on “lab” activities and applied exploration of things the customer is specifically interested in.

We generally tweak the structure to emphasise what the customer most hopes to get out of it, and this varies a lot. Some customers are impatiently laser-focused on applied use of Kamailio to solve very immediate needs. Some customers are focused more on shoring up a general understanding of SIP among their support staff to improve troubleshooting outcomes. Some customers’ primary goal is to get people with a fairly “standard” IT background more conversant with the exotic vocabulary and history of IP telephony and real-time communications. Some customers are highly technical developer types and are looking to reach into more rarefied knowledge of some APIs, or really niche aspects of SIP standards and protocol formalities. We have experience catering to that entire spectrum.

All told, teaching SIP and Kamailio is not so different to teaching most other niche software systems, tools, or frameworks. Most lessons we’ve learned seem to apply to all technical training of a 2-3 day introductory format (as opposed to an intensive and more long-term course). I’ll share a few:

Training is not a conference talk

There’s a point of view out there that slide content should be minimal, to serve only as speaking prompts, akin to a speaker’s private note-cards, or to hold illustrations. Certainly, walls of text aren’t useful, and nobody likes speakers who just read their lengthy slides verbatim. Pedagogically useful slides are by nature somewhat abstemious.

However, we’re yet to have a customer who did not ask to keep our slide deck for their own reference. That alone means that the slides have to have some standalone informational value, and can’t be too minimal.

Some hipster slide deck with five slides of faux-“Zen” rhetorical questions,

or vacuous treacle like:

will be of zero value to anyone. In situations where there is a declared intention to use the slides as reference material, they have to strike a balance between walls of text on the other hand, and an utter paucity on the other. They’re a document of sorts.

More generally:

I have observed that many people, when asked to teach a training course or a seminar of some kind, go to drink from a common well of “public speaking skills” they may have deployed in other contexts, such as presentations to management, conference talks, etc. The skills for conference talks seem to be an especially common departure point, where the focus is on keeping the audience engaged.

While all public speaking is a performance that makes demands upon one’s artistry, and there is no question that the challenge of keeping the audience engaged falls into your purview as a trainer, training is not the same as giving a conference talk.

For one, two days is not thirty minutes. More importantly, one’s purpose in being there is specifically to convey non-trivial information as a specialist, and the audience carries a greater responsibility to absorb it. It’s not a sales pitch. You’re not marketing your specialty. The business objectives of a half-hour conference talk given to a general audience are entirely different. It’s worthwhile to ponder that when wrestling with the temptation to pilfer the “performance art” of one and channel it into the other.

Your audience are mostly there because their boss said they have to be, so you don’t have to get them “amped” about the subject. You’re not a clown, and the primary purpose of your visit is not entertainment.

I’m not saying you should aim to be boring—no, by all means add earnestness, humour, wit and charisma to your presentation if you can, and good trainers do. However, if you feel like you have to make it “dynamite” enough for a bunch of ADHD hamsters who will move on to a different room/booth/track in 20 seconds if you don’t keep them on the edge of their seats, stop yourself. You’re optimising for the wrong problem. This is training; it’s their time and their dime.

Have a clear idea of the objective

Having a clear idea of an objective and mindfully allowing it to guide you is not the same as merely stating an objective or marketing an objective. Lots of folk do the latter without a dime of sincere thought capital invested in the former.

You’ve seen it in the facile syllabi of sundry curricula before:

By the end of the VoIP Bushido Expert Seminar 3XL, the student will have mastered the skills of real-world SIP aikido and H.323 jujitsu. The VoIP Bushido Expert Seminar 3XL will empower the student for maximum success in a fast-paced, ever-changing Ameriglobal VoIP marketplace that demands advanced expertise.

Yeah, okay. If you’re running anything remotely describable as a “seminar”, there is exactly 0.0% chance that anyone will come out of it with a mastery of anything. Either you’re teaching something utterly trivial and obvious, or you are abusing the concept of “mastery” in a way that is deeply fraudulentYour marketing department might say everyone’s doing it and it’s not meant literally, but this is the service you are rendering unto the use of language and the meaning of words:

 

 

It gives me no pleasure to say that it’s especially apropos in this very cultural moment.

 

 

But assuming we’ve been relieved of the notion that anyone stands to “master” anything from a seminar, or for that matter a 2 to 3-day training course, the entirely legitimate question arises: what, exactly, do you mean to accomplish?

Most people in pedagogy will agree, I think, that the primary goal of any “introductory” endeavour should be to leave everyone in the room with a greater level of knowledge and skills than you found them. In the case of the Kamailio side of our training, I like to abuse the cliché “teaching a man to fish”. Going from zero to a “distinguished and commercially viable skill set” with a system like Kamailio takes years. My intent is that everyone leave our training sessions:

  • With a better grasp of the ontologies surrounding Kamailio, and more especially a sense of the Kamailio idioms for various general concepts in SIP proxy behaviour and SIP routing;
    • Transactions;
    • Dialogs;
    • Initial vs. sequential requests and “loose routing”;
    • Hop-by-hop messages (CANCEL, 100 Trying, negative ACK) vs. end-to-end messages;
  • A clear high-level sense of where Kamailio is typically used in building large-scale SIP service provider architectures (e.g. registrar, load balancer, redirect server to add routing intelligence, and the rest);
  • Some familiarity and comfort level with the names of Kamailio concepts and the ideas to which they refer, e.g.
    • Core functions;
    • Modules;
      • Essential modules needed for almost any useful configuration; modules which are “good as core” (e.g. TM);
      • Ancillary modules to provide specific functionality (e.g. JANSSON);
    • Pseudo-variables;
    • Transformations.
  • A clear sense of where to find documentation and how it’s laid out, and some intuition of where to look for certain kinds of things;
  • Some visual and reflexive familiarity with the appearance and anatomy of the Kamailio configuration file
    • Core configuration directives;
    • Module parameters;
    • Subroutines (in essence, SIP event callbacks):
      • Request routes;
      • Reply routes (onreply_route);
      • Failure routes (failure_route);
      • Branch routes;
    • Specialised event routes (callbacks/event handlers exposed by modules).
    • Concepts analogous to general-purpose programming languages and runtimes:
      • String transformations (kind of like string methods in OO languages);
      • Variables
        • Ephemeral/scratch-pad variables ($var(…));
        • Transaction-persistent variables ($avp(…)/$xavp(…));
        • Dialog-persistent variables ($dlg_var(…));

This is not “mastery” of anything, including these very concepts. But the goal of the training is to expose these ideas and vocabulary to the audience so that they recognise them and can use them in the future to develop their knowledge toward their goals.

The “leave them better off than you found them” bit will have different results for different people and groups in our SIP and Kamailio training. People with some development background may go from having a loose-fitting acquaintance with these things already to a more buttoned-down one, allowing them to be more focused and efficient in building further knowledge and experimenting, or at least asking more focused questions of us or on mailing lists, leading to better and more useful answers. For others, it will simply mean putting these words on a mental map where they did not exist before, so that references to them in the future “ring a bell”, an improvement over total bewilderment. There is a notable difference in the nature of the leaps we can expect from developer sorts versus operations types.

That’s a realistic assessment of what we can hope to achieve, though hardly a guarantee. That’s what I tell management when they want to learn more. Some nod approvingly and appreciate my candour, while others, accustomed to viewing programming as a fancy form of typing, bristle at the notion that we can’t get the staff “trained up” so that they can just, you know, code up the product real quick. Regardless, honesty and specificity are the best policies.

Group dynamics

Technical proficiency among our audiences follows a fairly typical Pareto distribution. Due to our prevailing flat-fee structure for most engagements, management will send send five to ten people. One or two of them will have had deeper Kamailio and SIP experience internally and extract a lot of specific information from the training, while the rest are there to soak up “exposure” so that the activities of the other one or two will not be a completely opaque mystery to them.

Just about every group will have That One Guy. (I don’t mean that disparagingly; he’s just That One Guy for lack of a better name coming to mind. And doesn’t have to be male.)

He’ll already have come into contact with 40-70% of your material in some fashion, and is often keen to demonstrate that with pep and vigour. He’ll ask a lot of questions and generate a lot of tangents. The psychological motive is rarely to ingratiate himself to the trainer, who, after all, will pack up and leave soon, but the motives will vary, from genuine intellectual curiosity and affability at one end, to a more ulterior plot to position himself as the “go-to guy” for this subject matter in front of his colleagues. The latter is more common in large organisations, where ownership of projects, and the budgets and clout that come with them, is a contentious topic in the sizzling (or slowly marinating) “office politics” inevitable in any group of nontrivial size.

As in any other consulting project, so it goes in training: every active, invested participant has explicit and covert objectives—well, “covert” implies something nefarious in a way I don’t intend, so perhaps “tacit objectives” is better. Either way, a few walks around the consulting block will lead to the insight that identifying all of these—as best as one can—with sensitivity and perceptiveness is a very important “soft skill” and a key part of the value proposition to management stakeholders. Naturally, it’s a tacit one. This holds true in training as well.

Anyway, trainers and teachers seem divided on whether That One Guy is friend or foe from the point of view of maintaining structure. There are some trainers who believe that such people “hijack” the agenda and do a disservice to the rest of the group as well as the trainer’s efforts to bring things to a common denominator that everyone can access. And it’s true enough that I’ve had some training sessions with small groups, in earlier iterations of doing this, that seemed to turn into a conversation between me and That One Guy. It’s important to remember that the goal is to leave everyone better off than you found them.

But not everybody in a group of more than about 3-4 is going to get something out of the training, and one must accept that. Some attendees may have the potential, but instead zone into their laptops, fighting fires and paying half-attention. It’s their time and their dime, and you don’t need to slow the bus down for their benefit. If they tumble out, they tumble out.

I personally think that That One Guy is an asset. All interactions, even the overwhelmingly lopsided dynamic of lecturing to a group, are two-way. It’s still a conversation with the audience, whose temperature and tempo one must gauge. As long as That One Guy’s role is properly managed, he provides much-needed anchoring and telemetry for how to proceed, helps to generate good energy and convection around the topic, and, often, provides a window into the tacit objectives in the group.

Frequent breaks and atmosphere

There are some managers who would say taking a 10-15 minute break every 1 to 1.5 hours is too frequent. And to them I say: people simply cannot be bombarded with detailed information for two or three hours, even if it’s quite riveting. They will zone out. I would say it’s best practice to insist on it over any objections of management.

A related idea:

Darkened rooms are great for reading slides, but even better for inducing sleep. Windowless fluorescently-lit rooms and depressing dungeons seem to have a similar effect. Bright, sunny conference rooms with picturesque views of trees and park benches serve to grimly remind everyone that they could be having fun outside, but alas, are hearing about Via branch parameter GUIDs from some propeller-head instead. The effects of this on the mood of people in an hour-long meeting are different than on the mood of people stuck with you for two full days of training.

Adjust the implied sympathy of your approach accordingly, but ideally, find a venue that represents a happy medium.

Love letter to Vue

In the frustratingly fast-paced, ever-shifty and profoundly fashion-driven JavaScript web development ecosystem, it’s not easy to find something that one can even stand to use, to say nothing of love. And if you do find something, its obsolescence will be triumphantly announced on Hacker News in about three weeks.

MVC/MVVM frameworks in particular are a source of frustration. There’s AngularJS (often known as Angular 1.x), which, despite being fundamentally meritorious (indeed, I got started with modern JS web frameworks on it), is clearly subject to a strong effort at obsolescence.

AngularJS is also notorious for being highly opinionated about how your entire application should be structured, forcing many competent developers into stifling vocabularies of design patterns — things like “factories”, “services”, “providers” — that are neither wanted nor needed. I understand that this is sometimes viewed as a selling point because it imposes discipline and more homogenous, shared vocabularies on front-end teams with an entry-level skill set, but it is incredibly stifling and bureaucratic to people who know what they’re doing.

Angular 2.x (itself now obsolete!) went completely off the rails with the boilerplate, complexity, build tooling, and Byzantine structure required just to get started. I understand what the Angular people are trying to do, catering to the sensibilities of large enterprise projects. However, in the course of doing it, I fear they’ve lost their minds. “Make it more Enterprise™” is a common trap in the “evolution” of libraries and tools. Angular 2/4 is a completely over-engineered trainwreck.

It was with this realisation that I went looking for something new on which to standardise our ambitious internal portal project, which throws off reusable components that are cycled into the new CSRP UI. I considered React, Riot.js, and one or two others.

In that research, in late 2016 I stumbled upon Vue. It was an incredible breath of fresh air. I don’t mean to be melodramatic, but it’s not an exaggeration to say that it lifted me from an overwhelming depression about the future of front-end development in this company and got me coding enthusiastically again. They say you can’t look for love; it just happens. It doesn’t happen very often in the mess that is the JavaScript web ecosystem. In this case, it did.

I’m a back-end developer, systems person and telecoms nerd by trade; if you’ve got me loving UI development, you’ve achieved the certifiably impossible.

Reasons why Vue is amazing, in my eyes:

Not too much, not too little!

For an experienced software engineer, it is refreshing to be left in charge of one’s own architectural decisions about the application as a whole without being forced into contrived vernaculars and unwanted ontologies. Vue liberates one to architect the application as one chooses, providing only the UI bits and not a whole “pattern”.

At the same time — and this is critical — there’s plenty “in the box” with Vue. One doesn’t have to cobble together all the necessary pieces of a typical web application from a dizzying array of third-party components, as with React. It’s got an official, batteries-included router (vue-router) and a central state store (Vuex). The declarative templates are feature-complete and rich, requiring no additional plumbing, at least for typical CRUD business application use-cases. Vue really gets this intricate balance right in ways that have innate appeal to an experienced developer. It has everything you need to build such applications, including optional features that foresee considerable complexities and nuances of large-scale projects, but not a smidgeon more than truly necessary to do its job.

Vue achieves elusive simplicity in a realm where simplicity is seldom found, except in such forms as to require the passengers to build the plane themselves, right there on the tarmac.

Component-centric design

As in React, everything in Vue is a component. That’s it. You simply build self-contained components and compose them. You’re not bombarded with half a dozen different kinds of abstractions and related esoterica. You don’t have to master exotic vocabulary like “transclusion” and the fine points of scopes. Vue has alternatives for all of this functionality, of course, but they are much more succinct.

There are of course a few other constructs, such as filters and custom directives, to which you may need to resort. However, fundamentally, components are the only important first-class citizens of Vue.

Added business bonus: while the declarative template syntax, in essence identical to AngularJS’s, allows one to have meaningful Vue conversations with developers with an AngularJS background, the component-orientated focus of Vue allows one to have equally meaningful conversations with React developers.

If you hate declarative template logic and have an insatiable twitch for JSX and custom render functions, Vue has got you covered. And more fundamentally, Vue is also based on the idea of passively reactive data plumbing, so you don’t have to litter your code with imperative watchers.

So, although I don’t know React nearly as well, I believe Vue accommodates the habits of mind of both Angular and React developers.

Amazing documentation

Vue documentation is the gold standard of documentation, in my opinion. I’ve never read such clear, complete and easy-to-understand documentation for anything in my life.

Characteristically, it strikes a great balance between giving adequate conceptual background on the one hand for those who want to learn more, and instant gratification and quick examples to those with an applied, hands-on motive.

I don’t know how they got the documentation so right, but they did. Their claim that one can get started developing non-trivial things in Vue in about a day isn’t frivolous; I got started developing non-trivial things in Vue after about a day (although I was coming off an AngularJS background). Obviously, it takes time to learn to use anything fluently and come into better ways of doing things, but the learning curve is really short. I attribute that largely to the excellent documentation.

ES6/2015-compliant

You can use ES6 constructs freely in Vue. The only major restriction I’ve run into is that Vue can’t observe changes inside ES6 keyed Collections. There are also a few places where traditional functions, rather than arrow functions, are required to preserve the appropriate scope of this, as for example in watcher callbacks.

However, arrow functions, native Promisesdestructuringasync/await (ES2017), and other modern goodies are good to go, and our Vue projects use them everywhere without a care. The Vuex store docs actually recommend the use of stage-3 spread syntax. That’s pretty modern!

Scales up and down, to large and small tasks alike

You can build a complex application architecture in Vue, making full use of Flux/Redux-type state-keeping patterns using the Vuex store. Or you can just attach a single Vue component into a single DOM element for a niche purpose, much as you’d do with jQuery. Although you certainly can build your Vue project with a Webpack-driven monstrosity, you can also inject it for that kind of niche purpose via a single <script> tag.

If you’re stuck with maintenance on a legacy web application or possibly even a pre-SPA, and want to inject some modern new widgetry, this sort of thing can be a godsend.

Vue can be as minimalistic or as expansive as you like in that sense, and that flexibility is one of my favourite things about it. You don’t have to load half a billion dependencies, rewrite half the code base, or structure complicated scaffoldings to use it for small tasks.

Plays well with others

A familiar problem to those with an AngularJS background is trying to get other DOM-impacting JS libraries to work within it and not make manipulations invisible to its dirty checking / digest cycle. I’ve never really had this problem with Vue (though I don’t doubt there are some edge cases). I’ve injected a little jQuery here and there for some effects and that, and, thanks to the clean reactivity model behind Vue, the changes pick up just fine.

More generally, Vue doesn’t force you to use a portfolio of native componentry for things unrelated to its core mission. It’s common to use a module like axios for XHR/AJAX/REST operations, and Vue plays ball. That’s because it operates on plain old JavaScript properties and doesn’t introduce a large out-of-band wrapper superstructure to effect its data binding and reactivity.

Clear and distinct project vision

Thanks to the author’s prudent leadership, Vue does not appear to suffer from existential confusion about what it is or wants to be. It has a clearly articulated philosophy of what it does and doesn’t aim to provide, and sticks to it. I hope that never changes.

Stability

The Vue that I am using at the time of this writing, in summer 2018, is the same Vue I picked up in autumn of 2016. With the way things work in the JavaScript web ecosystem, that alone says a lot about the project discipline and leadership.

From the perspective of a company whose core business is not web UI development, this is really, really important. Few things are safe investments in the JavaScript web ecosystem, often here today, completely rebuilt and backward-incompatible 2.0 tomorrow — no, I literally mean tomorrow. That’s how the Valley web hipsters of Hacker News do things. Not just the hipsters, actually: how long did it take for Angular 2 to be “obsoleted” by Angular 4 again?

As always, Vue manages to strike an elegant balance between staying current and keeping pace on the one hand, and providing a solid and credible technology platform on the other. Combined with Vue’s other aforementioned virtues, and more especially its strongly centred philosophical identity, this stability and seeming longevity puts JavaScript web development within reach of non-web-economy stakeholders like us.

We love Vue so much that we support Evan on Patreon, earning me a small mention in the Github repository’s backers list. We hope to increase our contribution in the near future to reflect the incredible utility and satisfaction, both business-level and psychological, that we’ve got out of Vue.

Kamailio as an SBC: five years on

In early 2013, more than five years ago, I wrote an article: “Kamailio as an SBC (Session Border Controller)”. It was in response to the often-asked question in the Kamailio and open source-focused VoIP consulting arena about whether Kamailio is an SBC, or can be made to serve as an SBC.

Far from having been put to bed, the question rages on; we get it now more than ever, and certainly even more than at the time the article was written. In that original article, the essential thesis was: “Kind of, depending on what exactly you mean by ‘SBC'”.

That answer continues to be broadly accurate, in our view. However, five years of additional industry experience, observation of broad trends in our corner of the SIP platform-building universe, and Kamailio project evolution have certainly shifted the contours somewhat.

Let’s revisit the topic with an updated 2018 view.

Is it the right question?

In the original article, I made the point that there are two different understandings of “SBC” floating around out there. One is highly nuanced and product-specific, generally held by large telco types who are highly specialised on mainstream commercial SBC platforms. The other view, which enjoys much wider currency, is that of a carrier and/or customer endpoint-facing SIP element that performs traffic aggregation and routing in a rather generic sense. I argued that Kamailio is suitable to the latter, but falls rather short of what qualified specialists mean relative to the former.

Having had half a decade to ponder this, I’ve come to increasingly see it as an ontological problem. The marketing departments of major SBC vendors, starting from Acme Packet, have successfully convinced IP telecoms practitioners, in the enterprise market at least, that this thing called an “SBC” is the basic building block of a “carrier-grade” SIP service delivery platform. It’s a Swiss Army Knife routing box, a reassuring “voice firewall” for helpless Class 5 platforms exposed to the brutal storms and harsh, cold winds of the public Internet, a solution to the problem of juggling multiple signalling IPs, an adaptation layer for non-interoperable behaviour, a place where the vicissitudes of NAT can be sheathed off, and everything in between.

But more importantly, it’s just how it’s done. SBC is the word we use to describe the sort of thing that this eclectic grab-bag o’ SIP gateway is. Rare is a marketing triumph so total that it reshapes our mental categories and how we think about things at an almost metaphysical level, regardless of the objectively available options in underlying technology. It’s on par with the crystallisation of Kleenex® as a moniker for “paper tissue” here in America; only one kind of “paper tissue” is now allowed to exist. It pretty much has to come in a rectangular box with a plastic aperture on the top, regardless of brand. It’s risen to the level of conventional wisdom. All (non-bath) paper tissue putatively comes in such boxes. All tissue is Kleenex®.

In this scenario, I view that as a serious problem. Considering the wealth of concepts that exist in the market space of SIP platform-building, it’s rather grim that our answers about Kamailio in the carrier space so often have to be framed in terms of the SBC question.

A great many generic industrial uses of SBCs amount to little more than “SIP gateway + RTP relay + server-side NAT traversal intelligence in a box”. That’s a more accessible and open-minded framing of the desiderata. Talking to telco-heads about this stuff easily leads to the impression that this vocabulary is Ancient Greek, lost in long-forgot ancient manuscripts. It’s just all subsumed under “SBC”. I’m not trying to lead an anti-corporate revolt here, but from an engineering perspective, it’s really time to reclaim and re-democratise some of this language, lest it give way to trafficking exclusively in trademarked proper nouns of product names.

Some may say that’s rather tendentious, considering we’re an open-source VoIP platform consultancy and product vendor. Maybe so. Far it be from me to say that SBCs don’t have their place; they certainly do. But I think everyone would be well-served if people requesting a “Kamailio SBC replacement” took a step back and asked:

1. What do I actually need?

2. What is the SIP vernacular for that, from the point of view of someone conversant with SIP standards and market realities alike?

3. Does #2 actually compute to an “SBC” from Oracle, Genband, Sansay, Metaswitch, etc?

4. Do I actually need it to behave that way? Why?

5. Is it reasonable or desirable for Kamailio to behave that way?

6. Is there a compelling alternative that has different formal technical properties but substantially the same effect to the bottom line?

We’ve built our entire business on competent exploration of those kinds of issues.

By way of final addendum to the philosophical portion of this article, I will note that the colonisation of The Cloud™ by the service provider world has been productive inasmuch as it has increasingly exposed some of this notion, that Real VoIP Networks are built out of Big SBC Appliances, to be somewhat intellectually fraudulent. (Shameless plug: I gave a whole talk on ITSPs moving to the cloud at Kamailio World 2018 in Berlin recently.) Major SBC vendors, who lobbied so hard to establish this reflexive social convention of the Big SBC Appliance as the universal Lego block of service provider networks, are — without a trace of irony — putting out virtual machine images of their platforms in an effort to remain cloud-relevant.

If you can virtualise a Sonus SBC—it’s just SIP software!—who knows what else you can virtualise instead?

RTP relay

So much, then, for fighting the power.

My original article feebly pointed to rtpproxy as an RTP relay solution and implied that it cannot compete with the horsepower of ASIC-assisted RTP forwarding in proprietary boxes.

A few years ago now, SIPwise released RTPEngine, which most certainly can. RTPEngine uses kernel-mode packet forwarding, making use of the Linux kernel netfilter APIs, to achieve RTP relay at close to wire speed and in a manner which bypasses userspace I/O contention. It’s got a raft of other features, from SRTP/crypto support to WebRTC-friendly ICE, not to mention recent innovations (admittedly in user-space) in call recording and transcoding.

RTPEngine has been shown to be able to handle over 10,000 concurrent bidirectional RTP sessions on commodity hardware, and RTPEngine instances can be trivially scaled out of Kamailio without reloading or restarts, so if that call volume isn’t enough for you, just stack more gateways. It even supports replicating RTP session state to a redundant mate in real time using Redis.

Considering it’s under open-source licence, it’s hard to argue with the unit economics or port density of that, more especially on modern multi-core commodity hardware. It’s hard to see the necessity of dense ASIC-assisted RTP forwarding for cost scaling in 2018.

Topology hiding

If topology hiding for commercial reasons is your overriding concern, so that your customers can’t discover your vendors and vice versa, you’ve probably got it in your head that you need a B2BUA (back-to-back user agent).

It’s true; Kamailio is a SIP proxy, and that’s not going to change. Logical call leg A goes in, logical call leg A comes out, largely unadulterated.

Nevertheless, a great deal of work has gone into the topoh and topos modules, which take two different approaches to hiding the network addresses on the other side of Kamailio. Both approaches make use of Kamailio’s ability to remain in the signalling path, both in the context of a SIP transaction and the more persistent SIP dialog. Both comply with the fundamental dictum that a SIP proxy shall not alter the essential state-defining characteristics of a SIP message as constructed by the respective UAs (User Agents) in a manner that shall be known to those UAs.

By the very nature of the complex state management and sleight of the hand that these modules do, there are likely always going to be edge-cases where they don’t work as expected.

For those cases, I continue to recommend a high-performance signalling-only B2BUA in series to the call path. Although the community edition of SEMS (SIP Express Media Server) suffers from some neglect, I still wholeheartedly recommend its SBC module on the basis of sheer performance.

Registration

In the original article, I made the claim that many registrars don’t properly support the SIP Path extension. Experience suggests the number of these has dwindled, and Path is a very reasonable way to handle a scenario such as relaying registrations to specific hosted PBX instances inside a private network.

As has been the case for most of the project’s history, Kamailio continues to perform admirably as an actual registrar. If your architecture allows for the concentration of large amounts of registrations in a centralised registrar, this is the best bet.

Truly originating registrations is the province of the UAC module, and some additional management handles have been added to make the process more controllable in real time. Nevertheless, Kamailio cannot reasonably re-originate registrations on a one-to-one basis.

In large-scale platforms, there is significant demand out there for a
registration “middlebox” which can absorb high-frequency re-registrations and parlay them into lower-frequency re-registrations upstream. This requirement arises in large measure due to the irony that many expensive enterprise platforms cannot cope with high message volume nearly as well as open-source, and fall over easily in the face of the registration onslaughts from modern NAT’d customer environments.

OpenSIPS have taken the lead here with the mid-registrar module, which caters to this very need. It is possible to implement something like this manually with Kamailio, but it would take a great deal of state-keeping on the part of the route script programmer. A module to accommodate this niche may be forthcoming in the future.

Edit: As is so often the case, it turns out that perceived limitations are more a failure of the author’s knowledge than technology. While it is true that Kamailio does not have a module specifically named and geared toward the “registrar middlebox” role, Daniel-Constantin Mierla, the chief developer of Kamailio, helpfully pointed out to me in this post on the VoiceOps mailing list that Kamailio’s UAC module has existing functionality that can be used toward the same end. Additionally, one of the virtues of open-source is that enhanced functionality can be added in a reasonable time.

Replication and sharing state

One of the most exciting developments in Kamailio in recent years has been the introduction of DMQ, a SIP-transported distributed replication system with sensible node discovery and failover features.

Previously to DMQ, most Kamailio redundancy strategies involved reliance on shared database backing. This is an inefficient bottleneck and a significant I/O burden. DMQ presents us with the possibility of using in-memory storage backing for things like the registrar and cutting the database bottleneck out. Dialogs can also be replicated with DMQ, as can generic hash tables (frequently used as a distributed data store), and a number of other things. DMQ’s dynamic character is also very complementary to cloud architecture. Watch this space.

SIP over TCP and TLS transports has seen significantly increased uptake in recent years, and session state replication for those remains a sore topic for sophisticated connoisseurs. TCP sessions cannot be transferred from one Kamailio host to another in the event of a failure, as that would take significant operating system network stack involvement. It’s potentially important because we live in a NAT’d world where new TCP connections cannot be opened to customers on demand, instead requiring reuse of existing ones (conforming to the general thesis of SIP Outbound). That level of network stack coupling is achievable by the big SBC brands and remains a selling point, and although worst-case exposure is mitigated by low re-registration intervals, that puts pressure on the throughput side of the registrar and is subject to the minimum re-registration interval of many devices, which remains 60 seconds. Getting databases out of the registration persistence business can help with that nevertheless.

IMS support

Kamailio has sophisticated IMS support, led by NG Voice GmbH and more especially Carsten Bock. Carsten has given many insightful presentations on using Kamailio with IMS and VoLTE over the years.

If you are interested in this topic, you should also take a look at OpenIMSCore.

There has been a great deal of interest in Kamailio from mobile operators and MVNOs.

SIP Outbound

For some years now, Kamailio has supported SIP Outbound (RFC 5626). Use of it in the wild remains very limited, but when you have Outbound-ready clients, you’ll have an Outbound-ready server, a long time in the making.

SIP over WebSocket / WebRTC support

Kamailio is an excellent candidate for a SIP WebRTC gateway, with its extensive WebSocket support and RTPEngine for ICE and DTLS-SRTP.

Bottom line

Kamailio has its limits, and there are absolutely cases where a mainstream commercial SBC would be an appropriate choice. For instance, due to its architecture, Kamailio cannot accommodate listening on a large number of network interfaces, so if you are in the business of forwarding signalling across a large amount of customer VLANs/VRFs, you may need an SBC.

More pointedly, Kamailio is a SIP proxy at heart. No amount of clever topology hiding modules can change the very real sense in which the UAs on both sides of Kamailio are interoperating with each other rather than with Kamailio per se. We are a far cry from the mid-2000s, and most mainstream SIP endpoints are shockingly interoperable these days. Nevertheless, if SIP interoperability is your particular woe for any number of niche reasons, remember the rule of “garbage in, garbage out”. At the very least, you would need to pair Kamailio with a B2BUA that can be generous in what it accepts and conservative in what it emits.

Those of you who work with WebRTC know that it presents a rich and fertile field for interoperability woes as well. We tend to take the view that this is part of a larger conversation about whether WebRTC is ready for prime time.

The other key point about Kamailio is that it’s configured in an imperative manner, driven by a programmatic route script. In this respect, it’s not like a finished product with a static feature set that simply need to be enabled or disabled via declarative configuration files. I often compare Kamailio to an SDK with a SIP proxy core, and I think that comparison is meritorious. You will need some software engineering expertise to use Kamailio in a non-trivial way. You must meticulously script your SIP outcomes at a rather low level.

Nevertheless, we come full-circle to the idea that Kamailio presents a compelling, full-featured and viable SBC alternative, if you insist on that terminology. You’ll have to approach the question in a technically SIP-savvy and open-minded way. Moreover, while Kamailio has always catered well to the needs of large-scale service providers, it has evolved even more technical capabilities in recent years to facilitate that.

Service provider work is the only kind of work we do, and the reason we’re in business is because quite a lot of them are quietly subverting conventional SBC wisdom and thinking outside the traditional SBC. Don’t take my word for it; their numbers speak for themselves.

Server-side NAT traversal with Kamailio: the definitive guide

If you are a retail-type SIP service provider — that is, you sell SIP service to SMB end-users rather than wholesale customers or enterprises — and your product does not include bundled private line or VPN connectivity to your customers, the vast majority of your customer endpoints will be NAT’d.

If you’re using Kamailio as a customer-facing “SBC lite” to front-end your service delivery platform, this article is for you.

There’s a lot of confusion around best practices for NAT handling with Kamailio, given (1) that there are multiple approaches to handling NAT in the industry and also (2) that Kamailio idioms and conventions for this have evolved over time. I hope this article helps to address these issues comprehensively and puts lingering questions to rest.

Before we delve into that, let’s lay down some important background…

Why is NAT such a problem with SIP?

There are a few reasons:

First — for VoIP telephony purposes, at least — SIP primarily provides a channel in which to have a conversation about the establishment of RTP flows on dynamically allocated ports. This puts it in league with other protocols such as FTP, which also do not multiplex data and “metadata” over the same connection, and instead create ephemeral connections on unpredictable dynamic ports. This is different to eminently “NATtable” protocols like HTTP, where all data is simply sent back down the same client-initiated connection.

Second, VoIP by nature requires persistent state and reachability. Clients not only have to make outbound calls, but also to receive inbound calls, possibly after the “connection” (construed broadly) has been inactive for quite some time. This differs from a more or less ephemeral interaction like that of HTTP (though this claim ignores complications of the modern web such as long polling and WebSockets).

Third, most SIP in the world still runs over UDP, which, in its character as a “connection-less” transport that provides “fire and forget” datagram delivery, is less “NATtable” than TCP. Although UDP is connection-less, NAT routers must identify and associate UDP messages into stateful “flows” as part and parcel of the “connection tracking” that makes NAT work. However, on average, their memory for idle UDP flows is shorter and less reliable than for TCP connections — in some cases, egregiously worse, no more than a minute or so. That calls for more vigourous keepalive methods. Combined with the grim reality of increasing message size and the resulting UDP fragmentation, it’s also an excellent argument for using TCP at the customer access edge of your SIP network, but to be sure, that’s a decision that comes with its own trade-offs, and in any case TCP is not a panacea for all SIP NAT problems.

Finally: despite the nominally “logical” character of SIP URIs, SIP endpoints have come to put network and transport-layer reachability information (read: IP addresses and ports) directly into SIP messaging. No clean and universal logical-to-NLRI translation layer exists, such as DNS or ARP. A SIP endpoint literally tells the other end what IP address and port to reach it on, and default endpoint behaviour on the other side is to follow that literally. That’s a problem if that SIP endpoint’s awareness is limited to its own network interfaces (more on that in the next section).

SIP wasn’t designed for NAT. Search RFC 3261 for the word “NAT”; you’ll find nothing, because it presumes end-to-end reachability that today’s IPv4 Internet does not provide.

Client vs. Server-side NAT traversal and ALGs

Broadly speaking, there are two philosophies on NAT traversal: client-side NAT traversal and server-side NAT traversal.

Client-side NAT traversal takes the view that clients are responsible for identifying their WAN NLRI themselves and making correct outward representations about it. This is the view taken by the WebRTC and ICE scene. This is also the central idea of STUN and some firewalls’ SIP ALGs (Application Layer Gateways).

Server-side NAT traversal takes the opposite view; the client needs to know nothing, and it’s up to the SIP server to discover the client’s WAN addressing characteristics and how to reach it. In broad terms, this means the server must tendentiously disbelieve the addresses and ports that appear in the NAT’d endpoint’s SIP packets, encapsulated SDP body, etc., and must instead look to the source address and port of the packets as they actually arrive.

Server-side NAT traversal is the vantage point of major SBC vendors, and is also the most universal solution because it does not require any special accommodation by the client. Server-side is what this article is all about.

One last note on the dichotomy: client-side and server-side approaches don’t play well together much of the time. Most server-side implementations detect NAT’d clients by identifying disparities between the addresses/ports represented in SIP packets and the actual source IP and port, and take appropriate countermeasures. While it is theoretically unproblematic to give an “effectively public” (that is to say, non-NAT’d) endpoint NAT treatment anyway, this is only true if every part of the client message containing addressing is appropriately mangled at every step.

ALGs (Application Layer Gateways), a type of client-side traversal solution embedded in the NAT router itself, are especially notorious for foiling this by substituting in correct public IP/port information. However, in my experience, and that of our service provider customers, they only correct some parts of the SIP message and not others (e.g. they will fix the Via but not the Contact address, and perhaps not touch the SDP at all, and even if they do, they don’t open the right RTP ports). This way lies madness, and that’s why we hate ALGs so much, but the same caveats can sometimes apply to STUN-based approaches.

“Great, the ALG fixed all the problems!” said noone, ever. Not that I know of, anyway. Some NAT gateways allow one to disable the SIP ALG, and if you are using a server-side NAT traversal approach, you should do this. However, other consumer-grade and SMB NAT gateways do not allow you to do this, and dealing with them can be nigh impossible. The best solution is to replace the NAT gateway with a better one. If that’s not possible, sometimes they can be bypassed by using a non-standard SIP port (not 5060) on either the client or the server side, or both. However, some of them actually fingerprint the message as SIP based on its content, regardless of source or destination port. They’re pretty much intractable.

In short, if you’re going to do server-side NAT traversal, make every effort to turn off any client-side NAT traversal measures, including STUN and ALGs. The “stupidity” of the client about its wider area networking is not a bug in this scenario, but a feature.

NAT and RTP

A server-side NAT traversal strategy typically requires solutions for RTP, not just SIP.

Even if you get SIP back to the right place across a NAT’d connection, that doesn’t solve two-way media. The NAT’d endpoint will send media from the port declared in its SDP stanza (assuming symmetric RTP, which is pretty much universal), but this will be remapped to a different source port by the NAT gateway.

This requires a more intelligent form of media handling, commonly referred to as “RTP latching” and by various other terms. This is where the RTP counterparty listens for at least one RTP frame arriving at the destination port it advertised, and harvests the source IP and port from that packet and uses that for the return RTP path.

If you have a publicly reachable RTP endpoint on the other side of Kamailio which can behave that way, such as Asterisk (with the nat=yes option, or whatever it is now), you don’t need an intermediate RTP relay. However, not all endpoints will do that. For example, if you are in the “minutes” business and have wholesale carriers behind Kamailio, their gateways will most likely not be configured for this behaviour, more as a matter of policy than technology.

There are other scenarios where intermediate RTP relay may not be necessary. For example, if you are providing SIP trunking to NAT’d PBXs, rather than hosted PBX to phones (Class 4 rather than Class 5 service, in the parlance of the North American Bell system), you may be able to get away with DNAT-forwarding a range of RTP ports on the NAT gateway into a single LAN endpoint. This works because the LAN destination is single and static. A number of our customers use this strategy to great effect. Another reason you may need an intermediate RTP relay is simply to bridge topology; if your ultimate media destinations as on a private network, as for example in my network diagram below, you’ll need to forward RTP between them.

These are important issues to consider because if your entire customer base is NAT’d, being in the RTP path will greatly change the hardware and bandwidth economics of your business. Nevertheless, assuming you’ve determined that you do need to handle RTP for your customers, convention has settled around Sipwise’s RTPEngine. RTPEngine is an extremely versatile RTP relay which performs forwarding in kernel space, achieving close to wire speed. Installation and setup of RTPEngine is outside the scope of this tutorial, but the documentation on the linked GitHub page is sufficient.

As with all other RTP relays supported by Kamailio, RTPEngine is an external process controlled by Kamailio via a UDP control socket. When Kamailio receives an SDP offer or answer, it forwards it to RTPEngine via the rtpengine control module, and RTPEngine opens a pair of RTP/RTCP ports to receive traffic from the given endpoint. The same happens in the other direction, upon handling the SDP offer/answer of the other party. These new endpoints are then substituted into the SDP prior to relay, with the result that RTPEngine is now involved in both streams.

What is to be done?

To provide server-side NAT traversal, then, the following things must be done within the overall logic of Kamailio route script.

  1. Ensure that transactional replies return to real source port – When an endpoint sends a request to your SIP server, normal behaviour is to forward replied to the transport protocol, address and port indicated in the topmost Via header of the request. In a NAT’d setting, this needs to be ignored and the reply must instead be returned to the real outside source address and port of the request. This is provided for by the rport parameter, as elaborated upon in RFC 3581. The trouble is, not all NAT’d endpoints include the ;rport parameter in their Via. Fortunately, there is a core Kamailio function, force_rport(), which tells Kamailio to treat the request as if ;rport were present.
  2. Stay in the messaging path for the entire dialog life cycle – If Kamailio is providing far-end NAT traversal functionality for a call, it must continue to do so for the entire life cycle of the call, not just the initial INVITE transaction. To tell the endpoints to shunt their in-dialog requests through Kamailio, a Record-Route header must be added; this is accomplished by calling record_route() (rr module) for initial INVITE requests.
  3. Fix Contact URI to be NAT-safe – This applies to requests and replies alike, and applied to INVITE and REGISTER transactions alike. This will be discussed further below.
  4. Engage RTPEngine – (if necessary)

It’s really as simple as that.

We will discuss how to achieve these things below, but first…

Testing topology

For purposes of example in this article, I will be using my home Polycom VVX 411, on LAN subnet 172.30.105.0/24>. It talks to a Kamailio server, 70.1.2.1, which also acts as a registrar, and front-ends an elastic group of media servers which are located on a private subnet, 192.168.2.0/24. This also means that the Kamailio server bridges SIP (and as we shall see, RTP, by way of RTPEngine) between two different network interfaces. This is perhaps more complex than the topology needs to be by way of example, but also illuminates a fuller range of possibilities.

A diagram may help:

nat_traversal_topology

The nathelper module

The nathelper module is Kamailio’s one-stop stop for NAT traversal-related functionality. Its parameters and functions encapsulate three main functional areas:

  • Manipulation of SIP message attributes to add outside-network awareness;
  • Detection of NAT’d endpoints;
  • Keepalive pinging of NAT’d endpoints.

There is a subtle link between this module and the registrar module, in that the received_avp parameter is shared among them—if you choose to take that approach to dealing with registrations.

The nat_uac_test() function performs a user-defined combination of tests to decide if an endpoint is NAT’d. The argument is a bitmask; if you’re not familiar with the concept from software engineering, it means that a combination of flags can be specified by adding them together. For example, to apply both flag 1 and flag 2, use an argument of “3”.

Here is a REGISTER request from my NAT’d endpoint:

2018/05/07 06:53:26.402531 47.39.154.156:5060 -> 192.168.2.220:5060
REGISTER sip:sip.evaristesys.com SIP/2.0
Via: SIP/2.0/UDP 172.30.105.251:5060;branch=z9hG4bKffe427d2756F1643
From: "alex-balashov" <sip:alex-balashov@sip.evaristesys.com>;tag=B84E1216-803F7CD7
To: <sip:alex-balashov@sip.evaristesys.com>
CSeq: 3561 REGISTER
Call-ID: 4ae7899d1cc396640e440df7c72662d3
Contact: <sip:alex-balashov@172.30.105.251:5060>;methods="INVITE, ACK, BYE, CANCEL, OPTIONS, INFO, MESSAGE, SUBSCRIBE, NOTIFY, PRACK, UP
TE, REFER"
User-Agent: PolycomVVX-VVX_411-UA/5.6.0.17325
Accept-Language: en
Authorization: [omitted]
Max-Forwards: 70
Expires: 300
Content-Length: 0

The Via header specifies where responses to this transaction should be sent. It can be clearly seen that although the Via header contains a private IP of 172.30.105.251:5060, the actual source of the request is 47.39.154.156:5060 (and, it should be noted, the fact that the internal port 5060 maps to an external port of 5060 is merely a coincidence from how this particular NAT gateway works; it is more typical for it to be mapped to an arbitrary and different external port). Therefore, in this case, test flags 2 and 16 to nat_uac_test() would detect this anomaly.

There is some debate as to whether the various tests for RFC 1918/RFC 6598 (private) addresses have merit. It’s tempting to think that one can reveal NAT straightforwardly by checking for private addresses, e.g. 192.168.0.0/16, 172.16.0.0/12, 10.0.0.0/8, in the Via or Contact headers. However, to return to the network diagram above, Kamailio is multihomed on a private as well as a public network. Although symmetric SIP signalling can be taken for granted from almost any SIP endpoint nowadays, it is nevertheless poor form to give NAT treatment to an endpoint that is directly routable. Give some thought to whether the central theme of your NAT detection approach should be in looking for private addresses, or looking for discrepancies between the represented address/port and the actual source address/port. I personally favour the latter approach.

The “old books” of nathelper vs. the new

Traditional OpenSER-era and early Kamailio folklore prescribes the use of fix_nated_contact() and fix_nated_register() functions. One can still find these in a lot of books and documentation:

fix_nated_contact() rewrites the domain portion of the Contact URI to contain the source IP and port of the request or reply.

fix_nated_register() is intended for REGISTER requests, so is only relevant if you are using Kamailio as a registrar or forwarding registrations onward (i.e. using Path). It takes a more delicate approach, storing the real source IP and port in the received_avp, where it can be retrieved by registrar lookups and set as the destination set, Kamailio’s term for the next-hop forwarding destination (overriding request URI domain and port).

fix_nated_register() is generally unproblematic, though it does require a shared AVP with the registrar module. From a semantic point of view, however, fix_nated_contact() is deeply problematic, in that it modifies the Contact URI and therefore causes the construction of a Request URI, in requests incoming to the NAT’d client, which are not equivalent to the Contact URI populated there by the client. RFC 3261 says thou shalt not do that.

The nathelper offers better idioms for dealing with this mangling nowadays: handle_ruri_alias() and set_contact_alias()/add_contact_alias. Using these functions, this:

Contact: <sip:alex-balashov@172.30.105.251:5060>

is turned into:

Contact: <sip:alex-balashov@172.30.105.251:5060;alias=47.39.154.156~5060~1>

and stored (if REGISTER) or forwarded (anything else). When handle_ruri_alias() is called, the ;alias parameter is stripped off, and its contents populated into the destination URI. The beautiful thing about handle_ruri_alias() is that if the ;alias parameter is not present, it silently returns without any errors. This simplifies the code by removing the need for explicit checking for this parameter.

For the sake of simplicity and minimum intrusiveness, I strongly recommend using these functions in place of the old fix_*() functions.

Implementation

Near the top of the main request_route, you’ll probably want to have a global subroutine that checks for NAT. At this point, the logic will not be specialised based on the request method or whether the request contains an encapsulated SDP body. Critically, ensure that this happens prior to any authentication/AAA checks, as 401/407 challenges, along with all other replies, need to be routed to the correct place based on force_rport():

   
   if(nat_uac_test("18")) {
      force_rport();

      if(is_method("INVITE|REGISTER|SUBSCRIBE"))
         set_contact_alias();
   }

Later, in the loose_route() section that deals with handling re-invites and other in-dialog requests, you’ll need to engage RTPEngine and handle any present ;alias in the Request URI:

   if(has_totag()) {
      if(loose_route()) {
         if(is_method("INVITE|UPDATE") && sdp_content() && nat_uac_test("18"))
             rtpengine_manage("replace-origin replace-session-connection ICE=remove");

         ...

         handle_ruri_alias();

         t_on_reply("MAIN_REPLY");

         if(!t_relay())
            sl_reply_error();

         exit;
      }
   }

Initial INVITE handling is similar:

request_route {
   ...

   if(has_totag()) {
      ...
   }

   ...

   t_check_trans();

   if(is_method("INVITE")) {
      if(nat_uac_test("18") && sdp_content()) 
         rtpengine_manage("replace-origin replace-session-connection ICE=remove");

      t_on_reply("MAIN_REPLY");

      if(!t_relay())
         sl_reply_error();

      exit;
   }

To accommodate the case that requests are inbound to the NAT’d endpoint or the case that NAT’d endpoints are calling each other directly, an onreply_route will need to be armed for any transaction involving a NAT’d party. Its logic should be similar:

onreply_route[MAIN_REPLY] {
   if(nat_uac_test("18")) {
      force_rport();
      set_contact_alias();

      if(sdp_content()) 
         rtpengine_manage("replace-origin replace-session-connection ICE=remove");
    }

}

For serial forking across to multiple potential gateways, it is strongly recommended that you put initial invocations to RTPEngine into a branch_route(), so that RTPEngine can receive the most up-to-date branch data and potentially make branch-level decisions.

Registration requests are already handled by the general NAT detection stanza above. However, registration _lookups_ require an additional nuance:

route[REGISTRAR_LOOKUP] {
   ...

   if(!lookup("location")) {
      sl_send_reply("404", "Not Found");
      exit;
   }

   handle_ruri_alias();

   if(!t_relay())
      sl_reply_error();

   exit;
}

That’s really it!

What about NAT’d servers?

In cloud and VPS environments, it is getting quite common to have a private IP address natively homed on the host with an external public IP provided via 1-to-1 NAT.

Kamailio’s core listen directive has a parameter to assist with just this:

listen=udp:192.168.2.119:5060 advertise 70.1.2.1:5060

This will ensure that the Via and Record-Route headers reference the public IP address rather than the private one. It has no impact on RTP.

Topology bridging with RTPEngine + NAT

The discerning observer will note that the foregoing invocations of rtpengine_manage() did not address a key requirement of the network topology outlined in the diagram, the need to bridge two disparate network topologies.

This requires two different RTPEngine forwarding interfaces, one of which has a public IP via 1-to-1 NAT. The latter would seem to require something like an advertise directive, but for RTP. Fortunately, RTPEngine has such an option, applied with the ! delimiter:

OPTIONS="-i internal/192.168.2.220 -i external/192.168.2.119!70.1.2.1

The direction attribute to rtpengine_offer() (or, equivalently, the initial call to rtpengine_manage()) allows one to specify the ingress and egress interfaces respectively:

rtpengine_manage("replace-origin replace-session-connection ICE=remove direction=internal direction=external");

Subsequent calls to rtpengine_manage(), including calls in onreply_route, will appropriately take into account this state and reverse the interface order for the return stream as needed.

Keepalives and timeouts

The most common challenge with NAT’d SIP endpoints is that they need to remain reachable in a persistent way; they can receive inbound calls or other messages at any moment in the future.

Recall that NAT gateways add mappings for connections or connection-like flows (in the case of UDP, for remember that for NAT purposes UDP isn’t truly “connection-less”) that they detect, e.g. from 192.168.0.102:5060 to $WAN_IP:43928. For the time that the latter mapping exists, any UDP sent to $WAN_IP:43928 will be returned to 192.168.0.102:5060.

The problem is that this mapping is removed after relatively short periods of inactivity. In principle this is a good thing; you wouldn’t want your NAT gateway’s memory filled up with ephemeral “connections” that have long ceased to be relevant. However, while, in our experience, most timeouts for UDP flows are in the range of a few minutes, there are some routers whose “memory” for UDP flows can be exceptionally poor — one minute or less. The same thing holds true for TCP, but UDP tends to be affected more egregiously.

When the connection tracking “mapping” goes away, the NAT gateway drops incoming packets to the old $WAN_IP:43928 destination on the floor. Consider this example:

screenshot-2018-05-08-08:37:42

In this test topology, 10.150.21.6 is a Freeswitch PBX on a private network (10.150.21.0/24) that receives registrations relayed from Kamailio (with help from the Path header). Kamailio is multi-homed on a private (10.150.20.2) and public (209.51.167.66) interface, the latter of which is presented to outside phones.

A registration which occurred about 15 minutes prior had established a contact binding of 47.39.154.156:5060 for my AOR (Address of Record). However, as no activity had occurred in this flow for as long, the NAT router “forgot” about it, and you can that efforts to reach the phone go nowhere. An ICMP type 3 (port unreachable) message (not shown) is sent back to Kamailio and that’s the end of it.

So, to keep NAT “pinholes” — as they’re often called — open, some means of generating frequent activity on the mapped flow is required.

The easiest and most low-hanging solution is to lower the re-registration interval of every NAT’d device to something like 60 or 120 seconds; this will generate a bidirectional message exchange (REGISTER, 401 challenge, 200 OK) which will “renew” the pinhole. This is effective in many cases. But there are two problems:

  1. Interval can’t be too low – Many devices or SIP registrars will not support a re-registration interval of less than 60 seconds, and believe it or not, that’s not low enough for some of the most egregious violators among the NAT gateways out there.
  2. Performance issues for the service provider – In a sympathetic moment, consider things from your SIP service provider’s perspective: tens of thousands (or more) of devices are banging on an SBC or an edge proxy — and with registrations no less, which are rather expensive operations that typically have some kind of database involvement for both authentication and persistent storage. That can greatly change the operational economics. So, as a matter of policy, allowing or encouraging such low re-registration intervals may not be desirable.

Enter the “keepalive”, a message sent by either server or client that garners some kind of response from the other party. Keepalives are an improvement over registrations in that they are not resource-intensive, since they invite only a superficial response from a SIP stack.

There are two types of keepalives commonly used in the SIP world: (1) a basic CRLF (carriage return line feed) message, short and sweet, and (2) a SIP OPTIONS request. While OPTIONS ostensibly has a different formal purpose, to query a SIP party for its capabilities, it’s frequently employed as a keepalive or dead peer detection (DPD) message.

Many end-user devices can send these keepalives, and if your end-user device environment is sufficiently homogenous and you exert high provisioning control over it, you may wish to do configure it that way and simply have Kamailio respond to them. In the case OPTIONS pings, you will want to configure Kamailio to respond to them with an affirmative 200 OK:

    if(is_method("OPTIONS")) {
        options_reply();
        exit;
    }

That goes in the initial request-handling section, toward the bottom of the main request route.

Pro-tip: Most end-user devices will send an OPTIONS message with a Request URI that has a user part, i.e.

OPTIONS sip:test@server.ip:5060 SIP/2.0

There is a valid debate to be had as to whether this is appropriate, since, strictly speaking, it implies that the OPTIONS message is destined for a particular “resource” (e.g. Address of Record / other user) on that server, rather than the server itself. Nevertheless, this is how a lot of OPTIONS messages are constructed. The Kamailio siputils module, which provides the options_reply() function, takes a fundamentalist interpretation in this debate, which will impair many replies.

Slightly unorthodox, but effective workaround, since keepalive applications of the OPTIONS message seldom care about the actual content of the response:

    if(is_method("OPTIONS")) {
       sl_send_reply("200", "OK");
       exit;
    }

You may find more profit in server-initiated keepalive pinging, however. The Kamailio nathelper module provides extensive options for that as well. Start with the NAT pinging section.

UDP fragmentation

The tendency over time is for the median size of SIP messages to creep up: SDP stanzas get bigger as more codecs are on offer, new SIP headers and attributes enter into use, etc.

When the payload size of a UDP message gets to within a small margin of the MTU (typically 1500 bytes), it gets fragmented. UDP does not provide transport-level reassembly as TCP does. Because only the first fragment will contain the UDP header, it takes considerable cleverness to reassemble the message. Kamailio’s SIP stack can, of course, do this, as can many others in the mainstream FOSS world. However, many user agents cannot.

More damningly, there’s virtually a zero-percent chance that a NAT gateway will handle UDP fragmentation correctly. So, as a rule of thumb, it is eminently safe to assume that a NAT’d endpoint will not receive a fragmented SIP message.

Strategies for dealing with this phenomenon are detailed in a separate post all about UDP fragmentation on this blog, but the short answer is: use TCP. It’s what RFC 3261 says to do.

What about SIP Outbound?

RFC 5626, known as “SIP Outbound”, is the latest opus of the IETF’s copious intellectual output on these topics. As is true of many such complicated ventures, Kamailio has supported it for a long time but most SIP UAs in the wild seldom do.

In brief, SIP Outbound proposes the establishment of multiple concurrent connection flows by the client for redundancy. A basic tenet of this arrangement is that all responsibility for establishment of connections through NAT, as well as all maintenance and upkeep of the same, is the responsibility of the client. There are a lot of other details involved, mainly to do with the registrar only using one of the “flows” at a time to reach a client with multiple registrations, so that multiple registrations established for redundancy do not lead to multiple forked INVITEs to the client. Some new parameters are involved in this new layer of bureaucracy for the registrar: instance-id and reg-id.

A full exposition of how it all works is certainly beyond the scope of this article, but RFC 5626 is captivating bedtime reading. However, until and unless widespread UA support for it appears, this author cannot be moved to say, “Use SIP Outbound, it’ll solve your NAT traversal problems!”

Tuning Kamailio for high throughput and performance

Unrivaled SIP message processing throughput is one of the central claims to fame made by Kamailio. When it comes to call setups per second (“CPS”) or SIP messages per second, there’s nothing faster than the OpenSER technology stack. Naturally, we pass that benefit on in the value proposition of CSRP, our Kamailio-based “Class 4” routing, rating and accounting platform.

It’s an important differentiator from many traditional softswitches, SBCs and B2BUAs, some of which are known to fall over with as little as 100 CPS of traffic, and many others to top out at a few hundred CPS spread over the entire installation. It shapes the horizontal scalability and port density of those platforms, and, accordingly, the unit economics for actors in the business: per-port licencing costs, server dimensioning, and ultimately, gross margins in a world where PSTN termination costs are spiraling down rapidly and short-duration traffic–love it, hate it–plays a conspicuous role in the ITSP industry prospectus.

That’s why it’s worthwhile to take the time to understand how Kamailio does what it does, and what that means for you as an implementor (or a prospective CSRP customer? :-).

Kamailio concurrency architecture

Kamailio does not use threads as such. Instead, when it boots, it fork()s some child processes that are specialised into SIP packet receiver roles. These are bona fide independent processes, and although they may be colloquially referred to as “threads”, they’re not POSIX threads, and, critically, don’t use POSIX threads’ locking and synchronisation mechanisms. Kamailio child processes communicate amongst themselves (interprocess communication, or “IPC”) using System V shared memory. We’re going to call these “receiver processes” for the remainder of the article, since that’s what Kamailio itself calls them.

The number of receiver processes to spawn is governed by the children= core configuration directive. This value is multiplied by the number of listening interfaces and transports. For example, in the output below, I have my children set to 8, but because I am listening on two network interfaces (209.51.167.66 and 10.150.20.2), there are eight processes for each interface. If I enabled SIP over TCP as well as UDP, the number would be 32. But a more typical installation would simply have 8:

[root@allegro-1 ~]# kamctl ps
Process:: ID=0 PID=22937 Type=attendant
Process:: ID=1 PID=22938 Type=udp receiver child=0 sock=209.51.167.66:5060
Process:: ID=2 PID=22939 Type=udp receiver child=1 sock=209.51.167.66:5060
Process:: ID=3 PID=22940 Type=udp receiver child=2 sock=209.51.167.66:5060
Process:: ID=4 PID=22941 Type=udp receiver child=3 sock=209.51.167.66:5060
Process:: ID=5 PID=22942 Type=udp receiver child=4 sock=209.51.167.66:5060
Process:: ID=6 PID=22943 Type=udp receiver child=5 sock=209.51.167.66:5060
Process:: ID=7 PID=22944 Type=udp receiver child=6 sock=209.51.167.66:5060
Process:: ID=8 PID=22945 Type=udp receiver child=7 sock=209.51.167.66:5060
Process:: ID=9 PID=22946 Type=udp receiver child=0 sock=10.150.20.2:5060
Process:: ID=10 PID=22947 Type=udp receiver child=1 sock=10.150.20.2:5060
Process:: ID=11 PID=22948 Type=udp receiver child=2 sock=10.150.20.2:5060
Process:: ID=12 PID=22949 Type=udp receiver child=3 sock=10.150.20.2:5060
Process:: ID=13 PID=22950 Type=udp receiver child=4 sock=10.150.20.2:5060
Process:: ID=14 PID=22951 Type=udp receiver child=5 sock=10.150.20.2:5060
Process:: ID=15 PID=22952 Type=udp receiver child=6 sock=10.150.20.2:5060
Process:: ID=16 PID=22953 Type=udp receiver child=7 sock=10.150.20.2:5060
Process:: ID=17 PID=22954 Type=slow timer
Process:: ID=18 PID=22955 Type=timer
Process:: ID=19 PID=22956 Type=MI FIFO

(There are some other child processes besides receivers, but these are ancillary — they do not perform Kamailio’s core function of SIP message processing. More about other processes later.)

You can think of these receiver processes as something like “traffic lanes” for SIP packets; as many “lanes” as there are, that’s how many SIP messages can be crammed onto the “highway” at the same time:

kamailio_sip_worker_processes

This is more or less the standard static “thread pool” design. For low-latency, high-volume workloads, it’s probably the fastest available option. Because the size of the worker pool does not change, the overhead of starting and stopping threads constantly is avoided. What applies to static thread pool management in general also applies here.

Of course, synchronisation, the mutual exclusion locks (“mutexes”) which ensure that multiple threads do not access and modify the same data at the same time in conflicting ways, is the bane of multiprocess programming, whatever form the processes take. The parallelism benefit of multiple threads is undermined when they all spend a lot of time blocking, waiting on mutex locks held by other threads to open before their execution can continue. Think of a multi-lane road where every car is constantly changing lanes; there’s a lot of waiting, acknowledgment and coordination that has to happen, inevitably leading to a slow-down or jam. The ideal design is “shared-nothing”, where every car stays stays in its own lane always–that is, where every thread can operate more or less self-sufficiently without complicated sharing (and therefore, locking) with other threads.

The design of Kamailio is what you might call “share as little as possible”; while certain data structures and other constructs (AVPsXAVPs, SIP transactions, dialog statehtable, etc.) are unavoidably global (otherwise they wouldn’t be very useful), residing in the shared memory IPC space accessed by all receiver threads, much of what every receiver process requires to operate on a SIP message is proprietary to that process. For instance, every child process receives its own connection handle to databases and key-value stores (e.g. MySQL, Redis), removing the need for common (and contended) connection pooling. In addition to the shared memory pool used by all processes, every child process gets a small “scratch area” of memory where ephemeral, short-term data (such as $var(…) config variables) as well as persistent process-proprietary data lives. (This is called “package memory” in Kamailio, and is set with the -M command line argument upon invocation, as opposed to -m, which sets the size of the shared memory pool.)

Of course, actual results will depend on which Kamailio features you utilise, and how much you utilise them. Nearly all useful applications of Kamailio involve transaction statefulness, so you can expect, at a minimum, for transactions to be shared. If, for example, your processing is database-driven, you can expect receiver processes to operate more independently than if your processing is heavily tied up in shared memory constructs like htable or pipelimit.

Furthermore, in contrast to the architecture found in many classically multithreaded programs with this “thread pool” design, there is no “distributor” thread that apportions incoming packets. Instead, every child process calls recvfrom() (or accept() or whatever) on the same socket address. The operating system kernel itself distributes the incoming packets to listening child processes in a semi-random fashion that, in statistically large quantities, is substantially similar to “round-robin”. It’s a simple, straightforward approach that leverages the kernel’s own packet queueing and eliminates the complexity of a supervisory process to marshal data.

How many children to have?

Naturally, discussions of performance and throughput all sooner or later turn to:

What “children” value should I use for best performance?

It’s a hotly debated topic, and probably one of the more common FAQs on the Kamailio users’ mailing list. The stock configuration ships with a value of 8, which leads many people to ask: why so low? At first glance, it might stand to reason that on a busy system, the more child processes, the better. However, that’s not accurate.

The reason the answer is complicated is because it depends on your hardware and, more importantly, on Kamailio’s workload.

Before we go forward, let’s define a term: “available hardware threads”. For our purposes, this is the number of CPU “appearances” in /proc/cpuinfo. This takes into account the “logical” cores created by hyper-threading.

For instance, I have a dual-core laptop with four logical “CPUs”:

$ nproc
4

sasha@saurus:~$ cat /proc/cpuinfo | grep 'core id' | sort -u
core id		: 0
core id		: 1

In this case, our number of available hardware threads is 4.

In principle, the number of child processes that can usefully execute in parallel is equal to the number of available hardware threads (in the /proc/cpuinfo sense). Given a purely static Kamailio configuration on an 8-HW thread system, 8 receiver processes will have 8 different “CPU” affinities and peg out the processors with as many packets as the hardware can usefully handle. Such a configuration can handle tens of thousands of messages per second, and the limits you will eventually run into are more likely to do with userspace I/O contention or NIC frames-per-interrupt or hardware buffer type issues than with Kamailio itself.

Once you increase the number of receiver processes beyond that, the surplus processes will be fighting over the same number of hardware threads, and you’ll be more harmed by the downside of that userspace scheduling contention and the limited amount of shared memory locking that does exist in Kamailio than you’ll benefit from the upside of more processes.

However, most useful applications of Kamailio don’t involve a hard-coded config file, but rather external I/O interactions with outside systems: databases, key-value stores, web services, embedded programs, and the like. Waiting on an outside I/O call, such as a SQL query to MySQL, to return is a synchronous (or blocking) process; while the receiver thread waits for the database to respond, it sits there doing nothing. It’s tied up and cannot process any more SIP messages. It’s safe to say that the end-to-end processing latency for any given SIP message is determined by the cumulative I/O wait involved in the processing. Such operations are referred to as I/O-bound operations. Most of what a typical Kamailio deployment does is somehow I/O-bound.

This is where you have to take a discount from the aforementioned idealised maximum throughput of Kamailio, and it’s usually a rather steep one. The question is rarely: “How many SIP messages per second can Kamailio handle?” The right question is: “How many SIP messages per second can Kamailio handle with your configuration script and external I/O dependencies?” It stands to reason that given a fixed-size receiver thread pool, one should aim to keep external I/O wait to a minimum.

Still, when a receiver process spends a lot of time waiting on external I/O, it’s just sleeping until notified by the kernel that new data has arrived on its socket descriptor or what have you. That sleeping creates an opening for additional processes to do useful work in that time. If you have a lot of external I/O wait, it’s safe to increase the number of receiver threads to values like 32 or 64. If most of what your worker processes do is wait on a morbidly obese Java servlet on another server, you can afford to have more of them waiting.

This is why a typical Linux system has hundreds of background processes running, even though there are only 2, 4 or 8 hardware threads available. Most of the time, those processes aren’t doing anything. They’re just sitting around waiting for external stimuli of some sort. If they were all pegging out the CPU, you’d have a problem.

How many receiver processes can there be? All other things being equal, the answer is “not too many”. They’re not designed to be run in the hundreds or thousands. Each child process is fairly heavyweight, carrying, at a minimum, an allocation of a few megabytes of package memory, and toting its own connection handle to services such as databases, RTP proxies, etc. Since Kamailio child processes do have to share quite a few things, there’s shared memory mutexes over those data structures. I don’t have any numbers, but the fast mutex design is clearly not intended to support a very large number of processes. I suppose it’s a testament to CSRP’s relatively efficient call processing loop that, despite being very database-bound, we’ve found in our own testing signs of diminishing returns after increasing children much beyond the number of available hardware threads.

Academically speaking, the easiest way to know that you need more child processes is to monitor the kernel’s packet receive queue using netstat (or ss, since netstat is deprecated in RHEL >= 7, in keeping with general developments in systemd land):

[root@allegro-1 ~]# ss -4 -n -l | grep 5060
udp    UNCONN     0      0      10.150.20.2:5060                  *:*
udp    UNCONN     0      0      209.51.167.66:5060                  *:*

The third column is the RecvQ column. Under normal conditions, its value should be 0, perhaps ephemerally spiking to a few hundred or thousand entries here and there. If the receive queue size is continuously > 0, bursting stubbornly high, or, worst of all, increasing monotonically, this tells you that incoming SIP messages are not being consumed by the receiver processes fast enough. These receive queues can be tuned to some extent, but that ultimately won’t solve your problem. You just need more processes to suckle on the packet teat.

More fine-grained results can be obtained with sipp scenario testing. Run calls through your Kamailio proxy and ramp the call setup rate up until the UAC starts reporting retransmissions. This gives insight into a different dimension of the problem than the packet queue: is your proxy taking too long to respond? In both cases, however, the available options are either to decrease the I/O wait to free up the receiver processes to process more messages, or to add more receiver processes.

However, once you go down the road of adding receiver processes, you need to ask yourself: are these processes doing a lot of waiting, or are they always busy? If your request/message processing in configuration script has relatively little end-to-end I/O delay, all you’re going to do is overbook your CPU, driving up your load average and slowing down the system as a whole. In that case, you’ve simply run into the limits of your hardware or your software architecture. There’s no simple fix for that.

That’s why, when the question about the ideal number of receiver processes is asked, the answers given are often avoidant and noncommittal.

What about asynchronous processing?

Over the last few years, Kamailio has evolved a lot of asynchronous processing features. Daniel-Constantin Mierla has some useful examples and information from Kamailio World 2014.

The basic idea behind asynchronous processing, in Kamailio terms is that, in addition to the core receiver processes, an additional pool of processes is spawned to which latent blocking operations can be delegated. Transactions are suspended and enqueued to these outside processes, and they’ll get to them … whenever they can get to them–that’s the “asynchronous” part. This keeps the main SIP receiver processes free to process further messages instead of being blocked with expensive I/O operations, as the heavy lifting is left up to the dedicated async task worker processes.

Asynchronous processing can be very useful in certain situations. If you know your request routing is going to be expensive, you can send back an immediate, stateless 100 Trying and push the processing tasks out to the async task workers.

However, a note of caution. Asynchronous processing of all kinds is often held to be a panacea, driven by popular Silicon Valley fashions and async-everything design patterns in the world of Node.js. There’s a lot of cargo cult thinking and exuberance around asynchronous processing.

As a guiding principle, remember that asynchrony is not magic, and it cannot achieve that which is otherwise thermodynamically impossible given the number of transistors on your chip. In many cases, asynchronous programming is almost a kind of syntactical sugar, a different semantic vantage point on the same operations which ultimately have to be executed in some way on the same system given the same resources. The fact that the responsibility for I/O multiplexing is pushed to an external, opaque actor doesn’t change that.

Asynchronous processing also imposes its own overheads: in the case of Kamailio, there’s a complexity to suspending a TM transaction and reanimating it in a different thread that should be weighed. (I cannot say how much complexity, and, as with everything else in this article, have made no effort to measure it or describe it with the rigour of the scientific method. But it’s there.)

In the commonplace case of database-driven workloads, asynchronous processing does little more than push the pain point to a different place. To drive the point home, let’s take an example from our very own CSRP product:

In CSRP, we write CDR events to our PostgreSQL database in an asynchronous way, since these operations are quite expensive and can potentially set off a database trigger cascade for call rating, lengthening the transaction. We don’t really care if CDRs are written with a slight delay; it’s far more important that this accounting not block SIP receiver processes.

However, many CSRP customers choose to run their PostgreSQL database on the same host as the Kamailio proxy. If the database is busy writing CDRs and is pegging out the storage controller with write ops, it’s going to make everything less responsive, asynchronous or not, including the read-only queries required for call processing. Even if the database is situated on a different host, our call processing is highly database-dependent, so overwhelming the database has deleterious consequences regardless.

This can engender a nasty positive feedback loop:

  • Adding more asynchronous task workers won’t help; they’ll just further overwhelm storage with an additional firehose of CDR events.
  • The asynchronous task queue will stack up until calling SIP endpoints will start CANCELing calls due to high post-dial delay (PDD).
  • Adding more receive workers won’t help; if the system as a whole is experiencing high I/O wait, adding more workers to take on more SIP messages just means more queries and yet more load.

A fashionable design pattern can’t fix that; you just need more hardware, or a different approach (in terms of I/O, algorithms, storage demand, etc.) to call processing.

The point is: before shifting the load to another part of the system so as to get more traffic through the front door, consider the impact globally and holistically. Maybe you can get Kamailio to slurp up more packets, but that doesn’t mean you should. How well do your external inputs scale to the task?

Asynchronous tasks can be very handy for certain kinds of applications, most notably where some sort of activity needs to be time-delayed into the future (e.g. push notifications). We love our asynchronous CDR accounting, since it’s heavy, and yet there’s no need for that to be real-time or responsive by SIP standards. However, for maximising call throughput in an I/O-bound workload such as ours, in which storage and database demand is more or less a linear function of requests per second, it’s far less clear. Our own testing suggests that asynchronous processing yields marginal benefits at best, and that we might be better off keeping ourselves honest and putting our efforts into further lowering our processing latency in the normal, synchronous execution context.

Conclusion

  • There’s no straightforward, generic answer to the question of how to reap maximum throughput from Kamailio and/or how many receiver worker processes to use. It requires deep consideration of the nature of the workload and the execution environment, and, most likely, empirical testing — doubly so for bespoke and/or nonstandard applications.
  • A reasonable guideline for most generic and/or commonplace Kamailio workloads is to set the children equal to the number of available hardware threads. The prevalence of servers with quad-core processors + HyperThreading probably explains why the stock config ships with a setting of 8.
  • Asynchronous features are convenient and can, to an extent, be used to increase raw throughput, but rapidly encounter diminishing returns when the result is a drastic increase in base I/O load on either the local host or a dependency to which the workload is heavily I/O-bound.

Many thanks to my colleagues Fred Posner, Matt Jordan and Kevin Fleming for reading drafts of this article.

SIP UDP fragmentation and Kamailio – the SIP header diet

Failed calls due to fragmentation of large UDP SIP messages is a frequent support issue for us, as a provider of a SIP proxy-based call processing platform based on Kamailio.

Anecdotally, the problem seems to be getting more common, as SIP becomes more complex, offering more extensions and capabilities that are represented in some way in the messaging, and more voice codecs are on offer, such as Opus and G.722. Meanwhile, the capacity of many SIP user agents to reassemble fragmented UDP does not seem to have increased much over the last few years. Even where a SIP user agent can handle UDP reassembly, UDP fragments are dropped by braindead routers and NAT gateways. Setting the Don’t Fragment (DF) IP header bit doesn’t help. It’s safe to say that fragmented UDP messages can’t be handled reliably out there in the world.

sip_fragmentation

Kamailio itself handles fragmented incoming UDP messages just fine, but that doesn’t mean we can pass the full message to the destination, where it will be fragmented again and, most likely, disappear into the ether. As you might surmise, it poses a unique challenge for us since Kamailio is at the heart of CSRP’s call processing core, not a Back-to-Back User Agent (B2BUA), as might be the case in a more traditional SBC or softswitch. An inline B2BUA that endogenously originates a new logical B-leg can easily overcome this problem by accepting a wealth of SIP message bloat on the A-leg liberally while emitting headers conservatively on the B-leg.

SIP proxies such as Kamailio participate within one logical call leg, and are for the most part are bound by standards to pass along SIP messaging as received, without adulteration. This poses problems for interoperability, in the sense of “garbage in, garbage out”. Customers choose our platform for its lightweight design and high throughput characteristics, and that’s the trade-off they make.

Nevertheless, the question comes up often enough: can we use Kamailio to remove any SIP headers from obese INVITE messages to tuck them back under the fragmentation threshold of ~1480 bytes implied by the ubiquitous Maximum Transmission Unit (MTU) of 1500 bytes? If so, which ones are safe to remove? What are the implications of doing so?

First, RFC 3261 scripture is quite clear on the true solution to this problem:

   If a request is within 200 bytes of the path MTU, or if it is larger
   than 1300 bytes and the path MTU is unknown, the request MUST be sent
   using an RFC 2914 [43] congestion controlled transport protocol, such
   as TCP. If this causes a change in the transport protocol from the
   one indicated in the top Via, the value in the top Via MUST be
   changed.  This prevents fragmentation of messages over UDP and
   provides congestion control for larger messages.  However,
   implementations MUST be able to handle messages up to the maximum
   datagram packet size.  For UDP, this size is 65,535 bytes, including
   IP and UDP headers.

In other words, if your UDP message is > 1300 bytes or within 200 bytes of the MTU, send it over a TCP channel.

RFC 3261 mandates that all compliant endpoints support TCP, but practically, many don’t, or don’t have it enabled, or it isn’t reachable. While SIP-TCP support is becoming more commonplace and viable for a variety of reasons, driven by improvements in hardware, memory and bandwidth as well as phenomena like WebRTC, it’s neither ubiquitous nor ubiquitously enabled. Then there are intervening factors like NAT. Consequently, it doesn’t represent a viable solution for most operators.

Having ruled out TCP, attention usually turns to SIP headers which are perceived to be nonessential, such as the commonplace:

Date: 17 Dec 2015 12:20:17 GMT
Allow: ACK, BYE, CANCEL, INFO, INVITE, MESSAGE, NOTIFY, OPTIONS, PRACK, REFER, UPDATE
User-Agent: SIPpety SIP BigSIP Rev v0.12 r2015092pl14

As well as generous SDP offers:

a=rtpmap:0 PCMU/8000
a=rtpmap:9 G722/16000
a=rtpmap:8 PCMA/8000
a=rtpmap:18 G729/8000
a=fmtp:18 annexb=no
a=rtpmap:101 telephone-event/8000

“Can we just strip some of this stuff out?”

As it turns out, the answer depends on whether you are pure of heart. Will you give into sinful temptation, or are you one of the faithful few who walks with God?

Formally speaking, proxies are not supposed to modify anything in flight. Section 16.6 (“Request Forwarding”) of RFC 3261 says:

    The proxy starts with a copy of the received request.  The copy
    MUST initially contain all of the header fields from the
    received request.  Fields not detailed in the processing
    described below MUST NOT be removed.  The copy SHOULD maintain
    the ordering of the header fields as in the received request.
    The proxy MUST NOT reorder field values with a common field
    name (See Section 7.3.1).  The proxy MUST NOT add to, modify,
    or remove the message body.

So, no, you’re not supposed to tinker with the message headers in any way.

For those susceptible to Sirens’ songs, Kamailio has quite a few technical capabilities that go above and beyond what a proxy is formally allowed to do. However, I would urge strong caution. The dominant rule here should be: “Just because you can, doesn’t mean you should.” This is not generally heeded by sales-minded managers and executives. But you, as the hapless engineer, have to deal with the technical fallout.

Do keep in mind that all of these SIP headers (notwithstanding obvious unfiltered custom headers beginning with X-) are standardised, and they serve some purpose. However, some purposes are more important than others, and in Plain Old Call setup, not all purposes served by headers are necessarily important. Some message tampering by proxies is relatively inconsequential, in that removing some headers seems to have no material impact on the planet or on the UAC or UAS.

Still, I advise against a cavalier or presumptuous attitude. Axing data from SIP requests and replies that the proxy has a duty to pass through unmodified should be a last resort option, after you have exhausted all possible configuration changes to the UA(s) which could lead to a reduction in message size. This includes:

  • Disable unused or unnecessary codecs so they are not offered in the SDP stanza.
  • Enable compact headers, where possible.
  • Switch to TCP.
  • Increase MTU if signalling is taking place in a controlled LAN environment where you control the path MTU entirely.

But if you must travel down the dark road:

Most header fields should not be removed. Many “MUST NOT” (in the IETF RFC sense of the term) be removed. That said, here are the ones that can be removed relatively safely, in this author’s opinion:

  1. User-Agent / Server – This is a strictly informational field, and nobody is the wiser if it falls in the forest.
  2. Codecs in encapsulated SDP body – As long as the two endpoints can still agree on a common codec, what the far end doesn’t know can’t hurt it, and pruning codecs can result in substantial payload economies. Be very sure that all endpoints are going to have at least one codec they can agree on, and that the resulting agreement would yield desirable results in all scenarios.
  3. Allow – I have not seen any adverse impacts from removing this header in “POTS” INVITE call flows, whose value enumerates every SIP method supported by the endpoint. RFC 3261 itself suggests it can be read as somewhat redundant in non-OPTIONS flows. From Section 20.5:
       The Allow header field lists the set of methods 
       supported by the UA generating the message.
    
       All methods, including ACK and CANCEL, understood 
       by the UA MUST be included in the list of methods 
       in the Allow header field, when present.  The 
       absence of an Allow header field MUST NOT be interpreted 
       to mean that the UA sending the message supports no 
       methods.   Rather, it implies that the UA is not providing
       any information on what methods it supports.
    
       Supplying an Allow header field in responses to methods 
       other than OPTIONS reduces the number of messages needed.
  4. Date – Honestly, nobody cares what time the UA thinks it is. Axe it.
  5. Timestamp – Same as #4, unless you have an environment which actually uses it for round-trip estimates.

I would be very wary of removing Supported, as you simply cannot be certain that something Supported by one side is not Required by another. I would likewise avoid tinkering with Session-Timer-related headers (RFC 4028) for the same reason.

Under no circumstances should other well-known headers be touched.

This is a bandage that will buy some time, and does not address the fundamental issue. Best-practical solutions to this problem remain:

  • Reduce message size through configuration options on the UAs.
  • Use of a reliable transport layer that handles reassembly (aka TCP).
  • Inline B2BUA (yes, CSRP begrudgingly provides this as an option).

Why Evariste’s customers choose CSRP to solve the SIP Class 4 carrier interface

At Evariste Systems, our vision for our Kamailio-based CSRP project germinated from a perceived gap in Class 4 switch platform solutions for the small to midsize ITSP market.

12074504_10104890114887410_1352926093300463537_n
Alex Balashov, founder of Evariste Systems and member-in-council of the Kamailio project management board.

We started out in the mid-2000s as a FOSS technology-focused consultancy into the VoIP service provider space, specialising in high-performance call routing, accounting and rating solutions built on top of the OpenSER/Kamailio technology stack.

Rather quickly, we realised that we were constantly being asked to build subsets of the same sort of thing: a Kamailio-based trunk routing platform for hosted PBX operators, wholesale SIP origination & termination providers, and VoIP application service providers. These implementations had a common denominator of requirements:

  • Least cost routing or other dynamic routing to and from PSTN connection providers;
  • Call detail record (CDR) accounting and/or mediation;
  • High-volume call processing engine.

We had to ask ourselves: why did so many ITSPs want to build a Kamailio-based Class 4 switch? After all, there were, by the time we entered this arena, plenty of canonical solutions to these problems in the enterprise segment, with well-established elements like Nextone (Genband) and Acme Packet (Oracle) for call processing and redirect-based LCR folk traditions, as for example from Global Converge.

The picture that emerged from our discovery:

  • Most obviously, small and midsize ITSPs were priced out of the enterprise Class 4 solution space.
  • Many small to medium ITSPs relied on the the trunking/Class 4 side of their Class 5 feature platforms, e.g. multitenant hosted PBX platforms, for the carrier interface. However, this component was distinctly lacking in technical and business-layer features. In those kinds of platforms, the carrier interface portion was a relatively colourless afterthought — just another check box in a very long feature matrix.
  • The carrier interface of Class 5 platforms was not built with high-volume, high-throughput applications in mind, as call processing in these systems has to be funneled through an application-heavy engine. This engine was good for applications and user experience, not for high-volume call processing.
  • The conventional combination of “dumb” network element (e.g. SBC) + “intelligent” outboard routing server + [often] mediation solution was too complex, too expensive, and demanded too many moving parts.By the time this zoo was built out, there were three vendors to pay and three infrastructure silos to operationalise and keep highly available. It was simply too burdensome.
  • The Tier 1 supply chain that feeds the VoIP DID & termination industry was getting better and better at delivering small transactions with shrinking ramps and commits, squeezing distributive resale and arbitrage plays out.
  • Accordingly, as small to medium ITSPs had to look elsewhere for differentiation strategies, there was a widespread ask for programmable solutions and universal integration paths, such as APIs and direct database access.
  • Enterprise equipment offered these, but usually through a highly bureaucratic interface such as SOAP, or a complex SDK. The more technically minded in the industry demanded a purer, simpler, and more flexible way to externally drive their platform from their in-house OSS/BSS elixirs, customer portals and so on.
  • The performance of enterprise equipment in high-volume wholesale scenarios was easily exaggerated and often did not meet expectations, especially relative to the licencing costs.

Open-source engines were just that — engines. They offered no silver bullet because, while they offered good core technology, they lacked any of the vertical-specific business features.

So you want to build a FreeSWITCH-based platform? Okay, you’ve installed FreeSWITCH. Now what? While it can be wrestled relatively easily into a PBX or static call routing role, a Class 4 switch it does not make. Kamailio/OpenSER does even less out of the box; with its domain-specific, low-level route script, it can almost be thought of a kind of SDK with SIP proxy core.  These technologies are toolboxes, not finished products. All of which is to say: you can’t just download an open-source trunking box real quick — at least, not if you want it to do the kinds of things listed in the CSRP features matrix.

And thus became it clear that there is a market opportunity to fill a gap — an inexpensive Class 4 interface which:

  • Runs on commodity hardware.
  • Provides straightforward and folkloric integration paths.
  • Performs well under high loads, in many cases better than big brand commercial equipment.
  • Provides an expansive business layer with wide applicability to the global VoIP ITSP, carrier, call centre and application farm workloads.
  • Combines call processing, accounting and mediation functions into one chassis.
  • Solves commonplace technical problems in the delivery of SIP trunking as well or better as big brand SBCs, e.g. far-end NAT traversal for off-net gateways.
Screenshot from 2014-12-13 15:41:00
Configuring a Customer Billing Group in the CSRP web management interface.

We broke ground on our reference implementation for CSRP in February of 2010, choosing to build the platform on top of Kamailio with a strong focus on performance.

Although Kamailio, as a SIP proxy, was a decidedly unconventional core call processing technology in an industry accustomed to opaque B2BUA (Back to Back User Agent) network elements, it has paid handsome dividends in performance and reliability. Despite technical trade-offs that come with foresaking the traditional B2BUA formula, we feel our decision is vindicated by the thousands of calls per second effortlessly set up by our highest-volume customers. At that level, it’s a challenge to find suppliers whose Sonuses and Acme Packets won’t fall over from the loads CSRP easily copes with!

In terms of trajectory, we took the slow, organic customer-driven development route, preferring high market validation to an explosive–if spectacular–front-loaded marketing blitzkrieg. This means that the feature set of CSRP today is closely coupled to concrete customer demand in this space because we have taken the time to truly understand our market, and it means that our technology has seen extensive field battle testing in the precise applications for which it was intended. We don’t have a lot of dark feature corners of bit rot for the sake of another coveted check box; if something is there, it’s there for a reason.

This conservative and involved business strategy puts us in a unique position of credibility from which to comment objectively and soberly on the suitability of CSRP to your application. We know what our customers want, and we provide a vendor service relationship commensurate with the insight we have painstakingly cultivated into the Class 4 arena.

CSRP is not a piece of a larger puzzle for us; it’s the endgame. We are passionate about the platform we have built, and dedicated to making it better every day.

If you would like to learn more about CSRP, please do not hesitate to reach out!