RTP stands for Real-Time Transport Protocol. Its purpose is to pass voice data between a caller and the called. The problem is that when you call someone using the RTP protocol, you need to know his IP address and port. This makes RTP quite inconvenient when used alone, since the parts have no way to find one another. This is why people invented SIP.
SIP (Session Initiation Protocol) looks in syntax like HTTP, human readable text. Its purpose is to help the caller find the IP address and port of the called. It also helps the negotiation of the media types and formats. For example, when you have a PC at home and you want to call your friend from Romania using Free World Dialup (which uses the SIP protocol):
SIP sends an INVITE packet with the caller IP address and port for RTP to the FWD server, and from there, FWD forwards the call to the intended destination. The called accepts the call and sends his own IP address and port for RTP back.
The problem with SIP and NAT is not actually a SIP problem, but the RTP problem. SIP announces the RTP address and port, but if the client is behind NAT, it announces the client's RTP port, which can be different from the port the NAT allocates externally.
Even if a lot of SIP implementations and carriers are based on the fact that NAT will always try to allocate the same port, that assumption is false. In a production environment, you can't tell grandma that she can't talk with her grandson because some router has allocated another number.
If you are a carrier, the solution is simple because you proxy all the data, anyway. The solution is to use a SIP Session Border Controller. The SIP SBC normally stays in front of the internal SIP network of the carrier, solving the NAT traversal problem and protecting the SIP network.
The solution for NAT traversal in this case is to use some tricks.
The first trick is to keep open the hole in the NAT from the SIP client to the server. This is normally done by making all SIP clients use a two byte packet which is sent more often than 30 seconds. Some routers remove apparently unused NAT mappings after 30 seconds; GNU/Linux typically does this after three minutes.
The second trick is one we've used for our project YATE, to figure out the RTP IP and port from the first packet that arrives on the local RTP IP and port of the server, and to use it instead of using the RTP IP and address declared in SDP. This trick solves the NAT traversal problem, no matter how many NATs the client is traversing. However, the main disadvantage is that, in some cases, the client will not receive early media (since at that point, it sends out no voice packets) and it will not hear the ringing.
If you are not a carrier and you are trying to make a peer-to-peer call and both sides are behind the NAT, you must use an external SIP proxy or gateway to pass the SIP between the two points, hoping that the NATs will open the proper ports, one to another, for the RTP connection. However, there is no ultimate solution for that. Two proposed solutions are STUN and ICE, but every solution that currently exists can get in your way sometimes. Skype has found a very simple and nice solution for this problem: They use the Skype clients that are not behind NAT to proxy all the data for clients that are behind NAT.
This solution, technically speaking, is very good. However, there are some moral and political reasons not to use Skype. One of them is that if you are a Skype client outside the NAT, you don't know whose data is passing through your computer. Another is that it is using your bandwidth; after all, someone has to pay one way or another for Internet bandwidth necessary to proxy the voice stream.
My personal hope is that in the near future, most SIP implementations will use the two tricks used by YATE. Skype will probably be around for a long time for home users, but enterprise seems to move slowly to VoIP providers. With a lot of work and a little bit of luck, they will become at least as reliable as PSTN providers, since the technology is better.