Friday, January 18, 2008


Abstract — This paper describes the design of a system that
can give information of vehicle position everytime there’s a
request for it. The information of vehicle position is gained from GPS and it is transmitted using Short Message Services. The system is designed using VHDL on Altera MAX+plus II software, and it is implemented on Altera UP1X demoboard based on FPGA chip, which is Altera FLEX 10K EPF10K70RC240-4.

A. Background
Technology grows rapidly that causes every people to act fast. As one of human needs, information plays a greater role. People need to get fast and up -to- date information.
The need to get such information is get ting more important. For example is the need of knowing the position of vehicle is important for its owner .
In this paper we develop such system that can give information of vehicle position. This system helps the owner of the vehicle to know where his vehicle is. The system also helps tracking the vehicle when the owner is not driving it.
The objective of this project is to achieve a design of such system that can give information of the vehicle position every time there’s a request for it. The designed system has to be able to work properly on Altera UP1X demoboard based on FPGA, the Altera FLEX 10K EPF10K70RC240 -4.
C.Problem Boundary
This system is designed to be able to communicate in two direction between the vehicle and the owner. If the owner wants to know his vehicle position, he can easily send a signal to that vehicle. Then, the information of its position will be transmitted to him . The vehicle position is gained from the Global Positioning System while the data is transmitted using Short Message Services.
The system is designed using VHDL (Very High Speed
Integrated Circuit Hardware Description Language) on Altera MAX+plus II software and it is implemented on Altera UP1X demoboard based on FPGA ( Field Programmable Logic Array) chip, w hich is the Altera FLEX 10K EPF10K70RC240-4. The testing of the system is limited until
in -circuit level.
The designed system is a bidirectional communication system between the owner and its vehicle
The owner wants to know the vehicle position. Therefore he sends a signal to that vehicle. Then, the information of the vehicle position that is gained from GPS will be transmitted to him . The data is transmitted using SMS.
Part of the system that will be designed in this project is in
the vehicle side (Part 1 in Figure 1).
As already been described in C hapter II, the designed system is an interface between the main operator, which is the owner, and his vehicle. The two parties could communicate in two directions in order to know the vehicle position.
Figure.2. The input, output, and the main modules of the system
A. Inputs and Output of the System
There are two dif ferent kinds of inputs for this system. The first one is the input gained from GPS which is the sentence based on NMEA 0183 standard. The other one is the input received from cellular phone. But there is only one output for this system which is AT Command for sending the SMS.
B. Data Processing Mechanism
The signal sent by the main operator is the SMS containing a request of the vehicle position. Then the signal received later is the SMS containing the information of the vehicle position. The needed vehicle position are the latitude and longitude data gained from GPS.
C. System Module
As we can see from Figure 2, this system is divided into three main parts, that are the SMS receiver, the GPS receiver and data processing, and the SMS transmitter modules.
 SMS Receiver Module
This module handles the new incoming SMS signal. The SMS device used here is Siemens C35 cellular phone. Baud rate is set at 19200 bps, the data is 8N1 format, and flow control is none [1] . The input for this module is a sequence of data
that will be appeared whenever there is a new SMS coming. But there are certain AT commands that must be set first to make those data possible to appear
[2].The GPS used in this project is GARMIN type
35LP. The format of the transmitted data from that
GPS is as follows
[3] . $PGRMF,df1,df2,df3,df4,df5,df6,df7,df8,df9,df10,df11
There are two conditions of receiving the GPS data, that is when the GPS is still searching for the first position, and when the GPS has determined the first position.
The first condition is identified by the data in fie ld
11 (df11) that contains ‘0’. There are also no data in the 1st until 9th field and 12th until 15th , while the 10th field contains ‘A’ character which means that the mode is automatic.
For the second condition, the latitude data is in the
6th field in format ddmm.mmmm, while the latitude hemisphere data (North or South) is in the 7th field.
On the other hand, the longitude data is in the 8th field in format dddmm.mmmm and the longitude hemisphere data (West or East) is in the 9th field.
The output for this module is the information of the vehicle position which are the latitude and longitude data. After this process finished, the system is ready to transmit the SMS.
 SMS Transmitter Module
This module handles the SMS transmission containing the vehicle position information. The setting of this module is the same as in the SMS receiver module.
The input for this module is the information of vehicle position gained from the GPS receiver and data processing module.
The outputs for this module are AT command and
PDU codes that are used to send an SMS.
The format of SMS content expected is as follows.
*dd:mm; *ddd:mm
The ‘*’ character can be the ‘+’ character that indicates the North Latitude or West Longitude, or the ‘– ‘ character that indicates the South Latitude or the
East Longitude.


The designed system is first verified using the timing simulation on Altera MAX+plus II software.
The ser_hp signal is the serial data sent to the cellular phone. The respons from the cellular phone is the serial data in_hp input signal. The datain_hp[7..0] is the 8 bit parallel data from the in_ hp signal.
Therefore, the system will send those certain AT commands at the beginning, to be able to indicate that there is a new incoming SMS. After the system indicates that there is a new incoming SMS, the SMS will be deleted and the system will be ready to receive data from GPS.
 GPS Receiver and Data Processing Module This module receives and process es the GPS data. Baud rate is set at 4800 bps, the data is 8N1 format, and flow control is none. At the beginning, the system sends certain AT commands to make the cellular phone be able to indicate the new incoming SMS. Then the system is ready to indicate the new incoming SMS.
After the system indicates the new incoming SMS, the SMS is deleted. Now the system is ready to receive and process data from GPS.

As alr eady been described in Chapter I, the designed system is implemented on Altera UP1X demoboard.
The Altera UP1X demoboard has two PLD
( Programmable Logic Device) chips, which are FPGA FLEX
10K EPF10K70RC240- 4 and CPLD (Complex Programmable Logic Device) MAX7000 EPM7128S [4]. From those two chips, only the FPGA one that is used here, because it has larger capacities, such as 70000 logic gates.
There are two testing and verification procedures of this
system. Both are limited until the in- circuit level.

T he baud rate is changed to GPS baud rate. Now the system is ready to receive and process the GPS data.
The in_gps is the serial data input signal from the GPS. The datain_gps[7..0] is the 8 bit parallel data from the in_gps signal.

After the information of position has been achieved, the system is ready to send the SMS. Therefore the baud rate is changed to cellular phone baud rate.
In the testbench for the first testing and verification procedure , the inputs for the designed system are supplied from two different modules, which are the variation of cellular input module and the variation of GPS input modul e. The variation of cellular input module has a start signal that trigger s the flowing of the data signal, which is a serial asynchronous data. Those serial data is supplied to the input signal of the designed system. The output of the designed system is the ser_hp signal which is a serial data that is transmitted to the cellular phone. The clk signal is the clock input provided from the oscillator crystal of 25.175 MHz. The baud signal is the output of the baud rate generator that supp lies the clk2 signal, which is the secondary clock for the system. A nd also, there are the rst signal which reset s the
In the second testing and verification procedure , we use the real cellular phone, which is Siemens C35, and for the GPS, we use a simulated data supplied from a serial port program using Visual C++.
In Figure 11, we can see the example of SMS as the result of this second testing and verification procedure. The vehicle position information gained from that SMS is 01° 04’ South Lattitude ; 307° 06’ West Longitude. This information is accurate according to the simulated data supplied from a serial port program on PC.
In this project, the result of the design ed system has accomplished the target . The maximum frequency of this system based on timing simulation is 3.74 MHz, while based on the in- circuit verification, the system works at both real baud rates , which are 19200 bps for cellular and 4800 bps for GPS . The total number of logic cells used is 2472 of 3744 (6 6%).

The authors would like to express gratitude s to PT Elektrindodaya Pakarnusa that supports the facilities for this p roject. Along with Mulyanto for his assistance during working this project.

Thursday, January 17, 2008

Internet Monitoring & PingER at SLAC

To provide a better expectation of the network performance between sites that SLAC collaborates with, in May 1996 the PingER project (initiated in Janary 1995) monitored about 100 hosts worldwide from SLAC. Since 2000, the emphasis is more on measuring the Digital Divide. Nowadays (April 2007) there are over 35 Monitoring Sites, over 700 remote sites being monitored in over 120 countries (containing over 99% of the worlds Internet connected population) and over 3000 monitor-site remote-site pairs included. More details on the deployment of PingER can be found in PingER Deployment and there is a map of the sites.

The main mechanism used is the Internet Control Message Protocol (ICMP) Echo mechanism, also known as the Ping facility. This allows you to send a packet of a user selected length to a remote node and have it echoed back. Nowadays it usually comes pre-installed on almost all platforms, so there is nothing to install on the clients. The server (i.e. the echo reponder) runs at a high priority (e.g. in the kernel on Unix) and so is more likely to provide a good measure of network performance than a user application. It is very modest in its network bandwidth requirements (~ 100 bits per second per monitoring-remote-host-pair for the way we use it).
Measurement method
In the PingER project, every 30 minutescron from a monitoring node (Measurement Point - MP), we ping a set of remote nodes with 11 pings of 100 bytes each (including the 8 ICMP bytes but not the IP header). The pings are separated by at least one second, and the default ping timeout of 20 seconds is used. The first ping is thrown away (it is presumed to be slow since it is priming the caches etc. (Martin Horneffer in reports that using UDP-echo packets and an inter-arrival-time of about 12.5 seconds the first packet takes about 20% more time to return)). The minimum/average/maximum RTT for each set of 10 pings is recorded. This is repeated for ten pings of 1000 data bytes. The use of two ping packet sizes enables us to make estimates of ping data rates and also to spot behaviors that differ between small and large packets (e.g. rate limiting). See Big vs. small packets, timing of ping measurements for more details. In general the RTT is proportional to l (where l is the packet length) up to the maximum datagram size (typically 1472 bytes including the 8 ICMP echo bytes). Behavior beyond that is undefined (some networks fragment the packets, others drop them). Documentation on the recommended measurement script that runs at each monitoring site is available. The ping response times are plotted for each half hour for each node. This is mainly used for trouble shooting (e.g. see if it got dramatically worse in the last few hours).

The set of remote hosts to ping is provided by a file called pinger.xml (for more on this see the documentation). This file consists of two parts: Beacon hosts which are automatically pulled daily from SLAC and are monitored by all MPs; other hosts which are of particular interest to the administrator of the MP. The Beacon hosts (and the particular hosts monitored by the SLAC MP) are kept in an Oracle database containing their name, IP address, site, nickname, location, contact etc. The Beacon list (and list of particular hosts for SLAC) and a copy of the database, in a format to simplify Perl access for analysis scripts, are automatically generated from the database on a daily basis.

Data Gathering
The architecture of the monitoring includes 3 components:
The remote monitoring sites. These simply provide a passive remote-host with the appropriate requirements.
The monitoring site. The PingER monitoring tools needs to be installed and configured on a host at each of these sites. Also the ping data collected needs to be made available to the archive hosts via the HyperText Tansport Protocol (HTTP) (i.e. there must be a Web server to provide the data on demand via the Web). There are also PingER tools to enable a monitoring site to be able to provide short term analysis and reports on the data it has in its local cache.
The archive and analysis sites. There must be at least one each of these for each PingER project. The archive and analysis sites maybe located at a single site, or even a single host or they may be separated. The HENP & ESnet project has two such sites, one at HEPNRC at FNAL, the other at SLAC. The HEPNRC site is the major arcive site and SLAC the major analysis site. They complement each other by providing access to different reports obtained by analyzing the data gathered, e.g. HEPNRC provides charts of response time and loss versus time on demand for selected sites and time windows, while SLAC provides tables of more extensive metrics going back over a longer period. The XIWT project has its archive/monitor site at the CNRI.
The archive sites gather the information, by using HTTP, from the monitor sites at regular intervals and archive it. They provide the archived data to the analysis site(s), that in turn provide reports that are available via the Web.

The data is gatheredgather on a regular basis (typically daily) from the monitoring nodes by two archive hosts, one at SLAC the other at FNAL (HEPNRC), that store, analyzeanalysis, prepare and via the web make available reportspingtable on the results (see figure below).

The PingER architecture is illustrated below:

Some care is needed in the selection of the node to ping (see Requirements for WAN Hosts being Monitored).
We have also observed various pathologies with various remote sites when using ping. These are documented in PingER Measurement Pathologies.

The calibration and context in which the round trip metrics are measured are documented in PingER Calibration and Context, and some examples of how ping results look when taken with high statistics, and how they relate to routing, can be found in High statistics ping results.

We have validated the use of ping by demonstrating that measurements made with it correlate with application response. The correlation between the lower bounds of Web and ping responses is seen in the figure below. The measurements were made on December 18th, 1996, from SLAC to about 1760 sites identified in the NLANR caches. For more details see Effects of Internet Performance on Web Response Times, by Les Cottrell and John Halperin, unpublished, January 1997.
The remarkably clear lower boundary seen around y = 2x is not surprising since: a slope of 2 corresponds to HTTP GETs that take twice the ping time; the minimum ping time is approximately the round trip time; and a minimal TCP transaction involves two round trips, one round trip to exchange the second to send the request and receive the response. The connection termination is done asynchronously and so does not show up in the timing.

The lower boundary can also be visualized by displaying the distribution of residuals between the measurements and the line y = 2 x(where y = HTTP GET response time and x = Minimum ping response time). Such a distribution is shown below. The steep in crease in the frequency of measurements as one approaches zero residual value (y=2x) is apparent. The Inter Quartile Range (IQR), the residual range between where 25% and 75% of the measurements fall, is about 220 msec, and is indicated on the plot by the red line.

An alternate way of demonstrating that ping is related to Web performance is to show that ping can be used to predict which of a set of replicated Web servers to get a Web page from. For more on this see Dynamic Server Selection in the Internet, by Mark E. Crovella and Robert L. Carter.

The Firehunter Case Study of the Whitehouse Web Server showed that though ping response does not track abnormal Web performance well, in this case ping packet loss did a much better job.

The Internet quality of service assessment by Christian Huitema, provides measurements of the various components that contribute to web response. These components include: RTT, transmission speed, DNS delay, connection delay, server delay, transmission delay. It shows that the delays between sending the GET URL command and the reception of the first byte of the reponse is an estimate of server delay ("in many servers, though not necessarily all, this delay corresponds to the time needed to schedule the page requests, prepare the page in memory, and start sending data") and represents between 30 and 40% of the duration of the average transaction. To ease that, you probably need more powerful servers. Getting faster connections would obviously help the other 60% of the delay.

Also see the section below on Non Ping based tools for some correlations of throughput with round trip time and packet loss.

What do we Measure
We use ping to measure the response time (round trip time in milli-seconds (ms)), the packet loss percentages, the variability of the response time both short term (time scale of seconds) and longer, and the the lack of reachability, i.e. no response for a succession of pings. For a discussion and definition of reachability and availability see Internet Performance: Data Analysis and Visualization a White Paper by the XIWT. Also see the NIST document on A Conceptual Framework for System Fault Tolerance for more on the distinction between reliability and availability. We also record information on out of order packets and duplicate packets.
With measured data we are able to create long term baselines for expectations on means/medians and variability for response time, throughput, and packet loss. With these baselines in place we can set expectation, provide planning information, make extrapolations and look for exceptions (e.g. today's response time is more than 3 standard deviations greater than the average for the last 50 working days) and raise alerts.

The loss is a good measure of the quality of the link (in terms of its packet loss rates) for many TCP based applications. Loss is typically caused by congestion which in turn causes queues (e.g. in routers) to fill and packets to be dropped. Loss may also be caused by the network delivering an imperfect copy of the packet. This is usually caused by bit errors in the links or in network devices. Paxson (see End-to-end Packet Dynamics) from measurements made in 1994 and 1995 concluded that most corruption errors came from T1 links, and the typical rate was 1 in 5000 packets. This corresponds to a bit error rate for a 300byte packet average of about 1 in 12 million bits. IP has a 16 bit checksum, so the probability of not detecting an error in a corrupted packet is 1 in 65536, or 1 in about 300 million packets. A more recent study on When the CRC and TCP checksum disagree published in August 2000, indicates that traces of Internet packets over the last two years show that 1 in 30,000 packets fails the TCP checksum, even on links where link-level CRCs should catch all but 1 in 4 billion errors. These TCP checksum errors are higher level (e.g. they can be caused by bus errors in network devices or computers, or by TCP stack errors) than the link level errors that should be caught by the CRC checks.
The response time or Round Trip Time (RTT) when plotted against packet size can give an idea of ping data rate (kilo Bytes/sec (KB/s)) - see for example TRIUMF's Visual-ping utility. This becomes increasingly difficult as one goes to high performance links, since the packet range is relatively small (typically < 1500bytes), and the timing resoltion is limited. The RTT is related to the distance between the sites plus the delay at each hop along the path between the sites. The distance effect can be roughly characterized by the speed of light in fiber, and is roughly given by distance / (0.6 * c) where c is the velocity of light (the ITU in document G.144, table A.1 recommends a multiplier of 0.005 msec/km, or 0.66c). Putting this together with the hop delays, the RTT R is roughly given by:
R = 2 * (distance / (0.6 * c) + hops * delay)
where the factor of 2 is since we are measuring the out and back times for the round-trip. This is illustrated in the figure below, that shows the measured ping response as a function of distance for between 16 pairs of sites located in the U.S., Europe and Japan (Bologna-Florence, Geneva-Lyon, Chicago-U of Notre Dame, Tokyo-Osaka, Hamburg-Dresden, Bologna-Lyon, Geneva-Mainz, Pittsburg-Cincinnatti, Geneva-Copenhagen, Chicago-Austin, Geneva-Lund, Chicago-San Francisco, Chicago-Hamburg, San Francisco-Tokyo, San Francisco-Geneva & Geneva-Osaka). The blue triangles are for the measured RTT (in milliseconds), the black line is a fit to the data, the green line is for y = x(distance) / (0.6 * c), and the red dots fold in the hop delays with a delay/hop of about 2.25ms for each direction (i.e. the red dots are the theoretical RTT fit). We used the How far is it? web page to obtain the "as the crow flies" distances between major points along each route. A more recent measuremet made by Mark Spiller in March 2001, to about 10 universities from UC Berkeley measured router delays of 500-700 usec with a few spikes in the 800-900 usec range.

Route length (Rkm) may be used in the place of distance for some Frame Transfer Delay (FTD) performance objectives. If Dkm is the air-route distance between the boundaries, then the route length is calculated as follows (this is the same calculation as found in the ITU document G.826).
if Dkm < 1000 km, then Rkm = 1.5 * Dkm
if 1000 km <= Dkm <= 1200 km, then Rkm = 1500 km
if Dkm > 1200 km, then Rkm = 1.25 * Dkm
This rule does not apply if there is a satellite in the route. If a satellite is present in any portion of the route, that portion is allocated a fixed FTD of 320 msec. The value of 320 msec takes into account factors such as low earth station viewing angles, and forward error correcting encoding. Most portions that contain a satellite are not expected to exceed 290 msec of delay. If it is a geostationary satellite then the lower limit on geosynchronous orbit is between 22,000 and 23,000 miles, the speed of light is ~ 186,000 miles, figure up and back is 45,000, and round-trip is 90,000 miles, so we get 500ms right there.
The delays at each hop are a function of 3 major components: the speed of the router, the interface clocking rates and the queuing in the router. The former two are constant over short (few days) periods of time. Thus the minimum RTTs give a measure of the distance/(0.6*c) + hops * ((interface speed / packet size) + minimum router forwarding time). This number should be a linear function of packet size. The router queuing effects, on the other hand, are dependent on more random queuing processes and cross-traffic and so are more variable. This is illustrated in the MRTG plot below that shows the very stable minimum RTT (green area) and the more random maximum RTTs (blue lines) measured from SLAC to University of Wisconsin from Sunday 25th February 2001, to Monday April 5th, 2001. The little blip in the RTT around mid-day Tuesday is probably caused by a route change.

Non Ping Based Tools
SLAC is also a Surveyor site. Surveyor makes one way delay measurements (not using ICMP), using Global Positioning System (GPS) devices to synchronize time, and dedicated monitoring/remote hosts. We are working on comparing the PingER and Surveyor data to compare and contrast the two methods and verify the validity of ICMP echo. One concern raised with ICMP echo is the possibility of Internet Service Providers (ISPs) rate limititing ICMP echo and thus giving rise to invalid packet loss measurements, for more on this see the section on Gotchas above.
We also use more complex tools such as FTP (to measure bulk transfer rates) and traceroute (to measure paths and number of hops). However, besides being more difficult to set up and automate, FTP is more intrusive on the network and more dependent on end node loading. Thus we use FTP mainly in a manual mode and to get an idea of how well the ping tests work (e.g. Correlations between FTP and Ping and Correlations between FTP throughput, Hops & Packet Loss). We have also compared PingER predictions of throughput with NetPerf measurements. Another way to correlate throughput measurements with packet loss is by Modeling TCP Throughput.

Calculating the Mean Opinion Score (MOS)
The telecommunications industry uses the Mean Opinion Score (MOS) as a voice quality metric. The values of the MOS are: 1= bad; 2=poor; 3=fair; 4=good; 5=excellent. A typical range for Voice over IP is 3.5 to 4.2 (see In reality, even a perfect connection is impacted by the compression algorithms of the codec, so the highest score most codecs can achieve is in the 4.2 to 4.4 range . For G.711 the best is 4.4 (or an R Factor (see ITU-T Recommendation G.107, "The E-model, a computational model for use in transmission planning.") of 94.3) and for G.729 which performs significant compression it is 4.1 (or an R factor of 84.3).
There are three factors that significantly impact call quality: latency, packet loss, jitter. Other factors include the codec type, the phone (analog vs. digital), the PBX etc.) We show how we calculate jitter later in this tutorial. Most tool-based solutions calculate what is called an "R" value and then apply a formula to convert that to an MOS score. We do the same. This R to MOS calculation is relatively standard (see for example ITU - Telecommunication Standardization Sector Temporary Document XX-E WP 2/12 for a new method). The R value score is from 0 to 100, where a higher number is better. Typical R to MOS values are: R=90-100 => MOS=4.3-5.0 (very satisfied), R=80-90=>MOS=4.0-4.3 (satisfied), R=70-80=>MOS=3.6-4.0 (some disatisfaction), R=60-70=>MOS=3.1-3.6 (more disatisfaction), R=50-60=>MOS=2.6-3.1 (Most disatisfaction), R=0-50=>MOS=1.0-2.6 (not recommended). To convert latency, loss, jitter to MOS we follow Nessoft's method. They use (in pseudo code):

#Take the average round trip latency (in milliseconds), add
#round trip jitter, but double the impact to latency
#then add 10 for protocol latencies (in milliseconds).
EffectiveLatency = ( AverageLatency + Jitter * 2 + 10 )
#Implement a basic curve - deduct 4 for the R value at 160ms of latency
#(round trip). Anything over that gets a much more agressive deduction.

if EffectiveLatency < 160 then
R = 93.2 - (EffectiveLatency / 40)
R = 93.2 - (EffectiveLatency - 120) / 10

#Now, let's deduct 2.5 R values per percentage of packet loss (i.e. a
#loss of 5% will be entered as 5).
R = R - (PacketLoss * 2.5)

#Convert the R into an MOS value.(this is a known formula)
if R < 0 then
MOS = 1
MOS = 1 + (0.035) * R + (.000007) * R * (R-60) * (100-R)

Calculating Network Contribution to Transaction Times
The ITU have come up with a method to calculate the network contribution to transaction time in ITU-T Rec.G1040 "Network contribution to transaction time". The contribution depends on the RTT, loss probability (p), the Retransmission Time Out (RTO) and the number of round trips involved (n) in a transaction. The Network Contribution to Transation Time (NCTT) is given as:

Average(NCTT) = (n * RTT) + (p * n * RTO)

Typical values for n are 8, for RTO we take 2.5 seconds, we take the RTT and loss probability (p) from the PingER measurements.
Deriving TCP throughput from ping measurements
The macroscopic behavior of the TCP congestion avoidance algorithm by Mathis, Semke, Mahdavi & Ott in Computer Communication Review, 27(3), July 1997, provides a short and useful formula for the upper bound on the transfer rate:
Rate < (MSS/RTT)*(1 / sqrt(p))
Rate: is the TCP transfer rate
MSS: is the maximum segment size (fixed for each Internet path, typically 1460 bytes)
RTT: is the round trip time (as measured by TCP)
p: is the packet loss rate.
Strictly speaking the losses are the TCP losses which are not necessarily identical to the ping losses (e.g. standard TCP provokes losses as part of its congestion estimation). Also the ping RTTs are different from the way TCP estimates the RTT (see for example Improving Round-Trip Time Estimates in Reliable Transport Protocol.) However, especially for lower performance links it is a reasonable estimator.
An improved form of the above equation can be found in: Modelling TCP throughput: A simple model and its empirical validation by J. Padhye, V. Firoiu, D. Townsley and J. Kurose, in Proc. SIGCOMM Symp. Communications Architectures and Protocols Aug. 1998, pp. 304-314.

The behavior of throughput as a function of loss and RTT can be seen by looking at Throughput versus RTT and loss. We have used the above formula to compare PingER and NetPerf measurements of throughput.

Normalizing the Derived Throughput
To reduce the effect of the 1/RTT in the Mathis formula for derived throughput, we normalize the throughputs by using
norm_throughput = throughput * min_RTT(remote region) / min_rtt(monitoring_region)

Access to data
The raw ping data is publicly accessible, see Accessing the PingER data for how to get the data and the format. The summarized data is also available from the web in Excel tab-separated-value (.tsv) format from the PingER detail reports.
Pinging of Collaborators' Nodes
Data Analysis & Presentation
Daily Plots
The ping response times are plotted for each half hour for each node, to see some current examples go to Generate Network Performance Graphs. This is mainly used for trouble shooting (e.g. see if it got dramatically worse in the last few hours).
3D Plots of Node vs Response vs Time of Day
By plotting a 3D plot of node versus time versus response we can look for correlations of several nodes having poor performance or being unreachable at the same time (maybe due to a common cause), or a given node having poor response or being unreachable for an extended time. To the left is an example showing several hosts (in black) all being unreachable around 12 noon.

Last 180 Days plots:
Long term graphs showing the response time, packet loss and unreachability for the last 180 days can also indicate whether a service is getting worse (or better).
Monthly Ping Response & Loss Averages going back for years:
Tables of monthly medians of the prime time (7am- 7pm weekday) 1000 byte ping response time and 100 byte ping packet loss allow us to view the data going back for longer periods. This tabular data can be exported to Excel and charts made of the long term ping packet loss performance.

Quiescent Network Frequency
When we get a zero packet loss sample (a sample refers to a set of n pings), we refer to the network as being quiescent (or non-busy). We can then measure the percentage frequency of how often the network was found to be quiescent. A high percentage is an indication of a good (quiescent or non-heavily loaded) network. For example a network that is busy 8 work hours per week day, and quiescent at other times would have a quiescent percentage of about 75% ~ (total_hours/week - 5 weekdays/week * 8 hours/day) / (total_hours/week). This way of representing teh loss is similar in intent to the phone metric of error-free seconds.
The Quiescent Network Frequency table shows the percentage (frequency) of the samples (where a sample is a set of 10 100 byte pings) that measured zero packet loss. The samples included in each percentage reported are all the samples for each site for each month (i.e. of the order of 30 days * 48 (30 min periods) or about 1440 samples) per site/month.
Jitter, see also Jitter,
The short term variability or "jitter" of the response time is very important for real-time applications such as telephony. Web browsing and mail are fairly resistent to jitter, but any kind of streaming media (voice, video, music) is quite suceptible to jitter. Jitter is a symptom that there is congestion, or not enough bandwidt to handle the traffic. The jitter specifies the length of the VoIP codec de-jitter buffer to prevent over- or under-flow. An objective could be to specify that say 95% of packet delay variations should be within the interval [-30msec, +30msec].
The ITU has a Proposed Method for Measuring Packet Delay Variation. This requires injecting packets at regular intervals into the network and measuring the variability in the arrival time. The IETF has IP Packet Delay Variation Metric for IP Performance Metrics (IPPM) (see also RTP: A Transport Protocol for Real-Time Applications and RFC 2679).

We measure the instantaneous variability or "jitter" in two ways.

Let the i-th measurement of the round trip time (RTT) be Ri, then we take the "jitter" as being the Inter Quartile Range (IQR) of the frequency distribution of R. See the SLAC <=> CERN round trip delay for an example of such a distribution.
In the second method we extend the IETF draft on Instantaneous Packet Delay Variation Metric for IPPM, which is a one-way metric, to two-way pings. We take the IQR of the frequency distribution of dR, where dRi=Ri-Ri-1. Note that when calculating dR the packets do not have to be adjacent. See the SLAC <=> CERN two-way instantaneous packet delay variation for an example of such a distribution.
Both of the above distributions can be seen to be non-Gaussian which is why we use the IQR instead of the standard deviation as the measure of "jitter".
By viewing the Ping "jitter" between SLAC and CERN, DESY & FNAL it can be seen that the two methods of calculating jitter track one another well (the first method is labelled IQR and the second labelled IPD in the figure). They vary by two orders of magntitude over the day. The jitter between SLAC & FNAL is much lower than between SLAC and DESY or CERN. It is also noteworthy that CERN has greater jitter during the European daytime while DESY has greater jitter during the U.S. daytime.

We have also obtained a measure of the jitter by taking the absolute value dR, i.e. |dR|. This is sometimes referred to as the "moving range method" (see Statistical Design and Analysis of Experiments, Robert L. Mason, Richard F. Guest and James L. Hess. John Wiley & Sons, 1989). It is also used in RFC 2598 as the definition of jitter (RFC 1889 has another definition of jitter for real time use and calculation) See the Histogram of the moving range for an example. In this figure, the magenta line is the cumulative total, the blue line is an exponentail fit to the data, and the green line is a power series fit to the data. Note that all 3 of the charts in this section on jitter are representations of identical data.

In order to more closely understand the requirements for VoIP and in particular the impacts of applying Quality of Service (QoS) measures, we have set up a VoIP testbed between SLAC and LBNL. A rough schematic is shown to the right. Only the SLAC half circuit is shown in the schematic, the LBNL end is similar. A user can lift the phone connected to the PBX at the SLAC end and call another user on a phone at LBNL via the VoIP Cisco router gateway. The gateway encodes, compresses etc. the voice stream into IP packets (using the G.729 standard) creating roughly 24kbps of traffic. The VoIP stream include both TCP (for signalling) and UDP packets. The connection from the ESnet router to the ATM cloud is a 3.5 Mbps ATM permanent virtual circuit (PVC). With no competing traffic on the link, the call connects and the conversation proceeds normally with good quality. Then we inject 4 Mbps of traffic onto the shared 10 Mbps Ethernet that the VoIP router is connected to. At this stage, the VoIP connection is broken and no further connections can be made. We then used the Edge router's Committed Access Rate (CAR) feature to label the VoIP packets' by setting Per Hop Behavior (PHB) bits. The ESnet router is then set to use the Weighted Fair Queuing (WFQ) feature to expedite the VoIP packets. In this setup voice connections can again be made and the conversation is again of good quality.

Service Predictability
A measure of variability of service (or ping predictability) may be obtained by means of a scatter plot of the dimensionless variables daily average ping data rate / maximum ping data rate versus the daily average ping success / maximum ping success (where % success = (total packets - packets lost) / Total Packets). Here ping data rate is defined as (2 * bytes in ping packet)/response time. The 2 is since the packet has to go out and come back. Another way of looking at the ratios is that numbers near 1 indicate that average performance is close to the best performance. Numbers not close to 1 are typically caused by large variations in ping time between work hours and non-work hours, see for example the UCD ping response for Oct 3, 1996 for an example of the diurnal variations. Some examples of ping predictability scatter plots for various parts of the Internet as measured from SLAC for July 1995 and March 1996 can be seen below. Date All Hosts ESnet N. America International
Jul '95
Mar '96
One can reduce this scatterplot information further by plotting the monthly average ping packet success / maximum ping packet success versus the monthly average ping thruput / maximum ping thruput for different months to see the changes. Such a plot for a few N. American nodes for July 1995 and March 1996 shows big changes, in all cases being for the worse (more recent points are more to the lower left of the plot).
One can also calculate the distance of each predictability point from the coordinate (1,1). We normalize to a maximum value of 1 by dividing the distance by sqrt(2). I refer to this as the ping unpredictability, since it gives a percentage indicator of the unpredictability of the ping performance.
By looking at the ping data to identify 30 minute periods when no ping responses were received from a given host, one can identify when the host was down. Using this information one can calculate ping unreachability= (# periods with Node down / total number of periods), # Down periods, Mean Time Between Failure (MTBF or Mean Time To Failue MTTF)) and Mean Time To Repair (MTTR). Note that MTBF = sample_time/ping_unreachability where for PingER sample time is 30 minutes. The reachability is very dependent on the remote host, for example if the remote host is renamed or removed, the host will appear unreachable yet there may be nothing wrong with the network. Thus before using this data to provide long term network trends the data should be carefully scrubbed for non-network effects. Examples of ping reachability and Down reports are available.
One can also measure the frequency of outage lengths using active probes and noting the time duration for which sequential probes do not get through.

Another metric that is sometimes used to indicate the availability of a phone circuit is Error-free seconds. Some measurements on this can be found in Error free seconds between SLAC, FNAL, CMU and CERN.

There is also an IETF RFC on Measuring Connectivity and a document on A Modern Taxonomy of High Availability which may be useful.

Out of Order packets
PingER uses a very simple algorithm for identifying and reporting out of order packets. For each sample of 10 packets, it looks to see if the sequence numbers of the responses are received in the same order as the requests were sent. If not than that sample is marked as having one or more out of order responses. For a given interval (say a month) the value reported for ou of order is the fraction of samples that were marked with out of order ping responses. Since the ping packets are sent at one second intervals it is expected that the fraction of out of order samples will be very small, and may be worth investigating whenever it is not.
Duplicate Packets
Duplicate ping responses can be caused by:
More than one host has the same IP address, so all these hosts will respond to the ICMP echo request.
The IP address pinged may be a broadcast address.
The host has multiple TCP stacks bound to an Ethernet adaptor (see
A router believes it has two routes by which it can reach the end host and (presumably mistakenly) forwards the ICMP echo requests by both routes, thus the end host sees two echo requests and responds twice.
There maybe two or more (non-routed) paths to the end host and each request is forwarded by more than one path.
A misbehaving NAT box.
Some tests that may help include:
Pinging the routers along the route to see if any of them respond with duplicates.
Capture the ping packets and look to see if all the packets are returned from the same Ethernet address.
Combination of all ping measures
One can put together a plot of all the above ping measures (loss, response, unreachability and unpredictability) to try and show the combination of measurements for a set of hosts for a given time period. The plot below for March 1-11, 1997, groups the hosts into logical groups (ESnet, N. America West, ...) and within the groups ranks the hosts by % 100 byte ping packet loss for SLAC prime time (7am - 7pm weekdays), also shown by a blue line is the prime time ping response time, and the negative of the ping % unreachability and unpredictability.
In the above plot, the loss and response time are measured during SLAC prime time (7am - 7pm, weekdays), the other measures are for all the time.

The loss rates are plotted as a bar graph above the y=0 axis and are for 100 byte payload ping packets. Horizontal lines are indicated at packet losses of 1%, 5% and 12% at the boundaries of the connection qualities defined above.
The response time is plotted as a blue line on a log axis, labelled to the right, and is the round trip time for 1000 byte ping payload packets.
The host unreachability is plotted as a bar graph negatively extending from the y=0 axis. A host is deemed unreachable at a 30 minute interval if it did not respond to any of the 21 pings made at that 30 minute interval.
The host unpredictability is plotted in green here as a negative value, can range from 0 (totally unpredictable) to 1 (highly predictable) and is a measure of the variability of the ping response time and loss during each 24 hour day. It is defined in more detail in Ping Unpredictability.
The following observations are also relevant:
ESnet hosts in general have good packet loss (median 0.79%). The average packet losses for the other groups varies from about 4.5% (N. America East) to 7.7% (International). Typically 25%-35% of the hosts in the non-ESnet groups are in the poor to bad range.
The response time for ESnet hosts averages at about 50ms, for N. America Wes it is about 80ms, for N. America East about 150ms and for International hosts around 200ms.
Most of the unreachable problems are limited to a few hosts mainly in the International group (Dresden, Novosibirsk, Florence).
The unpredictability is most marked for a few International hosts and roughly tracks the packet loss.
In order to be able to summarize the data so the significance can be quickly grasped, we have tried to characterize the quality of performance of the links.
The scarcest and most valuable commodity is time. Studies in late 1970s and early 1980s by Walt Doherty of IBM and others showed the economic value of Rapid Response Time: 0-0.4s High productivity interactive response
0.4-2s Fully interactive regime
2-12s Sporadically interactive regime
12s-600s Break in contact regime
600s Batch regime
For more on the impact of response times see The Psychology of Human-Computer Interaction, Stuart K. Card, Thomas P. Moran and Allen Newell, ISBN 0-89859-243-7, published by Lawrence Erlbaum Associates (1983).
There is a threshold around 4-5s where complaints increase rapidly. For some newer Internet applications there are other thresholds, for example for voice a threshold for one way delay appears at about 150ms (see ITU Recommendation G.114 One-way transmission time, Feb 1996) - below this one can have toll quality calls, and above that point, the delay causes difficulty for people trying to have a conversation and frustration grows.

For keeping time in music, Stanford researchers found that the optimum amount of latency is 11 milliseconds. Below that delay and people tended to speed up. Above that delay and they tend to slow down. After about 50 millisecond (or 70), performances tended to completely fall apart.

For real-time multimedia (H.323) Performance Measurement and Analysis of H.323 Traffic gives one way delay (roughly a factor two to get RTT), of: 0-150ms = Good, 150-300ms=Accceptable, and > 300ms = poor.

For real time haptic control and feedback for medical operations, Stanford researchers (see Shah, A., Harris, D., & Gutierrez, D. (2002). "Performance of Remote Anatomy and Surgical Training Applications Under Varied Network Conditions." World Conference on Educational Multimedia, Hypermedia and Telecommunications 2002(1), 662-667 ) found that a one way delay of <=80msec. was needed.

For the quality characterization we have focussed mainly on the packet losses. Our observations have been that above 4-6% packet loss video conferencing becomes irritating, and non native language speakers become unable to communicate. The occurence of long delays of 4 seconds or more at a frequency of 4-5% or more is also irritating for interactive activities such as telnet and X windows. Above 10-12% packet loss there is an unacceptable level of back to back loss of packets and extremely long timeouts, connections start to get broken, and video conferencing is unusable (also see The issue of useless packet transmission for multimedia over the internet, where they say at page 10 "we conclude that for this video stream, the video quality is unintelligible when packet loss rates exceeds 12%".
Originally the quality levels for packet loss were set at 0-1% = good, 1-5% = acceptable, 5-12% = poor, and greater than 12% = bad. More recently, we have refined the levels to 0-1% = good, 1-2.5% = acceptable, 2.5-5% = poor, 5%-12% = very poor, and greater than 12% = bad. Changing the thresholds reflects changes in our emphasis, i.e. 5 years ago we would primarily be concerned with email and ftp. Today we are also concerned with X-window applications, web performance, and packet video conferencing, and are starting to look at voice over IP. Also changing the thresholds reflects increased understanding of what is going on, this quote from Vern Paxson sums it up; Conventional wisdom among TCP researchers holds that a loss rate of 5% has a significant adverse effect on TCP performance, because it will greatly limit the size of the congestion window and hence the transfer rate, while 3% is often substantially less serious. In other words, the complex behaviour of the Internet results in a significant change when packet loss climbs above 3%.

The Automotive Network eXchange (ANX) sets the threshold for packet loss rate (see ANX / Auto Linx Metrics) to be less than 0.1%.

The ITU TIPHON working group (see General aspects of Quality of Service (QOS), DTR/TIPHON-05001 V1.2.5 (1998-09) technical Report) has also defined < 3% packet loss as being good, > 15% for medium degradation, and 25% for poor degradation, for Internet telephony. It is very hard to give a single value below which packet loss gives satisfactory/acceptable/good quality interactive voice. There are many other variables involved including: delay, jitter, Packet Loss Concealment (PLC), whether the losses are random or bursty, the compression algorithm (heavier compression uses less bandwidth but there is more sensitivity to packet loss since more data is contained/lost in a single packet). See for example Report of 1st ETSI VoIP Speech Quality Test Event, March 21-18, 2001, or Speech Processing, Transmission and Quality Aspects (STQ); Anonymous Test Report from 2nd Speech Quality Test Event 2002 ETSI TR 102 251 v1.1.1 (2003-10) or ETSI 3rd Speech Quality Test Event Summary report, Conversational Speech Quality for VoIP Gateways and IP Telephony.

Jonathan Rosenberg of Lucent Technology and Columbia University in G.729 Error Recovery for Internet Telephony presented at the V.O.N. Conference 9/1997 gave the following table showing the relation between the Mean Opinion Score (MOS) and consecutive packets lost. Consecutive packet losses degrade voice quality Consecutive frames lost 1 2 3 4 5
M.O.S. 4.2 3.2 2.4 2.1 1.7
where: Mean Opinion Score Rating Speech Quality Level of distortion
5 Excellent Imperceptible
4 Good Just perceptible, not annoying
3 Fair Perceptible, slightly annoying
2 Poor Annoying but not objectionale
1 Unsatisfactory Very annoying, objectionable
If the VoIP packets are spaced apart by 20msec then a 10% loss (assuming a random distribution of loss) is equivalent to seeing 2 consecutive frames lost about every 2 seconds, while a 2.5% loss is equivalent to 2 consecutive frames being lost about every 30 seconds.

So we set "Acceptable" packet loss at < 2.5%. The paper Performance Measurement and Analysis of H.323 traffic gives the following for VoIP (H.323): Loss = 0%-0.5% Good, = 0.5%-1.5% Acceptable and > 1.5% = Poor.

The above thresholds assumes a flat random packet loss distribution. However, often the losses come in bursts. In order to quantify consecutive packet loss we have used, among other things, the Conditional Loss Probability (CLP) defined in Characterizing End-to-end Packet Delay and Loss in the Internet by J. Bolot in the Journal of High-Speed Networks, vol 2, no. 3 pp 305-323 December 1993 (it is also available on the web). Basically the CLP is the probablility that if one packet is lost the following packet is also lost. More formally Conditional_loss_probability = Probability(loss(packet n+1)=true | loss(packet n) = true). The causes of such bursts include the convergence time required after a routing change (10s to 100s of seconds), loss and recovery of sync in DSL network (10-20 seconds), and bridge spanning tree re-configurations (~30 seconds). More on the impact of of bursty packet loss can be found in Speech Quality Impact of Random vs. Bursty Packet losses by C. Dvorak, internal ITU-T document. This paper shows that whereas for random loss the drop off in MOS is linear with % packet loss, for bursty losses the fall off is much faster. Also see Packet Loss Burstiness. The drop off in MOS is from 5 to 3.25 for a change in packet loss from 0 to 1% and then it is linear falling off to an MOS of about 2.5 by a loss of 5%.

Other monitoring efforts may choose different thresholds possibly because they are concerned with different applications. MCI's traffic page labels links as green if they have packet loss < 5%, red if > 10% and orange in between. The Internet Weather Report colors < 6% loss as green and > 12% as red, and orange otherwise. So they are both more forgiving than we or or at least have less granularity. Gary Norton in Network World Dec. 2000 (p 40), says "If more than 98% of the packets are delivered, the users should experience only slightly degraded response time, and sessions should not time out".

The figure below shows the frequency distribution for the average monthly packet loss for about 70 sites seen from SLAC between January 1995 and November 1997.

For real time haptic control and feedback for medical operations, Stanford researchers found that loss was not a critical factor and losses up to 10% could be tolerated.

The ITU TIPHON working group (see General aspects of Quality of Service (QoS) DTR/TIPHON-05001 V1.2.5 (1998-09) technical report) defines four categories of network degradation based on one-way jitter. These are: Levels of network degradation based on jitter Degradation category Peak jitter
Perfect 0 msec.
Good 75 msec.
Medium 125 msec.
Poor 225 msec.
We are investigating how to relate the one-way jitter thresholds to the ping (round-trip or two-way) jitter measurements. We used the Surveyor one-way delay measurements (see below) and measured the IQRs of the one-way delay (ja=>b and jb=>a, where the subscript a=>b indicates the monitoring node is at a and is monitoring a remote node at b) and inter packet delay difference (Ja=>b and Jb=>a). Then we add the two one way delays for equivalent time stamps together and derive the IQRs for the round trip delay (ja<=>b) and inter packet delay difference (Ja<=>b). Viewing a Comparison of one and two way jitter one can see that the distributions are not Gaussianly distributed (being sharper and yet with wider tails), the jitter measured in one direction can be very different from that measured in the other direction and that for this case the above approximation for the round trip IQR works quite well (within two percent agreement).
Web browsing and mail are fairly resistent to jitter, but any kind of streaming media (voice, video, music) is quite suceptible to Jitter. Jitter is a symptom that there is congestion, or not enough bandwidth to handle the traffic.

The jitter specifies the length of the VoIP codec playout buffers to prevent over- or under-flow. An objective could be to specify that say 95% of packet delay variations should be within the interval [-30msec, +30msec].

For real-time multimedia (H.323) Performance Measurement and Analysis of H.323 Traffic gives for one way: jitter = 0-20ms = Good, jitter = 20-50ms = acceptable, > 50ms = poor. We measure round-trip jitter which is roughly two times the one way jitter.

For real time haptic control and feedback for medical operations, Stanford researchers found that jitter was critical and jitters of < 1msec were needed.

Link utilization can be read out from routers via SNMP MIBs (assuming one has the authorization to read out such information). "At around 90% utilization a typical network will discard 2% of the packets, but this varies. Low-bandwidth links have less breadth to handle bursts, frequently discarding packets at only 80% utilization... A complete network health check should measure link capacity weekly. Here's a suggested color code:
RED: Packet discard > 2 %, deploy no new application.
AMBER: Utilization > 60%. Consider a network upgrade.
GREEN: Utilization < 60%. Acceptable for new application deployment."
High-speed slowdown, Gary Norton, Network Magazine, Dec. 2000. The above does not indicate over what period the utilization is measured. Elsewhere in Norton's article he says "Network capacity ... is calculated as an average of the busiets hours over 5 business days".
"Queuing theory suggests that the variation in round trip time, o, varies proportional to 1/(1-L) where L is the current network load, 0<=L<=1. If an Internet is running at 50% capacity, we expect the round trip delay to vary by a factor of +-2o, or 4. When the load reaches 80%, we expect a variation of 10." Internetworking with TCP/IP, Principle, Protocols and Architecture, Douglas Comer, Prentice Hall. This suggests one may be able to get a measure of the utilization by looking at the variability in the RTT. We have not validated this suggestion at this time.

The Bellcore Generic Requirement 929 (GR-929-CORE Reliability and Quality Measurements for Telecommunications Systems (RQMS) (Wireline), is actively used by suppliers and service providers as the basis of supplier reporting of quarterly performance measured against objectives. Each year, following publication of the most recent issue of GR-929-CORE, such revised performance objectives are implemented) indicates that the the core of the phone network aims for a 99.999% availability, That translates to less than 5.3 minutes of downtime per year. As written the measurement does not include outages of less than 30 seconds. This is aimed at the current PSTN digital switches (such as Electronic Switching System 5 (5ESS) and the Nortel DMS250), using todays voice-over-ATM technology. A public switching system is required to limit the total outage time during a 40-year period to less than two hours, or less than three minutes per year, a number equivalent to an availability of 99.99943%. With the convergence of data and voice, this means that data networks that will carry multiple services including voice must start from similar or better availability or end-users will be annoyed and frustrated.

Levels of availability are often cast into a Service Level Agreements. The table below (based on Cahners In-Stat survey of a sample of Application Service Providers (ASPs)) shows the levels of availability offered by ASPs and the levels chosen by customers.
Levels offered Chosen by customer
Less than 99% 26% 19%
99% availability 39% 24%
99.9% availability 24% 15%
99.99% availability 15% 5%
99.999% availability 18% 5%
More than 99.999% availability 13% 15%
Don't know 13% 18%
Weighted average of availability offered 99.5% 99.4%
For more information on availabilty etc. see: the Cisco White Paper on Always-On Availability for Multiservice Carrier Networks for how Cisco is striving for high availability on data networks; the NIST document on A Conceptual Framework for System Fault Tolerance (for more on the distinction between reliability and availability); A Modern Taxonomy of High Availability; and the IETF document RFC 2498: IPPM Metrics for Measuring Connectivity.

As the number of host pairs being monitored increased it becomes increasingly necessary to aggregate the data into groups representing areas of interest. We have found the following grouping categories to be useful:
by area (e.g. N. America, W. Europe, Japan, Asia, country, top level domain);
by host pair separation (e.g. trans-oceanic links, intercontinental links, Internet eXchange Points);
Network Service Provider backbone that the remote site is connected to (e.g. ESnet, vBNS/Internet 2, TEN-34 CALREN2...);
common interest affiliation (e.g. XIWT, HENP, experiment collaboration such as BaBar, the European or DOE National Laboratories, ESnet program interests, Surveyor monitoring sites)
by monitoring site;
one remote site seen from many monitoring sites. We need to be able to select thr groupings by monitoring sites and by remote sites. Also we need the capability to includes all members of a group, to join groups and to exclude members of a group.
Some examples of how many ~1100 PingER monitor-host remote-site pairs were in the global area groups and the affinity groups can be found in PingER pair grouping distributions.
At the same time it is critical to choose the remote sites and host-pairs carefully so that they are representative of the information one is hoping to find out. We have therefore selected a set of about 50 "Beacon Sites" which are monitored by all monitoring sites and which are representative of the various affinity groups we are interested in. An example of a graph showing ping response times for groups of sites is seen below:

The percentages shown to the right of the legend of the packet lost chart are the improvements (reduction in packet loss) per month for exponential trend line fits to the packet loss data. Note that a 5%/month improvement is equivalent to 44%/year improvement (e.g. a 10% loss would drop to 5.6% in a year).

One Way Measurements
SLAC is also collaborating in the Surveyor project to make one way delay and loss measurements between Surveyor sites. Each Surveyor site has a measurement point consisting of an Internet connected computer with a GPS receiver. This allows accurate synchronized time stamping of packets which enables the one way delay measurements. More details can be found in Presentations by the Surveyor staff and sample results. The delay estimates generated are more detailed than PingER's and clarify the asymmetry on Internet paths in the two directions. For more on comparing the two methods see the Comparison of PingER and Surveyor.
RIPE also has a Test Traffic project to make independent measurements of connectivity parameters, such as delays and routing-vectors in the Internet. A RIPE host is installed at SLAC.

The NLANR Active Measurement Program (AMP) for HPC awardees is intended to improve the understanding of how high performance networks perform as seen by participating sites and users, and to help in problem diagnosis for both the network's users and its providers. They install a rack mountable FreeBSD machine at sites and make full mesh active ping measurements between their machines, with the pings being launched at about 1 minute intervals. An AMP machine is installed at SLAC.

More detailed comparisons of Surveyor, RIPE, PingER and AMP can be found at Comparison of some Internet Active End-to-end Performance Measurement projects.

SLAC is also a NIMI (National Internet Measurement Infrastructure) site. This project may be regarded as complementary to the Surveyor project, in that it (NIMI) is more focussed on providing the infrastructure to support many measurement methodologies such as one way pings, TReno, traceroute, PingER etc.

Waikato University in New Zealand is also deploying Linux hosts each with a GPS receiver and making one way delay measurements. For more on this see Waikato's Delay Findings page. Unlike the AMP, RIPE and Surveyor projects, the Waikato project makes passive measurements, of the normal traffic between existing pairs, using CRC based packet signatures to identify packets recorded at the 2 ends.

The sting TCP-based network measurement tool is able to actively measure the packet loss in both the forward and reverse paths between pairs of hosts. It has the advantage of not requiring a GPS, and not being subject to ICMP rate limiting or blocking (according to an ISI study ~61% of the hosts in the Internet do not repond to pings), however it does require a small kernel modification.

If the one way delay (D) is known for both directions of an Internet pair of nodes (a,b), then the round trip delay R can be calculated as follows:
R = Da=>b + Db=>a
where Da=>b is the one way delay measured from node a to node b and vice versa.
The two way packet loss P can be derived from the one way losses (p) as follows:
P = pa=>b + pb=>a - pa=>b * pb=>a
where pa=>b is the one way packet loss from node a to b and vice versa.

There are some IETF RFCs related to measuring one way delay and loss as well as a round trip delay metric.
Another very powerful tool for diagnosing network problems is traceroute. This enables one to find the number of hops to a remote site and how well the route is working.
John MacAllister at Oxford has developed Traceping Route Monitoring Statistics based on the standard traceroute and ping utilities. Statistics are gathered at regular intervals for 24-hour periods and provide information on routing configuration, route quality and route stability. We (the PingER project) are in the process of extending traceping and installing it at major sites, in particular the PingER monitoring sites (see Interest in Traceping for more information).

TRIUMF also has a very nice Traceroute Map tool that shows a map of the routes from TRIUMF to many other sites. We are looking at providing a simplification of such maps to use the Autonomous Systems (AS) passed through rather than the routers.

One can also plot the FTP throughput versus traceroute hop count as well as ping response and packet loss to look for correlations.

Many sites are appearing that run traceroute servers (the source code (in Perl) is available) which help in debugging and in understanding the topology of the Internet, see for example BNL's Path Check utility.

Some sites provide access to network utilities such as nslookup to allow one to find out more about a particular node. A couple of examples are SLAC and TRIUMF.

Impact of Routing on Internet End-to-end Performance
DSCP Differentiated Services CodePoint. The Differentiated Services CodePoint is 6 bits in the IP header field that are used the select the per-hop-behaviour of a packet. The 6 bits for DSCP and 2 unused bits are intended to supersede the existing definitions of the IPv4 TOS octet, see RFC 2474 for more details.
MTU Maximum Transfer Unit. The maximum transfer unit is the largest size of IP datagram which may be transferred using a specific data link connection.
MSS Maximum Segment Size. MTU-40.
QBSS QBone Scavenger Services. QBone Scavenger Services is an additional class of best-effort service. A small amount of network capacity is allocated (in a non-rigid way) for this service; when the default best-effort capacity is underutilized, QBSS can expand to consume unused capacity.
Receive Window (Rwin) The size of the TCP buffer, the number of packets your machine will send out without receiving an ACK.
Further Information
SLAC Talks and Papers on Network Monitoring.
Research Network User's Guide to Performance: Knowledge Base from the Geant2 Performance Enhancement Response Team.
Considerations for the Specifications of IP Performance Objectives concludes that: IP packet size is a major determinant for loss performance; loss is not dependent on T3 vs. T1 trunk capacity; packet delay variation and delay are controlled by T3 vs. T1 trunk capacity; the access line rate has little effect on IP packet delay or IP packet delay variation.
NLANR (National Laboratory for Applied Network Research)
Network Measurement FAQ from the CAIDA Metrics Working Group.
Network management Tutorial.
Network Monitoring Tools Used at ESnet Sites
ICFA-NTF Monitoring Working Group Report, May-98 provides a lot of useful information on the results of end-to-end Internet monitoring.
References on Network Management.
Requirements of Network Monitoring for the Grid from the EuroGrid WP7 provides an interesting overview of methods for monitoring etc.
Network Tools for Users from Terena, provides a nice introduction to how the Internet works, and how to use some network tools (like ping and traceroute) to diagnose problems.

Wednesday, January 16, 2008

OpenVPN over 7050

I spoke too soon in a recent thread, when i thought I had openvpn working properly. In fact it seems that it is working inside my LAN but not from outside. I cannot reach it from a static IP to the router.

my internal lan is all on the subnet. I'm trying to establish the VON in bridge mode with the following topology:
vpn gateway:
dhcp server: (also serves DNS and WINS)
the dhcp will lease addresses to vpn clients.
all this works if i try to connect to the gateway inside the network (or, in fact). it doesn't work outside.
i have set an ipforwarding rule in the ar7.cfg file to point all incoming traffic on port 1194 to, which should do the trick, but seemingly not.
my server conf, debug.cfg are
proto tcp-server
port 1194
dev tap0
dev-node /dev/misc/net/tun
mode server
ca ca.crt
cert server.crt
key server.key
dh dh1024.pem
keepalive 10 60
verb 3
cd /var/tmp
mkdir vpn
cd vpn
chmod 0600 ./*
chmod +x ./openvpn
chmod +x ./brctl
/var/tmp/vpn/openvpn --config /var/tmp/vpn/fritz.conf
/var/tmp/vpn/brctl addif lan tap0
an ifconfig gives the following output
dsl Link encap:Point-Point Protocol
inet addr: P-t-P: Mask:
RX packets:1724 errors:0 dropped:0 overruns:0 frame:0
TX packets:2078 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:278391 (271.8 KiB) TX bytes:430953 (420.8 KiB)

eth0 Link encap:Ethernet HWaddr 00:15:0C:97:46:1E
RX packets:4078 errors:0 dropped:0 overruns:0 frame:0
TX packets:3054 errors:0 dropped:0 overruns:0 carrier:0
collisions:1367 txqueuelen:100
RX bytes:2603889 (2.4 MiB) TX bytes:505053 (493.2 KiB)
eth1 Link encap:Ethernet HWaddr 00:15:0C:97:46:1F
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
Base address:0x2800
lan Link encap:Ethernet HWaddr 00:15:0C:97:46:1E
inet addr: Bcast: Mask:
RX packets:4098 errors:0 dropped:0 overruns:0 frame:0
TX packets:3029 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:2555254 (2.4 MiB) TX bytes:496047 (484.4 KiB)
lan:0 Link encap:Ethernet HWaddr 00:15:0C:97:46:1E
inet addr: Bcast: Mask:
lo Link encap:Local Loopback
inet addr: Mask:
RX packets:223 errors:0 dropped:0 overruns:0 frame:0
TX packets:223 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:30170 (29.4 KiB) TX bytes:30170 (29.4 KiB)
tap0 Link encap:Ethernet HWaddr 00:FF:F6:AD:A2:76
inet addr: Bcast: Mask:
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:104 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:0 (0.0 B) TX bytes:27006 (26.3 KiB)
usbrndis Link encap:Ethernet HWaddr 00:15:0C:97:46:22
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
the addition to ar7.cfg was
"tcp 0 #openvpn",
do i, perhaps, need to explicitly bridge the tap0 interface? or what else am i doing wrong? all help gratefully received.

Tuesday, January 15, 2008

Modem Phone Tutorial

misal kalau udah install driver kabel data dku5 nya di COM7, ikuti langkah berikut :
1. start -> control panel -> phone and modem option
2. pilih tab modem -> add
3 . ikuti langkah2nya, tandai "don't detect my modem; i will select it from a list -> klik Next
4 . di Install new modem, pilih manufacture: Nokia, terus pilih handphone sesuai yang dimiliki.. misal kamu kan punya N6100 , pilih N6100 (Cable) *kalau ada driver coba klik "Have Disk" kemudian Browse kemudian pilih sesuai lokasi driver modem yang kamu miliki.
5 . kalau udah keinstal.. pilih COM nya... COM7 mungkin tidak tersedia, pilih sembarang COM dulu. kalau ada notifikasi, pilih "Continue Anyway"
6 . kalau udah ada Nokia 6100 (Cable) di modem list ikuti langkah berikut
6.1. close semua windows
6.2. start -> control panel -> phone and modem
6.3. pilih tab modem
6.4 . klik kanan "Nokia 6100 (Cable)" tadi, pilih duplicate
6.5 . nanti ada pilihan COM, pilih "selected port/com" pilih COM7
6.6. kalau ada notifikasi, pilih continue anyway
6.7 . nanti bertambah "Nokia 6100 (Cable) #2" di list modem nya
6.8. close semua windows
7 . start -> control panel -> phone and modem -> pilih tab modem -> klik kanan "Nokia 6100 (Cable) #2" -> kemudian pilih properties
8.pilih tab advance di properties modem tadi, terus masukkan
AT+CGDCONT=,,"" (khusus GSM) / kalau CDMA dikosongin aja
9. close semua windows
10.modem sudah terinstal
Next ~>

setelah modem terinstall, sekarang tinggal membuat connection nya, ikuti langkah berikut
1. start -> control panel -> network connection
2. create new connection -> next -> pilih connect to internet kemudian next
3. pilih "set up my connection manually" kemudian next
4. pilih "connect using dial-up modem", kemudian next
5 . select a drive, tandai "Modem - Nokia 6100 (Cable) # 2 (COM7)
6. ISP Name, terserah... misal kamu buat "Kabel Data" aja, kemudian next
7 . phone number, isi: #777 /*99 ***1# kemudian
8 . username: m8 (untuk fren) atau xlgprs (untuk ProXL), password: m8 /proxl, confirm password: m8
hilangin semua tanda ceck list, kemudian next
9. finish
10. nanti ada permintaan koneksi, isi dengan:
- username: m8/xlgprs
- password: m8/proxl
- tandai "save this username and password for following users:"
- pilih "anyone who uses this computer"
11. Dial

cyber addict

Monday, January 14, 2008

Enabling Client Certificates in IIS 6.0 (IIS 6.0)

You can require users attempting to access your Web site to log on with a client certificate. Requiring a client certificate is just one aspect of protecting your server against unauthorized access. Any user with a valid and trusted client certificate can establish a secure connection and access your resource. To protect your Web content from unauthorized access you must do one of the following:

• Use Basic, Digest, or Integrated Windows authentication, in addition to requiring a client certificate.
• Create a Windows account mapping for client certificates. For more information, see Mapping Client Certificates to User Accounts in IIS 6.0.
You must be a member of the Administrators group on the local computer to perform the following procedure or procedures, or you must have been delegated the appropriate authority. As a security best practice, log on to your computer by using an account that is not in the Administrators group, and then use the runas command to run IIS Manager as an administrator. At a command prompt, type runas /User:Administrative_AccountName "mmc systemroot\system32\inetsrv\iis.msc". For information about delegating administrative authority, see "Delegating administration" in Help and Support Center for Windows Server 2003.
To enable client certificates
1. In IIS Manager, double-click the local computer, and then right-click the Web site, directory, or file that you want and click Properties.
2. If you have not previously obtained a server certificate, click the Directory Security tab, and then under Secure Communications, click Server Certificate. For more information, see Obtaining Server Certificates.
3. If you have previously obtained a server certificate, click the Directory Security or File Security tab, and then under Secure Communications, click Edit.
4. In the Secure Communications box, select the Require secure channel (SSL) check box. Requiring a secure channel means that users cannot connect to this site without using a secure link (that is, the link's URL must begin with https://).
5. Under Client certificates select one of the following to enable client Certificate authentication:
• Accept client certificates Users can access the resource with a client certificate, but the certificate is not required.
• Require client certificates The server will request a client certificate before connecting the user to the resource. Users without a valid client certificate will be denied access.
• Ignore client certificates Users with or without a client certificate will be granted access.
Related Information
• For information about obtaining client certificates, see Obtaining Client Certificates in IIS 6.0.
• For information about backing up client certificates, see Backing Up Client Certificates in IIS 6.0.

A Tutorial on Geometric Programming

S. Boyd, S.-J. Kim, L. Vandenberghe, and A. Hassibi
Optimization and Engineering, 8(1):67-127, 2007
(software for generalized geometric programming)
A geometric program (GP) is a type of mathematical optimization problem characterized by objective and constraint functions that have a special form. Recently developed solution methods can solve even large-scale GPs extremely efficiently and reliably; at the same time a number of practical problems,

particularly in circuit design, have been found to be equivalent to (or well approximated by) GPs. Putting these two together, we get effective solutions for the practical problems. The basic approach in GP modeling is to attempt to express a practical problem, such as an engineering analysis or design problem, in GP format. In the best case, this formulation is exact; when this isn’t possible, we settle for an approximate formulation. This tutorial paper collects together in one place the basic background material needed to do GP modeling. We start with the basic definitions and facts, and some methods used to transform problems into GP format. We show how to recognize functions and problems compatible with GP, and how to approximate functions or data in a form compatible with GP (when this is possible). We give some simple and representative examples, and also describe some common extensions of GP, along with methods for solving (or approximately solving) them.

1 The GP modeling approach 4
2 Basic geometric programming 5
2.1 Monomial and posynomial functions . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Standard form geometric program . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Simple extensions of GP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.5 How GPs are solved . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 Feasibility, trade-o®, and sensitivity analysis 10
3.1 Feasibility analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Trade-o® analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Sensitivity analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4 GP examples 15
4.1 Power control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2 Optimal doping pro¯le . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5 Generalized geometric programming 18
5.1 Fractional powers of posynomials . . . . . . . . . . . . . . . . . . . . . . . . 19
5.2 Maximum of posynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.3 Generalized posynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.4 Generalized geometric program . . . . . . . . . . . . . . . . . . . . . . . . . 22
6 GGP examples 23
6.1 Floor planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
6.2 Digital circuit gate sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6.3 Truss design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.4 Wire sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
7 More transformations 34
7.1 Function composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
7.2 Additive log terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
7.3 Mixed linear geometric programming . . . . . . . . . . . . . . . . . . . . . . 36
7.4 Generalized posynomial equality constraints . . . . . . . . . . . . . . . . . . 37
8 Approximation and ¯tting 39
8.1 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
8.2 Local monomial approximation . . . . . . . . . . . . . . . . . . . . . . . . . 42
8.3 Monomial ¯tting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
8.4 Max-monomial ¯tting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
8.5 Posynomial ¯tting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
9 Extensions 52
9.1 Signomial programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
9.2 Mixed-integer geometric programming . . . . . . . . . . . . . . . . . . . . . 54
10 Notes and references

Sunday, January 13, 2008

Speed Up Your Site! 8 ASP.NET Performance Tips

Now that you've added the finishing touches to your web site and unleashed it onto the world, fame, fortune, and success will surely follow -- won't it?

Unfortunately, your web application's success can lead to something less desirable -- performance and scaling problems. On a traditional desktop application, one thousand users translate to one thousand client computers chugging away, sharing the load of running your application. The application is effectively spread among all the users' machines. When it comes to a web application, though, those same thousand users are usually serviced by a single machine -- your web server.

Success can come at a cost for web applications: a cost in bandwidth and server hardware. However, there are a few clever ways you can reduce -- and sometimes eliminate -- these problems. We'll take a look at some of the different approaches to improving the performance of an ASP.NET site in this chapter, which has been extracted from The ASP.NET 2.0 Anthology, 101 Essential Tips, Tricks & Hacks. Feel free to download this chapter -- along with three others -- for offline reference.

How do I determine what to optimize?
You want your web application to be the best, right? Like all of us, by "best" you mean "fastest." And what better way to create a blazingly fast application than to optimize everything? Optimize, optimize, optimize -- right? Wrong.

Premature optimization refers to the fixing of a performance problem before you understand the problem, or before there even is a problem, and it's a bad idea.
That's not to say that you should write sloppy, inefficient code. My point is that you should trust the ASP.NET Framework, and make use of the features that make it such a terrific environment in which to develop, until you hit a problem. Once you hit a problem, you should take the time to understand what that problem is, and only then should you start to look at how best to address it. Dr. Joseph M. Newcomer's essay, "Optimization: Your Worst Enemy," gives a fascinating overview of the perils of optimizing in the dark.
The tips in this chapter propose fairly lightweight solutions for some common performance issues. I've steered away from dramatic changes, because I don't want you to end up doubling your development or maintenance time in order to shave a meagre 2% off your page load time. While it is possible to bypass the in-built ASP.NET Page mechanism completely (by using Response.Write, or building an ASHX-based sites), I'm not a fan of these approaches. The ASP.NET system as a whole has been tuned and improved for more than half a decade, and it's reasonably fast straight out of the box. There's a good chance that trying to improve on it will result in a slower and less maintainable web site.
So, with all that out of the way, let's assume that some of your pages are running slowly. How do you figure out what to fix?
Isolate the bottleneck in your site by measuring the actual speed of the site. This exercise can be performed using logs, database profiling, and tracing.
We've discussed the task of logging using log4net in Chapter 13, Handling Errors. If you suspect that the database is the cause of the slowdown (for example, you know that your application makes use of some large queries), you can either step through the page in debug mode to see whether the database calls are taking a long time to return, or you can use the SQL Server Profiler discussed in the section called "How do I speed up my database queries?" later in this chapter. For a very basic analysis of what's going on in your pages, you can use the ASP.NET trace; while it's not nearly as good as a full-featured logging system, it's easy to implement and will provide you with plenty of timing information.
The first step in using the trace is to get into the habit of adding trace statements. Write to the trace whenever you think there's something you might want to see when you debug future problems. Tracing doesn't have any real performance impact on your site until it's enabled in the Web.config, and when you need to troubleshoot a problem, you'll be glad it's there.
There's more information on tracing in Chapter 13, Handling Errors, but the general idea is that you can write to the Trace object like this:
Trace.Write("Here's a trace message.");
Tracing is disabled by default; when you want your Trace.Write statements to actually do something, you'll need to turn tracing on in the Web.config file, as follows:
Example 15.1. Web.config (excerpt) xml version="1.0"?>
In this solution, we'll look at a sample page that performs a few different actions -- it makes a call to a web service, retrieves some data from a database, and throws an exception. Each function that we'll use writes a trace message when it begins and ends, via a straightforward utility method called writeTrace. However, it has one slightly complex aspect -- it uses the System.Diagnostics object to figure out the method name, so we don't have to pass it in. The code for our sample page is as follows:
Example 15.2. Trace.aspx.cs (excerpt) using System; using System.Web; public partial class _Default : System.Web.UI.Page { protected void Page_Load(object sender, EventArgs e) { hitAWebservice(); getSomeData(); doSomeProcessing(); breakSomething(); displayTheResults(); } private void getSomeData() { writeTrace(true); simulateWaiting(8000); writeTrace(false); } private void hitAWebservice() { writeTrace(true); Trace.Write("A message to demonstrate tracing."); simulateWaiting(2000); writeTrace(false); } private void doSomeProcessing() { writeTrace(true); simulateWaiting(1000); writeTrace(false); } private void displayTheResults() { writeTrace(true); simulateWaiting(500); writeTrace(false); } private void breakSomething() { writeTrace(true); try { int superBig = int.MaxValue; superBig += 1; } catch (Exception ex) { Trace.Warn("Exception", "Oops", ex); } } private void writeTrace(bool enteringFunction) { if (!Trace.IsEnabled) return; string callingFunctionName = "Undetermined method"; string action = enteringFunction ? "Entering" : "Exiting"; try { //Determine the name of the calling function. System.Diagnostics.StackTrace stackTrace = new System.Diagnostics.StackTrace(); callingFunctionName = stackTrace.GetFrame(1).GetMethod().Name; } catch { } Trace.Write(action, callingFunctionName); } /// /// Wait a bit. /// /// Time in milliseconds to wait. private void simulateWaiting(int waitTime) { System.Threading.Thread.Sleep(waitTime); } }
Right, we've got our trace statements in place. Now, let's assume that this page is taking abnormally long to load, and we'd like to get to the bottom of the problem. With tracing enabled, we can simply load the page, then browse to Trace.axd within our web site; it's at http://localhost:1209/MySite/Trace.axd.
Right away, we can see which aspect of our page load is taking the majority of time -- getSomeData takes eight seconds to execute. Without this information, we might have assumed the web service call was at fault and spent valuable time solving the wrong problem. This example shows how, armed with some real information, we can begin to fix the right problem.
By Jeff Atwood and Jon Galloway