North American Network Operators Group|
Date Prev | Date Next |
Date Index |
Thread Index |
Author Index |
2006.06.05 NANOG-NOTES BGP tools BOF notes
- From: Matthew Petach
- Date: Tue Jun 06 07:19:15 2006
- Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:sender:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition:x-google-sender-auth; b=h/PgaHeBmNiy8hFa452pCLmBgBaddgACmB9lPkZDDdhzAF8C7GLcIm7tMtdrZyfS8Y9aWH/DjI0eTDvCGwsN2mWSq9ORl/dSnn7vLn0FjTaif/G3EB/+vEt00MTVTG10s7k3dcMcWXSfgeBso2qZblVgFMx7orWfrLTQ4J82yWo=
(ok, last set of notes for tonight, and then it's off to bed for 90
minutes of sleep
before heading back to the convention center. ^_^; --MNP)
2006.06.05 Welcome to the 4th BGP Tools BOF!
[slides are at
Nick Feamster GeorgeTech
Dan Massey CUS
Mohit Lad and Lixia Zhang, UCLA
sharing some tools develop from our research
hopefully will be useful for operations community.
Also to collect input on new tools we would like
to see so they can develop them.
Routing Configuration Checker
O-BGP data organization tool
[slides are at
The Datapository by Nick Feamster
[I'm sorry, that just sounds *far* too much like something
you do *NOT* want your bedside nurse administering...--MNP]
Visualizing BGP dynamics using Link-Rank by
Open discussions and demos
Network Troubleshooting: rcc and beyond
rcc: router configuration checker
proactive routing configuration analysis
idea: analyze configs before deployment
many faults can be detected with static analysis.
preprocessor -> parser -> relational database (mySQL),
constraints <-> verifier <-> faults
verifier is a template checker and set of constraints
your configs are checked against.
He's looking for GUI developers.
very bare-bones command line right now.
Parsing configurations--shows some output.
He shows examples of the abilene configs, which
are non anonymized.
show all routers peering with a given AS, can look
at route maps in each direction, etc.
After running rcc on it, you get a web output
which shows relationships--oh, pictures don't matter,
with some more grease could be a reasonable representation
of your network.
Q: Randy Bush asks if it could show which peering
sessions are missing?
A: Not yet, but it could be added, thank you!
Shows processing and errors;
you get a page that summarizes the things RCC thinks
Signalling partition? that's a missing iBGP session;
he needs some better lingo in places.
Also shows anomalous imports, could be intended for
traffic engineering; that's "inconsistent policy"
in ISP speak.
Some of the names will get fixed to make Randy Bush
Yes, but surprises happen!
traffic volumes shift
network devices "wedged"
Need to marry static config analysis with dynamic
information (route is configured but isn't in the
he skips a closer look, just some jargon.
Detection: analyze routing dynamics;
drill down on interesting operational issues.
idea: routers exhibit correlated behaviour
blips across signals may be more operationally
interesting than any spike in one signalling system.
How do you spot things in the churn?
Detection three types of events
multi-router bursts <---common; and commonly missed
using simple thresholds
Localization: joint dynamic/static
which routers are "border routers" for that burst
topological properties of routers in the burst.
proactive analysis -> deployment -> dynamic ->
reactive detection -> diagnosis/correction -> static ->
By going back to the configs, lets you see if it's
something happening inside the network, or on the edge.
Specific Focus: firewall configuration
difficult to understand and audit configs
subject to continual modifications
roughly 1-2 touches per day
federated policy, distributed dependencies
each department has independent policies
local changes may affect global behaviour
(These are pulled from Georgia Tech; 130 firewall
configs. Builds static connectivity matrix.)
Reactive monitoring...use probes from subnets to
(immediate) open issues
reachability and reliability of controller
diagnostic tools != service-level happiness
Q: can it give suggested remediation, or provide
config templates for new routers being added?
A: Good idea!
OK, over to next presenter. Helps with understanding
BGP data collection and organization (OBGP) Tool
Colorado state university/university of Arizona/UCLA
BGP data collection
takes lots of BGP data, from RIPE RIS, etc.
ISP BGP peer router -> update oreg -> rib+update ->
feeds into gigabytes of data, different formats,
potential errors enter in, and severe lack of metadata.
Other tools can use it, LinkRank, BGP-Inspect, and a
bunch of people cite it in reports and research.
Large Volume of Data
data from many sources (RIPE, RV, private data)
Long time scales and very recent (real-time?) data
Slightly different formats
RIPE/RV use different naming conventions
different dump intervals
different timezones for older data
Lack of MetaData
would like to only see desired peers and desired update
Possible errors in the data
are updates missing due to log errors?
what is lost due to session failures?
So, OBGP is the "thing" in the middle.
A simple perl script called oBGP that simplifies
Uniform data organization
consistent and easy to use for scripts
consistent view of multiple monitoring points
can be stripped, help locate useful data easily
table transfer detection
distinguish updates from data collection peering
Data inconsistency detection and correction
understand and fix possible data errors
Uniform data organization
Uniform naming and organization conventions for all
RIB and update data split by peer
One rib and update file per peer per day,
dumped at beginning of the day.
Labels and Annotations are more interesting
Existing format labels update as
announce (A) or Withdraw (W)
also includes some STATE messages
OBGP enhances the labels
Adds a status message
Adds an update type
More STATE messages
route table dump
(shows it's an announcement, it's incremental, and
it's updating the destination path
OBGP Added labels
|<original update type:<status info>:<OBGP udate type>|
<orginal update type>
add E for error correction
INC incremental update
TT table transfer update
RIB: correction update
<OBGP update type>
change in AS path (DPATH)
change in other attribute (not ASpath)
If you don't need this, it's just a few extra characters
in your log; but could be useful.
Using Labels to filter data
example: find suballocation hijacks
Only need new announcements and withdraws
so 83% of the update data can be ignored.
Is the collected data accurate?
May lose updates due to data collection errors
start with an accurate RIB
apply updates in log
should match the next RIB dumped by the router
modulo some race conditions near dump time
does this clearly work with RouteViews?
85 of 111 peers from RV suffered inconsistencies in
About 25 were rock solid right on.
One peer had 378,998 inconsistencies in one day.
Is this evenly distributed? Not really.
Inconsistencies and session failures
session down: RIB-IN drops to empty
session up: table transfer
(failure to recognize a session dropping)
look for table transfer, can estimate where
sessions went down and came back up.
How long does an error persist?
Lifetime of correction updates can last 43 days!
If you miss an update, you can have bad data for
a long, LONG time!!
Correction updates added by OBGP
E:RIB updates; figure a change in RIB had to happen
due to a routing update that was missed.
adds label to easily sort and limit
adds additional state messages
identifies and corrects update error messages
[NOTE URL at end of slide deck is WRONG --MNP]
If you're using RouteViews or RIPE RIS, consider
using this tool, and give feedback!
Randy is using it to check propagation of his
prefixes, and for research.
RIPE NCC--performance of these tools? With
multiple collectors, perl didn't scale.
Perl is mainly demonstration.
He pulls data from RIPE and has it stored, hopes
to make it public some day.
he has stacks of disks with text format data for
easy search; considering binary format for it.
Randy--on that subject: Matt Rowan, he's spent
half his life getting the data out of the system;
make it in funny format and sticking it back in,
Disk is cheap! Look at raw data. With binary data,
what tools are there? Hard enough to look at router
One tool to look at binary data, lots of tools to
look at text.
Q: Matt asks how much space it takes to store data
A: Takes about 1TB to store all the RIS data.
Q: Are they planning to make it available to the public?
A: Well, he'd like to host it at route-views or ripe,
rather than create a new site.
How long does it take to process the RIPE data?
Need a fast CPU, will take a couple of days to
process the data.
Q: can it deal with live updates? It can keep up
with route-views and RIPE, but that's not live;
there is a lag; route-views is every 15 minutes.
The update files sometimes take 8 hour lags to show
up on the site.
Nick Feamster and David Anderson?
raw data -> compute engines-> storage and DB plus
archival storage ->analysis.
Very alpha right now
NOT realtime! inserting data in greedy approach;
when he needs it, he inserts it, and starts running
You can see a list of feeds, he has abilene but not
Can restrict it, look at neighbor ASes, etc.
see it in graphical form, or list form
can diagnose issues,
has an XML query engine and output for programatically
If you use matlab, could be interesting to throw this
into a multidimensional time series.
Randy Bush notes all his tools take MRT output.
Oh, he can spit out sparse matrices
He could spit out MRT format; he has python that
speaks MRT format.
he'll look at adding that.
Do spammers hijack BGP routes?
1 announce BGP route for mail server
2 send lots of spam
3 withdraw route, becoming invisible
reality? let's check!
output to matlab
and per Randy Bush, MRT format would be good too!
BGP-Inspect vs this tool?
this has additional datasets beside BGP, like active
probes, traffic, etc. This has a better collection
setup as well; unified formats.
Mohit will do last one, Link-Rank show the dynamics
Visualizing BGP dynamics with Link-Rank
constructing rank-change graphs
closest to BGPlay.
weight is number of prefixes reached across that
weight changes are on specific links, can do easy
Activity bar--routing activity across time.
green shows gains, red shows losses,
sums all gains, sums all losses.
visualization graph of where prefixes gained and lost.
again, green are links that gain, red are links that lost.
other observation points highlighted in orange
May 23rd, instability
293 flapping from 1239 to 3356
for multiple observation points, dashes are lost,
solids are gains.
highlight sources and sinks
cutting one link explains most of the errors
3561 to 4134 link issue
case II, one node that sucks in all the flows,
no single link, 3356
automated root cause identification
Can look at destination link-rank graphs, see how the
rest of the internet change going to you.
connectivity issues to 7018
should show your prefix hijackings moving from one
link to another.
Could simplify this to BGPlay if you wanted.
download client, configure...
work with any BGP data
Q: Matt asks when we'll be able to use our own BGP data;
A: about 4-5 months, hopefully!
Haven't looked at netflow yet.
Q: Randy Bush. Common problem we all face. I'm at 42
peering points; my neighbors are X. I have route views
dumps, I have my BGP dumps. I have my netflow data.
Want a whatifatron that shows what happens to my
traffic if depeer someone, or add someone, or
peer with SingTel in singapore, or stop peering
with Joe in SF.
That's a question many operators ask every day.
A: Matt notes that if they can solve that question/write
something that does all that, they'll have Arbor and
others beating on their door. ^_^
Panel wraps up at 1728 hours Pacific time.