North American Network Operators Group|
Date Prev | Date Next |
Date Index |
Thread Index |
Author Index |
Re: Resilience: faults, causes, statistics, open issues
- From: David Andersen
- Date: Fri Jan 28 13:45:33 2005
On Jan 28, 2005, at 5:30 AM, András Császár (IJ/ETH) wrote:
Just some comments about the root causes of BGP related problems,
maybe you find something useful from the research perspective,
although probably this is not going to be new for you.
I found a few author groups with very related and useful papers:
- Tim Griffin and co.
- Nick Feamster and co.
- Jennifer Rexford and co.
- Lixin Gao and co.
Yup. That particular group you mentioned has a lot of interplay.
These people often have joint publications but sometimes separate as
well. Also, Craig Labovitz and co have some very useful papers in the
area of routing convergence time.
Yes. There's also Morley Mao's convergence work.
In a sense. I think that this is one of the root causes, but it's
perhaps not the only one. I think we can group it into two areas:
As I see things now, in case of BGP, routing divergence, configuration
and policies have a very strong correlation.
A high level conclusion (what you probably can expect from half year
paper- and presentation-reading research) is that the first root cause
of BGP problems is the absence of a >>widely deployed and practical<<
formal language for policies. Since there is no formal language, there
no compiler, and so you have unwanted anomalies resulting from your
a) Fundamental BGP problems
(e.g., the convergence/flap damping issues, etc.). By
"fundamental" I don't mean uncorrectable - I simply mean that they're
"features" of the protocol as it exists today. Some may be fundamental
trade-offs in global routing; I don't know.
b) The abovementioned policy issue
Some of the issues in (a) can be corrected through (b) - for example,
the Gao/Rexford examination of what policies can be permitted if you
want to ensure stable routing. Given that BGP is a strongly
policy-driven beast, many, many of its problems do arise from this.
So, in the end, although we can possibly identify the root causes
behind BGP problems, I'm not sure they can ever be fully ceased. OK, I
can imagine a formal language and config compiler, and one can find
verification tools as well, but I can hardly imagine e.g. the sharing
of policies (although some papers write about methods how to infer the
necessary knowledge from measurements).
Agreed. I think we'll make steps, though, and I think that groups of
collaborating providers can probably implement some of the solutions
between themselves in ways that make sense.
p.s. Sorry for the long mail :) :)
No worries - quite interesting. (to me, at least!)