Very technical, but lack informative analysis.  Seems to promote
adaptive fault-tolerance entirely in the paper and didn't discuss
issues or disadvantages that might come out of this approach 
(it mentioned the disadvantages of how fault-tolerant systems are 
done today though).  Discussions about the representative systems 
using the proposed fault-tolerant "model" are very detailed, but 
again, does not contain much analysis.  It sounds more like stating 
its features and how it supports the model, but neglects to 
discuss any potential shortcomings.

Although this is not part of the thesis of the paper, maybe it 
would help (perhaps during the presentation) to discuss briefly 
a basic background on CORBA (which is mentioned in Electra, AFTM, 
and Proteus) and why is it prevalent in fault-tolerant distributed
systems, to familiarize those who does not know too much about
them. 

--------

I thought this is a very well-written paper.  The topic is relevant,
and important.  However, it seemed to me that they didn't present
much new material - it seemed more of a summary of existing implementations.
While I thought it was good that they discussed the challeneges involved
in designing these systems, it seemed like they didn't propose any of 
their own solutions.  However, this was a survey paper.  They discussed
some key ideas, and evaluated current systems of doing fault tolerance very
well, and all in all - I thought this was a well-organized and good paper.

--------

The aproach was great, but I would of liked to see some algorithm
development especially with group synchronization.  Also the initial
aproach up to section #4 is to general.  I think a more technical
explination is needed.  Section #4 was great(Systems), Eletra was very
easy to understand and brought more insight in the authors
model. Although the seperate models were great in detail, it still
seemed like their was a lack of continity between the models, sort of
like a list but the detail was great.  I would of liked more
transitions, comparisions of the different systems and a longer more
representitive conclusion.

--------

This paper presented an interesting topic as distributed computing plays
increasingly important role with the internet.  Good job detailing the
systems and describing the various critical aspects in distributed
computing.  It might be better to also discuss the cross cutting issues
between the systems though.  The systems seemed be presented completely
separately from each other.  Could different techniques from the various
systems be used together for a better system?  A little more on basic
adaptive techniques might be good too.  For instance, what happens in a
failure? Does the system transparently adapt to the failure, or does it
alert the applications? As for writing style, shorter sentences and active
voice would make reading it easier...

--------
This paper is well structured and contains much relevant information.
The authors concisely summarized the approaches to fault tolerance they
researched without giving too many extraneous details.  I found, however,
that some of the explanations and definitions given in this paper did not
give me a complete understanding of the topic described.  As I have a very
minimal amount of prior knowledge on this topic (only the papers we have
read in this class), I found some of the definitions given early in the
paper a little frustrating.  For example, after reading the introduction
of the paper, I still did not have a very good sense of what the main
concept of an adaptive policy to fault tolerance is.  While the
explanations later in the paper made this clearer, it was a little
frustrating to not have an idea of the direction of the paper as I read
it.  I think perhaps this paper might have benefited from a lengthier,
simplified explanation of some of the key topics at the start of the
paper.  Often, I find it helpful to have someone with a minimal amount of
understanding of the topic I am describing edit my papers and point out
weak points.  But I don't want to dwell on the negative aspects of this
paper!  The descriptions of the different fault tolerant systems cleared
up many of my misunderstandings of the paper.  The analyses were coherent
and logical and flowed well from the first part of the paper.  Perhaps
some comparisons between the different systems might have been nice.

--------

The list of topics in the adaptive fault tolerant model seems very complete.  The summaries provide good coverage system details.  The Chameleon system summary could be trimmed a little to match the amount of information provided on other systems.  It would help to have some comparison of the various approaches.  How do they relate? Could they be used together or would they interfere?  The overuse of acronyms should be kept in check.  In the conclusion "Adaptive fault tolerance" starts a paragraph.  It seems unwieldy to use AFT in the next sentence.  It is unclear what an RTO.k object is.

--------

Adaptive Fault Tolerance in Distributed Systems
By Bharath, Dumas, Kurul

 From the section 2, problem statement, I think what authors try
making their paper interesting is to first define what the properties
of an adaptive fault tolerant model and analyze four exemplary
systems, focusing on the properties they have observed.

     In my opinion, only the last example, Chameleon, provides a good
explanation in terms of properties. However, the others focus so much
on details of system that it prevents me from seeing the connection
between the properties and the system itself.

--------

Should use diagram to explain AFT model.  Unclear figures in section
Group Agreement.

--------

Fault tolerance is an important topic in Operating system. Distributed
environment make it more challenging. The traditional approach of
fault tolerance is duplication, which is very costly. The adaptive
approach is intended to utilize resources more efficiently.  In this
paper, the authors present the problems and decisions that are
required to architect an adaptive fault tolerant system in section
3. It is very clear and and study the current systems. Four systems
are described in section 4 and these systems are representative.  The
figures in the paper are hard to read.  The description of the systems
in section 4 is not well organized.  It will be better if the authors
can explicitly compare these systems, especially their decisions of
those important problems and the reason why they got such decisions.
I choose the scores for the following reasons Important: 6 Fault
tolerant system is very desirable for large systems and the adaptive
approach seems promising.  Novelty: 4 This is basically a description
and comparison paper Quality: 4 Section 4 is not well organized for
each system. The figures are hard to read.  Overall: 5 For a survey
paper, it is well done.

--------

This paper defines adaptive fault tolerance and its motivation,
defines the characteristics of such a system, and describes some
current implementations. The motivation for adaptive fault tolerance
is that static allocation of resources to provide fault tolerance is
very expensive in that the system is always prepared for some "worst"
case. Adaptive fault tolerance can provide more efficient use of
resources. The characteristics of an AFT system are the timing it
supports (real-time or not), how resources are replicated, how
replicated processes are grouped, how group members communicate, and
how faults are detected and dealt with transparently. Current AFT
systems that are built on top of CORBA include Electra which defines
object groups to provide structure for redundancy, AFTM which
provides real-time support and uses a highly componentized
architecture, and Proteus which allows the user to dynamically
control the redundancy configuration.

One thing I thought was good about this paper was how it brings in
CORBA technology, which is fairly mainstream. I have some experience
with CORBA so I can understand the problem better. I also thought
the environmental awareness part was very interesting (3.9), but too
short. I would have liked another paragraph or two. I am not clear
on to what degree existing systems do this. The wording in most of
the paper is good. Most of it seems well-polished. The paper is also
very well organized. I knew which way the paper was going from the
first time through it.

A couple things gave me some trouble. The second paragraph of
section 3.5 confused me -- I could have used a little more detail
about the "majority voter" concept. Also, I had some minor troubles
with the wording in the first half of section 3. Going over this
part one more time would be beneficial. Lastly, I couldn't make out
most of the text in the diagrams. It looked like the text was gray.
Black would improve reproducibility.

--------

This was a very interesting paper. Several things that I thought
the paper could use was better analysis of the described systems
and their relevance to the adaptive model. Also I found the included
pictures hard to read, and references 1-3 were too vague (a url
would be useful).

--------

1) The paper provides a good summary of the Adaptive Fault tolerance
model. I found the material interesting to read especially when I 
hadn't read much on Adaptive fault tolerance.

2) I would like an answer to this question.Do we require Adaptive
Fault Tolerance to be a part of OS? It can be provided through a layer of 
abstraction on top of OS and OS need not even be aware of Adaptive Fault 
Tolerance. In fact the examples (chameleon) they have used Chameleon is 
used for detect and recover from faults in OS also.

3) A lot of research is ongoing on Adaptive Fault Tolerance and 
the paper provides no insight into that, like Wolfpack used by
Micorosoft or FRIENDS which provides a reflection-oriented architecture
for metaobjects or fault tolerance. 

4) The important feature of chameleon is implementation flexibility
which armors allow. For example if checkpointing is not needed for
a specific user application, the checkpointing armor need not be
present. The paper fails to highlight this important point.

5) The paper talks about Electra with no reference to Piranaha which
addresses the issue of service availability in distributed application
by using sphisticated ORB that provides failure detection.

6) I don't find any refernce to software-based approaches like, Delta-4,
Isis, Horus, Totem.

7) The paper fails to highlight the problems with group communication
paradigm. A process pair approach (Tandem's)

8) An adaptive design strategy can take into account available resources,
deadlines and observed faults and notify on-line scheduling mechanism
about relative instances of tasks, their timing requirements and both their
worst-case and active usage of resources. The paper failed to highlight this
aspect.

9) No information about as how Timing requirements are met in Adaptive Fault
tolerance.

--------

It is not explained in 4.2 why the AFTM system must be real-time to manage
resources.  Perhaps this could be expanded to explain why the timing is
necessary.  Please label the figures as they were confusing.  Perhaps
section 2.2 could incorporated in section 2.3?