Control Systems MagnusLarsson , Inger Klein ,Dan Lawesson
and Ulf Nilsson
*Department of Electrical Engineering
Linkoping University, SE-581 83Linkoping, Sweden
WWW: http://www.control.isy.l iu. se
Email: magnusl,inger@isy.liu.se
**Department of Computer and InformationScience
Linkoping University, SE-581 83Linkoping, Sweden
WWW: http://www.ida.liu.se
Email: danla,ulfni@isy.liu.se
12 December, 2000
REG
LERTEKNIK
AUTO
MATIC CONTR
OL
LINKÖPING
Report no.: LiTH-ISY-R-2324
Presented atSAFEPROCESS2000
TechnicalreportsfromtheAutomaticControlgroupinLinkopingareavailable
by anonymous ftp at the address ftp.control.isy.liu.se. This report is
CONTROL SYSTEMS Magnus Larsson IngerKlein Dan Lawesson UlfNilsson
Dept. of Electrical Engineering, LinkopingUniversity, Sweden
Dept. of ComputerandInfo. Science, Linkoping University,
Sweden
Abstract: This article addresses theproblem of fault propagation between software
modulesinalarge-scalecontrolsystemwithobjectorientedarchitecture.Thereexists
acon ict between object-orienteddesign goalssuch asencapsulation and
modular-ity, and the possibility to suppress propagating error conditions. The propagation
manifests itself as many irrelevant error messages, and hence causes problems for
systemoperatorsandservicepersonnel whenattemptingtoisolatetherealfault.We
proposeafaultisolationschemeaimedatachievingclearandconcisefaultinformation
to the operator without violating encapsulation and modularity. The approach is
implementedand testedonacommercial industrialrobot controlsystemfrom ABB
Robotics and a patent application has been led with the Swedish patent oÆce
(PRV)LarssonandEriksson(1999).
Keywords:Faultisolation,objectmodelingtechniques,controlsystem,
safety-critical,propagation
1. INTRODUCTION
Developingcontrol systems for complex systems
is a diÆcult and increasingly important task.
Traditionalsoftwaredevelopmentmethodsbased
on structured analysis and functional
decompo-sition (see e.g.DeMarco (1979))are today often
replaced by object oriented methods, see e.g.,
Douglass (1998); Rumbaugh et al. (1991). The
new methods have many advantages over
tradi-tional approaches,including better possibility to
master complexity and to facilitate maintenance
and reuse (seee.g.Booch (1994)).However,new
problemsarise;theproblemaddressedhereisfault
propagation in an objectoriented software
archi-tecture for a large-scale, congurable and safety
critical control system. As basic inspiration and
case study we have used a commercial control
system for industrial robots developed by ABB
grammable and has an object oriented
architec-ture.
Object-oriented design goals such as
encapsula-tion andmodularity oftenstandindirect con ict
with the need to generate concise information
about a fault situation, and to avoid
propagat-ing error messages. Error messages are sent by
individual objectsto notify an operator that an
error condition has been detected. The aim to
encapsulate information implies that individual
objects, or groups of objects, in general do not
knowhowclosetheyareto afaultorifthefault
hasalreadybeenadequatelyreported.
The focus of this paper is an operational and
safety critical control system running without
direct supervision;in case of aserious fault, the
rstpriorityistotakethesystemtoasafestate.
Only then is it possible to start analyzing what
may have caused the fault. Operators orservice
afailurearefairlyunexperiencedwiththesystem
andhavelittleinsightinitsinternaldesign.Since
error reporting often re ects the internal design
of the system, it can be very diÆcult for the
operatorto understandwhicherrormessagethat
is most relevant and closest to the fault. In this
paperwepropose aliberal errorreporting policy
in combinationwithafaultisolationlayerhiding
the log and the core control system from the
operator; the layer performs post-processing of
thefault informationand isableto presentclear
andconcisefaultinformationtotheoperator;thus
facilitatingdesignprinciplessuchasencapsulation
andmodularity.
Itshould benotedthat thenumberoferror
mes-sages or alarms in a fault scenario need not be
especially large to cause problems for an
unex-perienced operator,the number typically ranges
from 3 to 20 in our use case. The strength of
theproposedapproachdoesnotlieinthenumber
of errormessageshandled in each fault scenario,
but inthewiderangeof potentialfault scenarios
handled byageneralmethod.
2. INFORMATIONUSEDDURINGFAULT
ISOLATION
We propose a fault isolation scheme where error
messages are explained locally, by means of
in-formation already available to an object at
run-time;thusnotviolatingtheprincipleof
encapsula-tionandmodularity.Thislocalinformation,made
available in an errormessage, is called the error
messagesignature which togetherwitha
concep-tualexplanation model makesit possibletoinfer
cause-eectrelationsbetweentheerrormessages.
The most relevant error message(s) can then be
presentedto theoperator.Whentheinformation
from the error messages is inconclusive, we use
a structural system model to nd dependencies
between objects and hence\ll in the gaps". To
theauthors'knowledge,this isanovelapproach.
Error messages are divided into internal error
messages,andrelational errormessages;ina
rela-tionalmessagethecomplainingobjectiscalledthe
complainer, and theimputed objectis called the
complainee.Relationalerrormessagesarefurther
specializedintothosewhere(1)thecomplaineeis
knownandwhere(2)thecomplaineeisunknown.
If the system is regarded as a collection of
col-laborating, fairly intelligent, but narrow-minded
individuals,thesethreetypescanbecharacterized
withthestatements\Ididit",\hedidit"and\I
didn'tdoit"respectively.
Theinformationprovidedintheerrormessagesis
complementedwithastructural system model. A
fartoocomplex(in theorderof10 linesof code
in the ABB case). Even if it is not yet common
practicetodoso,itiswidelyrecognized that
de-velopingsystemmodelshelpsindesigningcorrect
systems; hence it is not unreasonable to assume
thatsoftwareinthefuturewillbeaccompaniedby
modelsatdierentlevelsofabstraction.The
mod-elinglanguageusedherewastheUniedModeling
Language(UML)(seee.g.,Douglass(1998)).The
UMLisadesignnotationforobjectoriented
sys-tems and also serves as system documentation.
For the fault isolation process we use the UML
class diagrams andtaskdiagrams.
Closely collaborating and related classes can in
the UML be collected into modules called
pack-ages.Apackagemodels aspecicsubjector
con-cerninthesystem,andsuppliesamorehigh-level
modelofthesystemarchitecturethantheclasses
and theclassrelationships.Howthisinformation
is used in the fault isolation approach will be
demonstratedbelow.
Forthesystemmodel (in theform ofUML class
diagrams) to be useful for fault isolation, the
system and system model should be such that
the static class structure re ects the run-time
object structure well. Classes in control systems
areoftenhighlyspecialized.Evenifthecomplete
run-time object structure often is very dynamic
and constantly changing, there are usually only
a few \major players" among the objects that
arealwayspresent.Iftheseobjectshavethemain
responsibilityforerrorreporting,theycanprovide
enough similarity with the static class structure
for thesystemmodel basedon classdiagrams to
be useful for fault isolation. Another important
property is that the inheritance hierarchies are
seldomverydeep;oftenonlyoneortwolevels.
Since the information used for fault isolation is
partitionedintoaUML modelanderrormessage
signatures, the approach scales quite well. The
fault isolation schemeis easyto maintainand to
extend when the system changes, since it is an
integralpartofthesoftwaredevelopmentprocess
andthesoftwareitself.
3. AFAULTISOLATIONSCENARIO
Duetospacelimitationsitisnotpossibletofully
describe the formal notation or algorithms used
in the implemented system.Insteadweillustrate
themethodbyarealfaultscenariofromtheABB
Roboticsindustrial robotapplication.Fora
com-plete,andformal,treatmentseethethesisLarsson
(1999) or the report Larsson et al. (1999). The
purpose of the example is to illustrate the fault
ibsser eiodevIBS 1..* 1..* 1 1 1 1 1 1 eiodev eioexe (from EIO)
Fig.1.Extractofclassdiagramfromthepackage
Drivers.
7. 10008 Program restarted 0105 13:45.9
The task MAIN has
restart to execute.
The originator is the production window.
8. 71061 I/O bus error 0105 13:45.30
Description\Reason:
- An abnormal rate of errors on
bus IBS has been detected.
9. 71107 InterBus-S bus failure 0105 13:45.31
Description\Reason:
- Lost contact at address 2.3
10. 71139 Access error from IO 0105 13:45.35
Description\Reason:
- Cannot Read or Write signal DO3_1
due to communication down.
11. 40503 Reference error 0105 13:45.35
Device descriptor is
not valid for a digital write operation
12. 40223 Execution error 0105 13:45.35
Task MAIN: Fatal runtime
error
13. 10020 Execution error state 0105 13:45.35
The program execution has reached
a spontaneous error state
14. 10005 Program stopped 0105 13:45.35
The task MAIN has
stopped. The reason is that
an external or internal stop has
occurred.
Fig.2.Errorlogfortheexample.
In Figure 1, part of the system model relevant
to our fault scenario is shown in UML class
diagram notation. Classes are shown graphically
usingrectangleswiththenameoftheclassinside.
That an object uses some service performed by
anotherobjectismodeledwithanarrowbetween
classes, aso called association. A class can be a
specializationofamoregeneralclass;thisiscalled
inheritance, and is indicatedby an arrow witha
int pgmexe 40223 RealInstructio n rlio 40503 eio 71139 eiount eiobus eioexe ibsser 71061 71107 PGM REAL EIO DRIVERS
(a)Originalbasegraph.
40503 40223 71139 71061 71107 (b) Explanation graph.
Fig.3.Original baseandexplanationgraphs.
Thefaultconsideredhereisamalfunctioningeld
bus. The resulting error message log is given in
Figure 2. The error message signatures are not
shown in the log, but the local information
pro-vided in the error message signatures is
visual-ized in a so-called base graph, see Figure 3(a).
Each node of the base graph corresponds to an
object that has either sent an error message (a
complainer)orispointedoutbyanotherobject(a
complainee).Theedgesbetweennodescorrespond
to relational error messages and should be read
\complains on".The self-loop adornedwith int
corresponds to an internal error message. There
isalsooneinheritancerelationin thebasegraph.
Thepackagesareshownusingdashedboxeswith
thepackagenamein theupperleft corner.
The base graph describes dependencies between
objects, but the aim is to point out the error
messageclosestto thefault. Forthispurpose we
construct anexplanation graph,see Figure 3(b).
The explanationgraphis in somesense the dual
of thebase graph; thenodes correspondto error
messages and the edges represent dependencies
between error messages. The goal of the fault
isolationschemeistoproduceaconnected
expla-nation graph without any cycles where all error
messagescanbetracedtooneprimaryerror
mes-sage.Thiserrormessagecanthenbepresentedto
the operator. In our scenario this primary error
int pgmexe 40223 RealInstruction rlio 40503 eio 71139 eiount eiobus eioexe ibsser eiodevIBS 71061 71107
(a)Extendedbasegraph.
40503 40223 71139 71061 71107 (b)Explanation graph.
Fig. 4. Extended base graph and explanation
graph.
but byusing thepackageinformationweachieve
aconnectedexplanationgraphanyway.
Ifthebasegraphisnotconnected,asinourcase,
it may be necessary to extend the base graph
usingtheUMLsystemmodel(Figure1).Thiscan
be done both on the class- and package levels.
Algorithms for doing this are further described
in Larsson (1999); Larsson et al. (1999). In our
example,an extensionon theclass levelis
possi-ble. The basic ideais to tryto nd complainees
for objectsin the base graphwhich do nothave
\somebodytoblame".
The system model is searched for associations
between classes corresponding to objects in the
basegraph,possiblyviaintermediateclasses.The
resultisanextendedbasegraphand
correspond-ing explanation graphas in Figure 4. Note that
theconclusionsofthefaultisolationapproachare
strengthened by the extension, even though the
originalexplanationgraphalreadywasconnected.
In this example the generation of the
explana-tion graph is easy, but the situation becomes
morecomplicated if,e.g.,IPC errormessagesare
present (communication errors between
concur-renttasks).Inthosecasesthebasegraphconsists
oftwoparts,onebasedontheclassdiagramsand
onebasedontaskdiagrams,andtheexplanation
We have presented a scheme for fault isolation
in object oriented control systems. The method
is based on the error messages in the error log,
andusesaUMLmodelofthesystemtocomplete
the explanation graph which shows the
cause-eect relationships between error messages. The
strength of the proposed approach does not lie
in the amount oferrormessageshandled in each
fault scenario,but in thewiderangeof potential
fault scenarioshandledbyageneralmethod.
Themethodoutlinedherehasbeenimplemented.
The core of the fault isolation layer consists of
ca. 2000 linesof C++code andis ableto access
UML models developed in Rational Rose. The
algorithms have been tried on aset of real fault
scenariosfromtheABBRoboticsindustrialrobot
control system. Inthenine examplesconsidered,
thesystemwasabletopin-pointtheprimaryerror
message in seven cases { in the remaining two
cases the error wascaused in asub-system that
wasnotpartoftheUMLmodel{hence,therewas
nohopeofpin-pointingtheerror.Inthosecases,
the fault isolation tool returnedseveral maximal
errormessages,but oeredadeepened insight in
thefaultscenario.
Thesystemmodelusedabovecapturesthe
struc-ture of the system. In our future work we will
examine the possibility to use a system model
containingalsobehavioralinformation.Naturally,
a more detailed model allows for more precise
diagnosis,but it also posesproblems in termsof
maintenance of the model and nding (in some
sense)correctrulesforreasoning.Statechartsare
included in the UML, and they are our present
candidate forabehaviorsystemmodel. Oneway
of performing reasoning would then be to use a
modelchecker.
References
G. Booch. Object-OrientedAnalysis andDesign:
With Applications. Benjamin/Cummings, 2
edition,1994.
T. DeMarco. Structured Analysis and System
Specication. Prentice-Hall,1979.
B.P. Douglass. Real-TimeUML: Developing
Ef-cient Objectsfor EmbeddedSystems. Addison
Wesley,1998.
M. Larsson. Behavioral and Structural
Model Based Approaches to Discrete
Diagnosis. Phd thesis 608, Department of
Electrical Engineering, Linkoping University,
Linkoping, Sweden, 1999. Can be aquired at
http://control.isy.liu.se/publications/.
M. Larsson and P. Eriksson. Fault isolation
Modelbased fault isolationfor object-oriented
control systems. Technical Report
LiTH-ISY-R-2205, Dept. of Electrical Engineering,
LinkopingUniversity,1999.Canbeacquiredat
http://control.isy.liu.se/publications/.
J.Rumbaugh,M.Blaha,W.Premerlani,F.Eddy,
and W. Lorensen. Object-Oriented Modeling