DIAGNOSTICS OF INTERMITTENT ERRORS

(1)

UPTEC Q 21010

Examensarbete 30 hp September 2021

DIAGNOSTICS OF

INTERMITTENT ERRORS

______________________________________________________________

Niklas Lindborg

Civilingenjörsprogrammet i teknisk fysik med

(2)

Diagnostics of Intermittent Errors

Niklas Lindborg

ABSTRACT

Intermittent faults/errors are infamous for being among the most challenging errors to diagnose. It is estimated that more than 80% of the total number of errors in real systems are intermittent errors. Previous research on intermittent errors suggests that they are the prelude to permanent faults. There seems to be a vast knowledge gap in general regarding intermittent errors, both in academia and industry. The term "No Fault Found" might have ingrained a culture of acceptance regarding faults that intermittent errors might cause. This master thesis aims to develop a generic algorithm for diagnostics of intermittent errors that allows for the early isolation of failing sensors, especially at the end of their life spans. It is desirable that Scania can identify intermittent errors efficiently to save maintenance costs and keep customer satisfaction high. Multiple intermittent error detection and diagnostics methods have been produced and tested through simulations in MATLAB. The results suggest that the most important factors when introducing algorithms for intermittent error detection are the sensors' self-diagnostic capabilities and their communication protocol. The developed algorithms can be used for efficient fault isolation, obtaining valuable data for research, and triggering Diagnostic Trouble Codes (DTCs) when the impact of the errors is too significant, which allows for proactive replacement. If the algorithms are introduced as suggested in this master thesis, the knowledge gap can be filled. Consequently, Scania can use the increased knowledge to further improve the algorithms for better detection of intermittent errors and increase the overall performance of Scania vehicles.

Keywords

Intermittent errors, diagnostics, knowledge gap, fault isolation

Teknisk-naturvetenskapliga fakulteten Uppsala universitet, Utgivningsort Uppsala/Visby

Handledare: Professor (adj.) Ola Stenlåås Ämnesgranskare: Professor Hugo Nguyen

(3)

Diagnostik av intermittenta fel

Niklas Lindborg

Intermittenta fel definieras som fel som ”kommer och går” i ett maskinsystem under dess livslängd och de har ett rykte att vara bland de svåraste felen att diagnostisera. Fel av

intermittent karaktär existerar ofta oupptäckta trots att det har uppskattats att mer än 80% av det totala antalet fel i komponenter är intermittenta fel. Tidigare forskning om intermittenta fel tyder på att intermittenta fel, över tid, i princip alltid leder till permanenta fel.

Det verkar dessutom finnas en stor kunskapslucka angående effekten och systempåverkan av intermittenta fel, både inom den akademiska världen och i näringslivet. Vidare kan termen

"Inget fel hittats" ha skapat en acceptans-kultur gällande fel i komponenter som intermittenta fel kan ha orsakat.

Detta examensarbete syftar till att utveckla en allmän algoritm för diagnostik av intermittenta fel. Algoritmen ska möjliggöra tidig identifiering av sensorer som håller på att gå sönder eller om de intermittenta felen orsakar för stor systempåverkan, vilket är speciellt viktigt i slutet av sensorernas livslängder. Det är önskvärt att Scania effektivt kan identifiera komponenter med intermittenta fel för att spara underhållskostnader och för att hålla kundnöjdheten hög. Flera intermittenta feldetektering- och diagnostikmetoder har utvecklats och testats med hjälp av simuleringar i MATLAB och Simulink.

Tre sensorer studerades i detta examensarbete. Sensorerna var avgasmottryck sensorn, hög temperatur sensorn och NOx-sensorn. Avgasmottryck sensorn var en analog sensor medan hög temperatur- och NOx sensorn var digitala sensorer. Dessutom hade alla sensorer olika kommunikationsprotokoll och självdiagnostik möjligheter.

För att effektivt kunna utveckla algoritmen kartlades all relevant diagnostik hos de tre sensorerna för att kunna avgöra vilken typ av fel som inte upptäcks av dagens diagnostik.

Detta gjordes bland annat genom att studera interna Scania dokumentation och genom att intervjua dem ingenjörer som var ansvariga för den specifika sensorn. De utvecklade algoritmerna fokuserade på att diagnosera dem typer av fel som inte riktigt fångades upp av dagens diagnostik.

Under examensarbetets gång identifierades tre kunder av algoritmen, alla med olika krav och

önskemål på vad algoritmen ska leverera. Den första kunden är verkstadsarbetaren. De vill att

algoritmen ska ge tydliga instruktioner gällande hur det upptäckta felet ska repareras. Den

andra kunden av algoritmen är utvecklingsingenjörerna hos Scania. De vill ha statistik och

information från algoritmen som kan användas för att få mer kunskap om intermittenta fel.

(4)

Den kunskapen skulle kunna användas för att utveckla algoritmerna samt för att göra design ändringar i motorn eller sensorerna för att minska förekomsten av intermittenta fel. Den sista kunden av algoritmen är de lagstiftande myndigheterna. De vill att algoritmerna ska varna föraren av lastbilen om intermittenta fel hittas som kan påverka utsläppen samt om

säkerheten har blivit försämrad. Alla dessa kunder togs hänsyn till när algoritmerna utvecklades.

Resultaten tyder på att de viktigaste faktorerna att ta i beaktande vid utveckling av algoritmer för intermittent fel diagnostik är sensorns självdiagnostik och kommunikationsprotokoll.

Vidare tyder resultatet från litteraturstudien att de signal symptom som intermittenta fel kan orsaka är toppar och dalar, oscillation, offset, dämpning, överkänslig signal status

nedgradering, ingen signal eller maximum/minimum signal. Orsakerna till dessa symptom varierar mellan lösa/glappande kontakter i lödfogen eller kablaget, komponent åldring, oxidation, fukt, läckage eller föroreningar.

Ingen ensam algoritm kan detektera alla dessa möjliga symptom i sensorns signaler, därför utvecklades fem olika detektionsmetoder, varje detektionsmetod kan upptäcka olika typer av fel. Tyvärr utvecklades inga detektionsmetoder som kunde hitta intermittenta offset eller dämpningar.

Om algoritmerna implementeras på det sättet som föreslagits i detta examensarbete kan kunskapsluckan fyllas och alla kunder av algoritmen kommer att bli nöjda. Detta görs genom effektiv felisolering, insamling av värdefull information och generering av felkoder om de intermittenta felens påverkan är för stor eller om sensor håller på att gå sönder. Detta skulle möjliggöra proaktiv reperation eller utbyte av sensorer som är på väg att gå sönder.

Insamlingen av information rörande intermittenta fel kan Scania använda för att öka kunskapen för att ytterligare förbättra algoritmerna för bättre detektion av intermittenta fel, vilket skulle resultera i ökad prestanda för alla Scania fordon.

Nyckelord

Intermittenta fel, diagnostik, kunskapslucka, felisolering

Examensarbete 30 hp på civilingenjörsprogrammet Teknisk fysik med materialvetenskap

Uppsala universitet, september 2021

(5)

Master Thesis performed in collaboration with

Scania CV AB, Södertälje, Sweden

(6)

Acknowledgements

First of all, I want to thank my supervisor at Scania, Professor (adj.) Ola Stenlåås, who has been incredibly dedicated and supportive throughout the master thesis works. The meetings with the supervisor have constantly provided me with idea-fetching and insightful discussions. I am very grateful for the opportunity to perform my master thesis at Scania.

Secondly, I want to thank my master thesis colleagues, Daniel Rodríguez Pascal and Daniel Strandberg. They have made my time at Scania more enjoyable through, for example, excellent coffee-break chit-chat. However, most importantly, they have been a source of motivation and inspiration when the master thesis, at times, felt overwhelming.

Thirdly, I want to thank all of the Scania engineers I have talked to during the master thesis. The engineers that I have interviewed have shown interest in my work and have been helpful in any way they can. Without the help of the Scania engineers, I would not have been able to complete the master thesis.

Lastly, I want to thank Uppsala University. Especially Professor Hugo Nguyen, my

subject reviewer and Associate Professor Åsa Kassman Rudolphi, my grading

professor. Uppsala University has proven to be very pragmatic, considering the

strange situation in the world with the Covid pandemic.

(7)

List of Contents

Chapter 1: Introduction 1

1.1 Background 1

1.2 Goal of the Thesis 1

1.3 Research Questions 2

1.4 Delimitations 2

1.5 Research Method 2

1.6 Challenges 3

Chapter 2: Investigation of the Existing Diagnostics 3

2.1 Exhaust Back Pressure Sensor 3

2.1.1 Capacitive Pressure Sensor 4

2.1.2 Piezoresistive Pressure Sensor 5

2.1.3 Voltage Range and Available Signals 5

2.2 High Temperature Sensor 6

2.2.1 Thermocouple Temperature Sensor 6

2.2.2 Digital Messages and Available Signals 7

2.3 NOx Sensor 8

2.3.1 Amperometric Sensors 8

2.3.2 Smart-Sensor and Signals Available 9

2.4 Intermittent Errors 10

2.5 Existing Diagnostics 13

2.5.1 Exhaust Back Pressure Sensor 14

2.5.2 High Temperature Sensor 14

2.5.3 NOx Sensor 16

2.6 Knowledge Gap 16

Chapter 3: Prerequisites for developing an IE detection algorithm 17

3.1 Demands on an IE Detection Algorithm 17

3.1.1 Workshop Operator 17

3.1.2 R&D Engineers at Scania 18

3.1.3 Legislation Requirements and Vehicle Safety 18

3.2 Analog, Digital and Smart Sensors 18

3.3 Multiple Monitoring Algorithms 19

3.4 Real-Time Processing 19

3.5 Test Conditions 20

3.6 DTC Threshold 21

Chapter 4: Applicable Methods for IE Detection 21

4.1 Identification of Intermittent Errors 22

4.1.1 Out-of-Range Method 22

(8)

4.1.2 Response Time Method 23

4.1.3 Spectral Analysis Method 25

4.1.5 Absent Signal Method 28

4.1.6 Signal Status Method 29

4.2 Short-term Storage of Data 30

4.2.1 Matrix Method 30

4.2.2 Counter Method 32

4.3 Diagnosis 32

4.3.1 Percentage Method 32

4.3.2 Signal Deviance Method 33

4.3.3 DTC Threshold 34

4.3.4 Operational Data 34

Chapter 5: Simulation Result and Discussion 35

5.1 Exhaust Back Pressure Sensor 35

5.2 High-Temperature Sensor 39

5.3 NOx Sensor 42

5.4 General Remarks 45

Chapter 6: Implementation of Algorithm 46

Chapter 7: Conclusions 48

References 50

Appendix 1

(9)

Recurring Abbreviations

Abbreviation Description Section

CAN Controller Area Network 2.3.2, 3.2 and 4.1.5

DPF Diesel Particle Filter 2.3.2

DTC Diagnostic Trouble Code 1.1, 1.2, 1.3, 2.4, 2.5, 3.1 3.6, 4.1, 4.3, 5.3, 5.4 and 6.1

ECU Electronic / Engine Control

Unit

1.2, 1.3 2.2.2, 2.3.2, 2.4, 2.5.2, 3.4, 4.1.1, 4.1.5, 4.2 and 5.4 EMS Engine Management System 2.1.3, 2.5, 2.5.2, 2.5.3, 4.3.2 and

5.4

FMI Fault Mode Indication 2.5.3

HTS High Temperature Sensor 1.4, 2.2, 3.2, 3.4, 3.5, 4.1, 4.3.2, 5.2 and 5.4

IE/s Intermittent Error/s 1, 2, 3, 4, 5, 6 and 7

NOx Nitrogen Oxides 1.4, 1.5, 2.3, 3.1.3, 3.2, 3.4, 3.5,

4.1, 4.3.1, 5 and 5.3

OBD On-Board Diagnostics 1.5, 2.5.3 and 7

VT1 Vehicle Test 1 6.1

VT2 Vehicle Test 2 6.1

SCR Selective Catalytic Reduction 2.3 and 2.3.2

SCU Sensor Control Unit 2.2.1, 2.2.2 and 2.3.2

YSZ Yttria Stabilized Zirconia 2.3.1

(10)

(11)

Chapter 1: Introduction

1.1 Background

Intermittent faults/errors (hereafter called intermittent errors, abbreviated to IEs) are

infamous for being among the most challenging errors to diagnose. This type of error often occurs periodic and irregularly within the service intervals. The cause of IEs is typically degradation of the component or its electrical connections, which may lead to a total malfunction or breakdown of the component over time. [1] Today’s diagnostics in Scania engines can be improved for better IEs detection.

If a Scania truck is operated with one or more engine management- or exhaust gas treatment sensors that suffer from IEs, their incorrect output signals may affect the engine and exhaust gas treatment system performance and function. One potential risk is that the intermittently malfunctioning sensors may trigger diagnostic trouble codes (hereafter

abbreviated to DTC) with no specified error cause. Since the DTC is unspecific, the number of possible reasons for errors is large, and thus the maintenance staff may have difficulties identifying the faulty component. Moreover, a sensor that suffers from IEs may cause a cascade of DTCs in other sensors or actuators that depend on its output signal.

It is desirable that IEs can be identified efficiently to save maintenance costs and keep customer satisfaction high. Furthermore, Scania Research and Development can research the data obtained by the detected IEs to avoid IEs in next-generation engines.

The sensors of the Scania trucks operate within embedded systems. Consequently, the available memory space is minimal. The limited memory space entails careful balancing when designing monitoring algorithms. The algorithm should be simple to save memory space and sophisticated enough to carry out its designated task efficiently. Furthermore, only the most essential of monitored data may be stored for diagnostics.

1.2 Goal of the Thesis

The goal of this master thesis was to develop a generic algorithm for diagnostics of IEs that allows for early isolation of failing sensors, especially at the end of their life spans. Examples of IE root causes that are desirable to fault isolate are glitchy electrical contacts and clogged pipes or membranes.

If a sensor that is about to malfunction or has a breakdown is detected by the generic algorithm, through the diagnostics of IEs, the triggered DTC is to be stored in the electric control unit (hereafter abbreviated to ECU). The DTC should provide information allowing for fault isolation and information for appropriate measures to be taken in the workshop, i.e., repair, cleaning, or replacement of sensor, pipe, membrane, or electrical contacts/wires.

(12)

1.3 Research Questions

RQ1) Identify typical output signal behaviour of sensors that suffers from

electrical/mechanical/software IEs. Which are the similarities and differences? IEs make their presence known through the absence or the deviation of sensor output data.

RQ2) Find and develop an algorithmic method for precise and efficient detection of absent or deviating sensor output signals. Which method has the highest detection rate respectively lowest false positives?

RQ3) Describe and determine the best threshold value for when the IE DTC should be triggered. Which aspect is most important; sensor lifespan, cost efficiency,

environmental sustainability or customer satisfaction? Sensors with IEs may function within the tolerances of the system until a certain threshold.

RQ4) Select a categorization of IEs that allows for easy fault isolation and cost-effective repair/replacement of sensors or electrical contacts with IEs at the workshop. Which aspect is most important; means taken at the workshop, root cause or output signal behaviour?

RQ5) Determine the information storage solution for detected IEs, diagnosis data, fault isolation data and fault codes. What benefits are gained from the stored information considering the memory required? The available memory in the ECU is very limited.

1.4 Delimitations

This master thesis analysed the occurrence of IEs of the Scania engine- and exhaust gas treatment system sensors. The project was limited to the exhaust back pressure sensor, the high temperature sensor (hereafter abbreviated to HTS) and the Nitrogen Oxides (hereafter abbreviated to NOx) sensor. Moreover, the generic algorithm only monitors the data

available in the output signal from the sensors. No redundancy- or model based approaches were used.

1.5 Research Method

The first four weeks at Scania were devoted to a literature survey to gain a broad

understanding of the subject at hand. The topics researched in the literature survey include general information of the NOx-sensor, methods for algorithm development for signal pattern detection, and various engine control methods. The objective of the literature survey was to have enough information to set the final quantitative goal of the master thesis and to obtain most of the theory needed regarding IEs.

Much effort was put into the development of the generic algorithm. The generic algorithm code is to be stored in the onboard-diagnostics (hereafter abbreviated to OBD) hardware

(13)

with limited memory space. Therefore, the generic algorithm should be simple without sacrificing too much precision and effectiveness.

Meetings with Scania engineers, who have deep knowledge of the examined sensors, were planned for discussion regarding the development of the generic algorithm and to gain information regarding the sensors that were beneficial to the project.

When the algorithms were to be tested, there were two ways to acquire the test data. The first alternative was to collect the data containing IEs from Scania’s internal database, where statistics and real-life data from customer truck drives are available. The second alternative was to manipulate error-free data to mimic data containing IEs.

When the algorithm trials were conducted, the results were evaluated. The purpose of the evaluation was to identify the areas for improvement. When improvements were

implemented, new algorithm trials were carried out. The iterations were repeated until the algorithms were sufficiently effective.

Additional literature study was carried out when needed. Starting from project week four, when the final goal of the master thesis was set, the report was written and updated continuously.

1.6 Challenges

The challenges identified in the initial stage of the project are listed below.

1. Detection of all potential IEs.

2. Overcoming the hardware limitations (e.g. available memory storage).

3. Verifying that the identified IEs are true errors.

4. Storage of time-resolved fault codes, diagnostics data and fault isolation data of the sensors.

5. Accurate fault isolation.

6. Obtaining operational data containing IEs, manipulating data to mimic IEs or accurately provoking IEs in the test cell.

Chapter 2: Investigation of the Existing Diagnostics

2.1 Exhaust Back Pressure Sensor

The exhaust-pipe pressure sensor aims to measure the pressure of the exhaust gas at the exhaust gas collector. The information is, for example, used for the boost system of the engine. Engine errors can be detected, for example, if the measured value is too low for some time. It suggests leakage in the exhaust gas manifold.

(14)

2.1.1 Capacitive Pressure Sensor

One common pressure sensor is of the capacitive type. The capacitive pressure sensor works by measuring the capacitance of two capacitor plates separated by a small gap. The medium in the gap can be of various materials such as air or oil. The dielectric constant of the medium influences the overall capacitance potential of the sensor, which in turn affects the sensor sensitivity. The higher dielectric constant of the medium results in higher

capacitance potential. A higher capacitance potential increases the sensor sensitivity.

The diaphragm alternates location, with respect to the capacitor plates, as a function of the pressure. The diaphragm location affects the capacity which is measured. Therefore by measuring the capacitance, the pressure can be calculated. Eq. (1) describes how the variables affect the capacitance. [2]

𝐶 = ϵ

𝑟ϵ

0 𝐴 𝑑

(1)

In equation (1) C is the capacitance, ϵ_ris the dielectric constant of the medium in between the capacitor plates, ϵ₀is the electric constant, A is the area of the capacitor plates and d is the distance between the capacitor plates. See Figure 1 for a simple sketch of a capacitive pressure sensor.

Figure 1. Illustration of a simplified design of capacitive pressure sensors. In modern capacitive pressure sensors, the design can be more advanced. The figure above is inspired by the works of Avnet. [2]

Compared with other types of pressure sensors, the advantages of capacitive pressure sensors are their low noise ratio and low power usage. Capacitive pressure sensors are temperature stable, and self-calibration is possible. However, capacitive sensors typically have a low signal; therefore, some signal amplification is usually needed. Otherwise, background signals might cause too much disturbance. [3]

(15)

2.1.2 Piezoresistive Pressure Sensor

Another usual pressure sensor is the piezoresistive type. The piezoresistive pressure sensor works by detecting differences in the resistance of a sensor element. See Figure 2 for a simple sketch of a piezoresistive pressure sensor. If the pressure increases, the diaphragm applies a mechanical force upon the piezoresistive material. The piezoresistive material is stretched, which causes a shift of its electrical resistivity. The resistance change is measured and is directly correlated to the magnitude of the pressure. Therefore, by knowing the

resistance, the pressure can be calculated.

Figure 2. Sketch of the basic design of piezoelectric pressure sensors. The sketch is inspired by the work of Avnet. [2]

The advantages of piezoresistive pressure sensors compared with other types of pressure sensors are their simple fabrication, strong signal, and simple interface circuits; the changes in voltage from the piezoelectric material are easily captured by measuring the changes in resistance using a Wheatstone bridge. However, some drawbacks are that piezoelectrical pressure sensors are typically temperature-sensitive, and usually, there is high thermal noise. [3]

2.1.3 Voltage Range and Available Signals

The pressure sensors that are treated in this master thesis are analog. The voltage range is between LVL_ps(Lower Voltage Limit, Pressure Sensor) and UVL_ps(Upper Voltage Limit, Pressure Sensor) for both sensors. LVLpscorresponds to pmin, while UVLpscorresponds to p_max. See Figure 3 below for a graph that illustrates the voltage range.

(16)

Figure 3. Output voltage of the pressure sensor for different pressures. The graph above was inspired by K.

Reif’s work. [4]

The frequency of voltage sampling in the analog signal is ‘A’ Hz. The ‘A’ Hz signal is then subjected to a moving average with a ‘C’ ms window. The value of the moving average is sampled with a frequency of ‘B’ Hz. The ‘B’ Hz signal is then translated to bar, and those values are used by the engine management system (hereafter abbreviated to EMS). See Figure 4 for a visualization of the signals.

Figure 4. Block diagram of the available analog signals for diagnostics.

All of the signals, meaning the ‘A’ Hz frequency signal in volt, the ‘B’ Hz frequency signal in V, and the ‘B’ Hz frequency signal in bar, are available for usage in the detection algorithm.

2.2 High Temperature Sensor

2.2.1 Thermocouple Temperature Sensor

The high-temperature sensor (hereafter abbreviated to HTS) treated in this master thesis is of the thermocouple type. The high-temperature sensor aims to regulate the heat of the exhaust after-treatment system. The HTS can have one to four probes connected to the sensor control unit (hereafter abbreviated to SCU), meaning that one SCU can measure the temperature of up to four different components simultaneously.

(17)

The thermocouple sensor works by having two different metals connected in a circuit. One end of the circuit is called the hot side, which is where the temperature is measured. In contrast, the other end of the circuit is called the cold side, where the temperature is known.

On the hot side, the metal wires are in electrical contact, while on the cold side, they are connected to a voltage measurement device. When the wires are exposed to a temperature gradient, a thermoelectric voltage arises because of the Seebeck effect. Because the wires are electrically connected at the hot side, they have the same voltage potential. However, there is a voltage difference between the wires on the cold side, which is measured. The measured voltage is directly correlated to the temperature difference between the hot and cold sides of the circuit. [5]

The HTS can calculate the temperature difference between the cold and hot sides by obtaining the voltage over the wires, and the temperature on the cold side is known.

Therefore, the temperature on the hot side is easily calculated.

2.2.2 Digital Messages and Available Signals

The signals available for monitoring from the HTS follow the Single Edge Nibble

Transmission (hereafter abbreviated to SENT) protocol. See Figure 5 for a sketch of a typical SENT protocol signal.

Figure 5. Sketch of what a typical signal contains that is following the SENT protocol. The Figure is inspired by the work of Integrated Device Technology [6].

The SENT message is first synchronized; this is done by synchronization pulse of predefined length. The second pulse, labelled status & communication in Figure 5, has 4 bits. The first two bits communicate which probe is going to send its message. The last two bits convey secondary information such as different fault modes like electronics errors or supply voltage failure. The last two bits are referred to as the slow channel. It is called the slow channel because multiple SENT protocol messages are needed to transfer one piece of information.

After all, only 2 bits are transferred per message.

After the information in the slow channel has been broadcasted, the temperature of the probe is transferred. This nibble (that contains the temperature information) is called the fast channel because the whole piece of information is transmitted in one message. After that comes a Checksum pulse which has the purpose of controlling that the information was accurately transferred. Lastly, there is a pause pulse that aims to delay the next signal so that the messages are sent at an even tempo. [6, 7]

(18)

Up to four probes can communicate using the SENT protocol. The probes take turns in broadcasting their information. One SENT message has a transfer time of ‘D’ ms. Therefore, each probe sends its temperature every ‘E’ ms. If one or more probes are missing, there is a dummy message in its/their place.

Consequently, the temperature is sent with a frequency of ‘F’ Hz at all times regardless if there are one or four probes connected to the SCU. The messages in the SENT protocol are not in decimal but in hexadecimal (also called base 16). Furthermore, in the SENT protocol, messages only go from the sensor to the ECU, meaning that the ECU cannot send

messages to the sensor.

2.3 NOx Sensor

NOx is a group of molecules, including NO and NO₂. In the combustion of diesel in the engine, one of the by-products is NOx gases. The emission of NOx gases is harmful to the environment and poisonous to humans and animals. It is estimated that yearly 23 500 people in Britain are diseased due to complications caused by NOx gases. Some

complications that NOx can cause are headaches, chronically reduced lung function, and eye irritation. Furthermore, NOx gases contribute to acid rain and suffocation smog. It is desirable to minimise NOx gas emissions. [8] Euro VI states that there may only be 0.4 g/kWh NOx gas emission for trucks. [9]

There are typically two NOx sensors in a Euro VI truck, one upstream and one downstream of the selective catalytic reaction (hereafter abbreviated to SCR) catalyst. The NOx sensor upstream is used to measure the influx of NOx in the exhaust gas from the engine, and the NOx sensor downstream is used to measure the NOx content of the emission gas. The output signal from these sensors is used to make adjustments in the SCR. For example, the amount of added Diesel Exhaust Fluid is adjusted based on the values from these sensors.

2.3.1 Amperometric Sensors

The NOx sensors treated in this master thesis are of the amperometric type. In

amperometric sensors, a current is measured. The measured current is directly correlated with the quantity of the variable that is to be detected.

When the exhaust gas enters the NOx sensor, there are one or two pre-chambers. The task of the pre-chambers is to remove all oxygen molecules from the gas. The oxygen is removed by applying a voltage over the “roof” of the chambers. The oxygen molecule is then split into oxygen ions that are conducted through the “roof” of the chambers. The “roof” of the

chamber is a membrane that is made of yttria-stabilized zirconia (hereafter abbreviated to YSZ). The YSZ is a material that allows the oxygen ions to pass without too much

resistance. [10, 11]

In the last chamber, there is a catalyst that splits the NO into oxygen and nitrogen ions. An applied voltage causes the oxygen ions to be conducted through the roof of the last

(19)

chamber, which is also made of YSZ. This current is measured and converted to the NO content of the exhaust gas in ppm. The NO concentration is calculated by knowing that all of the conducted oxygen ions come from a NO molecule. See Figure 6 for a simplified sketch of the NOx sensor.

Figure 6. Simplified sketch illustrating how the NOx sensor functions. The picture is inspired by the works of Diesel Net Technology Guide. [11]

2.3.2 Smart-Sensor and Signals Available

The NOx sensor is the most complicated of the three sensors that are examined in this master thesis. The NOx concentration of the exhaust gas depends on many factors, such as the general health of the engine, the DPF, SCR, and urea dosing actuator.

Moreover, the NOx SCU has an internal sensor interface that regulates all of the signals going from the NOx sensor element. These signals are not available for monitoring in any external control unit algorithms because the supplier of the sensor implements them. See Figure 7 for a block diagram that visualizes the available signals.

Figure 7. Illustration of how the flow of the signals within the sensor and also the signals going to the ECU. [12]

The signal going from the Sensor Interface to the Control Module, where the monitoring algorithm is located, is transmitted using a controller area network (hereafter abbreviated to CAN) bus. The CAN bus is a messaging protocol that allows multiple sensors and actuators to communicate without a host computer. The main difference between the CAN bus

messaging protocol and the SENT messaging protocol is that numerous components are

(20)

messaging the ECU simultaneously, and messages can go in both directions. Furthermore, because multiple components transmit information simultaneously, there is a priority order, meaning that the component with the highest priority broadcasting its message first while the other components wait. After the component with the highest priority is done with its

message, it is the component with the next highest priority’s turn to broadcast its message.

This goes on until all of the components have sent their message, and then it starts over.

Some values might be delayed because of the priority system if many components have a higher priority in a row. The priority system needs to be considered when implementing the monitoring algorithm. [13]

The information in the CAN bus contains multiple values that are available for the monitoring algorithm; NOx concentration, the O₂concentration, O₂reading stability, NOx reading

stability, and various other information. The CAN bus signal going from the Sensor Interface is sent with a ‘G’ Hz frequency. [12]

2.4 Intermittent Errors

Some common root causes of IEs are tabulated below. [14] The IEs listed in the tables are generic and apply to all sensors. For specific sensors, there may be unique root causes of IEs that only appears in that sensor. For example, clogging of sensor connector pipes. [15]

In the literature, multiple reasons for the root causes are mentioned. The explanations vary between degradation of the sensor [1, 16, 17], stressed resources [18] and malfunctioning joints [19].

Table 1. Electrical root causes of IEs and their potential effect on the output signal of the sensor.

Root Cause [14] Potential Output Signal

Loose Connections - Solder Joint - Wiring

No signal or Maximum/Minimum Voltage Signal

Component Ageing Peaks or Oscillations

Oxidation Peaks or Oscillations

Moisture Peaks or Oscillations

Table 2. Mechanical root causes of IEs and their potential effect on the output signal of the sensor.

Root cause [14] Potential output signal

Leakage Negative peaks or Oscillations

Contamination (example; clogging) Peaks, Oscillations or Damping

Ageing Peaks, Oscillations or Damping, Oversensitive

Signal Status degradation

It is estimated that in real systems, more than 80% of the total number of errors are IEs. [20]

Moreover, IEs seem to occur at random or in a non-deterministic way. [21] Typically, the IEs that are because of loose connections co-occur with external circumstances such as

temperature, humidity, power fluctuations, and vibrations. [22] See Figure 8 for a sketch illustrating how IEs typically develop over time. [1]

(21)

Figure 8. Illustration of how a component may degrade over time. The graph is divided into a Fault Free, Intermittent Fault and Hard Fault Region. The graph is inspired by Wakil Ahmad Syed’s work. [1]

At the beginning of IE occurrence, there are no symptoms in the output signal of the sensors until a certain threshold; see point A in Figure 8. After this threshold, the IEs make their presence known through signal deviance. When the sensor degradation reaches point B in Figure 8, the symptoms are no longer IEs, but instead hard/permanent faults in the sensors or their electrical contacts. In between points A and B, the nature and severity of the IEs increase as the IEs evolves. See Figure 9 for a sketch illustrating this phenomenon. [19]

Figure 9. Illustration of how an intermittent error may progress over time. The blue line in the sketch indicates the limit of when the signal deviances resulting from IEs are detectable. Stage 0 corresponds to the Fault Free Region in Figure 8. The error in Stage 4 is not classified as an IE because the current diagnostic is triggered.

Figure is inspired from Józef Stoklosa et al’s work. [19]

(22)

The theory presented in Figure 8 and Figure 9 applies to all types of IEs. However, depending on IE type, the amplitude on the y-axis in the graph in Figure 9 is interpreted differently. For example, if the IE causes the signal to oscillate, the amplitude in Figure 9 corresponds to the strength of the signal's frequency content in Db.

Stage 0 in Figure 9 is the fault-free region in Figure 8. In this stage, nothing suggests IE occurrence when observing the output signal, even though it is present. Therefore, it is impossible to detect IEs at this stage. However, as the IEs develop, which they do over time [1, 16], the severity of the impact increases. In stages 1, 2, and 3, there is signal deviance because of the IEs, as shown in Figure 9. Stages 1, 2, and 3 corresponds to the curve between points A and B in Figure 8. Stage 1 is close to point A. In stage 1, the signal deviance because of the IEs is still not detectable. Therefore, it is impossible to identify that a sensor is suffering from IEs when the IEs are at stage 1.

Stage 2 is around the area in the middle between points A and B. When the IEs are in this stage of their development, some signal deviance is detectable. Consequently, a monitoring algorithm can find IEs at stage 2. Stage 3 is close to point B. When IEs have developed to stage 3, signal deviance is noticeable and might cause (unspecified) DTCs and large subsystem impact. Stage 4 is the hard fault region. At this stage, the errors are no longer intermittent, but permanent and the sensor is totally malfunctioning. There is consensus in the literature that IEs are the prelude of permanent faults [1, 16, 18].

At stages 1 and 2, the output signal deviance may still be within the system’s tolerances.

This means that even though there is deviance in the sensor's output signal, the deviance of the signal is small enough that the impact on the subsystem is minimal and can be regarded as none existing.

Therefore, it is not justifiable to repair or replace the sensor or its electrical contacts at that point in the degradation. Replacement of the sensors at that point in the degradation process only results in average sensor lifespan decrease, cost of ownership increase, truck uptime decrease, and environmental impact increase without any, or minimal, engine performance improvement.

However, at some point in the degradation process of the sensor or its electrical contacts, the IEs will cause output signal deviations that generate DTCs or hard/permanent Faults, i.e., at Stage 3 and Stage 4, respectively, in Figure 9. The generated DTCs or hard/permanent faults are either accounted to the sensor where the actual degradation occurs or other sensors/actuators of the subsystem that use the intermittently faulty sensor's output signal as input signals.

For IE detection, the algorithm should monitor the output signals of the sensor. As stated in Table 1 and Table 2, the most common signal behavior when IEs are present are no signal, maximum/minimum voltage signal, peaks, damping, frequent signal status degradation, or oscillations. Therefore, when those signal behaviours are detected, especially if the behaviour is observed intermittently, the sensor is more than likely suffering from IEs.

(23)

When IEs have been detected, the diagnostics of the errors should be based on their subsystem impact or occurrence. Suppose a trend can be seen; the severity of the IEs detected has reached a point of being close to generating DTCs or hard/permanent Faults, i.e., at Stage 3 and Stage 4 in Figure 9. In that case, the diagnostics of the IEs should trigger a DTC so that the truck operator has some time to schedule a workshop visit for proactive repair or replacement of the sensor or its electrical connectors.

As stated in Table 1 and Table 2, many root causes of intermittent errors make their presence known by similar output signal behavior. Therefore, it may be challenging to develop an exact fault isolation algorithm. Moreover, to enable fault isolation of every possible IEs root cause, each root cause must have its own monitor, which requires lots of memory space.

However, it is unnecessary to have an exact IE fault isolation because only the fact that the sensor is indeed malfunctioning is of interest. After all, the action taken at the workshop is the same for all malfunctioning sensors; they replace the whole component. Therefore, it costs more than it is worth to have advanced fault isolation. One exception when it is worth having exact fault isolation is if the cabling from the sensor to the ECU has a glitchy contact.

In that case, fault isolation of the faulty cabling allows for easy and cost-efficient repair.

2.5 Existing Diagnostics

When designing an algorithm that is to be implemented in a product that already has existing diagnostic routines, such as the Scania engine, the new algorithm has to be compatible with what already exists. Therefore, it is essential to analyse the existing diagnostic routine to identify errors that are not detected today.

Generally, all of the existing diagnostics follow the same approach. The algorithms monitor a signal, and when it detects what it has been designed to find, it stores the value in the short-term memory. The algorithm is active over a predetermined time, and then it starts over. The predetermined time is called the loop time. After every loop, the detected entity is treated and compared with a threshold. The treatment varies from algorithm to algorithm, but usually, it is the mean, standard deviation, percentage, or total sum of the detected entity.

If the treated detected entity is above the threshold, a DTC is triggered. If the treated

detected entity is below the threshold, multiple courses of action are standard. The course of action can be to disregard the number that is compared with the threshold (i.e. the number is deleted / not used in any function / not saved), save the number in the long-term memory (hereafter referred to as operational data), or use it for calibration.

If a DTC is triggered, it comes with a severity tag. [23] Some errors do not pose a hazard risk. Therefore, the truck can continue operation even though there is a fault present. In those cases, the fault is repaired during the next workshop visit. Some errors require immediate attention. It is because the EMS cannot guarantee that legislative requirements are met, or that engine damage is possible when these types of DTC are active. In those cases, the truck operator is informed to visit the workshop as soon as possible. The last kind

(24)

of DTCs is the most severe. If those errors are detected, the errors pose a risk to the safety of the truck operator. This can be the case if sensors within the steering wheel or brake system, for example, are broken. In those cases, the truck is put under a speed limit or has to request towing.

In the cases where data is stored as operational data, the information can at all times be accessed by workshop personnel. Furthermore, during workshop visits, some of the

operational data is uploaded to a Scania server which enables the R&D engineers at Scania to research the data for development purposes. The information from the operational data also fulfils another role, namely the data can be examined by the workshop personnel if there is a truck with an unspecified DTC for fault isolation.

2.5.1 Exhaust Back Pressure Sensor

There is one already existing diagnosis identified as relevant to the exhaust back pressure sensor study. The diagnosis algorithm triggers a DTC if any sample, in ’H’ different ‘C’ ms loops contains any values out of the voltage range (i.e., out of the voltage range, as explained in 2.1.3) within ‘I’ sec. However, if a DTC is triggered, but no sample is out of range for ’J’ sec, then the DTC is invalidated. When invalidating the DTC, the information is saved in the operational data.

If the number of past DTCs in the operational data is high, it indicates the occurrence of IEs.

However, the current diagnoses leave a window for specific IE data to go undetected.

Furthermore, the current diagnosis does not capture the magnitude of the signal deviance.

Consequently, the subsystem impact due to IEs is unknown. See Figure 10 for a visualization of the IEs whose full extent is not captured. [24]

Figure 10. Illustration of the type of IE who’s full extent is not captured with today’s existing diagnostic routine for the exhaust manifold pressure sensor.

2.5.2 High Temperature Sensor

For the HTS, there is one already existing diagnosis identified. The current diagnosis is based on the sensor's self-diagnostics. The self-diagnostics in the sensor can detect multiple types of internal errors such as overflow of data processing, the common-mode voltage of probe out of range, and insulation resistance lower than the limit. If any errors are detected, the SENT signal going from the sensor to the ECU contains the error code 'K' instead of the temperature. If 'K' is sent consecutively for ’L’ sec, a DTC is triggered. Furthermore, if no 'K'

(25)

code has been sent from the sensor for ‘L’ sec, the DTC is invalidated. See Figure 11 for a visualization of the IEs that are not detected.

Figure 11. Illustration of the type of IE who’s full extent is not captured with today’s existing diagnostic routine for the HTS.

The sensor's internal software can differentiate different types of errors, even though the same fault code is transmitted for all errors (in the fast channel). The kind of error causing the fault code is communicated through the slow channel; the fast and slow channel is explained in 2.2.2.

However, in this master thesis, only the information in the fast channel will be used by the developed algorithms. Only the fast channel is used because only the sensor's operational status is of interest because the action taken at the workshop is the same for all

malfunctioning sensors; the workshop operators exchange the entire sensor. Moreover, if all types of errors were treated separately, it would require extensive, unnecessary fault code management and memory space. Furthermore, suppose that the root cause of the sensor malfunction is of interest, for example, for development purposes, in that case, Scania can examine the returned malfunctioning sensors in a lab at Scania or the supplier. [25]

Moreover, the information in the slow channel is not always reliable. The slow channel is not reliable because of the informational broadcasting priority order when two errors co-occur.

For example, suppose that probes ‘X’ and ‘Y’ are broken simultaneously. In that case, only the error that probe ‘X’ is suffering from is transmitted in the slow channel because probe ‘X’

is prioritized over probe ‘Y’. However, in the fast channel, it is communicated that both are indeed broken. It is best not to involve the information in the slow channel because it can result in double and confusing DTCs.

When a SENT message is missing, different values are used instead depending on which probe whose value is missing. Sometimes the measured values from other probes are used.

Sometimes a modelled value is used.

Usually, the modelled value is based on the previous, fault-free sample. However, to not overcomplicate things, in this master thesis, it is assumed that the value used in the EMS when a SENT message is missing is equal to the last fault-free sample. If the SENT messages are missing for more than ‘L’ sec, a DTC is triggered. [26]

(26)

2.5.3 NOx Sensor

The NOx sensors used in Scania have extensive self-diagnostics capabilities. For example, if any internal parameters, such as power supply, various currents, voltages, are out of range, they are handled by the NOx sensor's internal diagnostics. If an error is persistent enough, an FMI is triggered. When Scania's software detects an FMI, a DTC is triggered. If a fault is detected but is not severe enough, OBD can degrade the signal status instead of triggering an FMI.

If the CAN bus messages from the NOx sensor are absent, a DTC is triggered within ‘M’ ms.

If there is a signal status degradation because of an electrical error, a DTC is activated within

‘M’ ms. If the measured NOx and O₂content are out of range, a DTC is triggered within ‘N’

sec. If there are plausibility errors, the signal status is lowered depending on the type of error and how long it takes to find the error. Once the error is encountered, a DTC is triggered within ‘M’ ms. Whenever a deviation resulting in a DTC is found, the measured values from the NOx sensor are not used in any EMS functions. In those cases, the measured values from the NOx sensor are only used to invalidate the DTC.

Moreover, if the signal status is degraded, there are two modes: not perfect and not OK.

During not perfect, the upper layer diagnostics is inhibited, and some functions are not run or run a bit differently, resulting in subsystem impact. During not OK, the upper layer

diagnostics is hindered, and no functions using the NOx sensor signal are used, resulting in subsystem impact.

The signal status of the NOx sensor can also be degraded because of natural reasons, not only because of errors within the sensor. For example, rapid changes in engine load may cause a considerable pressure change in the exhaust gas after-treatment system because of the abrupt change in the exhaust gas's mass flow. The pressure changes may cause the currents in the amperometric NOx sensor elements to go from static to oscillation, which results in degradation of the signal status.

2.6 Knowledge Gap

There is a vast knowledge gap regarding the effects of IEs in the signal going from sensors in the literature, academia, and industry. When searching on the internet and in books for information regarding typical signal behavior from sensors that suffer from IEs, only what is presented in section 2.4 was found. Furthermore, researcher team Khan S et al. stated that it is difficult to find statistics regarding IEs because the problem is severely underreported.

[14]

Furthermore, Khan S et al. suggested that since there is numerous vocabularies for IEs, it indicates that the phenomenon of IEs is not well understood even among researchers, practitioners, and other experts. Furthermore, they suggested that the term “No Fault Found”

has engrained culture of acceptance regarding unexplained faults that could be caused by IEs. The culture of acceptance could explain the lack of attention to IEs. [14]

(27)

Due to the knowledge gap as discussed above, it was not easy to decide what to look for in the signal from the sensors. However, one way to circumvent the knowledge gap is to fall back on the definition of IEs, namely the fact that they occur at periodic and irregular intervals, as described in 2.4. Moreover, there is an upper limit of how long each IE can be because the already existing diagnostic routine captures these errors, as described in 2.5.

By monitoring for periodic and irregular abnormalities in the signal while also considering the upper limit, IEs can be detected despite the knowledge gap of IEs behavior in the signal.

It is desirable to obtain data resulting from IEs, that way, Scania R&D can research the data obtained by the detected IEs to avoid IEs in next-generation engines. It is suggested in 6.1 how Scania can fill the knowledge gap by implementing the algorithms from this master thesis.

Chapter 3: Prerequisites for developing an IE detection algorithm

3.1 Demands on an IE Detection Algorithm

Intuitively one might think that the only goal of the algorithm is to detect, diagnose and fault isolate IEs. However, there is more to it than that. There are three identified customers of the algorithm being developed in this master thesis, all of them have different needs and

demands on the diagnostic algorithm. The identified customers are listed below.

1. Workshop Operator (Customer Satisfaction) 2. R&D Engineers at Scania (Knowledge & Statistics)

3. Legislation Requirements (Vehicle / Truck-Operator Safety & Compliance)

3.1.1 Workshop Operator

The workshop operator has different demands on the algorithm depending on the

circumstances. If the algorithm has detected that the number of IEs is above the threshold limit, a DTC should be triggered. The DTC should contain information such as pinpointing the faulty component, what eventual tests to perform, what order (if there are multiple tests), how to repair the component (if possible), or if the workshop operator should replace it entirely.

However, suppose that there is a generic DTC for the whole subsystem in which the algorithm operates. In that case, the algorithm should save key figures in the operational data to help the workshop operator search for the faulty component. If the values of the key figures are within predetermined ranges, different recommendations should be given. For example, if the values are low, that would indicate that the fault causing the unspecified DTC is not found in the components that this algorithm is monitoring. The probability that the

(28)

component that the algorithm is monitoring is causing the unspecified DTC increases as the key figure in the operational data increases.

If it is likely that the problem is regarding the component that this algorithm is monitoring, then a list of measures to be taken for the workshop operator should be provided. The list should be composed regarding the probability of the measures for solving the issue, the cost of the measures, the time required to perform the measures, and if the specific measure hinders the performance of a subsequent measure if the first measure did not solve the issue. An example of a measure hindering another measure can be one of the measures needs to be performed at an elevated temperature while the other has to be done at 0 degrees. Then it would be problematic to perform these measures one after the other. The goal of ordering the recommended measures is to save money and the time required from the workshop operator. [27]

3.1.2 R&D Engineers at Scania

There is a knowledge gap regarding IEs in the industry and academia (as explained in 2.6).

Therefore, the R&D Engineers at Scania are interested in obtaining statistics regarding IEs, such as the signal deviance resulting from IEs, IEs characteristics, and the conditions resulting in IEs. This data can be obtained by observing the freeze frames saved when DTC is triggered by the algorithm, recording algorithm data, and by observing how the key figures stored in the operational data develop over time.

Scania can use this knowledge to develop a theory that explains how IEs are connected to signal deviance, the root causes, and the typical development of IEs over time. The

developed theory can be used to prevent IEs from happening or at least reduce the probability of IE occurring considerably. Moreover, the theory can also be used to further perfect the algorithms from this master thesis. If this is done correctly, it can increase the uptime of all Scania vehicles.

3.1.3 Legislation Requirements and Vehicle Safety

Legislation requires that emissions are kept below certain limits and that the vehicle is safe to operate at all times. Therefore, one of the customers of this algorithm is the legislative authorities. The algorithm can be used to ensure no IEs affect critical sensors involved in emission measurement, such as the NOx sensor.

Furthermore, the vehicle safety is of concern when DTCs are triggered. There should be clear indications if the vehicle is not allowed to be driven because of the error. Determining how the vehicle operator should react when an error has appeared is a careful balancing between vehicle uptime, customer satisfaction, and hazard risk. [23]

3.2 Analog, Digital and Smart Sensors

The sensors that are examined in this master thesis are of different complexity. Arguably the exhaust manifold pressure sensor is the least complex sensor. It is the least complex

because it is an analog sensor. There are few embedded self-diagnostics in the sensor, and

(29)

the signal going from the sensor into higher software layers only contains information regarding the pressure.

The HTS is a bit more complicated. It deploys the SENT protocol for communication, which means that more information is available in each transmitted message. There can be from 1 up to 4 probes for each HTS. This means that the probes have to be treated separately, which essentially means that it requires one monitor for each probe. Moreover, the HTS sensor has self-diagnostics. The sensor sends an error code in the SENT message if its internal parameters are out of range, as explained in 2.5.2.

The NOx sensor is the most advanced and complicated sensor treated in this thesis, and it is arguably among the state of the art in sensor technology. The sensor’s embedded diagnostic routine is very sophisticated. Moreover, there is plenty of information available in each CAN message. For example, each CAN message contains measured NOx, and O₂

concentrations, among the others.

Because of the varying self-diagnostic capabilities of the sensors, the demands on the monitoring algorithm vary as well, depending on which sensor is to be monitored. The simpler the sensor, the simpler the monitoring algorithm can be, but more upper software layer diagnostic routines are needed to capture all errors. The smarter the sensor, the more complex the monitoring algorithm has to detect errors which are not captured by the sensor's self-diagnostics. However, fewer diagnostic routines are needed for smart sensors because many errors are already captured by the embedded software within the sensor itself.

3.3 Multiple Monitoring Algorithms

As described in 2.4, different IEs cause different behavior in the signal going from the sensor into the upper software layers. No single monitoring algorithm can capture all of these signal behaviours. Therefore, to capture a broad range of IEs, multiple monitoring algorithms are needed.

Decisions regarding deployment of a monitoring algorithm, which algorithm, and how many are dependent on the sensor at hand. In some cases, no monitoring algorithm might be needed at all. For example, this can be the case if the sensor is very robust, and it is not likely that IEs will occur, and if they occur, the subsystem impact likely would not be

significant enough to be a problem. In contrast, if there is a very problematic sensor, it might be worth deploying many monitoring algorithms to capture future hard/permanent faults at an early intermittent stage which would enable proactive replacement.

3.4 Real-Time Processing

The developed algorithms are to be performing their diagnostics of IEs in an operating truck.

That implies that the algorithm needs to be able to perform real-time processing. In Figure 12 below, the “Data acquisition” is the time required to get the signal to the ECU, where the algorithm is running. The “Algorithm processing” is the time that the algorithm needs to perform its designated calculations and operations. Lastly, the “Frame” is the maximum time

(30)

allowed for the total processing of a sample before the subsequent signal sample needs to be processed. Real-time processing is achieved if the sum of data acquisition time and the algorithm processing time are less or equal to the frame time (see Figure 12).

Figure 12. Illustration of a real-time processing system. The picture is inspired by Vidya Viswanathan’s online presentation (MathWorks). [28]

The pressure sensor, HTS, and NOx-sensor signal frequencies have been stated in 2.1.3, 2.2.2, and 2.3.2. By knowing the sampling frequency, the calculation of the time between samples is possible. The time in between samples is equal to the frame time. See Table 3 for the frame time, data acquisition time, and algorithm processing time for the sensors examined in this master thesis.

Table 3. Frame time, data acquisition time, and algorithm processing time of the pressure sensor, HTS, and the NOx-sensor.

Maximum Frame Time and Data Acquisition Time + Algorithm Processing Time [ms]

Pressure Sensor, 100 Hz Signal ‘b’

Pressure Sensor 1kHz Signal ‘a’

HTS, 250 Hz Signal ‘f’

NOx-sensor, 20Hz Signal ‘g’

Since the algorithm processing time varies for different sensors (because of the varying signal frequency), the computationally intensive nature of algorithm operations has to be considered before deploying an algorithm to a specific sensor. For example, because the NOx-sensor has a low sampling frequency, more computationally intensive operations, such as spectral analysis, may be performed.

3.5 Test Conditions

The monitoring algorithms should not necessarily be active at all times. When the algorithm should be active is different for each sensor, it depends on sensor complexity, function, and location.

(31)

The pressure sensor algorithm should be active once the protocol ‘O‘ is completed. The NOx sensor algorithm should be active once the protocol ‘O’ is completed and the sensor heater reaches the temperature ‘P’.

The HTS sensor algorithm should be active once protocol ‘O’ is completed. Moreover, the HTS ambient temperature has to be between (-)’Q’ and (+)’R’ C°, which is the operational temperature range of the sensor. The temperature condition is there to avoid recordings of transient errors.

3.6 DTC Threshold

The purpose of the algorithm threshold is to trigger a DTC if the subsystem impact is too significant or if the diagnostic prognosis states that the sensor is about to fail shortly. Multiple variables need to be taken into consideration to enable the calculation of the threshold. See the numbered list below for all of the questions that need to be answered.

1. How much can the signal deviate before too much subsystem impact?

2. Is the threshold constant or variable?

3. If the threshold is variable, what variables should be considered, and how often should the threshold be calibrated?

4. How do factors such as performance of other sensors and actuators in the

subsystem, general engine health, engine load, vehicle speed, cold, heat, vibrations, and sensor age affect the threshold?

5. For how long does the signal have to deviate before DTC?

6. Is the threshold the same for all types of sensors and IEs?

7. How many IEs typically occur before resulting in a permanent error?

8. At what point can a prognosis regarding near future failure be stated depending on the occurrence of IEs?

When all eight questions above are answered, a threshold can be calculated. However, because of the knowledge gap (as discussed in 2.6), it is challenging to obtain the

information needed. Alternatives to a theoretically perfect threshold are presented in 4.3.3.

Chapter 4: Applicable Methods for IE Detection

This chapter presents the developed methods that are applicable for IE detection.

Regardless of the methods used, the general flowchart of the algorithm is the same. See Figure 13 for the generic sketch of the algorithm flowchart. The developed methods are sorted into three groups concerning identification of IEs, short-term storage of data and diagnosis.

(32)

Figure 13. Illustration of the generic flowchart used regardless of methods deployed.

4.1 Identification of Intermittent Errors

4.1.1 Out-of-Range Method

All of the sensors' parameters operate within predefined ranges. The ranges can apply to multiple metrics such as currents, resistances, temperatures, measured quantity, and voltages. For example, the output signal of the pressure sensor operates within LVL_PSto UVL_PSV, as described in 2.1.3. Suppose there are peaks or oscillations in any of the sensor parameters metrics that are out of range. In that case, the explanation is the occurrence of an error.

For the pressure and NOx sensor, the output signal sent to the higher software layers can go out of range. In contrast, for the HTS, the self-diagnostic of the sensor stops any

out-of-range values from being sent to higher software layers; instead, an error code is sent.

Therefore, the out-of-range method can be applied for the HTS by monitoring those error codes. Moreover, the same error code is sent for the HTS if any internal parameters are out-of-range, as explained in 2.5.2.

For the sensors where no error codes are sent when the sensor is out-of-range, an algorithm must be written to detect out-of-range values. See Figure 14 for how that code can be

written in MATLAB.

Figure 14. Illustration of the code that can be used to implement the out-of-range method in MATLAB.

The types of errors that are captured by deploying this method are positive/negative peaks and oscillations. The root causes vary between component ageing, oxidation, moisture, leakage, loose electrical connections, and contamination, as described in 2.4. For analog sensors, the root cause might also be a fault in the cabling connecting the sensor to the ECU. For example, if there is an open circuit in the cabling, the residual voltage in the ECU will result in measured voltages below the range limit.

If the algorithm has triggered a DTC, the recommended action to take at the workshop differs between the sensors. Since the root cause for the IEs for analog sensors can be a fault in the cabling, the cost-effective measure is to repair the cabling. If repairment of the

(33)

cabling does not solve the issue at hand, then the whole sensor has to be replaced. For digital sensors, the recommended action is to replace the sensor because, most likely, there is some internal fault within the sensor. The recommendation was made considering the first and second customer of the algorithm, namely the workshop operator and R&D Engineer at Scania. See 3.1.1 and 3.1.2 for the customers’ demands and what was considered when the recommendations were made.

The advantage of the out-of-range method is that it can capture the complete duration of IEs because the time the signal is out-of-range or the time the error code is being sent is

effortlessly recorded. The out-of-range method is very robust because only error codes or out-of-range values are used - guaranteeing that if the algorithm detects anything, there is some fault in the sensor. Furthermore, it can capture a broad range of IEs (positive/negative peaks and oscillations), as described in the paragraph above. The method is not

computationally intensive because the only algorithm operation used compares the signal to a detection threshold, see Figure 14, or receives self-diagnostic error codes. The

disadvantage is that it can only capture IEs that cause the signal to deviate to the point of the signal going out of range. Because all sensors can go out of range, the out-of-range method is suitable for all sensors treated in this master thesis. See Table 4 for a

summarization of the method.

Table 4. Summarization of the characteristics of the Out-of-Range Method.

Suitable for the Out of range method?

Signal Monitored

Signal Behavior captured

Root causes If DTC: Workshop action

Exhaust Back Pressure Sensor

Yes Output

signal

Peaks or Oscillations

Component aging, oxidation, moisture,

leakage, contamination, loose

connections in the solder joint or wiring

Test if the error is because of loose connections. Suppose

that is the case, repair wiring—otherwise,

exchange sensor.

HTS Yes Error codes Error codes Component aging, oxidation, or contamination

Exchange sensor

NOx Sensor

Yes Output

signal

Peaks or oscillations

Component aging, oxidation, leakage, sensor aging, or

contamination

Exchange sensor

4.1.2 Response Time Method

The sensor's response time limits how much adjacent samples may differ from each other when the sensor is fault-free. The definition of response time is the time required for the sensor to go from a previously measured value to 63.2 % of the new value when the measurand changes. For example, suppose a pressure sensor is kept at 1 bar and then exposed to a pressure of 5 bar. In that case, the response time is equal to the time required

(34)

for the sensor to go to 3.5 bar. 3.5 bar is 63.2 % of the difference between the old environment of the sensor to the new environment.

If the difference between the sensor's present sample, compared with the previous sample, is too large, considering the response time, the large difference can be explained by a fault in the sensor. Since the sensors operate within predetermined ranges, the differences between adjacent samples considered too large vary depending on the value of the previous sample. SeeFigure 15for visualization of accepted differences based on the earlier

samples' value.

Figure 15. Illustration of the expected present value as a function of previous sample value. The red areas illustrate unreasonable present values as a function of previous sample value.

See Figure 16 for the MATLAB code implementing the method. The upper range limit was used as an example. For the lower range limit, the code line is very similar.

Figure 16. Illustration of the code that can be used to implement the out-of-range method in MATLAB.

The response time method works for all response times regardless if they are constant or variable. The only requirement is that the response time of the sensor at hand is greater than the sampling time at all times. Otherwise, no differences between adjacent values can be regarded as too big.Table 5lists the sensors' response times and sampling rates. For a more in-depth description of sensors' sampling rates, see 2.1, 2.2, and 2.3.

Table 5. The sensors’ response time and sampling rates.(S>a, T>f, and U>g)

Response Time [sec] Sampling Rate [ms]

Pressure Sensor < ‘S’ [29] ‘a’

HTS < ‘T’ [30] ‘f’

NOx Sensor < ‘U’ [31] ‘g’