Two-stage Soldering Defect Detection with Deep Learning

(1)

IN

DEGREE PROJECT INFORMATION AND COMMUNICATION TECHNOLOGY,

SECOND CYCLE, 30 CREDITS ,

STOCKHOLM SWEDEN 2019

Two-stage Soldering Defect

Detection with Deep Learning

JINGZHI YE

(2)

Abstract

In electronics industry, quality control of soldering points on printed circuit boards (PCB) is an important topic. Soldering points are usually inspected by the operator or by automatic optical inspection (AOI) techniques.

(3)

Sammanfattning

Inom elektronikindustrin är kvalitetskontrollen av lödpunkter p˚a kretskortet (PCB) ett viktigt ämne. Lödpunkten inspekteras vanligtvis av operatören eller den automatiska optiska inspektions-tekniken (AOI).

(4)

Acknowledgment

I would like to thank Markus Flierl, my examiner, for his guidance and support to this thesis.

This thesis was conducted in Sony Mobile Communications in Lund, Swe-den. Thank my industrial supervisor, Lijo George, my colleagues, Bilal, Johan, Kieran and many other amazing people in the company for their support and inspiring discussion.

(5)

List of Abbreviations

AOI Automatic Optical Insepction. CCD Charge-coupled Device.

CNN Convolutional Neural Network. CPU Central Processing Unit. CSV Comma Seperate Values. FN False Negative.

FP False Positive.

GPU Graphics Processing Unit. IOU Intersection over Union. LED Light-emitting Diode. MSE Mean Squared Error. PCB Printed Circuit Board. ReLU Rectified Linear Unit. RGB Red, Greeb, Blue. ROI Regions of Interest.

RPN Region Proposal Network. SMD Surface-mount Device. SMT Surface-mount Technology. SVM Support Vector Machine. TN True Negative.

(8)

Chapter 1

Introduction

In past few years, deep learning has been proven successful in solving many computer vision problems, such as image classification, image segmentation, object detection, etc.. Besides the research in the academy, the industry is also seeking deep-learning solutions for practical problems. This thesis focuses on one of the application of deep learning in the electronics industry - detecting defects of soldering points on printed circuit boards.

1.1 Motivation

In printed circuit board (PCB) production process, soldering is an essential step, which connects the electronic components and the bare circuit board with melting metal. As the soldering joint is formed by cooling the melting metal, the shape of each soldering joint can be variant and the uncertainty brings higer chance of defects. Thus, the quality control over the soldering joint is of great necessity.

Traditionally, the operator examines the joints on the PCB and takes further actions if defects are found. However, this requires the operator to be highly experienced and to pay constant attention. The idea of Automated Optical Inspection (AOI) was proposed to automate the process of inspection with the help of optical sensors, so as to help reducing the workload of the operator and improving the efficiency of the production process.

(9)

Most deep learning architectures for object detection can be labelled as either ”one-stage” or ”two-stage” method. The one-stage method detects the desired objects directly from the original image, while the two-stage method usually proposes bounding boxes as the regions of interest (ROI) in the first place, and find the objects from the proposed regions.

1.2 Problem

Considering the problem as an object detection task, this thesis focuses on the two-stage solution only, whose first step is to detect all soldering points, and then classify each of them by its defectiveness and the type(s) of defect it has. Therefore, we would like to conduct a research on how to detect soldering points on the PCB, and how the deep learning model performs in terms of multi-label soldering defect classification.

1.3 Purpose & Goals

In this thesis, we aim to propose and evaluate a two-stage method based on deep learning for soldering fault detection. Also, the company has a limit in terms of the execution time that each panel image should take less than 17 seconds, which is the current time for manually inspection and repairing. If such solution can be found, the operator will have more time for repairing rather than for repetitive inspection work, and thus improve the efficiency as well as the quality of the production process.

The goal of the project is to deliver a solution for soldering defect detection based on deep learning. It can be divided into following sub-goals:

• Developing a tool for data annotation;

• Designing an algorithm to detect all the soldering point on the PCB; • Implementing a deep learning model for the classification of soldering

points;

• Evaluating the performance of the model.

1.4 Methodology

(10)

In the ROI proposal stage, we come up with a template-based method and a semi-supervised method. The idea of the template-based method is to align a reference PCB image, whose soldering points have been manually specified, to the rest of the inspected images and thus locate the soldering points; the idea of the semi-supervised method is to train a U-NET to segment the soldering part and propose bounding boxes from the segmentation result.

In the classification stage, we adopt a convolutional neural network (CNN) with similar architecture to VGG network [4]. The details of the architecture we use will be demonstrated in the following chapters.

There are many existing two-stage object detection algorithms. The ”R-CNN” series, including R-CNN [1], fast R-CNN [5] and faster R-CNN [6], are ones of the most well-known models. However, they are not only know for their their precise results on object detection, but also for the high computational cost. What’s more, some of them require a pre-trained model for ROI proposal, which is difficult to acquire for PCB data due to lack of benchmark and related works. Therefore, we design our own method instead of using existing ones.

1.5 Delimitations

Since this thesis is linked to a project in Sony Mobile Communications, the experiments are conducted in accordance with the project and the needs of the company. Thus, we mainly concentrate on a specific type of PCB that manufactured by the company and provide the AOI solution for that production line. Also, only four common types of soldering defects which are fatal and require repairing are considered.

The data was collected and annotated by the operator of the production line. However, because of the heavy workload the operator already had, only small amount of data is annotated and available to us. The small scale of the dataset results in low representativity. Therefore, even though the method is designed to detect all four types of defect, our evaluation mainly focuses on the solder bridge, the most common one.

(11)

Chapter 2

Background

AOI of PCB has always been a hot topic in the PCB industry. However, the scenario for this project is quite unique. In this chapter, we illustrate the four types of soldering defects to detect, and give introductions to the common deep learning algorithms of two-stage object detection. In the end of the chapter, we briefly review the related works.

2.1 Soldering Defect

In this project, we focus on four types of common soldering defects - solder bridge, cold solder, dry joint and leg up. Below are the descriptions and sample images of each type of the defect.

2.1.1 Solder Bridge

(12)

(a) (b) (c) Figure 2.1: Solder Bridge Samples

2.1.2 Cold Solder

The cold solder happens when the metal is not heated to a high enough temper-ature during the soldering process. It results in high possibility of cracking of the soldering joint, so that it may easily be detached from the board. Visually, the cold solder usually looks like a half-sphere while the good ones has the shape of a sand pile. Sometimes there are even cracks on the cold solder.

(a) (b) (c)

Figure 2.2: Cold Solder Samples

2.1.3 Dry Joint

(13)

(a) (b) Figure 2.3: Dry Joint Samples

2.1.4 Leg Up

The leg up is not a very common soldering fault compared to the others. It happens when the pin fails to go through the hole. This may be caused by the misalignment between the component and the board, the unexpected short pin of the component, or the missing component. The soldering point with leg up is usually flat while the pin can be seen in the good soldering joint.

(a) (b)

(14)

2.2 Object Detection with Deep Learning

This section gives a brief introduction to the history of the Convolutional Neu-ral Network and the development of the two-stage object detection with deep learning.

2.2.1 Convolutional Neural Network

The Convolutional Neural Network is a deep learning architecture widely used for image data. It was first used for image classification in 2012 [7], and there are 3 key types of layer in a typical CNN architecture for classification. They are the convolutional layer, the pooling layer and the fully connected layer. The convolutional layer performs convolutional operation and extracts the fea-ture. By stacking multiple convolutional layers, the network grows ”deep” and it enables the model to extract features in higher levels [8]. Generally, after the convolutional operation in the convolutional layer, a non-linear function, called activation function, is applied to improve the expressivity of the network. In the CNN, a commonly used activation function is ReLU, which stands for Rectified Linear Unit.

The pooling layer acts as a feature-selecting process by down-sampling the fea-ture map and significantly reduces the number of parameters. Most of the CNN implements max-pooling, which selects the maximums during the pooling operation.

The fully connected layer is usually for high-level reasoning in image classifica-tion. With high-level features extracted by convolutional and pooling layers as input, multiple fully-connected layers combine the features and yield the final prediction of the class.

2.2.2 Two-stage Object Detection

(15)

extracted by the CNN are classified by a support vector machine (SVM), and the bounding boxes are refined with non-maximum suppression and Canny edge detection.

Selective search and the bounding box regression are quite computationally ex-pensive, which become the main bottleneck in terms of the speed of R-CNN. In 2015, Girshick improved CNN and named the new algorithm as fast R-CNN [5]. Fast R-R-CNN has two main improvements. It proposed ROI pooling to handle different size of the ROI, and combined feature extraction, classifica-tion and bounding box regression in a single CNN model. The improvements significantly reduce the running time in both the training and the test phase. However, selective search is still a costly step. Ren et al. proposed Faster R-CNN [6] to replace the original proposal method with a region proposal network (RPN) in the same year of Fast R-CNN. The RPN aims to estimate the rough position and the scale of objects from the feature map, which is significantly faster than selective search. What’s more, the RPN and the network in the classification stage share the weights for feature extraction and thus improve the efficiency of the whole algorithm.

2.3 Automatic Optical Inspection for the PCB

Many researches about the AOI techniques for the PCB have been carried out in past years. Nevertheless, seldom share the same scenario as ours.

Some of the researches aimed to find out defects on the bare PCB, where no electronic component is assembled. The defect on the bare PCB mainly occurs on the printed conductive part, as shown in Figure 2.5a, and a perfect pattern can be easily defined. In 1996, Wu et al. proposed a template-based AOI method for bare PCB faults detection [10]. The main pipeline of the method is to subtract the perfect template image with the inspected image and then classify according to the residual image. The idea of the subtraction and classification is so classic that it is still widely adopted even after more than 20 years. In 2018, Wei et al. implemented deep learning to the similar pipeline [11]. In their work, the ROIs are proposed according to the residual image and then are classified by the CNN. Huang and Wei established a dataset [12]for bare PCB one year later.

(16)

(a) Printed conductive part to be detected in bare PCB

(b) Soldering points to be detected in as-sembled PCB

Figure 2.5: Different Part of PCB for Detection

shape of the soldering joint can thus be estimated according to the reflected color.

With the same image acquisition setup, most related works approached the problem with two-stage methods as well. However, there are various options for either stage. For instance, Song et al. [17] proposed the ROI by segmentation with traditional computer vision techniques, and tried several machine learning techniques for classification, including the decision tree, the multi-layer percep-tron and the support vector machine. Cai et al. [18], on the other hand, divided the image as blocks and trained multiple CNNs to for defective block selection and defect classification respectively.

(17)

Chapter 3

Method

This chapter introduces the techniques we use and our main contributions. We start from the data preparation to our own two-stage detection algorithm. For the first stage, ROI proposal, we come up with two methods. Then we show how we achieve the second stage - classification. In the end of the chapter, we describe the related evaluation metrics as well as the tools used in this project.

3.1 Data Acquisition & Annotation

Deep learning is a data-driven technology where data plays a essential role. Before we reach the body of the algorithm, let us have a look how the data is acquired and annotated.

The PCB is scanned by a scanner and the PCB panel image is annotated with an Android app developed by us. Figure 3.1 is a screenshot of the app with descriptions. When the operator notices a defective soldering point, he/she could draw a bounding box around it and assign one or more labels to the bounding box. The color of the bounding box indicates the status whether it is selected or annotated.

(18)

Figure 3.1: The screenshot of the Andoird App for Annotation. Table 3.1: Annotation File Sample

Left Top Right Bottom None Solder

Bridge Cold Solder Dry Joint Leg Up 0.40788 0.43254 0.42624 0.45282 False False True False False 0.07041 0.88399 0.11262 0.91230 False False True False False 0.04696 0.09195 0.05423 0.15178 False True False False False

3.2 Template-based ROI Proposal

ROI proposal is our first step towards the detection of the defective soldering points. The expected outcome of the process is bounding boxes which encloses all individual soldering points.

As we are focusing on a single PCB model in this project, it comes naturally that we can define a template pattern and map it to all the inspected images by transformation, which we call as template-based ROI proposal. This section focuses on the related image processing steps and pipeline of the template-based method.

3.2.1 Image Processing

(19)

Color Conversion

Many classical feature extraction algorithms are based on single-channel or even binary images, such as Hough Circle Detection, which is adopted for feature point extraction in this thesis. Therefore, a proper conversion method is desired when the input image has multiple channels. A very common conversion rule is to simulate human’s perception to illumination of different colors. Generally, the intensity is calculated as a linear combination of the red, green, blue (RGB) channels. For instance, the build-in color conversion method of OpenCV [19] is defined as:

Y = 0.299 · R + 0.587 · G + 0.114 · B.

The notation R, G, B stand for the red, green, blue channel, and Y indicates the value of the single channel after the conversion.

However, the ”standard” conversion method might not be a proper option con-sidering the special color characteristics of our PCB panel image. As shown in Figure 3.2a, the board is mainly green while the soldering point can have both bright and dark part due to the reflective nature of the metal surface. If the ”standard” conversion rule is applied (Figure 3.2b), we can see that the range of the converted value on the soldering part, which is considered as the fore-ground, is very wide and overlaps with that of the background (PCB board). This results in the high difficulty of distinguishing the soldering part from the board.

(a) Color Image (b) Standard Conversion (c) Red-Green Difference Figure 3.2: A sample PCB panel image and its grayscale images by different conversion methods.

To better separate the soldering part and the green panel background on the single-channel image, we convert the original image to grayscale by using the dif-ference between red and green channel, and then normalize it. Mathematically, it is written as

(20)

gray(x, y) = (dif f (x, y) − min

x,y dif f (x, y))/(maxx,y dif f (x, y) − minx,y dif f (x, y)), where dif f is the difference image, gray is the grayscale image, (x, y) indicates the pixel coordinate.

Figure 3.2c shows the result of our conversion method. After conversion, the green background becomes extremely dark while the whole soldering part gets quite even intensity. Thus, the foreground and the background can be easily separated based on the grayscale value.

Hough Circle Detection

Hough Transform is a feature extraction technique which is able to detect certain shapes from the image. The initial version was invented and patented by Hough for line detection [20]. Later the method is extended to more complicated shapes by Duda et al. [21] In a Cartesian coordinate system, a circle can be described as

(x − a)2+ (y − b)2= r2,

where r is the radius of the circle, (a, b) refers to the coordinates of the center, and all (x, y) on the image domain satisfying the equation is on the circle. The parameters define a 3-dimensional parameter space (a, b, r), in which each point refers to a circle in the image domain.

Edge detection techniques, such as Canny edge detection [22], are usually ap-plied before Hough transformation to determine the contours. For each point on the contours, it votes for all the circles - represented as (a, b, r) in the parameter space - passing through it. After voting, the maximums in the parameter space are considered as the detected circles. In practice, limitations, such as the range of the radius, minimal distance between circles, least number of votes etc., are used for better performance and efficiency.

Image Alignment

Two images can be easily aligned by correctly matched feature points. There are many existing algorithms for feature point detection and matching, such as Harris corner detector [23], SIFT detector [24] etc. However, our data is very tricky to these detectors due to the large size and huge amount of similar and even repetitive patterns. As a result, the detection and matching would take extremely long time.

(21)

Figure 3.3 illustrates the pipeline of the aligning process. It takes at least 3 pairs to calculate affine transformation and 2 pairs for similar transformation. There-fore, we start with all the four pairs and use less when no proper transformation can be found, until there are not sufficient pairs. The mean square error (MSE) between the two images is used as the metric to evaluate the alignment. Ini-tially, the MSE is measured for the reference and the original inspected image. Whenever a lower MSE is found between the reference image and the aligned inspected image, the corresponding transformation can be regarded as a ”good” one and we save that for further use; otherwise, we can assume the calculated transformation is wrong and misdetection has happened to the feature points.

Figure 3.3: Image Alignment Process

3.2.2 Pipeline

(22)

(a) The template bounding boxes on the reference image (zoomed to one PCB)

(b) The transformed bounding boxes on the inspected image

(23)

Then, for each of the inspected image, we calculate the transformation by align-ing the it to the target, and map the template boxes accordalign-ingly. The transfor-mation is calculated with the matched feature point pairs on the two images. We notice that there are four ”holes” (Figure 3.5), which we call ”anchors”, for fixation around each of the four corners on the panel. Therefore, Hough circle detection algorithm (explained in Section 3.2.1) is used to detect those anchors instead of complex feature point extracting and matching algorithms. The de-tected circle which is the closest one to any of the four corners is considered as a feature point and can be easily paired with the corresponding point on the reference image, based on their positions.

(24)

the inspected image.

3.3 Semi-supervised ROI Proposal

The template-based method for ROI proposal can be really efficient and ac-curate. However, for every new PCB models, it always takes time to prepare the new template bounding boxes. Also, the transformation is calculated solely based on the feature points. In the case that misdetection happens - very rare according to our experiment, though - or there is no good feature point for matching, the method cannot even yield a close result.

Therefore, apart from the template-based method, we come up with a semi-supervised method, which is more conscious to the content. The main idea of the method is to segment the soldering part on the panel image and generate bounding boxes based on the segmentation result.

3.3.1 Image Processing

The semi-supervised method shares some image processing techniques with the template-based one, including color conversion and image alignment. Here we will skip those which has already been introduced.

Otsu’s Binarization

Thresholding is a basic yet widely-used technique to get binary image from a single-channel image. When a threshold is selected, each pixel is classified by comparing its value to the threshold. A good threshold is critical to the result but usually hard to obtain. In 1979, Nobuyuki Otsu propose an automatic threshold selecting method [25], with which the within-class intensity variance of the binary image can be minimized. In the case of two-class thresholding, the within-class variance can be written as

σw2 = ω0σ20+ ω1σ21,

where ω refers to the class probability and σ2 _{is the variance. By iterating} over all the possible thresholds, the one with minimal within-class variance is selected.

(25)

(a) Binary image with standard color con-version

(b) Binary image with channel-difference color conversion

Figure 3.6: The image binarized with Otsu’s Thresholding with different color conversion methods.

For comparison, Figure 3.6b shows the binary image with the ”standard” color-grayscale conversion method, which turns out that the soldering part cannot be well separated from the background.

Mask Generation

Apparently, the binary image is noisy and contains much non-soldering part. It can by no means be directly used as the training data and teach a neural network to segment the soldering part. Hence, we take further actions to generate a cleaner mask for training purpose.

There are two main steps in mask generation. The first step is to use the bounding boxes to remove the non-soldering parts. Then, some image processing techniques are adopted to refine the mask.

Figure 3.7 is a set of images showing the intermediate results. As shown in Figure 3.7a, the input is a binary image and a set of bounding boxes indicating individual soldering points. After brutally removing all the foreground (white part) outside the bounding boxes, we can get a cleaner binary image as the rough mask (Figure 3.7b).

(26)

(a) The original binary image with bounding boxes

(b) The binary image filtered with the bounding boxes

(27)

Bounding Box Generation

A good mask is not enough. We need to generate bounding boxes from the mask so that we can classify each soldering point individually.

A very basic strategy is to divide the mask into regions according to the connec-tivity, and then generate the bounding boxes for each connected region. How-ever, as the mask is generated rather than human-annotated, sometimes there are errors, like small ”bridges” connecting separate soldering points, which ends up in one bounding box enclosing multiple soldering points. As shown in Fig-ure 3.8a, there are many errors due to the small ”bridges”.

We have also tried the watershed algorithm [26] to segment the individual sol-dering points. We first apply distance transformation to the mask and find the local maximums after the transformation. The local maximums are used as the ”source” and the ”water” coming out from the source determine the segmenta-tion boundary by ”flooding”. As seen in Figure 3.8b, the outcome is incredibly good and the algorithm can handle the ”bridges” between soldering joints. Nev-ertheless, the biggest challenge of the method is the speed. It cost more than 20 times as long as the running time of the vanilla connectivity-based method. As a trade-off between the speed and the performance, we finally choose to improve the connectivity-based method by adding a erosion operation before calculating the connectivity. The erosion process partly eliminate small bridges between soldering joints, and thus help generate correct bounding boxes. How-ever, it may fail when the ”bridge” is too wide (Figure 3.8c). The parameter of the erosion requires tuning so that it can eliminate most bridges while keeping the small solder points in the meantime.

3.3.2 Segmentation Network

In addition to classification, the CNN also shows its great potential in many other tasks such as image segmentation. U-Net [27] is an architecture for seg-mentation and can be regarded as a encoder-decoder network. The encoder part is a VGG-like architecture with only convolutional layers and max-pooling lay-ers, while the decoder part is almost symmetric except for the max-pooling layer replaced by the transposed convolutional layer for upsampling. There are also skipping layers passing low-level features from the encoder part to the decoder part for precision.

In our project, for convenience, we customize the architecture of U-Net and make it strictly symmetric so the output segmentation mask is of the same size of the input image. Figure 3.9 shows the architecture.

(28)

(a) The Bounding Boxes Generated by Connectivity

(b) The Bounding Boxes Generated by Watershed Algorithm

(c) The Bounding Boxes Generated by Connectivity after Ero-sion

(29)

(30)

3.3.3 Pipeline

The core of the semi-supervised method is to train the deep learning model for segmentation. The target images in the training set are obtained from the binary images with the help of the template bounding boxes, as described above. Once the model learns to segment the soldering parts from the original image, the next step is to propose bounding boxes from the binary mask. As mentioned in Section 3.3.1, we have tried 3 strategies and decide to select a improved connectivity-based method. Thus we can propose the bounding boxes indicating the regions of interest.

3.4 Classification

The ideal output of the ROI proposal stage is the panel image with bounding boxes enclosing each of the soldering point. Then it becomes a typical multi-label image classification problem to identify the defect.

With the help of annotated bounding boxes, we can crop the individual soldering points from the whole panel image as a single instance for the deep learning model. This section talks about how the data is processed and the network architecture for classification.

3.4.1 Data Preparation

Annotation Conversion

As shown in Figure 3.10, the original annotation is not appropriate for training a model to classify single soldering point since a bounding box may include multiple soldering joints. Therefore, after ROI proposal, we need to manually convert the annotation to the bounding boxes proposed by our algorithms.

(a) (b) (c)

(31)

Data Augmentation

In deep learning, both the quantity and the quality of the dataset play critical roles. There are two main challenges in our dataset. The first one is the small scale of the dataset, which leads to high chance of overfitting. The other chal-lenge is the significant imbalance between the defective and the non-defective data, with which the trained model may be strongly biased.

To deal with the two issues, different strategies are adopted for the defective and non-defective data respectively. The defective data is upsampled by augmen-tation, while the non-defective data is downsampled by random selection. The goal is to keep the number of samples in each class eventually close. The non-defective data is augmented by random combination of scaling, shift, flipping and 90-degree rotation. It is worth noting that the panel image is unaligned, which also introduces slight deviation in the orientation.

At last, all the chosen soldering points are cropped and resized to the same size (100*100) as training data for the classification model. Figure 3.11 shows samples of the augmented data.

(a) (b) (c) (d)

Figure 3.11: The Same Soldering Point after Augmentation

3.4.2 Classification Network

”VGG” is a CNN architecture invented by Visual Geometry Group of University of Oxford [4]. The network uses small convolutional kernels (3*3) for convolu-tional layers and small window size (2*2) for max-pooling layers, which makes it possible to build a rather deep neural network. VGG has been proved useful in image classification problem, and the pre-trained VGG without fully-connected layers is also recognized as a good feature extractor.

(32)

(33)

3.5 Evaluation Metrics

This section discusses the evaluation metrics for both segmentation and classi-fication models.

3.5.1 Image Segmentation

Intersection over Union (IOU) is a evaluation metric for images segmentation. It evaluates the segmentation results by comparing it with the ground truth. The name demonstrates the definition of IOU in a straightforward manner. Mathematically, it is

IOU = Area of Overlap Area of Union .

IOU ranges from 0 to 1, where 0 means no intersection between the ground truth and the result, and 1 means they are identical.

3.5.2 Image Classification

Table 3.2 shows some terminologies we will use for the classification evalua-tion by comparing the predicted results with the ground truth. The matrix is called confusion matrix and it gives a straightforward look of the classification performance. More insights can be explored with the basic four terms of the confusion matrix: True Positive (TP), False Positive (FP), True Negative (TN), False Negative (FN).

Table 3.2: Confusion Matrix

Predicted True Predicted False Actual True True Positive

(TP)

False Negative (FN) Actual False False Positive

(FP)

True Negative (TN)

The following metrics evaluates the classification results from different perspec-tives:

• Accuracy = (T P + T N )/(T P + T N + F P + F N ) • Precision = T P/(T P + F P )

• Recall = T P/(T P + F N )

(34)

Accuracy is the most common metric among the four, which is the ratio of the overall correctly-classified samples. However, it can be biased when the dataset is imbalanced, due to the dominance of the majority class. Precision and recall, on the other hand, concentrate on the classification quality of the positive samples, which generally has higher value than the negative ones. F1-score is the harmonic mean of precision and recall and takes both into consideration. The output of a neural network for classification is usually a vector of decimal numbers, which requires thresholding to determine the class of the prediction. Therefore, the terms in the confusion matrix change according to the selected threshold. The trade-off between precision and recall also shows up when differ-ent thresholds are chosen, and it can be illustrated by a precision-recall curve. In our scenario, the FN samples are more harmful than the FP samples, because the FP samples can be easily corrected while it cost much more effort to find out the FN samples. Therefore, we prefer the classification model along with the proper threshold can achieve high recall and keep precision in a acceptable level.

3.6 Tools

(35)

Chapter 4

Workflow

This chapter shows the workflow of the whole project, and it is not limited to this thesis. The workflow aims to demonstrate the motivation of the thesis in a higher level and to help the readers understand the project better.

There are two workflows in the project - an offline workflow and a sustainable workflow. The purpose of the offline workflow is to train the classification model in a offline manner. In other words, a fixed training dataset is established and is fed to train the model. On the other hand, the sustainable workflow takes advantage of the trained model and predicts the coming inspected image. The prediction is then corrected and in return added to the training set for re-training. Thus the model can be updated and improved in a sustainable way.

4.1 Offline Workflow

The offline workflow is quite straightforward. The PCB images are scanned, annotated and stored in a local machine as the training data. The classification model is trained with the data and can be used on the production line to detect soldering defects. Figure 4.1 illustrates the process.

The offline workflow is easy to set up and it is how we train the classification model in the thesis. However, to train a good model, it always takes huge amount of data, and the annotation and the maintenance of the data require much human efforts.

4.2 Sustainable Workflow

(36)

(37)

ure 4.2, the ROI is proposed from the image and classified before annotation. As a result, the operator can finish the annotation process by correcting the classification results rather than annotating on the original image from scratch. The corrected annotation, in return, is added to the training set and used to re-train the classification model. The classification is re-trained and updated when certain amount of new data has been added to the training set.

(38)

(39)

Chapter 5

Results and Evaluation

This chapter shows the results and evaluates the proposed algorithm for solder-ing defect detection.

5.1 Dataset

The whole dataset consists of 50 images of PCB panels. Each panel has 6 PCB, and there are 135 soldering points on each panel. The first row of Table 5.1 shows the number of the defective soldering points in the dataset. The original dataset is in an extremely imbalanced situation where the non-defective soldering points account for around 99.46% of all soldering points. The second row of the table is the number of instances after augmentation where each class has similar number of instances.

Table 5.1: Number of defective soldering points in the dataset. Solder Bridge Cold Solder Dry Joint Leg Up None # Before Augmentation 171 17 13 20 40282 # After Augmentation 8098 8400 8008 8000 8000

5.2 Image Alignment

(40)

image have more than two anchors detected and all of them can be properly aligned. As the bounding boxes are hard-coded, we assume the proposed ROIs are correct as long as the image is aligned.

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

Figure 5.1: Local image of the misdetected anchors (a-f ) and some correctly detected samples (h-i).

By plotting the detected anchors to the inspected image, we find that only 6 out of 316 anchors are misdetected. Figure 5.1 shows all the anchors which are misdetected and samples of some correctly detected ones. After inspecting the samples of the misdetected anchors, we notice that half of the misdetected cases are due to little visibility of the anchor, while the rest is affected by the other circular shapes on the panel image. Nevertheless, the detection algorithm is generally robust enough to detect even non-complete circles. Those anchors showing only a quarter (Figure 5.1g) can even be detected.

(41)

Figure 5.2: The loss of the U-Net for segmentation.

bounding box transferring process.

5.3 Semi-supervised ROI Proposal

As no human-annotated ground truth is available in the semi-supervised ROI proposal step, it is tricky to properly evaluate the process. Therefore, we decide to evaluate the method by carefully inspecting and analyzing the segmentation results as well as the final proposed bounding boxes.

The 50 images in the training set are used to train the U-Net for segmentation. Besides, 14 unannotated images of a different type of PCB are also used to evaluating the segmentation.

During the training process of the segmentation model, the loss of the training data and that of the validation data is always similar (Figure 5.2), so is the IOU of both. The final IOU of the validation set is 0.9269.

Figure 5.3 shows a sample of the segmentation. Figure 5.3a is the mask gen-erated from the original image, which can be regarded as ”ground truth”, Fig-ure 5.3b is the mask predicted by our U-Net, and FigFig-ure 5.3c is the original image with the proposed bounding boxes.

(42)

(a) The mask generated with the help of the template.

(b) The mask predicted by the segmentation model.

(c) The original image with the bounding boxes proposed from the predicted mask

(43)

they can be removed with basic processing steps, and do not have big influence in the proposal, as we can see in Figure 5.3c. An unexpected phenomenon is that the segmentation model sometimes is able to eliminate the small bridge between two joints which occurs in the generated mask (the three soldering points near the top-left corner).

Figure 5.4 shows a segmentation result sample of another type of PCB panel. As we can see, most of the soldering points have been successfully segmented. For the soldering points that have not been found, they are generally smaller in size. One reason to the failure can be that the segmentation model learns the size of the point but such small soldering points do not appear in the training set.

(a) Original Image

(b) Predicted Mask

Figure 5.4: Segmentation Result of Different Type of PCB

(44)

5.4 Classification

As the number of the solder bridge samples in the dataset is significantly more than the others, we can assume that it is the most representative type of the defect. Therefore, it becomes our main focus in the evaluation, but we still show the results of the remaining three types for reference.

Figure 5.5 illustrates the loss of the classification model during training. The loss of the training set descends to a low level and that of the validation set goes up after the third epoch, which means the model is very likely to be overfit.

Figure 5.5: The loss of the Classification Model.

Table 5.2 evaluate the performance in a statistical manner based on the valida-tion set by setting the threshold to 0.5. It is obvious that even the accuracy for all the four classes are relatively high, the recall, which is more interesting to us, can be frustratingly low.

Table 5.2: The performance of the classification model on different types of defect. The threshold is set as 0.5.

TP FP FN TN Accuracy Precision Recall F1-Score Solder Bridge 1346 126 286 6922 95.25% 91.44% 82.47% 0.8672 Cold Solder 1194 237 806 6443 87.98% 83.43% 59.70% 0.6960 Dry Joint 935 213 913 6619 87.02% 81.44% 50.60% 0.6241 Leg Up 1424 511 176 6569 92.08% 73.59% 89.00% 0.8056

(45)

bridge to those of the others, one can easily tell that the classification model works best on the solder bridge, since it generally reaches relatively higher pre-cision and recall at the same time. It seems quite challenging for the model to correctly classify the soldering points with leg up as the precision is always at a lower level.

Figure 5.6: The Precision-Recall Curve Plot of the solder bridge

Figure 5.7: The Precision-Recall Curve Plot of the Defects except for the solder bridge

Also, the difference between the curve of the training data and the validation data reveals the chance of overfitting. The curve-pair of the solder bridge is quite similar while those of other defects shows significant difference. Thus it appears that the model has overfit to the defects except for the solder bridge, especially the leg up.

(46)

image.

5.5 Discussion

In template-based ROI proposal stage, the results show that 6 out of 316 anchors are detected. Considering a panel image cannot be aligned when the detection algorithm misses at least 3 pairs of anchors, whose probability is 6.845 × 10−6, we can conclude that the anchor defined by us is a good feature that can be robustly detected for alignment for the type of PCB.

In semi-supervised ROI proposal stage, the segmentation model has been proven successful and has generalization ability to work on different types of the PCB. We notice that the model can fail on soldering points with certain characteristics, such as size. However, we believe it can be solved by collecting images of more PCB types as the training data.

The biggest challenge in our experiment is still the small scale of the dataset. Except for the solder bridge, there are no more than 20 instances in the original dataset for the other defects (Table 5.1). When the training set lacks variance, the deep learning model has high chance of overfitting. As a result, the loss of the validation set starts to rise after the third epoch (Figure 5.5) and the trained model underperforms on the defects other than the solder bridge even after the augmentation. Nevertheless, the results on the solder bridge is fairly good, which indicates the potential of the model on identifying soldering defects, as long as training data includes sufficient instances.

(47)

Chapter 6

Conclusions and Future

Work

The goal of the project is to explore the possibility of implementing the deep learning techniques on soldering defect detection. Instead of using existing ob-ject detection algorithms, we design our own detection pipeline as well as a sustainable workflow for industrial implementation. This chapter summarizes our works and discusses the possible future directions on the topic.

6.1 Conclusions

In this thesis, a two-stage detection method is proposed for the soldering defect, and it has been proved feasible.

In the first ROI proposal stage, we provide two options, the template-based method and the semi-supervised method, for different scenarios. The template-based method detects the fixation holes near the corners and uses them to calculate the transformation between images. The pre-defined bounding boxes on the reference image are then mapped to the inspected image based on the transformation as proposed ROI. The template-based method only works for the panel with fixation holes and the template bounding boxes need to be defined for each type of PCB. However, once the template is set up, the method is highly efficient. Hence, it is recommended when there is large amount of single-type PCB that require inspection.

(48)

segmentation is variant to scale, which means the size of the object is vital to the detection result. Therefore, the segmentation model can fail when the soldering point is smaller or larger than those in the training data. However, this usually can be fixed by feeding richer training data.

We train a VGG-like network to classify the defect of each proposed ROI. The model does a good job in identifying the solder bridge, yet underperforms on the other defects. According to our observation, the classification model overfits to the defects other than the solder bridge due to the poor variance in the dataset. Still, the model shows great potential in identifying different soldering defects as long as the training data is in better quality.

6.2 Limitations

The detection method is specially designed for the scenario of PCB, whose pattern is highly repetitive and regular. Therefore, it can be a great challenge to migrate the method to a new use case. Also, due to lack of benchmark, we are unable to compare our results to the existing object detection algorithms. In deep learning, hyper-parameters usually have great influence on the perfor-mance of models. However, in this thesis, we focus on proving the feasibility of the whole pipeline rather than tuning the hyper-parameters to reach best results. Thus, we believe the models can achieve better performance by tuning the hyper-parameters.

6.3 Future Work

The small scale of data has always been the pain to our project. The collection and the annotation process has taken up more time than we expect. There-fore, to carry on with the work, one needs to build a good dataset or even a benchmark in the first place. Another problem regarding the data comes from the annotation. The bounding box of an ideal annotation for object detection should exactly enclose the object. However, due to the insufficient instructions to the operator and the heavy workload he already had, the annotation requires manual inspection and conversion so as to be used as training data. Hence, a close connection to the annotation worker needs to be established for data in good quality.

In this thesis, we have not explored many different architectures for segmenta-tion and classificasegmenta-tion. Even though our models have a good balance between computational complexity and accuracy, we would like to test on more archi-tectures and find the optimal ones to the problem in the future.

(49)

the defective part is a tiny region compared to the whole image, the segmentation model tends to classify all pixels to negative (non-defective). We think the loss function needs to be adjusted so that the model can focus more on the defective parts.

(50)

Bibliography

[1] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmenta-tion. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 580–587, 2014.

[2] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779– 788, 2016.

[3] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014.

[4] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. [5] Ross Girshick. Fast r-cnn. In Proceedings of the IEEE international

con-ference on computer vision, pages 1440–1448, 2015.

[6] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In Ad-vances in neural information processing systems, pages 91–99, 2015. [7] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet

clas-sification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.

[8] Matthew D Zeiler and Rob Fergus. Visualizing and understanding con-volutional networks. In European conference on computer vision, pages 818–833. Springer, 2014.

[9] Jasper RR Uijlings, Koen EA Van De Sande, Theo Gevers, and Arnold WM Smeulders. Selective search for object recognition. International journal of computer vision, 104(2):154–171, 2013.

(51)

[11] Peng Wei, Chang Liu, Mengyuan Liu, Yunlong Gao, and Hong Liu. Cnn-based reference comparison method for classifying bare pcb defects. The Journal of Engineering, 2018(16):1528–1533, 2018.

[12] Weibo Huang and Peng Wei. A pcb dataset for defects detection and classification. arXiv preprint arXiv:1901.08204, 2019.

[13] Wenting Dai, Abdul Mujeeb, Marius Erdt, and Alexei Sourin. Towards automatic optical inspection of soldering defects. In 2018 International Conference on Cyberworlds (CW), pages 375–382. IEEE, 2018.

[14] Yi-Ming Chang, Chia-Chen Wei, Jeffrey Chen, and Pack Hsieh. An imple-mentation of health prediction in smt solder joint via machine learning. In 2019 IEEE International Conference on Big Data and Smart Computing (BigComp), pages 1–4. IEEE, 2019.

[15] Bing-Jhang Lin, Ting-Chen Tsan, Tzu-Chia Tung, You-Hsien Lee, and Chiou-Shann Fuh. Use 3d convolutional neural network to inspect solder ball defects. In International Conference on Neural Information Processing, pages 263–274. Springer, 2018.

[16] Fupei Wu and Xianmin Zhang. Feature-extraction-based inspection algo-rithm for ic solder joints. IEEE Transactions on Components, Packaging and Manufacturing Technology, 1(5):689–694, 2011.

[17] Ji-Deok Song, Young-Gyu Kim, and Tae-Hyoung Park. Smt defect clas-sification by feature extraction region optimization and machine learning. The International Journal of Advanced Manufacturing Technology, 101(5-8):1303–1313, 2019.

[18] Nian Cai, Guandong Cen, Jixiu Wu, Feiyang Li, Han Wang, and Xindu Chen. Smt solder joint inspection via a novel cascaded convolutional neural network. IEEE Transactions on Components, Packaging and Manufactur-ing Technology, 8(4):670–677, 2018.

[19] Opencv. https://opencv.org/, Jul 2019.

[20] Paul VC Hough. Method and means for recognizing complex patterns, December 18 1962. US Patent 3,069,654.

[21] Richard O Duda and Peter E Hart. Use of the Hough transformation to detect lines and curves in pictures. Technical report, Sri International Menlo Park Ca Artificial Intelligence Center, 1971.

[22] John Canny. A computational approach to edge detection. In Readings in computer vision, pages 184–203. Elsevier, 1987.

[23] Christopher G Harris, Mike Stephens, et al. A combined corner and edge detector. In Alvey vision conference, volume 15, pages 10–5244. Citeseer, 1988.

(52)

[25] Nobuyuki Otsu. A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man, and Cybernetics, 9(1):62–66, 1979. [26] Serge Beucher and Christian Lantujoul. Use of watersheds in contour

de-tection. volume 132, 01 1979.

[27] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolu-tional networks for biomedical image segmentation. In InternaConvolu-tional Con-ference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015.

[28] Scikit-image: image processing for python. https://scikit-image.org/. [29] Keras: The python deep learning library. https://keras.io/.

(53)

Two-stage Soldering Defect Detection with Deep Learning

Two-stage Soldering Defect

Detection with Deep Learning

JINGZHI YE

Abstract

Sammanfattning

Acknowledgment

Contents

List of Abbreviations

Chapter 1

Introduction

1.1

Motivation

1.2

Problem

1.3

Purpose & Goals

1.4

Methodology

1.5

Delimitations

Chapter 2

Background

2.1

Soldering Defect

2.1.1

Solder Bridge

2.1.2

Cold Solder

2.1.3

Dry Joint

2.1.4

Leg Up

2.2

Object Detection with Deep Learning

2.2.1

Convolutional Neural Network

2.2.2

Two-stage Object Detection

2.3

Automatic Optical Inspection for the PCB

Chapter 3

Method

3.1

Data Acquisition & Annotation

3.2

Template-based ROI Proposal

3.2.1

Image Processing

3.2.2

Pipeline

3.3

Semi-supervised ROI Proposal

3.3.1

Image Processing

3.3.2

Segmentation Network

3.3.3

Pipeline

3.4

Classification

3.4.1

Data Preparation

3.4.2

Classification Network

3.5

Evaluation Metrics

3.5.1

Image Segmentation

3.5.2

Image Classification

3.6

Tools

Chapter 4

Workflow

4.1

Offline Workflow

4.2

Sustainable Workflow

Chapter 5