DEVELOPMENT OF SYNTHETIC CAMERAS FOR VIRTUAL COMMISSIONING

(1)

1

DEVELOPMENT OF SYNTHETIC

CAMERAS

FOR

VIRTUAL

COMMISSIONING

Bachelor Degree Project in Automation Engineering Bachelor Level 30 ECTS

Spring term 2020 Francisco Vico Arjona Daniel Pérez Torregrosa

(2)

2

Abstract

(3)

3

Certify of Authenticity

This thesis has been submitted by Francisco Vico Arjona and Daniel Pérez Torregrosa to the University of Skövde as a requirement for the degree of Bachelor of Science in Production Engineering.

The undersigned certifies that all the material in this thesis that is not my own has been properly acknowledged using accepted referencing practices and, further, that the thesis includes no material for which I have previously received academic credit.

Francisco Vico Arjona Daniel Pérez Torregrosa

Skövde 2020-05-18

(4)

4

Acknowledgements

In these brief lines we would like to extend our most sincere gratitude to all the people who has helped us during the development of this project

Firstly, we would like to thank to all the team behind Simumatik AB for the great opportunity to work with them in this project. Specially, we wish to express our sincere thanks to our industrial supervisor Mikel Ayani for his constant and tireless assistance during all this project. Secondly, we would like to thank the University of Skövde, for providing us all the facilities for the research. Furthermore, our gratitude goes to our university supervisor Wei Wang for all the support regarding to this report itself.

(5)

5

1. INTRODUCTION ... 12 1.1 Background ... 12 1.2 Problem statement ... 12 1.3 Objectives ... 13 1.4 Delimitations ... 13 1.5 Overview ... 13 2. SUSTAINABLE DEVELOPMENT ... 14 2.1 Environmental Sustainability ... 14 2.2 Economic Sustainability ... 15 2.3 Social Sustainability ... 15 3. FRAME OF REFERENCE ... 16 3.1 Virtual Commissioning... 16

3.2 Emulation and simulation ... 16

3.2.1 Emulation ... 17

3.2.2 Simulation ... 17

3.2.3 Differences between emulation and simulation ... 17

3.3 Graphics engine ... 18

3.4 Application Programming Interface (API) ... 18

3.4.1 OpenGL ... 19 3.4.2 DirectX ... 19 3.4.3 Vulkan ... 19 3.4.4 Other alternatives ... 20 3.5 Machine Vision... 20 3.5.1 Benefits ... 20 3.5.2 Equipment ... 20 3.6 Computer Vision... 21

3.6.1 Applications of computer vision in manufacturing industry ... 21

3.6.2 Computer vision tools ... 23

3.7 Cameras ... 24

3.7.1 Optical system ... 24

3.7.2 Digital camera ... 26

(6)

6

3.8 3D Sensors ... 27

3.8.1 Stereoscopic cameras ... 27

3.8.2 Refocusing techniques ... 27

3.8.3 Time of Flight cameras (TOF) ... 28

3.8.4 Structured Light ... 28

3.8.5 Light Detection And Ranging (LIDAR) ... 28

3.8.6 RGBD Cameras ... 28

4. LITERATURE REVIEW ... 30

4.1 Application of Virtual Commissioning in manufacturing industry ... 30

4.2 Generation of synthetic images ... 31

4.3 Obtention of depth map of a 3D scene ... 31

4.4 Applications of Depth Sensing ... 33

4.4.1 Augmented and Virtual Reality (AR/VR) ... 33

4.4.2 Robotics ... 33

4.4.3 Facial recognition ... 33

4.4.4 Gesture and proximity detection ... 34

5. METHOD ... 35

6. DESIGN AND CREATION IMPLEMENTATION ... 38

6.1 Proposal ... 38

6.2 Tentative design ... 39

6.3 General Development Design ... 40

6.4 General Implementation Development ... 43

6.4.1 Final synthetic camera model ... 43

6.5 Virtual demonstrator development ... 46

6.5.1 Simumatik OEP system ... 47

6.5.2 OpenCV and OPCUA gateway script ... 48

(7)

7

9. CONCLUSION ... 59

10. FUTURE WORK ... 60

11. REFERENCES ... 61

12. APPENDICES ... 65

(8)

8

Nomenclature

2D/3D Two/Three Dimensions

API Application Programming Interface

AR Augmented Reality

ESG Environmental, Social and Governance

FOV Field Of View

GLB Graphics Library Binary

GPU Graphics Processing Unit

HMI Human Machine Interface

I/O Inputs and Outputs

IBVS Image-Based Visual Servo control

IR Infrared Radiation

IT Information Technology

LIDAR Light Detection And Ranging

OEP Open Emulation Platform

OPCUA Object linking and embedding for Process Control Unified Architecture

OpenCV Open source Computer Vision

OpenGL Open Graphics Library

OS Operating System

PBVS Position-Based Visual Servo control

PLC Programmable Logic Controller

PNG Portable Network Graphics

RADAR Radio Detection And Ranging

RGB Red Green Blue

(9)

9

SDK Software Development Kit

SONAR Sound Navigation And Ranging

TOF Time Of Flight

VC Virtual Commissioning

(10)

10

List of figures

Figure 1. Venn diagram for Sustainable Development ... 14

Figure 2. Virtual commissioning ... 16

Figure 3. Simplification of a lens model ... 24

Figure 4. Radial distortion error depending on Focal distance and Vision Angle. ... 25

Figure 5. Radial distortion types. ... 26

Figure 6. A graphic of light entering photosites with Bayer filters layered on. ... 26

Figure 7. LIDAR technology ... 28

Figure 8. Kinect depth image example ... 29

Figure 9. Pin-hole camera projection geometry ... 31

Figure 10. Depth maps with different near and far z-values ... 32

Figure 11. Design and Creation strategy ... 36

Figure 12. Prototype window creation ... 39

Figure 13. Prototype PNG file ... 40

Figure 14. OpenGL transformations structure ... 41

Figure 15. OpenGL perspective projection ... 42

Figure 16. Code state-machine diagram ... 44

Figure 17. Render state diagram. ... 45

Figure 18. Demonstrator general performance ... 46

Figure 19. Demonstrator Simumatik OEP System ... 47

Figure 20. Depth image of example screw ... 47

Figure 21. Zernike moments conversion ... 48

Figure 22. OPCUA Gateway's workflow ... 49

Figure 23. Communication protocol between OPCUA gateway, PLC and Robot ... 50

Figure 24. Camera component inside Simumatik OEP ... 51

Figure 25. Camera in example system ... 52

Figure 26. Depth images in collision mode example ... 53

Figure 27. Virtual demonstrator sorting different products ... 54

Figure 28. Products A, B and C respectively. ... 54

Figure 29. Camera's validation system in Simumatik OEP ... 55

Figure 30. Comparation between real and synthetic cameras' colour image ... 56

Figure 31. Comparation between real and synthetic camera's depth image... 56

Figure 32. Demonstrator's results ... 57

Figure 33 Work Breakdown ... 65

Figure 34 Initial Time Plan ... 65

(11)

11

List of Tables

Table 1. Comparison of graphics APIs ... 19

Table 2. Other computer vision libraries ... 24

Table 3. RGBD Cameras Comparison ... 29

Table 4.Objects’ attributes table based on their type. ... 45

Table 5. Different parameters for each image format ... 52

(12)

12

1. INTRODUCTION

This project is made as a bachelor’s degree project for the University of Skövde, by the School of Engineering Science. As well as, it is a project developed and supervised inside Simumatik company.

Virtual commissioning (VC) has become more and more important each day. Computer vision has opened a new field in automated systems tasks as robot guidance and machine control, so, implementing these types of techniques in a simulation will reduce costs and increase future researches without the necessity of owning the physical system.

1.1 Background

Any manufacturing system can be divided into four different stages: analyse the problem, design, development and commissioning. The commissioning stage is where the system is tested looking for any error or any possibility of improvement in the final system. Traditionally, this stage was made directly with the physical implementation of the system, the problem of this is that it creates a huge waste of time and money testing everything for the first time there because it can affect or even stop the production. However, nowadays, VC has become a crucial point for the implementation of a manufacturing system because it allows to test everything without the waste of resources that the physical implementation entails. (Elektroautomatik, 2020) (Rouse, 2018). Simumatik, the company behind this project, develops an Open Emulation Platform (OEP) which is presented as the next generation VC tool to support the entire life cycle of automated solutions.

In the latest years, with the evolution of graphics engines and software tools a new world of possibilities has opened with the use of synthetic cameras. This, alongside, with all tools available of computer vision developed recently, allows us a range of possibilities in the world of programming for automated industrial cells. (Angel, 2011)

1.2 Problem statement

One of the biggest challenges of VC is being able of integrating real world elements and simulate their functionality as accurate as possible. In this task, it has been advanced a lot although modelling vision cameras inside a virtual model has opened a new challenge.

Nowadays, for investigation projects and industry commissioning related to robot control using cameras or different 3D sensors is necessary owning all material that composes the physical system. This implies a huge economic cost in addition to possible damages in material before its final release.

(13)

13

1.3 Objectives

There is one main objective in this project, which is being able of generating synthetic camera images, not only depth images but also blue (RGB), luminance (L) and red-green-blue-depth (RGBD), inside the simulation model run in Simumatik. To achieve this, Python will be used along with a graphic application programming interface (API) to communicate directly with the graphics engine.

After it, as a way of validating the results of the synthetic camera developed, a virtual demonstrator that shows a simple industrial robot application using camera images is built. For demonstrator development, different competencies of computer vision need to be applied to add a new wide range of programming possibilities for virtual commissioning.

In order to properly work on this project, there are some questions that need to be considered: • Which are the most common 3D sensors and cameras used in modern industry? • Which graphics API is the best option for our problem?

• How synthetic cameras are modelled in a virtual environment?

• Which computer vision technologies are the most used and how are they applied in real life automated cell?

• How far can we go programming an automated robot cell with the possibilities that computer vision provides?

1.4 Delimitations

In order to delimitate the project, the following limitations will be considered:

• The programming will be done using Python programming language, this is necessary because all the code source from Simumatik is already built in Python.

• The number of libraries and the size of them should be limited as much as possible in order to achieve the program to be executed in as many devices with the less resources as possible.

• Images generation should not involve the creation of a window to render the image. Due to gaming rendering it is usual to create a window in order to start rendering. If we have a multi-camera system, many windows will be created, and this will cause an increase in processing times that will reduce refresh rate.

• Work only with 3D files used in Simumatik e.g. GLB (Graphics Library Binary transmission format).

1.5 Overview

(14)

14

2. SUSTAINABLE DEVELOPMENT

Sustainable development concept was introduced by the United Nations World Commission on Environment and Development in 1987 (United Nations, 1987) as the following statement: "Sustainable development is development that meets the needs of the present without compromising the ability of future generations to meet their own needs".

Sustainability in industry continues to gain relevance for engineers. Due to overpopulation and pollution our resources and energy sources will consume and, as industrial engineers, we have the responsibility to incorporate sustainability concepts into our work coming up with new original and innovative ideas.

Three sustainable development process is divided into three main pillars also called the ESG investing: economic, social and environmental. Its interrelation can be represented by the Venn diagram in the Figure 1 (Amenabar & Carreras, 2018) (Parkin, 2000). These three pillars will be analysed in the next chapters as well as their application to the present project.

2.1 Environmental Sustainability

Environmental sustainability can be defined as responsible interaction with the environment to avoid depletion or degradation of natural resources and allow for long term environmental quality. Having a beneficial impact on the planet can also have a positive financial impact. Reducing the amount of material used at every engineering project stage will decrease the overall investments. In other words, conserving our finite natural resources today will allow future generations fulfil their needs.

Some researchers have proved that using VC reduces considerably the amount of energy, time and material (Ayani, et al., 2018) (Eguti & Trabasso, 2018) (Wüncsh & Reinhart, 2007). This project aims to convert VC in a more accurate tool by simulating the behaviour of a camera, like a depth camera or other type, to enable virtually testing in automated systems where this technology could be applied, which will allow less waste of resources in future engineer projects.

(15)

15

2.2 Economic Sustainability

Economic sustainability refers to all the practices that allow a profitable and economic growth without affecting negatively on other aspects like social, environmental and cultural aspects of the community (Spangenberg, 2005). For economic sustainability engineers must use, safeguard and sustain human and material resources to create long-term sustainable projects with optimal use, recovery and recycling.

Thanks to VC, the virtual model of the camera could be reused and modified for future projects without needing of buying physical material. These features make VC a resources optimisation tool that follows the main states of economic sustainability.

2.3 Social Sustainability

(16)

16

3. FRAME OF REFERENCE

3.1 Virtual Commissioning

VC can be described as the environment for testing, simulation and debug of programmable logic controller (PLC) programs, human machine interface (HMI) code, and automation scheme with a virtual model of a system (Gurgul, 2019). VC is the manufacturers way of improving and validating production equipment and programming in a virtual environment before the implementation into the real production. VC allow to test our system in scenarios that would not have been possible in real system because of safety of personal or equipment (Wüncsh & Reinhart, 2007). Furthermore, VC gives the possibility to the customer to see the system how would work, and based on that, discuss which functionalities need to be modified (Stephan, et al., 2012). Comparison between real commissioning and VC are shown in the following Figure:

Figure 2. Virtual commissioning

As main advantages VC has the parallel cooperation of automation and robotics engineers from an early stage of project and the reduction of commissioning time at customer´s site, due to fewer errors in systems design, translating into a reduction of waste. Furthermore, VC allows to test and adjust the system for the simplest equipment solution that correctly answers our problem.

Nevertheless, VC also has some disadvantages, the most important one is that simulation can be complex and can require a huge amount of time and resources on modelling the system; furthermore, lots of equipment may not be possible to model the full functionality of it (Lee & Park, 2014). Also, the number of libraries of equipment for the virtual model is nowadays quite limited and not all VC software is correctly and free of bugs and open issues implemented.

3.2 Emulation and simulation

(17)

17

3.2.1 Emulation

Emulation bases on the idea of duplicating every aspect of the original system´s behaviour. The emulation is efficiently a complete imitation of the real system, it just operates in a virtual environment instead of the real world. It basically simulates all the hardware of the real device uses, allowing the exact same software to run on it without any need to modification. For example, Android Studio Emulator is an emulator that emulate android OS on a totally different platform: Windows, Mac or Linux computers.

If the emulation model is properly built, the model can be used for training operators, as well as how to handle the equipment (Oppelt & Urbas, 2014). Furthermore, emulation can also study the effects of adding or changing equipment to test changing sequences, flow and other things to improve the system´s equipment in an early phase of design.

According to McGregor (McGregor, 2002), there are some cases when emulation is specifically useful:

• When full testing before starting up is not available • When testing is the most critical path of the design • When emulated testing is cheaper than real testing

3.2.2 Simulation

Simulation tries to copy something from the real world into a virtual environment, often to give you an idea how something works. It simulates the basic behaviour, but it does not necessarily stick to all the rules of the real environment that it simulates. It allows to confirm theories and to answer questions regarding changes to the system (Banks, et al., 2009). For example, a flight simulator is a simulator because it feels like you are flying an airplane but in reality, the simulation is completely disconnected from any part of flying an actual airplane.

Simulation can be useful to show the eventual real effects of alternative conditions and courses on action. As it does not replicate hardware, it is so important the acquisition of valid source information about the key characteristics and behaviours of the system, the use of assumptions and approximations within the simulation, in addition to fidelity and validity of the simulation outcomes.

3.2.3 Differences between emulation and simulation

As said before, there are subtle differences, but they are especially relevant in the automation industry.

(18)

18

3.3 Graphics engine

Graphics engine is software used by application programs to draw graphics on computer display screens (The Linux Information Project, 2005). Although engine is a concept usually identified as mechanical element, nowadays it has been used in the computer field as a software that performs any type of rendering or powering for other programs.

Rendering, also called imaged synthesis, is the process of add shading, colour and lamination to a 2D or 3D wireframe in order to create images on a screen. It allows generating different illumination and optical effects that approaches the visualization of the image to the real world through our eyes. (Phong, 1975)

Rendering can be done ahead of time (pre-rendering) for better quality images with a high CPU demand; or in real time (real-time rendering) with a lower quality images for higher level of interaction with the model. Pre-rendering is typically used for movie creation, while real-time rendering is typically used for 3D interactive scenarios like 3D video games which rely on the use of graphics cards with 3D hardware accelerators. (Kajiya, 1986)

Graphic engine allows programmers and designers a rendering engine that allows to match their designs with 3D schematics to create complete models. Then, different programming tools can complement models for a more efficient access to the processor and graphic card. In short, graphics engine gives all needed tools for creating the physics of a virtual environment. (Santos, 2018)

3.4 Application Programming Interface (API)

API is an interface or communication protocol between different parts of a computer program aiming the simplification, implementation and maintenance of software. It allows that different products and services can communicate with others, without the necessity of knowing how they are implemented. (Emery, 2015)

Programmers can control engines directly with their APIs, instead of going through ordinary application programs that standard user’s control. This simplifies the development of any app, the design, administration and use of apps becoming in a big reduction of waste of money and time. (Clarke, 2004)

(19)

19 Most famous existing graphics APIs can be grouped like the following Table:

Table 1. Comparison of graphics APIs

Cross platform

Vendor specific

High level

OpenGL

RenderWare Direct3D (Direct X) Glide API

Low level

Vulkan Direct3D 12(Direct X) Metal Mantle 3.4.1 OpenGL

Open Graphics Library (OpenGL) is an open source, multilanguage, high level and cross-platform graphics API capable of writing applications which render 2D and 3D graphics. It was initially developed by Silicon Graphics Inc and later Khronos Group Inc assumed the responsibility of it. It is widely used in different fields like virtual reality (VR), computer aided design and scientific representation. (Silicon Graphics Inc, 2020)

This API is used to interact with graphics processing unit (GPU), which oversees to generate all the graphics that programmers decide through OpenGL. In other words, OpenGL is the way to communicate with the GPU. The interface consists of more than 250 different functions that can be used for drawing 3D scenarios beginning with primitive geometric figures as points, lines and triangles.

3.4.2 DirectX

DirectX is the Microsoft’s alternative to graphic APIs, furthermore of graphics, it supports sound, music, input, networking and other multimedia (Microsoft, n.d.). The main disadvantage it has is that it is exclusive to Microsoft environments like Xbox and different Windows versions. In terms of efficiency, for the graphics, OpenGL is faster because it seems to have a smoother and more efficient pipeline.

DirectX gives users a set of applications to help the control of different tasks, the most famous of them is Direct3D which is designed to virtualize 3D hardware interfaces. It can be seen as the equivalent of OpenGL in the set of applications that DirectX provides. Direct3D maintains the characteristics of OpenGL of high-level programming, although there is another subset of Direct X called Direct3D 12 who works under a low-level programming interface.

3.4.3 Vulkan

Vulkan is an open source, low-overhead and cross-platform API capable of generating 3D graphics. It is developed by Khronos Group Inc as well as OpenGL, announced in 2015 and it is seen as the next generation OpenGL initiative.

(20)

20

3.4.4 Other alternatives

• RenderWare is a combined cross platform rendering API and game engine that became popular specially for videogames. Nevertheless, nowadays it is practically disappeared. • Glide API was developed for the pioneering 3DFX accelerators although with the

evolution of computers it has not been necessary any more since latest 90s.

• Metal is the alternative of Apple, which is a low-level and low-overhead hardware accelerated 3D graphic. It is widely used but for most of the cases is used through high-level frameworks.

• Mantle is the alternative proposed by AMD really focused on gaming development.

3.5 Machine Vision

According to the Automated Imaging Association, machine vision is the execution of programs based on the capture and processing of images to provide operational guidance to devices in both industrial and non-industrial applications

In industry, vision systems need to have greater robustness, reliability, and stability compared with an academical or educational systems and normally its cost is cheaper than system destinated to governmental and military applications. Nevertheless, they still have acceptable accuracy, high robustness, high reliability, and high mechanical, and temperature stability for their applications (Cognex, 2016).

Machine vision bases their work on digital sensors inside industrial cameras with high-specialized optics to acquire images. Hardware and software devices are able to process, analyse, and measure various image key points or characteristics for decision taking.

3.5.1 Benefits

Human vision allows us a qualitative interpretation of a complex, unstructured scene meanwhile machine vision takes quantitative measurement of a structured scene due to its speed, accuracy, and repeatability (Cognex, 2016). A machine vision system with high camera resolution and optics can easily inspect small object details that cannot be seen by the human eye and inspect thousands of products parts per minute.

Machine vision brings safety by reducing human interaction in a manufacturing process and protects human workers from hazardous environments. Removing human contact between test systems and the parts being tested prevents material damage and reduces the maintenance time and costs associated.

3.5.2 Equipment

(21)

21 Computer vision consists on algorithms that take the image and extract required information for decision making. By last, communication are typically discrete input/outputs (I/O) signals or data sent to the device that is logging information or using it as a robot can be.

Three categories of machine vision systems can be distinguished: 1D, 2D and 3D.

• 1D vision uses a digital signal one line at a time instead of looking at a whole picture at once. This technique detects and classifies defects on materials manufactured in a continuous process as paper, metals and plastics.

• Common 2D systems perform area scans that involve capturing images in various resolutions.

• 3D machine vision systems use multiple cameras or laser displacement sensors (Cognex, 2016). Multi-camera 3D vision in robotic guidance applications provides the robot with part orientation information thanks to cameras triangulation. Meanwhile, 3D laser-displacement sensor applications use surface inspection and volume measurement for producing 3D images with point clouds. A height map is generated due to lasers reflection on the object. If an entire product needs to be scan camera/laser must be moved around the object or the object must rotate in front of the camera.

3.6 Computer Vision

Computer vision is a scientific discipline based on the extraction of information from images. The image data can take many forms, such as video, multiple cameras view, laser displacement sensors, or multi-dimensional data (Morris, 2004).

Recent researches in computer vision have been developing mathematical techniques for obtaining the 3D shape and appearance of objects overlapping photographs. Setting different views of an object, 3D sensors can create accurate 3D models thanks to stereo matching (Szeliski, 2010). Even with all this field advances, computers are far away to achieve the goal of recognise an image with the same level humans do.

Recently models used in computer vision are developed in computer graphics and physics (radiometry, optics, and sensor design). Both fields model objects movement and animation, light reflection through camera lenses (or human eyes) and projected them onto a flat or curved image plane (Szeliski, 2010).

Computer vision tries to describe the world in one or more images and to recreate properties such as illumination, shape, texture and colour (Szeliski, 2010).

3.6.1 Applications of computer vision in manufacturing industry

(22)

22

3.6.1.1 Visual servo control

Visual servo control is in charge of the motion of a robot using computer vision. Data can be obtained from a camera that is attached to the robot manipulator or one or several cameras that are placed in a fixed position on the observing the robot motion (S.Hutchinson & F.Chaumette, 2006).

The purpose of a vision-based control is to eliminate the position error e(t), that can be defined by:

e(t) = s(m(t), a) − s∗

Where vector m(t) is a set of image measurements such as image coordinates of interest points or image coordinates of the centroid of an object. Data from the picture is used to set a vector of k visual features, s(m(t), a). a is a set of parameters that gives additional knowledge about the system like intrinsic parameters of the camera or 3D models of objects. s∗ contains the desired values of the features (S.Hutchinson, et al., 1996).

Depending on the data that is stored in s visual control can be image-based visual servo control (IBVS) or position-based visual servo control (PBVS).

Image-Based Visual Servo (IBVS)

In IBVS, s is a set of features that are already in the image data. Image measurements m in the definition of s = s(m, a) are pixel coordinates of the image features and the parameters a are camera’s intrinsic parameters (S.Hutchinson, et al., 1996).

Position-Based Visual Servo (PBVS)

In PBVS, s consists of a set of 3-D parameters that are estimated from image measurements using the camera position respect to some reference coordinate frame. Three coordinate frames are considered: the current camera frame Fc, the desired camera frame Fc ∗, and a reference

frame attached to the object Fo (S.Hutchinson, et al., 1996). To compute robot motion from s

the system needs the 3-D model of the object observed and the camera intrinsic parameters.

3.6.1.2 Industrial quality inspection

(23)

23 To inspect parts, three-dimensional models are reconstructed thanks to stereoscopic cameras or laser sensors. Parts model is compared with ground truth cross model of the same object, comparing both models machine vision can find damages in the surface or changes in the object correct position. Depending on the object surface material different techniques have to be used as metallic parts can have reflectance.

3.6.1.3 Reactive navigation scheme with 3D sensor

In robotic guidance, perception problem is a field that continues in researching. The main purpose of machine vision in this field is bring all possible environment data to the robot on real time so it can take decisions based on his tasks.

Different sensorial systems have been implemented in order to save data as SONAR or 2D LIDAR (Bui, et al., 2009) or 3D LIDAR (Nüchter, et al., 2007) (Cole & Newman, 2006). 3D Lidar and RGBD Cameras give more data and allows the robot to create not only depth maps but also height maps using point cloud representation.

3.6.2 Computer vision tools

There are many computer vision tools nowadays in the market although the most common, reliable and complete software tools are considered OpenCV and Matlab.

3.6.2.1 OpenCV

Open source Computer Vision (OpenCV) is an open source and cross-platform computer vision library, written in optimized C and C++ but also available in different interfaces for Python, Matlab, etc. It was originally developed by Intel Research but later it was supported by Willow Garage.

OpenCV has had a tremendous role in the growth of computer vision, focusing on real-time vision, stablishing itself as one of the most used tools in this area of knowledge. OpenCV provides a simple-to-use interface computer vision infrastructure that has helped students and professionals efficiently implement projects as quick as possible. The OpenCV library contains over 500 functions that encompass lots of areas in vision like product inspection, medical imaging, security, camera calibration, user interface, stereo vision and robotics. (Bradski & Kaehler, 2008)

3.6.2.2 Matlab

Matlab has an Image Processing Toolbox and Computer Vision Toolbox that provide algorithms and workflow apps for image analysis and processing. For 2D images Matlab performs tasks such as image segmentation, image deblurring, noise reduction, geometric transformations and image registration.

(24)

24

3.6.2.3 Other alternatives

Table 2. Other computer vision libraries

Computer Vision Libraries Supported programming Language

TensorFlow Python, C++, CUDA

AForge.NET C# SimpleCV Python Torch C, CUDA SciPy Python BoofCV. Java DLib. C++

3.7 Cameras

This chapter explains general information about how images are generated by different types of cameras. First, an introduction of how any optical system works is explained, specifying the lens effects of any image captured. Then, how digital cameras are able to take pictures thanks to digital sensors will be explained. Finally, the concept of synthetic cameras is introduced.

3.7.1 Optical system

An optical system is a combination of lenses, prisms, and mirrors that constitutes the optical part of an optical instrument as a camera or microscope (Mora, et al., 2012). This chapter describes how these optical systems work.

To focus the real-world light rays in a plane as the human eye does, every camera needs some combinations of lenses. In cameras these rays converge at a point thanks to the lens, this point is called focus, as it can be seen in Figure 3:

(25)

25 Lens parameters are the following ones:

• F: Focal distance, measured from the centre of the lens to the focus. It affects the size of the projected image.

• s: Focus distance, measured from the plane of the lens to the plane that remains focused on the image. Since F and v are fixed, the distance to the subject must be adjusted to s to focus on it.

• Depth of field: determines the width of the focused area. It depends on the aperture and the focus distance.

• Aperture: It is the amount of light that reaches the image and is regulated with the diaphragm. Affects depth of field (larger aperture, shallower depth). It is expressed as "F-n" where n = focal length / aperture diameter.

Although lenses are necessary, all lenses cause distortions of the image to a greater or lesser degree. Usually, the less focal distance the more distortion the image has. The distortion error in a point of the image depends on the position of the point, what is known as radial distortion (Figure 4).

Figure 4. Radial distortion error depending on Focal distance and Vision Angle.

There are two types of radial distortion (Figure 5):

(26)

26 Figure 5. Radial distortion types.

3.7.2 Digital camera

This chapter explains how digital cameras’ sensor works. Digital cameras’ sensor is made up of millions of cavities called photosites, the number of photosites is exactly the same number of pixels your camera has. Photons that hit each photosite are interpreted as an electrical signal, that varies in strength based on how many photons were captured in the cavity and camera’s bit depth sets how precise this process is.

Just with the electrical data mentioned earlier, the sensor would generate an image in grey-scale, for getting coloured images a “Bayer filter array” is needed. A Bayer filter is a coloured filter placed over-top of each photosite (Figure 6), it is used to determine the colour of an image based on how the electrical signals from neighbouring photosites measure. Each colour filters allows only its colour light to be captured, the light that doesn’t match that photosites colour filter is reflected. (Cambridge In Colours, s.f.)

Figure 6. A graphic of light entering photosites with Bayer filters layered on.

(27)

27 On summary, raw image file is what sensor’s data interpreter receives with the Bayer filter array. Finally, the camera goes through a process to estimate how much of each colour of light there was for each photosite, and then, colour the image based on that guessing.

3.7.3 Synthetic cameras

Synthetic cameras can be defined as the paradigm which looks at creating a computer-generated image as being similar to forming an image using an optical system. (Courses Washington, 2014)

There are different specifications needed to specify the viewer or camera: • Position - position of the camera along 3 axes.

• Orientation – rotation of the camera along 3 axes. • Focal length – determines the size of image.

• Film Plane – has a height and width, it can be adjusted independent of lens orientation.

• Depth of field – near and far distances.

With all these parameters defined, the mathematical model can form an image similar to any optical system image, for example, digital cameras images. However, there are some aspects that are a way more difficult to imitate from optical systems: the lens flare and distortion, noise and the conditions of lightening/darkening of the image.

3.8 3D Sensors

Thanks to 3D vision it is possible to measure distance between objects and the camera and measure its speed, direction and movement in order to follow them (Mora, 2012). To obtain this data there are several 3D sensors that are going to be discussed in the following chapters:

3.8.1 Stereoscopic cameras

A couple of cameras are placed in different points of the system at the same height to simulate human eyes position (binocular disparity). After taking the images, an algorithm calculates what points of an image matches with the other image to calculate deepness (Mora, 2012). Using calibrated cameras makes an easily matching.

3.8.2 Refocusing techniques

(28)

28

3.8.3 Time of Flight cameras (TOF)

TOF Cameras calculate distance measuring the time it takes infrared light emitted by the camera reach objects, reflect on them and return to camera’s sensor. Position data is collected as points clouds where every point has his corresponding XYZ coordinates (Mora, 2012). This type of sensors is less light affected than other sensors, but its cost is higher and with less resolution.

3.8.4 Structured Light

Structured light patterns are projected on the object and are captured by another image sensor. The second sensor’s hardware compare the deformations caused by the object in the light pattern and calculates the object deepness (Mora, 2012). The most common patterns are parallel stripe patterns, normally different stipe width patterns are used in the same object to obtain higher quality details and improve resolution. Structured light is a robust technique against ambient light, but errors are obtained with translucid objects.

3.8.5 Light Detection And Ranging (LIDAR)

LIDARs sensors are similar to Radio Detection And Ranging (RADAR) and Sound Detection And Ranging (SONAR) sensors, but it uses light pulses instead of sound pulses to capture data. An Infrared Radiation (IR) laser is reflected on a rotatory mirror that allows the sensor to calculate deepness using time of flight calculation (Mora, 2012). As shown in Figure 7, depending of the mirror rotation LIDARs sensor can be 2D or 3D. Due to these sensors use light they are good in interiors and its efficiency decreases in open areas.

3.8.6 RGBD Cameras

RGBD cameras combine a RGB sensor with a depth sensor that can use structured light or TOF measurement so, as seen in Figure 8, the data captured is a point cloud that contains XYZ and colour information for every point measured (Mora, 2012). RGBD cameras become popular when Microsoft market its sensor Kinect for Xbox 360, furthermore Microsoft create a software development kit (SDK) that allows users develop apps via Windows so, apart for playing games, Kinect is now used for robotic and computer vision applications. Table 3 compares the different RGBD cameras in the market.

(29)

29 Figure 8. Kinect depth image example

Table 3. RGBD Cameras Comparison

Parameter Sensor Orbec Astra S Microsoft Kinect I Microsoft Kinect II Intel SR300 ZED Vico VR BlasterX Senz3D Range 0.4m-2m 0.8m-3.5m 0.5m-4.5m 0.3m-2m 1.5m-20m 0.5m-4.5m 0.2m-1.5m Field Of View(FOV) 60°(H) × 49.5°(V) × 73°(D) 57°(H) × 43°(V) 70°(H) × 60°(V) 41.5°(H) × 68°(V) x 75.2°(D) 90°(H) × 60°(V) 60°(H) × 50°(V) 85°(H) Frame rate 30 fps 30fps 30 fps 30, 60 fps 100 fps 30 fps 30,60 fps RGB resolution 640×480

pixel 1280x960 pixel 1920×1080 pixel

1920×1080 pixel 2208x1242 pixel 640x480 pixel 1920x1080 pixel Depth resolution 640×480

pixel 640×480 pixel 512 × 424 pixel

640×480 pixel 2208x1242 pixel 640x480 pixel 640×480 pixel Power

(30)

30

4. LITERATURE REVIEW

In this section, different reports, papers and thesis have been studied to reflect on the knowledge developed previously in the field of study that this project is about. Based on this, researches have been focused on investigations and experiments centred in VC, synthetic images, depth maps and computer vision.

4.1 Application of Virtual Commissioning in manufacturing industry

Modern manufacturing plants has become each day more and more complex, so, there is so important to find a way of testing the solution given to a problem before the real-life implementation. This question is so significant because the real commissioning in manufacturing plants may cause delays, reduction or even the stop of manufacturing production. According to Automation & Digitisat magazine (Automation & Digitisat magazine, 2018), VC is gaining ground to real commissioning because the sooner an error in the manufacturing process is detected, the more money is saved and vice-versa.

The report of Nazli Shahim and Charles Möller (Shahim & Moller, 2016) justifies economically VC in automation industry, where they applied interviews and questionnaire with employees responsible for VC for experiences and evaluation about VC and the change to it from real commissioning. The conclusion of this questions was that the business value of the service offered through VC is three times higher than real commissioning. Furthermore, the interviews justified that much less interruption is experienced, that allows a more efficient ramp up to be on time with the customer´s demand.

VC advantages applied to an assembly cell with cooperating robots have been studied by Makris, Michalos and Chryssolouris (Makris, et al., 2012); in this research they conclude that ramp-up time was reduced affecting the total installation time by 15-25 %. Furthermore, the cost is reduced up to 15% of the human resources.

Once the use of VC is more than justified, different researches of VC implementation into automated cells have been studied. Lee and Park (Lee & Park, 2015); Guerrero, López and Mejía (Guerrero, et al., 2014); both authors agreed that the modelling of the parts and its kinematics in the cell is the main issue for most of the cases. However, in the report of Seidel (Seidel, et al., 2012), affirms that the additional work of modelling the material in the simulator was compensated with a reduced commissioning stage on-site. Furthermore, it was incredible useful to test PLC and material flow controller programs; allowing a 25% reduction on planned time thanks to the boosted software´s maturity.

(31)

31

4.2 Generation of synthetic images

Although synthetic cameras have existed lots of years, their use was practically restricted to the academic and researching use, instead, nowadays their use has expanded to the modern industry and it has become more important each day.

The study of Potmesil and Chakravarty (Potmesil & Chakravarty, 1981), defines the purpose of generating synthetic images as the ability to capture the viewers´ attention to a particular segment of the image as well as to allow selective highlighting or optical effects; furthermore, it permits adaptation of cinematographic techniques to a scene, depth of field, lens distortions and filtering. The image generation process used in this study is based on two different stages:

• A hidden-surface processor generates point samples of intensity in the image using a digital pin-hole camera model. Each sample have plane coordinates, z depth distance, RGB intensities and identification of the visible surface.

• A post-processor converts the sampled points into circles of confusion to make an actual raster image. Every circle of confusion has a size and intensity that are determined by the z-depth of the visible surface and the characteristics of the lens and aperture of camera model.

Also, in the same study, explains the traditional pin-hole camera projection geometry that allows a more realistic camera model which approximates the effects of a lens and an aperture function of an actual camera. This model, shown in Figure 9, allows measure the depth of fragments knowing the diameter of the hole, the image reflected in the image plane and the “distance” between the hole and the image plane.

Figure 9. Pin-hole camera projection geometry

On summary, this model was used to give a better understanding on how synthetic images are generated and the mathematical models used to approximate the effects of lens and depth.

4.3 Obtention of depth map of a 3D scene

In 3D computer graphics, a depth map is the picture that contains information relating to the distance of scene features and objects from a specific viewpoint. The term is related to and may be analogous to depth buffer or Z-buffering.

(32)

32 Before making the projection, a primary visibility tests known as z-culling is performed. When viewing an image, objects can be furthest away from the viewer or behind other objects, so z-culling test identifies and removes those pixels in order to reduce filtrate, lighting, texturing and pixel shaders at rendering.

When an object is projected by a 3D rendering engine, its depth (z-value) is stored in a buffer and its perpendicular distance from projection plane is calculated. The range of depth values to be rendered is often defined between a near and far value of z. As seen in Figure 10, after perspective transformations an orthographic projection, the new value of z is normalized and coloured depending its deepness (between 0 and 1), values outside of this range aren't rendered (Massal, 2008).

Limitations

Single channel depth maps record the first surface seen but if the object is transparent or can reflect other objects’ it will not display this information. This limit’s deep buffer’s use in accurately simulating depth of field or fog effects.

Depending on the use of a depth map, it may be needed to encode the map at higher bit depths depending on its bit representation (an 8 bits depth map will represent a range of 256 distances).

Uses (Malik & Saeed, 2011)

• To simulate fog, smoke or large volumes of water effect within a scene.

• Z-culling is used in real time applications such as computer games, to reduce render time and improve real time performance.

• In Machine vision to transform 3D images to be processed by 2D image tools to guide robotic arms.

• To provide the distance information needed to create and generate auto-stereograms. • The Z-buffer is also for creating computer-generated special effects for films.

This chapter was useful to learn how to extract depth data out a 3D scene and which tools are the most used. Furthermore, it informs about the limitations and most common uses for the obtention of depth maps out of any 3D scene.

(33)

33

4.4 Applications of Depth Sensing

Depth sensing has become a rising technology in latest years and their possibilities has been multiplied by far. Since the real-world scenario change from 3D to 2D, lots of trivial questions like which things are further, how near or big are them, have suddenly become a big problem to solve. Depth sensing has appeared to solve these questions to capture the full information of our real world. (Li, 2017)

The next applications were researched to know the actual use of depth sensing in industry. Also, it helps to know the limitations and potential of developing a 3D sensor or camera inside a virtual environment. Finally, it was useful to select what type of demonstrator was more appropriate to test the behaviour of the synthetic camera.

4.4.1 Augmented and Virtual Reality (AR/VR)

There are different uses, mainly for sensing real 3D environments and reconstructing them in the virtual world. Depth sensing takes importance for human-machine interaction of VR/AR devices. These types of devices need high-performance depth sensors because they must respond accurately to the 3D movement of users.

There are lots of examples where it is applied: Project Tango, is the AR technology developed by Google for smartphones, uses depth sensors to measure real distances to place virtual content at proper positions (Roberto, et al., 2016). This technology improves widely the performance of AR that for example Pokemon Go gives, where often see Pokemon placed in inaccurate positions because they do not have environment depth information.

4.4.2 Robotics

Mainly depth sensors are used in robotics for navigation, localization, mapping and avoiding collision.

Fully autonomous vehicles have appeared in many warehouses to transport objects from one place to another. Any of these vehicles requires depth sensing to know where it is in the environment, where other things are and how it can safely move from A to B. Furthermore, any type of robot for picking purposes uses depth sensing to identify target and to know where it is and how to get safely it. (Einevik & Kurri, 2017)

4.4.3 Facial recognition

Most facial recognition systems use a 2D camera to capture a photo and through an algorithm deciding the person’s identity. This leads to a big security problem, a simple photo printed can fool the recognition system. In order to make facial recognition as safe as possible, 3D cameras with depth sensing are strictly necessary (Martin, 2017).

(34)

34

4.4.4 Gesture and proximity detection

(35)

35

5. METHOD

Methodology can be defined as the guide or planning to follow on producing the models and process stages to be followed to solve problems using IT (Oates, 2006). It can be considered as the framework of how the work will be done.

This project is a researching project that furthermore of giving a technical solution to a problem, it makes a deep academic and theoretical research about the main concepts that are needed for the technical part and full understanding of the project. All this research must be done with the highest rigor possible, something that always has characterised any good scientific project. This means that, this project will also include the basic academic characteristics such as analysis, argument, justification and critical evaluation.

For any IT artefact or product, Oates explains that the most common strategy is Design and Creation research (Oates, 2006). This strategy explains that any design and creation activity must follow the established principles of systems development, with the aim of developing the artefact to contribute to knowledge.

(36)

36

Figure 11. Design and Creation strategy

Above, Figure 11 shows a visual representation of how to work with this methodology. (Kosan & Saltuk, 2014)

• Awareness: the recognition and understanding of the problem that we are trying to solve. This stage should be made studying the literature review from different authors that tries to give a better comprehension of the different fields of study involved in this project. This stage will receive all the information of later steps and it will get a proposal on how our problem is working.

(37)

37 • Suggestion: the design of a tentative idea of how the problem might be addressed. In this stage we must end with an idea of how we will generate the synthetic cameras, with which tools and libraries.

• Development: the implementation and development of the idea suggested to become it into a proposed IT artefact. How the work is done in this stage is dependent on the kind of IT artefact proposed. For our case, as the artefact is the synthetic camera, the main working thread will be the construction of a tool that records images inside Simumatik software. Once that it is finished, we will work on a demonstrator of a robot cell where we can explore the huge number of possibilities that computer vision gives.

• Evaluation: the examination and testing of the developed artefact. In this stage, we look for any errors or possibilities of optimization of the artefact; furthermore, we look for an assessment of its worth and deviations from expectations. For our case, we will see if the synthetic cameras work properly and we will check if the solution for automating the cell implemented with computer vision tools works as expected.

• Conclusion: the consolidation of the results from the design process and the recompilation of the knowledge gained together with any loose ends that may remain.

(38)

38

6. DESIGN AND CREATION IMPLEMENTATION

In this chapter, the implementation of the different steps to develop the synthetic camera artefact following the methodology will be described. Once the theoretical framework necessary for the awareness stage of this project was concluded, it was time to carry on with the different stages of the methodology. Thanks to all the theoretical framework researched, a proposal to choose the different tools needed for this project was explained in the section “Proposal” of this chapter. Then, the tentative design was carried on to evaluate if the proposal methods selected were able to fulfil the requisites needed for all features and objectives set for the synthetic camera. Next, the development stage of the method was described in two different sections: “General implementation design” where the prototype’s results were fully analysed for improvements, and “General implementation development”. For the creation of the general implementation development, the iteration method was used over the tentative design by researching improvements proposals and testing over it.

6.1 Proposal

As the awareness of the problem was done after the theoretical framework’s research, the different goals that the chosen tools need to fulfil were completely stablish. Firstly, it was needed to choose the programming language. Then, for the development of the synthetic camera, a graphic API that could render different basic shapes and more complex geometries had to be chosen. Finally, for creating a demonstrator using the camera to control an automated system, a computer vision tool and a way to communicate back to the automated system had to be selected.

In relation with the different graphic APIs, an initial study of them was carried out for the Frame of Reference. The most dominant graphic APIs found were OpenGL, DirectX and Vulkan. Between these three, the choice for the tentative design was OpenGL. The software requirement of being able to operate in any OS caused the discard of Microsoft’s DirectX. Being Vulkan and OpenGL both made by the same developer (Khronos Group), the short life of Vulkan despite its predicted good future in the industry and the already huge documentation found for OpenGL was the reason for choosing it.

To validate the final artefact (synthetic camera), the camera will be integrated in a Simumatik virtual automated system for machine control, this automated system will consist of the treatment of synthetic images generated in OpenGL by a computer vision tool to control it. Once the image treatment was done, it was mandatory to find a way to communicate back to the robotic automated cell.

(39)

39 In order to find a way to communicate back to the automated system, a gateway between the python-OpenCV application and the PLC’s automated system was developed. This gateway was implemented using Object linking and embedding for Process Control Unified Architecture (OPCUA) in order to be compatible with most of automation software. OPCUA is a machine to machine communication protocol for industrial automation. It can be seen as a standardised, scalable and reliable mechanism for transferring information between client and servers. Given all these proposals, a tentative design was presented in order to test the different tools chosen.

6.2 Tentative design

Once the proposal was made, the tentative design started with the initial goal of creating a protype of the camera to prove that the chosen graphic API could be programmed to render synthetic images. For this objective, it has been rendered a 3D scene using PyOpenGL, the most common cross platform Python binding for OpenGL and related APIs.

As it is just a tentative idea, the development of the protype was restricted to the objective of creating a window where simple geometries as cubes or pyramids could be rendered, and then, save the window display in a file for its later use by a computer vision tool.

The first step was generating a virtual environment about which any element could be rendered. For this, as a first prototype, it was decided to use the easiest possibility, render to window. To generate the window the OpenGL native package called ‘GLUT’ was used for avoiding the use of another library.

The second step was rendering three-dimensional basic shapes, in this case, directly to the window. For this prototype, as seen in Figure 12, a 3D cube and pyramid were rendered defining directly the different vertices, and dictating to OpenGL in which order and structure were conformed.

(40)

40 As a third step, a way to communicate and navigate inside the scene was implemented using the keyboard. This was made for the main reason of given a better knowledge of how the perspective, translations and rotations work inside an OpenGL environment.

Finally, the scene data had to be extracted to a file in the easiest way possible. To extract this information, it was only need to read window’s pixels values (RGB) and convert them to portable network graphics (PNG) file using any library available as represented in Figure 13.

Figure 13. Prototype PNG file

This prototype clearly shows that the chosen tool, OpenGL, allows rendering 3D scenes in a quite flexible and scalable way. The next steps were focused for the remaining features of the performance of the synthetic camera. There were many aspects to keep developing in, mainly, off-screen rendering, rendering GLB and other geometries, and controlling the perspective to simulate the operation of a camera.

6.3 General Development Design

(41)

41 OpenGL, as a graphic API, allows rendering different elements that forms 3D scenes, which can be extracted out, in this case, to 2D PNG files. The process of rendering each element in a certain position of the scene, the view of each one of them and the projection defined for the whole scene is a complex process where it is relevant to have a high-level knowledge about how OpenGL transformations works.

Vertex positions in OpenGL are treated as vectors, and each transformation transforms a vector into a new coordinate system. Looking the whole performance of the camera, it can be considered as the main input the object coordinates including the camera’s coordinates, and the final output is objects displayed over window coordinates.

Figure 14 explains graphically the different transformations that internally OpenGL makes to define the final view of the scene (GLProgramming, n.d.).

Object coordinates are the local coordinate system of objects, in this case, the coordinates system corresponding to Simumatik. Thanks to multiplying model-view matrix with object coordinates, it is possible to transform the position and orientation of all elements from object space to eye space. Then, eye coordinates are multiplied with projection matrix in order to define how the vertex data are projected onto the screen. As a consequence of this multiplication, clip coordinates are weighted with a scalar, so it is needed to apply the perspective division to normalize values in all 3 axes. Finally, normalized device coordinates are scaled and translated in order to fit into the rendering screen thanks to the viewport transformation.

For the development of this project, it is only necessary to modify two transformations, projection matrix and model-view matrix, so an in-depth research of these two follows:

• Projection Matrix: it is used to define the frustum between two options, an orthographic projection or a perspective projection. For the modelling of a camera, the most realistic option is specifying a perspective projection (Figure 15) which shows a viewing frustum into the world coordinate system. (Vries, n.d.) In this matrix, it is needed to define all parameters like far, near, aspect ratio or vertical field of view.

Figure 14. OpenGL transformations structure Object

coordinates Window

(42)

42 Figure 15. OpenGL perspective projection

• Modelview Matrix: it is created by multiplying the modelling transform by the viewing transform. So, this matrix is used to define all the translations, rotations and scale of the whole scene, in order to render anything in the desired coordinates and with the desired view. In order to make all these transformations, it was needed an in-depth investigation about how transformations are applied to scene. After it, it was seen that, in OpenGL, it is not possible an isolated movement of the view and objects. For this reason, it is needed to render the contents of the 3D world from the camera’s point of view, not just from the world coordinate system; in other words, it is needed to translate your coordinate system to the camera point. (OpenGL-tutorial, n.d.)

As it was seen in the prototype, a window was generated each time the scene was rendered. As explained previously, the creation of window is a quite low process so that it was so convenient to find an alternative way. After the study of several possibilities, it was seen that the most optimum option was to create a hidden window only once for each time the simulation of Simumatik system began. Using a hidden window means that the creation of the window is completely transparent for user’s interface, it is created as a back-end process which is much more efficient.

After a depth research about which was the most optimum option to render textures and depth data, it was seen that rendering to framebuffers was faster and more optimum than rendering directly to window. Framebuffers are graphics devices that represent each screen’s pixel as locations in random-access memory. It allows to associate different textures to the framebuffer, which can store colour and depth data in different memory’s access. Therefore, after the scene is rendered to framebuffer, colour and/or depth data are extracted depending on the desired image format between RGB, luminance (L), depth (D) or RGBD.

Finally, it is worth commenting the different parameters that Simumatik send to the synthetic camera, because from this data, it will be rendered any existing scene of Simumatik. For each object it will be sent the position of it, along with the different relative position of each shape, type of shape, material (texture or colour), and specific attributes of each shape. Any shape type can be a box, a plane, a cylinder, a sphere, a capsule or a 3D mesh.

(43)

43

6.4 General Implementation Development

In this chapter, the implementation of the final version of the camera is going to be described. Its objective is to simulate taking pictures by a virtual synthetic camera located in Simumatik workspace. According to the chosen methodology, all the features explained in the previous section “General implementation design” were implemented one by one following an iteration method over the tentative design. For example, to render GLB files, the first iteration goal was to render the different vertices, the second one was to render colours, and the final iteration was to render properly the textures.

After the research done in the General Implementation Design, the first goal for the rendering was to not depend on window creation and rendering directly on the frame buffer. This was an elaborated work that, thanks to the acquired knowledge, was solved. Once the correct render was made, the camera development was focused on obtaining the frame buffer data in the correct chosen camera format. Finally, simple geometries renders were changed by obtaining Simumatik workspace objects’ data as translation, rotation, scale or textures in order to render a real workspace image.

All these features have been introduced in the code in order to provide a real camera behaviour. The camera development is focused on extending the flexibility of the Simumatik OEP to create any type of industrial systems. The implementation of the camera will allow users to create machine control systems using computer vision on the images taken by the camera or use the cameras as depth sensors for safety control applications for example.

6.4.1 Final synthetic camera model

The final version of the camera is described below from a high-level perspective, focusing on the general functioning and design decisions. The camera was developed using an object-oriented programming paradigm, what means that a camera class is going to be created as objects abstractions that can be instantiated when necessary, what allows to have more than one camera in the same workspace.

The script is included inside Simumatik OEP source code, which is ready to run on their platform and communicate with it. Every time the simulation starts running in Simumatik, an camera’s instance is created with its own window.

The software works as shown in the picture Figure 16. First, it will create and initialize all the camera’s intrinsic and extrinsic parameters as width, height, vertical FOV, near, far, position, orientation and image format with the values received from Simumatik. Once all the parameters are initialized is time to create the environment, the creation of a hidden window as well as its destruction is carried by the ‘glfw’ library.

(44)

44 Figure 16. Code state-machine diagram

(45)

45 Table 4.Objects’ attributes table based on their type.

Object type Attributes

Plane Plane normal vector.

Box Width, Height and Length.

Cylinder Radius and Length.

Capsule Radius and Length.

Sphere Radius.

Mesh Path to mesh model (GLB file) and Scale.

Figure 17. Render state diagram.

A special mention to mesh models has to be done, mesh models are GLB files that contain complex 3D geometries forms. Due to the different amount of data these files have, it forces the creation of different meshes render methods, so the camera will be able to render mesh files with or without textures. Inside GLB meshes data may exists different scenes with several nodes with its own hierarchy and each node has also its own translation, rotation, and scale. To obtain that data the use of ‘gltflib’ library have been needed. This implementation allows the camera to render every single new component the user wants to add to their systems. Due to code’s continuing iteration to obtain Simumatik data, the camera script can render every change of rotation or translation of any workspace object or even the camera itself.

(46)

46

6.5 Virtual demonstrator development

The main objective of the demonstrator is to test and check that the synthetic camera can be used in a system. The demonstrator operation is very simple, it has a robotic arm with the camera and a magnetic gripper integrated. Different types of screws will be transported, once any screw is detected in the picking position, the robotic arm will move on top of objects and the camera will take a picture.

Then, an OpenCV script will classify the model of the screw and the robot will make a pick and place operation using the magnetic gripper and placing the screw in its corresponding box. The next Figure summarizes the whole process.

In the next chapters, the different scripts and programmes that shape the demonstrator as well as the communication protocol used between them will be explained.

DEVELOPMENT OF SYNTHETIC CAMERAS FOR VIRTUAL COMMISSIONING