Towards automatic asset management for real-time visualization of urban environments

(1)

Department of Science and Technology

Institutionen för teknik och naturvetenskap

LiU-ITN-TEK-A--17/049--SE

Towards automatic asset

management for real-time

visualization of urban

environments

Erik Olsson

(2)

LiU-ITN-TEK-A--17/049--SE

Towards automatic asset

management for real-time

visualization of urban

environments

Examensarbete utfört i Medieteknik

vid Tekniska högskolan vid

Linköpings universitet

Erik Olsson

Handledare Patric Ljung

Examinator Jonas Unger

Norrköping 2017-09-08

(3)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –

under en längre tid från publiceringsdatum under förutsättning att inga

extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,

skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för

ickekommersiell forskning och för undervisning. Överföring av upphovsrätten

vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av

dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,

säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ

art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i

den omfattning som god sed kräver vid användning av dokumentet på ovan

beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan

form eller i sådant sammanhang som är kränkande för upphovsmannens litterära

eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se

förlagets hemsida

http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possible

replacement - for a considerable time from the date of publication barring

exceptional circumstances.

The online availability of the document implies a permanent permission for

anyone to read, to download, to print out single copies for your own use and to

use it unchanged for any non-commercial research and educational purpose.

Subsequent transfers of copyright cannot revoke this permission. All other uses

of the document are conditional on the consent of the copyright owner. The

publisher has taken technical and administrative measures to assure authenticity,

security and accessibility.

According to intellectual property law the author has the right to be

mentioned when his/her work is accessed as described above and to be protected

against infringement.

For additional information about the Linköping University Electronic Press

and its procedures for publication and for assurance of document integrity,

please refer to its WWW home page:

http://www.ep.liu.se/

(4)

Linköpings universitet

Linköping University | Department of Science and Technology

Master thesis, 30 ECTS | Medieteknik

202017 | LIU-ITN/LITH-EX-A--2017/001--SE

Towards automatic asset management

for real-time visualization of urban

environments

–

Realtidsvisualisering av stadsmiljöer

Erik Olsson

Supervisor : Patric Ljung and Per Larsson Examiner : Jonas Unger

(5)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – under 25 år från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och admin-istrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sam-manhang som är kränkande för upphovsmannenslitterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet – or its possible replacement – for a period of 25 years starting from the date of publication barring exceptional circum-stances. The online availability of the document implies permanent permission for anyone to read, to download, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the con-sent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping Uni-versity Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

c

(6)

Abstract

This thesis describes how a pipeline was obtained to reconstruct an urban environment from terrestrial laser scanning and photogrammetric 3D maps of Norrköping, visualized in first prison and real-time. Together with LIU University and the city planning office of Norrköping the project was carried out as a preliminary study to get an idea of how much work is needed and in what accuracy we can recreate a few buildings. The visualization is intended to demonstrate a new way of exploring the city in virtual reality as well as visualize the geometrical and textural details in a higher quality comparing to the 3D map that Municipality of Norrköping uses today. Before, the map has only been intended to be displayed from a bird’s eye view and has poor resolution from closer ranges. In order to improve the resolution, HDR photos were used to texture the laser scanned model and cover a particular area of the low res 3D map. This thesis will explain which method was used to process a point based environment for texturing and setting up an environment in Unreal using both the 3d map and the laser scanned model.

(7)

Acknowledgments

I would like to thank all engineers, PhDs and lecturers from C-research that helped me with guidance and expertise during my master thesis project. Especially, I would thank my super-visors Patric Ljung och Per Larsson for all their support. Additionally, I would like to thank my examiner Jonas Unger who came up with the idea of a real-time rendering of urban envi-ronments. Finally, I would thank KJ for learning the basics of cameras and image projection and Denny Lindberg for guidance in Unreal Engine.

(8)

3.4 Meshing . . . 11 3.5 Mesh Simplification . . . 12 3.6 UV-mapping . . . 13 3.7 HDR-image assembly . . . 14 3.8 Lens correction . . . 15 3.9 Perspective warping . . . 16 3.10 Panoram stitching . . . 16 3.11 Spherical projection . . . 17 3.12 Planer projection . . . 19 3.13 Ptex . . . 21 3.14 ReMake . . . 22 3.15 RealityCapture . . . 23 4 The Pipeline 25 4.1 Unreal (Environment set-up) . . . 27

(9)

6 Discussion 35 6.1 Method . . . 35 6.2 Results . . . 38 7 Conclusion 39 7.1 Further work . . . 40 Bibliography 41

(10)

List of Figures

2.1 The environment creation of Rise . . . 4

2.2 The creation of Scott’s apartment . . . 4

2.3 A small section from the municipality’s latest 3d-map, produced by Slagboom en Peters . . . 5

2.4 Differences in texel density . . . 7

3.1 (a) Faro Laser Scanner Focus3D. (b) Canon EOS 5DSR with 8m circular fisheye lens. Mounted on a calibrated Nodal Ninja tripod head . . . 10

3.2 Eight point clouds, measured from different locations has been aligned into one PEM in Recap. The yellow circles shows the scanner positions . . . 11

3.3 Meshed model, divided into 16 cells in Sequoia . . . 12

3.4 (a) Before simplification. (b) After 30% simplification in Simplygon . . . 13

3.5 (a) Before redundant geometries has been deleted. (b) After the clean up . . . 13

3.6 UV-mapping in Modo . . . 14

3.7 Compression between the result of the softwares UV-projection, generated from one mesh (cell 15) . . . 15

3.8 (a) The result of one HDR from Photomatix. (b) Tone mapped HDR in Photoshop . 15 3.9 (a) Original photo with perspective distortion. (b) Perspective warped photo in Photoshop . . . 16

3.10 Divided perspective warping . . . 16

3.11 Equirectangular projected panorama image, HDR merged and stitched in Pho-tomatix . . . 17

3.12 (a) The black square indicates the camera position to the spherical projection. The green circle indicates position. (b) An estimation of three camera positions. The projections has been overlapped by painting smooth transitions . . . 18

3.13 (a) Spherical projection alignment in Mari, adjusted by typing in coordinates. (b) Spherical projection alignment in Sequoia, adjusted by rotating/translating the colored handles . . . 19

3.14 (a) Paint buffer mapped to wrong UVs after baking. (b) Brush tool noise that became visible after baking . . . 21

3.15 Differences in texel resolution per faces . . . 22

3.16 Texture quality in ReMake . . . 22

3.17 Combined laser scans and photos captured from the ground. Dimensions: 120x31x34m. Polygons: 9,2M. Texture resolution: 8k (1 texture-map) . . . 24

3.18 Combined laser scans and photos captured from a drone. Dimensions: 250x145x48m. Polygons: 3M. Texture resolution: 4k (44 texture-maps) . . . 24

4.1 (a) The lasers scanners ability to capture glass. (b) Hollow windows filled with a plane in Maya . . . 26

4.2 (a)The green line shows the seam between two cells. (b) An overview of how much texture I was able to paint. The building in the right corner has been left with the base channel color . . . 27

(11)

4.3 Asset creation in Unreal . . . 30 5.1 The Pipeline . . . 32 5.2 Comparison between the 3d map and laser scanned model with HDR textures . . 33 5.3 Screenshot of the architect model . . . 34 6.1 The gap between the green lines missing resolution . . . 37 6.2 Spherical projections. (a) View pointed from the correct camera position. (b) The

(12)

1 Introduction

Today, the municipality of Norrköping uses two types of platforms to visualize the city in 3D. One is a web-based interface called City Planner, developed by the Swedish company Agency9. This interface is intended for the public to share ideas about urban planning by adding comments to particular destinations. The other platform is a multi touch table called Urban Explorer Table 2, developed by Rise Interactive. This map is provided with CAD-models of upcoming building projects that has been arbitrarily placed over the ground. Since the 3d map is created from oblique aerial photographs, the detail domain becomes very lim-ited in a close view. It is rather meant to visualize the city from a Bird’s-eye view than street view for an accurate resolution. To improve the visual experience we wanted to perform a real time rendering of the city in first person with higher geometry and texture quality than the existing 3d map. To achieve this, a combination of terrestrial laser scanning (tls) and HDR photos were used to recreate Yllefabriken and a smaller area of Strömmparken. The proce-dure of texturing a scanned model with High Dynamic Range (HDR) reference images has be done earlier by the professional VFX artist Scott Metzger [19]. By proceeding from his study we wanted to investigated how much of his pipeline could be used for our purpose and examine other softwares to get more experience and finally determine my own pipeline for a real-time rendering.

1.1 Motivation and aim

The purpose of the project has been to demonstrate for the City Planning Office of Nor-rköping how the 3d-map and Yllefabriken can be visualized in higher resolution as well as replacing the old Yllefabriken with a new apartment model created by an architect. The ge-ometries and the textures in the 3d-map looks a bit like “melted ice cream” in a close range which makes the visualization uninteresting in first person view. Therefore, terrestrial laser scanning had to be performed as well as photographing the site. How the textures should be applied on the measured geometry with best possible resolution had to be investigate as well as investigate what accuracy the geometry was required in order to not overload the render-ing. In a web-based visualization like City Planner, it takes a lot of time to load the details of the models and the user can only navigate through the scene with the mouse. In order to make the visualization faster and more like a computer game, Unreal Engine will be used as gaming engine. A game engine like Unreal is suitable for creating environments including

(13)

1.2. Research questions visual effects such as animations, illuminations, procedural materials and other assets. It also has great conditions for running the game in virtual reality. The aim of the project was to ob-tain a pipeline that could be used for processing the scanned data to a highly detailed mesh, texturing it with HDRs and rendering it together with the 3d-map in Virtual Reality. It was also desirable to reduce as much manual work as possible and rather rely on automatic soft-wares tools. The pipeline would not only be useful to visualize urban planning in real time, it could also be used for a short demo video, pre-rendered with physical camera in V-ray for example.

1.2 Research questions

The following questions were in focus of the thesis:

1. Is it worth the time and work to manually texturing a laser scanned environment to achieve high texture quality compared to doing it automatically by photogrammetry? 2. Can the scanned data be processed through the pipeline without manually resurfacing

the mesh with a reasonable performance ?

3. In what accuracy is it possible to align projected images with the geometry, directly in a texturing software without carefully estimate the camera position in an external application ?

4. Which factors are the most important to take into consideration to achieve as high tex-ture resolution as possible when the model is rendered in first person?

1.3 Delimitations

Yllefabriken was the main object to measure but lots of other geometries around became also recorded during a the scanning which could be interesting to visualize. Hence it was decided to maintain 3-4 buildings plus some of the terrain but not more. The visualization did not had to be done online, like City Planner that is why Unreal Engine was used and we could spend more time on the visual result. How effectively the rendering would be, there was no requirement as long as Unreal could run without perceivable lag. The geometries would be textured using HDR-photos and as high resolution as the gaming engine was capable of, us-ing 8-bit jpg-files as textures was not acceptable. The texture projection should be done with a combination of spherical projections with equirectangular images and planner projection with usual rectangular images. The placement of new models in Unreal did not had to align perfectly with the 3d map, just arbitrarily. The report will not describe any advanced theory about image projection, texturing, HDR-images or memory management, instead focusing on the practical work.

(14)

2 Background and Related Work

2.1 Related work

The idea of recreating an environment from laser scanned data and texturing with HDRs, came from a presentation performed by the visual effects artist Scott Metzger where he demonstrates his pipeline [20] for visual effects used in the short film RISE (figure 2.1 ) as well as when he demonstrate the recreation of his apartment (figure 2.2). His pipeline was founded by:

1. Creating a mesh from a laser scanned point could in Geomagic 2. Resurfacing the mesh in Modo

3. Merging HDRs in Photomatix 4. Lining up cameras in Maya 5. Texture painting HDRs in Mari 6. Rendering in Vray

In the apartment project all textures were projected using only spherical HDRs and painted on a UVless mesh by using the texture mapping system Ptex. In the Rise project, the tex-tures were baked on the geometry by first UV-unwrapping the mesh and then texturing it by projecting both spherical and planar images. What differs Scott’s project from our is that his purpose was to render a short movie sequence while we wanted to do it in real time like a computer game which means that the resolution has to be optimized everywhere where the user can view the surface in close up distances.

2.2 3d-maps

Today, there are a variety of map services that mainly focus on route planning and loca-tion informaloca-tion. Only a few applicaloca-tions have the ability to visualize the map in full 3D, where Google Maps and Apple Maps are the most dominating. When Google introduced Google Maps in 2005, the visualization consisted of satellite images together with oblique

(15)

2.3. Photogrammetry

Figure 2.1: The environment creation of Rise

Figure 2.2: The creation of Scott’s apartment

aerial photos projected from a bird’s eye view. A couple of years later, some cities were pro-vided with modeled buildings made by enthusiasts in Sketch-Up. Since there were only a few 3D-buildings in each city of varying quality and poorly aligned with the aerial photos, new technologies were investigated to automate the recreation of a three dimensional world. In 2008, hitta.se made a collaboration with C3 Technology and Agency9 where they together created an online 3D-visualization of cities in Sweden. C3 technology was a subsidiary of SAAB Dynamics who had developed a measuring technology for missile targeting and they used same measure technology for the 3D maps[2], hence the maps received enormously good precision. The maps are created by stereophotogrammetry where oblique photos have been taken from aircrafts mounted with multiple cameras from different angles. Hitta.se’s 3d maps had such good quality that C3 Technology was bought by Apple maps and more cities all over the world became photographed and are used in Apple Maps today. In 2012, Google Maps also began using stereophotogrammetry and photographed cities from aircrafts. In the last few years there has emerged a whole series of other companies working with 3D visual-izations of urban environments. Today, the municipality of Norrköping uses a 3d map from 2013 made by SAAB Dynamics for the visualization of the city planning project: Let’s create Norrköping [22]. To our project, we were provided with a newer map from 2016 created by the Dutch company, Slagboom en Peters [24] with much better resolution than earlier maps from SAAB, shown in figure (2.3).

2.3 Photogrammetry

Photogrammetry is a technique for creating a 3d model based on a variety of overlapping 2-dimensional images taken from multiple positions. In a single 2D-image there is no infor-mation about the depth, we have no idea of how far away the light has traveled from the object to the camera sensor. The fundamental of photogrammetry is to compute the camera position and the depth to recreate the exterior as a 3d-model. This is done by identifying key-points in the pictures with common features (also called tie key-points). Rays are then “drawn” from the center of the camera lens through the image planes in the direction of the keypoints. Where the rays intersect corresponds to a point located in the 3D-space. This procedure is called exterior orientation, mathematically it is based on by solving two collinearity equa-tions that address 3D-coordinates to the image coordinates, described in [27].

(16)

2.3. Photogrammetry

Figure 2.3: A small section from the municipality’s latest 3d-map, produced by Slagboom en Peters

Before the 3D coordinates can be determined, the direction and position of the cameras has to be known, given by x, y, z for the position and ω, φ, κ for the rotation. Some cam-eras are equipped with GPS receivers and are able to record the camera position from GPS-coordinates, otherwise the camera position has to be mathematically estimated. It is a non trivial problem and requires different solutions depending on whether the camera is cali-brated or not [7]. Basically the calculation is done by forming two calibration matrices K1and

K2using the cameras known intrinsic parameters, such as: focal length, optical center and

pixel scaling factors. Then computing the Fundamental Matrix F from two corresponding points in the images x1 Ø x2. We can then compute the essential matrix E that stores the

rotation and translation of the camera E = KT

1FK2. The camera location is the finally

esti-mated by extracting R and t from E through SVD decomposition, when E= Rt. The whole process of solving the cameras orientation is more detailed explained in chapter 9 in Multiple View Geometry in Computer Vision [12][21]. After the camera positions are calculated for all images, thousands of corresponding points are sampled over the images to obtain a dense point cloud. The method to match corresponding points is done automatically with com-puter vision algorithms where SIFT is the most common algorithm that registering invariant descriptors for each point [15]. The descriptors are generated from local image gradients. In-variant means that the descriptor value will not change change if camera view is translated, scaled or rotated. Each of the registered points are described by an vector which contains val-ues for the pixels locations in the image, the image scale, the orientation in world space and the descriptor. The vectors are then compared between two different images to establishes the correspondence between points. When two corresponding points are found, rays are pro-jected from the camera positions in the direction where the points are located and letting the rays intersect. By knowing the distance between the intersection point and the camera dis-tance, 3d-coordinates can be calculated and a dense point cloud can be obtained. The point

(17)

2.4. Lidar cloud is finally triangulated to a mesh and texture mapped by projecting the images on the 3d-model.

Aerial photogrammetry is a good method if a large area will be recreated, a 3D-map for example. But it has two disadvantages, the camera angle becomes limited from an aircraft flying 300-600 meters above the terrain which makes it difficult to recreate geometries that are obscured by other geometries, trees, tunnels, narrow alleys and cars for example. This has to do with the rays from two different images will not intersect in the exact orientation that makes the object look like it have melt together with the ground. The other disadvantage is that the texture resolution will only be sharp from a upper distance.

2.4 Lidar

The best accuracy for measuring geometries is obtained by using LiDAR (Light Detection and Ranging). LiDAR can be performed from spaceborne, airborne or tls regarding how large area is going to be measured. The measurement is performed by a scanner that emits thousands of laser beams (ir-light) per second in all possible directions, this laser beams records data of: the target position (X,Y,Z), intensity, and color (RBG).

Terrestrial Laser Scanners can use to types of measure techniques either time-of-flight or phase based [26]. Time-of-flight scanners measure the distance by calculating the time it took for the light to reflect back. For each beam that is emitted, a point is recorded, the formula for calculating the distance between the scanner and the point is: distance=0.5 ¨ v ¨ t, where v is the speed of light.

Phase based scanners emits constant waves of infrared light, the phase shift of the outgo-ing waves and the returned waves are then used to calculate the distance between the object and the scanner. The recorded points will then build a 3d model and in that case the geometry becomes more detailed than creating a 3d model from a photogrammetric algorithm since we do not have to estimate the scanner’s position. However, a Lidar instrument is significantly more expensive compared to airborne photogrammetry. A scanner costs around one million kr which is significantly lesser compared to manned aircraft operations. So both methods have pros and cons. A lack of both methods is the texture quality for close up views.

2.5 Texel Density

Texel density is how uniformly any pixels are spread across a 3D surface, i.e. a value of how much texture resolution will be represented on the mesh. The larger the UVs are the more texels it can holds that allows higher resolution in the rendering. In order to keep an uniform texel density over the entire mesh the UV-shells should always be scaled in relations to each other and not individually [32]. If the texel density is badly distributed it will result in shifting resolution when the texture is applied, shown in see figure 2.4. This should not be a problem if the UVs are mapped automatically using a software tool, then all UV shells will be scaled with same ratio.

An important rule is to never use a texture with higher (or smaller) resolution in relation to the objects texel density which L. Iezzi describes in [13]. For example if the object is de-termined to have a texel density of 1024 px/m the texture map’s resolution should also be 1024x1024 px. If an 2k texture map is used instead, the UV-coordinates will not correspond to the pixels in the texture which result in a magnified texture when it is applied. To achieve higher resolution in the rendering, both the texel density and the texture map resolution has to be increased. But the texel density is limited in relation to the UV-space, if we scaling up the UV-shells just to get higher texel density the UV coordinates will start ranging outside the (0-1) UV-space. The solution is to use multiple UV-tiles, then we have a larger UV-space to distribute the UVs and the mesh can holds more resolution.

(18)

2.6. Lens distortion

(a) No uniform texel density ñ varying resolution

(b) Uniform texel density ñ equal resolution

Figure 2.4: Differences in texel density

2.6 Lens distortion

Lens distortion arises due to the spherical shape of the lens and flaws in the optical elements which results in that the recorded image will not be perfectly projected on the image plane [6]. Radial distortion and tangential distortion is the most important types of lens distortion to correct if the image should have same perspective as reality. Radial distortion creates an displacement of an given point in the image in relation to its real location which emphasizes unwanted effects such as straight lines becomes curved as well as disproportionate magni-fications. It becomes more notable the farther away from the center of the lens because the rays are bent most there. Depending on what lens is used, different types of radial distortion appears. Distortion is typically classified into pincushion and barrel distortion. Pincush-ion distortPincush-ion turns straight lines to curve inwards and barrel distortPincush-ion makes the opposite. Lenses with longer focal length like telephoto lenses has easier to provoke pincushion distor-tion and wide angle lenses with short focal length, causes barrel distordistor-tion since it has more glass-elements that are curved. Tangential distortion is caused by that the image senor is not completely parallel to the camera lens which resulting in a tilted image.

To remove lens distortion, the camera can be calibrated by calculating the lens distortion parameters. Using the distortion parameters and the known camera intrinsic parameters, the distortion can be removed by recalculation of pixel coordinates. Instead of going into too much detail on how lens distortion is corrected, I will explain how I used softwares to correct it automatically. The mathematical formulas for computing the distortion parameters and how it is applied to an distortion free image are described in [14].

Perspective effects

When the camera is tilted up or down or angled to the side, another type of distortion ap-pears called perspective distortion. The distortion makes the objects to looks like it is falling back/forward (shown in 3.9a) because the principle axis (center of the lens) has been pointed above/below the horizon. To avoid perspective distortion the picture must be taken when

(19)

2.6. Lens distortion the principle axis is perpendicular to the horizon. That keeps the image plane parallel to the lines of the object.

Considering figure 3.9a this is the way the eyes actually perceives the building but when we see the picture on a screen our mind knows that we are looking at a picture from a differ-ent position and the perspective feels distorted. The advantage of removing the perspective distortion is to simplify the image projection during the texturing since it becomes much eas-ier to project images in orthogonal view when the camera angle always are perpendicular to the object.

There are two ways of removing such distortion. One is using a tilt-shift lens which is a movable lens that removes the distortion optically. The photographer manually change the lens in vertical/horizontal directions or by tilting it until the perspective looks straight. The other way is to correction the perspective in a post-processing software. Simply explained, four source points are selected by a user in the source images which forms a rectangular grid. The points in the source image are parameterized by x = (x, y) and are mapped into the planar shape of the grid parametrized by u = (u, v)and the correspondence are defined by two functions u(x, y) and v(x, y) [11]. When the corner points of the grid are moved to its target destination, the grid shape will change (in figure 3.9b) and the transformation can be obtained by computing the corresponding points in the image through the warp functions. This calculation is sampled in discrete points Uijover the whole image as Robert Carroll et

al. describes in [25]. To generate new pixel values the discretized points are then multiplied with computed interpolation coefficients.

If the picture are stretched too much, the image quality will be deteriorate because the interpolation. Using a tilt-shift lens will not cause any quality deterioration but it cost about 11-12 000kr and that is why I used Photoshop.

(20)

3 Method

This chapter will describe how the capture and reconstruction pipeline works and which methods are used in each step. Before a pipeline could be determined, a variety of softwares were compared and evaluated to find a structured workflow which will be explained after the Laser scanning section.

3.1 Laser scanning and photography

The laser scanner measurements was performed with a FARO Focus3D _{laser scanner, figure}

3.1a. The scanner is phase based and distributes laser beams by deflect the beams against a rotating mirror in vertical direction. Simultaneously as the mirror rotates, the scanner it-self rotating horizontally to distributes beams all possible directions. When the beams hits an opaque surface the light reflects back to the scanner and the phase shift is estimated to determine the distance, simultaneously as the vertical and horizontal angle of each point is calculated by using angle encoders. A point cloud is then created by transforming the polar coordinates to Cartesian coordinates [31].

The scanner was set to it’s maximum measurement speed: 976 000 points/sec. It has a maximal field of view of 305˝_{in vertical range, 360}˝_{in horizontal range and is able to register}

points in a range between 0.6m to 120m with a ranging error on ˘2mm (more technical details can be found in Faro Laser Scanner Focus3D_{user manual [18]).}

We scanned Yllefabriken from eight different positions in Strömparken to be sure that the most important angels of Yllefabriken were captured. The laser scanner has a built-in color camera but the sensor is only 8MP and can only capture the photos in one exposure. Therefore, the additional photography were captured with a 50MP Canon EOS 5DSR pre-set to five different exposures, from 1/2000s to 1/2s (3 EV apart). For spherical HDRs we used an 8m circular fisheye lens, the camera was then mounted on a calibrated tripod head and for each camera position, images were taken in four different directions by rotating the camera 90 degrees after each shooting, see 3.1b. The photographs was taken from almost the same positions where the laser scanner had been placed to capture same field of view as the scanner. For normal HDRs (non spherical) we used a normal 35mm lens and photographed Yllefabriken plus some other buildings in Strömparken using overlapping directions.

(21)

3.2. Software Survey

(a) (b)

Figure 3.1: (a) Faro Laser Scanner Focus3D. (b) Canon EOS 5DSR with 8m circular fisheye lens. Mounted on a calibrated Nodal Ninja tripod head

3.2 Software Survey

After the data was captured it had to be processed through a several steps before it could be rendered with textures. The fundamental steps for the pipeline are listed below:

• Point cloud alignment • Meshing

• Simplification • UV-mapping • HDR-merging • Texturing

Each step required new softwares to be examined to find their pros and cons. Most of the software were installed as 30-days trails and some could be activated as full versions with my supervisor’s licenses. By performing basic comparisons of usability and performance, the software with best result were adopted to the pipeline.

3.3 Point cloud alignment

For each new scan session the scanner creates a new point cloud and stores it in the memory card. Equal geometries will not have exactly same point coordinates if they are measured from different positions but can be aligned into a single point-based environment model (PEM [31]) by interpolating common points. This was done in Recap either by using "Auto registering" or manual point matching tool, seen in figure 3.2. The PEM was then exported and ready to be meshed in another software.

(22)

3.4. Meshing

Figure 3.2: Eight point clouds, measured from different locations has been aligned into one PEM in Recap. The yellow circles shows the scanner positions

3.4 Meshing

There are several different softwares to create meshes from point clouds and all using their own methods to compute the mesh. These softwares are also used to remove redundant data in the point cloud, captured during scanning. Meshlab[1] was initially used for cleaning and meshing the point cloud. In total, the point cloud consisted of 74 750 000 points and covered an area of approximately 15 000 m2, almost the entire Strömparken including Yllefabriken and Gammelbron.

First, we decided to only keep Yllefabriken and deleting everything else in Strömparken but later realised that it was interesting to visualize some other buildings in Stömparken. Since it was too difficult to do a proper clean up without deleting interesting geometries we decided to keep all data from the point cloud and perform the clean up after the point cloud was meshed. This made Meshlab to a unnecessary software in the pipeline and It was replaced with Sequoia which does not offers tools to erase points but had other useful tools like image projection and cell division.

Before the meshing is executed the user have to decide how dense the vertices should be sampled. In Sequoia this is done by adjusting the a parameter called Radius, it represents the average distance between neighboring points. A high radius will sample the vertices sparsely which makes the mesh thicker and details can be lost. A low radius will sample the vertices denser, which increase the amount of polygons with a cost of memory usage. Sequoia calculates a suggested radius (about 8cm) that was too large to keep an accurate level of details, hence it is important for to try out what looked good. Sampling the vertices at millimeter precision was impossible since Sequoia always ran out of memory which led to the process failure. A point radius of 5cm was enough to emphasized details like windowsill, downspouts and fences.

Which meshing method is used does has a big impact on how accurate detailed the recon-struction will be. Sequoia has three different methods to choose from: Union of Spheres Meta-balls and Zhu / Birdson, where the last one was preferred. More about which parameters can be adjusted for the selected method can be found at Software developers website: Thinkbox Software [29]. The meshing algorithms creates a two-sided mesh because the method is based on a fluid simulation (developed by Yongning Zhu et al. [33]) where the thickness becomes approximately equal to the Meshing Radius times points in depth. A two-sided mesh is of

(23)

3.5. Mesh Simplification cures unwanted since we never going to inspect the mesh from under or back. Changing the mesh from double side to single side can simply be done by enable "Conform to points".

3.5 Mesh Simplification

A reduction of the polygons should always be done before exporting a meshed point cloud to make it less complex and memory consuming. Dealing with 10-million polygon models during the editing was too much for a reasonable performance. I found that models of 1-2 million polygons was a more reasonable size to work with. Usually in computer games, multiple of models are created with different resolution, (LODs) which saves a lot of memory during the rendering. I decided to not use LODs because it is a lot of extra work to export and import several different models and the goal was not to perform a memory-efficient visualization.

The simplification was first performed in Sequoia, where the user adjusts how many per-cent of the mesh’s polygons will be retained. Sequoia claims that: “As a rule of thumb, a value of 10.0 percent is often a very good". If the mesh is reduced too much, details will disappear and sharp edges will be smoothed out. It is important to inspect how the mesh looks afterwards and determine an appropriate reduction value. I found that a reduction of 25 percent was enough to still preserve the geometries shape.

Exporting the model as a single mesh would be to much data to editing in the clean up and texture mapping process later on, given that the model consisted of 15 million polygons after the first reduction. Sequoia’s Hacksaw tool makes it possible to divide the mesh into multiple cubic cells and export them as individual fbx-files. This entails that each cell will have its own texture-map which increases the texture resolution for the entire model. I chose to split up the model into 16 cells (70x50x40m each) seen in figure 3.3.

Figure 3.3: Meshed model, divided into 16 cells in Sequoia

The simplification tool in Sequoia works only for a rough reducing. For a better a result I used the Maya plug-in: Simplygon, which is especially good at simplify flat surfaces and preserve complex surfaces. Further 10-30 percent (depending on the size of the mesh) was reduced. Figure 3.4 shows Simplygon’s ability of reducing a 3,7 million model to 1.1 million polygons. After the cells were simplified, the process of cleaning up unwanted geometries went much faster. Figure 3.5 shows how much of the redundant geometries was removed afterwards in Maya.

(24)

3.6. UV-mapping

(a) (b)

Figure 3.4: (a) Before simplification. (b) After 30% simplification in Simplygon

(a) (b)

Figure 3.5: (a) Before redundant geometries has been deleted. (b) After the clean up

3.6 UV-mapping

Normally, UV-mapping should be done manually to achieve best texture resolution. This is a very time consuming moment when the user has to manually marking seams, and divide groups of polygons into UV-islands (also called UV-shells). This can be done automatically in the most 3D-applications as well as packing the UVs in multiple tiles. The disadvantage is that some algorithms distributes the UV-islands very spares, which results in poor texture resolution. The applications I tested were Zbrush, Mudbox, Modo, and Sequoia.

Since Zbrush uses a rather unique user interface, unlike other software, it became a diffi-cult learning threshold, so I left that program for a moment to see if the other programs suited me better.

To start learning Mudbox went quite fast since it is developed by Autodesk and the user interface reminds of Maya. The disadvantage was that it took over 5 hours to generating 8 UV-tiles for a 0.4M polygon model. The UVs also became extremely downscaled and no UV-shells were generated, all polygons were mapped individually. Another problem is that it was not possible to assign material-IDs for individual UV-tiles directly i Mudbox, it has to be done in Maya or 3DS Max afterwards.

Maya is provided with one UV-mapping tool that automatically projects the polygons to UVs and packs it into a UV-map. The problem was that there is no tool for packing the UVs into multiple tiles automatically, it can only be done manually by translating the UV-shells outside the (0,0)-(1,1) UV space and the single tile it generated became sparsely packed.

I decided to use Modo for UV-mapping because the software was easy to use and it is developed by same company as the texture painting program Mari and there is many good

(25)

3.7. HDR-image assembly tutorials about the workflow between the softwares. To automatically create UVs in Modo, is done by selecting “Atlas” as projection type. As the developers of Modo explains: Atlas pro-jection, maps every single polygon from the mesh into a UV-map while maintaining relative scale based on the 3D volume of the polygons [16]. This is exactly what we want according to maintaining a uniform texel density. One drawback with automatic UV-mapping was that UVs were created from the model’s back and underside. This two sides will never be shown during the visualization, hence are these UVs unnecessary information that takes up space in the UV-map. Another UV-projection type is “projection from view”. This method creates UVs only from polygons who is facing the camera which optimize the space in the UV-map. It is a good method for projecting polygons from a flat wall for example but with the disadvantage that polygons that are hidden in the camera view never becomes UV-mapped. Multi-tile can be done automatically in Modo by using the “Pack UV”-tool. This tool packs the UV-islands automatically into a arbitrarily number of tiles as the user determines. The model that was UV-mapped with view projection, was better packed manually by increase the amount of tiles and translate the UV-islands manually into the empty tiles without scaling the UVs which the UV-packer does. Performing the atlas projection and the automatic UV-packing took about 3,5 hours for a 0.7M model. The two different UV-mapping methods are shown in figure 3.6.

(a) Projection from view (b) Atlas projection

Figure 3.6: UV-mapping in Modo

One other advantage using Modo is the material assignment. It is simply done by select-ing all the UVs in a tile, right click and assign them with a so-called "Unreal Material" and name the materials to something like u1_v1, u2_v1, u3_v1 etc. Naming the material with a unique id is an important step to later know which texture file belongs to which material slot in Unreal. Unreal does NOT number the material slots in the same order as the tiles are num-bered in Modo, due to a bug in Unreal, which Epic Games claims that it is fixed in version 4.15 but it was not [9]. It is worth mentioning that each material results in a separate draw call and each draw call have some cost [8]. In conclusion, the amount of material slots/tiles should not be more than absolutely necessary.

In figure 3.7 are Mudbox, Sequoia, Modo and Maya’s capability of distribute UVs into one tile shown. Sequoia did not support multi-tiling, but I still did a test to compare the results of one tile with the other softwares seen in figure 3.7b. It actually seems like Sequoia’s UV algorithm is the best considering how close it packs the UVs. For example, the straight lines in Sequoia’s UV-map correspond to the small-scale UV area in Maya’s UV-map 3.7d. For this reason, it was also worth evaluating Sequoia for image projection.

3.7 HDR-image assembly

All the images, captured with different exposures, had to be combined to achieve the high dynamic range into a single image. This operation can be done automatically for all images in HDR-merging softwares like PTgui, Photoshop or Photomatix Pro. Photomatix were used to perform the HDR-merging. Most because of the basic version is free, it merging HDRs faster than Photoshop and all images can be merge in one batch.

(26)

3.8. Lens correction

(a) Mudbox (b) Sequoia (c) Modo (d) Maya

Figure 3.7: Compression between the result of the softwares UV-projection, generated from one mesh (cell 15)

After the images were merged the result looked relay dull/matte because the monitor can not properly display a 32-bit HDR image. This is why HDRs must be tone mapped before they are rendered. I.e. colors from a high dynamic range are mapped to a more limited dynamic range that is adapted for the screen, which also induces details and balancing the brightness locally.

To get rid of the dullness I tone mapped some images in Photoshop by using the default filter, see figure 3.8b. The problem was that the images had been captured under different cloudiness, hence the tone mapping gave different results for different highlight conditions. So I decided to not tone map the HDRs to be sure that all information was retained.

(a) (b)

Figure 3.8: (a) The result of one HDR from Photomatix. (b) Tone mapped HDR in Photoshop

3.8 Lens correction

The 35 mm lens generated a small amount of barrel distortion which complicates the pro-jection process. Lens correction of HDR images can be done in Photoshop using the Camera Raw Filter but then it has to be adjusted manually on each image. To apply lens correction for all images in one batch, the node system in Nuke was utilized. An image of a checker pattern was captured with the Canon camera and imported into an image node in Nuke. The checker pattern was analyzed with a Lens Distortion node that calculated the cameras radial distor-tion parameters. The Lens Distordistor-tion node was then connected to all HDRs which removed

(27)

3.9. Perspective warping the distortion. White balancing was also performed in Nuke, using one images as reference and applying its white point value over the whole image batch.

3.9 Perspective warping

Perspective distortion was an inescapable fact when capturing photographs of Yllefabriken because trees obstructed the view and it could only be avoided by angle the camera. To re-move perspective effects, Photoshop’s perspective warp tool was used. First, was the warp-grid aligned with the vertical lines in the image and then it was transformed into a perpendic-ular rectangle which also transformed the image perspective perpendicperpendic-ular, see figure 3.9b.

The image with the round part of Yllefabriken could not be perspective warped proper without dividing the images into smaller parts with some overlap and straighten them sepa-rately see figure 3.10.

(a) (b)

Figure 3.9: (a) Original photo with perspective distortion. (b) Perspective warped photo in Photoshop

Figure 3.10: Divided perspective warping

3.10 Panoram stitching

The photos that were captured with the fisheye lense had to be stitched together into a spher-ical panoramas (also called lat/long image or equirectangular projection) before they could be used for a spherical projection. This could be done in Photoshop but it became too much manual work since the images needs to be HDR-merged first and then stitched to panorama images. Instead PTGui was used which performs: circular cropping, HDR-merging, lens cor-rection, stitching and tone mapping in one procedure. PTGui does not requires much inputs from the user to create a panorama since all the information about the camera and the lens is

(28)

3.11. Spherical projection stored in the image file’s EXIF-data. The only settings that has to be change is the projection type (equirectangular for spherical panoramas) and the export-format. Because of the tripod head had been calibrated in advance, the result became very good without notable seams (stitching errors) or differentiating brightness between overlaps, see figure 3.11.

Figure 3.11: Equirectangular projected panorama image, HDR merged and stitched in Pho-tomatix

3.11 Spherical projection

In the demo video of Rise[20], Metzger used the spherical images as an base layer because it fills a larger area with texture comparing to planer images. The further away the light has traveled from the objects to the lens the worse the resolution will be in the equirectangular image. This means that the resolution will be as highest on the geometries closest to the camera location. Hence, spherical projections is very well suited for texturing the ground. For projections of spherical panoramas we evaluated two applications, Sequoia and Mari.

Mari reminds a lot of Photoshop in its structure. It supports multiple channels which determines the paint resolution and color depth and stores multiple layers. Layers in Mari are categorized into: paint layers, layer masks, adjustments, and procedurals. Especially adjustments layers was beneficial to change the shading of the painted texture.

One good feature in Mari is that the user can chose to paint in a texture buffer which means that the image projection is not baked on the geometry until the B-key is pressed. If the painting becomes completely wrong the user can simply clear the buffer and start over instead of painting over the old layer. Like other texture painting softwares, images can be painted on the geometry with a soft brush which makes the images smoothly overlap each other.

For spherical projection there are two types of parameters to adjust for the alignment; translation and rotation which changing the location of the camera’s projection, relative to the models pivot point. The transformation units in the interface are relative to models scale, if the model are scaled in centimeters it means that the projection also will be translated/rotated in centimeters. Since no information about the camera positions had been recorded, apart from the image view, it could be estimated by translating the projection to it’s corresponding view in the scene. Then the camera rotation was estimated, to aim the projection at a certain surface. The Z and X-rotation were almost always zero because the camera tripod had a built-in Bull’s eye level that made it possible to regulate the cameras x-direction perpendicularly. When the camera estimation began to approach its proper position, the lower black area from the panorama appeared as a square, because a nadir shot [5] was never taken. This black square could instead be used as a reference for the camera. Each scanner position are

(29)

3.11. Spherical projection indicated by hollow circles because the scanner is not able to emit rays just below the tripod. Translating the black squares next to the circles was a good method to find an approximate camera position shown in figure 3.12. In figure 3.12a is one of the best attempt to align a spherical HDR with a building in Strömparken. It was still to much work to only align one single image, hence, Sequoia was investigated to see if it had better facilities for spherical projection.

(a) (b)

Figure 3.12: (a) The black square indicates the camera position to the spherical projection. The green circle indicates position. (b) An estimation of three camera positions. The projections has been overlapped by painting smooth transitions

Sequoia offers two features that makes the spherical projection a little bit easier than Mari. The first is that the projection location are shown in the scene and it can be transformed by dragging the colored handles, (displayed in figure 3.13b) which is much more intuitive than Maris interface. The second good feature is Sequoia’s alignment tool, it is based on point matching. The user needs to mark at least three points on the projection image and then mark the corresponding points on the geometry, by this information the program calculates the perspective plus the camera position and projects the image from that location. Sequoia did not always align the image correctly, but in some cases the result became surprisingly good. For cases when the projection did not align very well it had to manually corrected, but sometimes it was impossible to get the alignment right, no matter of how much the projection was translated/rotated. First I thought that the equirectangular projection was incorrectly created in PTgui and posted a thread on Sequoias forum regarding spherical projection [28] but later realized that everything was correct. The projection can not perfectly align with all geometries from one angle, the projection-area is limited by the camera location. Hence, the projected texture has to be smoothly overlapped with the next projection. This can only be done by painting the new texture over the old. Since Sequoia is not equipped with any painting tool, including that the program stopped responding very often during the image alignment, there was no point to continuing using Sequoia for image projection.

I went back to Mari but focusing more about painting smooth overlaps between the pro-jections, but it took too much time to only finishing one overlap so I stopped texturing with spherical projections for a while and went over to planner projections to see if it was easier. I also examine other texture painting software In case of Mari was not the optimal for the pipeline.

(30)

3.12. Planer projection

(a) (b)

Figure 3.13: (a) Spherical projection alignment in Mari, adjusted by typing in coordinates. (b) Spherical projection alignment in Sequoia, adjusted by rotating/translating the colored handles

3.12 Planer projection

Unlike spherical projection is the camera position for planner projection not estimated by typing in any coordinates. The projected image appears as a half transparent image plane in the screen space and then the camera position can be found by navigate the camera until the image plane align with the geometry. For fine tuning the image plane can also be rotated, translated and scaled.

To perform planer image projection, three different texture painting softwares were ex-amined; Mari, Zbrush, and Mudbox. The requirements for obtain a software to the pipeline was: easy to use, support multi-tiles, and equipped with good painting tools to overlapping images.

Since the user interfaces in Zbrush was difficult to manoeuvre including that it took much time to learn the basics for texture painting, Zbruch was excluded from the project. However, it has plenty of useful tools for both sculpting and texture painting which is rather suitable for painting game characters than architecture.

The worst issue with Mudbox was that it is not possible to perform a non uniform scaling of the image projection which is very important for aligning images. Some artists even make the alignment in Photoshop before. Another problem with Mudbox, is that the texture is directly baked (not storing the pixels into a buffer) when the user starts painting. Since it was 50MP HDR-images to be painted, it became too much information for the program to process in real time which made the program stop responding very often.

Finally, MARI was obtained to the pipeline because it met all requirements. Mari has three different view modes for texture painting: perspective, orthogonal and UV-painting. The orthogonal view was to prefer because it was much easier to find the camera location when the view always was orthogonal to the mesh. This required that the images had been perspective warped before. The other option was to project the images from the perspective view. This was much harder since the camera position had to be located from almost the exact same position as the photo was taken. One option is to aligning the camera positions from another 3D application provided with more camera adjustments (Maya or PhotoScan for example) and exporting the cameras as a FBX-file and then import it to Mari. But that was a process I wanted to avoid to save time.

Unfortunately there was some issues in Mari when the panting was baked in perspective view. The colors in the paint buffer could in some cases be mapped to wrong UVs after it was baked, see figure 3.14a. Sometimes it also produced noise in the textures (figure 3.14b) during

(31)

3.12. Planer projection the painting but it could be avoided by using the stamp tool instead of the brush tool. What is important to remember is to set the correct resolution size of each texture tile before painting. If the size is set to 2k, the resolution will not be increased if it is changed to 8k afterwards, which happened to me and I had to start over with the painting.

(32)

3.13. Ptex

(a) (b)

Figure 3.14: (a) Paint buffer mapped to wrong UVs after baking. (b) Brush tool noise that became visible after baking

Much time was spent getting all windows to align with the projection and painting soft overlaps between the pictures. It is desirable that one image should cover up a wall as much as possible to get as less seams as possible. The manufacturer of Mari claims that "If you are using patch resolutions higher than 4K, we recommend that you zoom in to the surface when painting, to keep the resolution sharp". It was really challenging to aligning the images in a zoomed view since lots of the model and the projection was outside the screen range.

When the painting was done all the texture-tiles (called UDIM in Mari) could be exported as either 16/32-bit exr or 8-bit png-files, depending on what color depth the texture channel was set to.

3.13 Ptex

Petex has several major advantages over traditional UV mapping. No UV-unwrapping has to be done before the mesh can be textured and the resolution can be radically increased compared to UV-maps. Avoiding the unwrapping process saves lots of times and no arti-facts from seams will be visible since Ptex using a seamless technique [30]. Each face of the geometry has an individual texture paths, hence, the texture resolution is determined by in-dividual face resolutions. For example one region of faces can have 16x16 texels per face and another region 128x128 texels per face or even higher depending on what software is used. A demonstration of different texel resolutions per faces are shown in figure 3.15.

In Mari there is usually two options of deciding the face resolution, either by Uniform Face Size or Worldspace density. Uniform Face Size has a squared fixed size for all faces regardless of models face dimensions. It can causes non smooth transitions between the faces due to its differences in texel densities, so it is not an good option for complex models. For Worldspace density the software determine each face’s resolution based on a given number of texels per unit of world space, in other words the resolution is dependent of the size (longest edge) of the faces [17]. Another advantage with Ptex is that each texture channel is not exported as separate image files. If the mesh has been painted with multiple channels: diffuse, dirt, specular, luminescence, displacement etc. all textures will be stored in one Ptex-file. In the beginning of the project Ptex were used because no good method for uv-unwrapping had been evaluated. The texture resolution became almost as good as the image itself in a relay close zoom. The problem is that today’s game engines do not support Ptex since the format was intended for rendering models in animated movies rather than real time-rendering. It would save lots of work to remove uv-mapping from the pipeline but today, you just have to wait for the game engines and the graphics cards will be developed for Ptex as Neil Blevins explains in [4].

(33)

3.14. ReMake

Figure 3.15: Differences in texel resolution per faces

In order to investigate whether the texturing process could be simplified into a few steps, I examined two softwares that calculates the camera, projects and baking the the texture automatically.

3.14 ReMake

An attempt to recreate only Yllefabriken by only using the photographs was done in Re-Make. ReMake recreates models by photogrammetry and performing UV-unwrapping and texturing automatically and the model can simply be exported as an fbx file with one asso-ciated texture file. The application is really easy to use but for our purpose it did not meet the requirements. It is only supports jpg-files and it did not managed to align the images proper and both the geometry and texture quality became very low in close distances, be-cause the software interpolates new colors from images that have been aligned, see figure 3.17b. The second problem was that images that had been captured from positions where trees obscured the view also became projected over the mesh. ReMake seams to be a better choice when smaller objects has to be recreated that can be photographed in a closer distance and from more angles, considering what they demonstrate in their tutorials.

(34)

3.15. RealityCapture

3.15 RealityCapture

RealityCapture is a program to create accurate 3D models from images through photogram-metry and Lidar data. To obtain both highly texture quality and geometric resolution, laser scans and photos can be combined which also saving lots of time considering the manual work during UV-mapping and texture painting. Since RealityCapture can not import HDR-images, I imported an RAW-format image set, taken with same exposure. After the images and the point cloud are imported there is basically six steps to go through before the model is done:

• Alignment (aligns all the imported laser data into one point cloud) • Reconstruction (meshing the point cloud)

• Alignment (aligns all the images)

• Texture (texture the mesh with the aligned images) • Simplify (reduce the numbers of polygons)

• Mesh (exports the mesh and the texture)

The result after the alignment can vary depending on how much data RealityCapture is able to aligning into one component. Generally, several components are created in the first alignment. These components then has to be manually realigned by point matching a few scanner-images from different components. Just like in Sequoia, the accuracy for the vertex sampling can also be adjusted by changing the “Minimal sample distance”. By default the accuracy is set to 2mm which they also recommend in their tutorial so I adopted this value. For some reason RealityCapture never managed to complete the mesh over the entire scanned area, it crashed every time in the middle of the process. Probably because it was too much data to process and the program ran out of memory like Sequoia did. The only way to complete the process was to import a smaller region of the scanned area. I chose too only recreate Yllefabriken.

The texturing was not particularly successful either. It seemed that the program tried to overlap the textures too much and interpolate a mix between two images too much instead of just applying image by image, a problem that is avoided when the texture is done manually. In contrast to Sequoia there is no Hacksaw feature in RealityCapture, everything has to be exported in one mesh or a specific area that the user marks. This means that the exported texture file will have a very limited resolution if it is exported with a single texture. Reali-tyCapture is supported for multiple texture-tiles but for some reason it only generated one, even though I changed to multiple. Exporting a single 8k texture file was the best option. The result is seen in figure 3.17 where the model is rendered in Unreal.

For a further comparison, the city planning office had sent me a new 3d-model of an industry located at the harbor in Norrköping. This model had also been processed in Reality-Capture, with a combination of Lidar data and photographs captured from a drone. Its area was almost as big as my model and the surface was divided into 44 UV-tiles, each provided with a 4k texture image. The model was imported into Unreal and applied with its material, the result is shown in figure 3.18.

(35)

3.15. RealityCapture

(a) (b)

Figure 3.17: Combined laser scans and photos captured from the ground. Dimensions: 120x31x34m. Polygons: 9,2M. Texture resolution: 8k (1 texture-map)

(a) (b)

Figure 3.18: Combined laser scans and photos captured from a drone. Dimensions: 250x145x48m. Polygons: 3M. Texture resolution: 4k (44 texture-maps)

(36)

4 The Pipeline

This chapter describes the pipeline for the recreation of scanned site, step by step including how the scene was arranged in Unreal.

Recap (point cloud alignment)

The scanned data set was imported into Recap and automatically aligned into a single PEM.

Sequoia (meshing and hacksaw)

The PEM was imported into Sequoia and meshed with an accuracy of 5cm. The mesh was then reduced to keep 25% of the polygons and divided into 16 cells. After the reduction, all cells together contained 15 million of polygons. All 16 cells were then exported as individual FBX-files. Only the most central cells of the model were processed through the pipeline which included 6 cells, the other cells contained to much redundant geometries.

Simplygon (simplification)

In Simplygon, the cells were reduced 30-15% further.

Maya (clean up)

Incoherent geometries like bushes and detached "islands" were removed in Maya, by simply selecting the polygons and deleting it. Now the cells only consisted of 198 000 - 776 000 polygons each.

Maya (creating windows)

The scanner is not capable of capture transparent materials like water or glass since the laser beams does not reflect back until they hit an opaque material, see figure 4.1a. With no texture in the windows, the model would look very incomplete. Planes were placed inside the walls to fill up the hollow windows, see figure 4.1b. The planes were merged with the entire mesh (using the combined tool) and the new models were exported as an obj-files.

(37)

(a) (b)

Figure 4.1: (a) The lasers scanners ability to capture glass. (b) Hollow windows filled with a plane in Maya

Modo (UV mapping)

Cell number 10 (eastern part of Yllefabriken), were UV-mapped by view-projection and packed into 5 tiles. The other cells were UV-mapped by atlas-projecting and packed into 4x2 tiles. Material IDs were assigned to each tile and the models were then exported as FBX-files. The cell with most polygons took about 3,5h to pack the UVs. The cell with fewest polygons took about 30 minutes to get done.

Photomatix (HDR-merging)

All exposure-images were imported into Photomatix and a HDR-batch was performed. The HDR-images were exported as 32-bit uncompressed OpenExr with retained pixel dimensions and without any tone mapping. The merging process took about 1,5 hour to complete 84 HDRs.

Nuke (Lens correction)

The HDRs were imported to Nuke and globally white balanced and exposure adjusted. Lens distortion was removed by applying a lens correction node to all images.

Photoshop(Perspective warping)

A few non-spherical images of Yllefabriken were perspective warped in Photoshop using the perspective warping tool.

MARI 3 (texture projection and image baking)

After all models had been; reduced, cleaned and UV-mapped, they were ready to be textured in Mari. All cells were imported at the same time which facilitates the painting when an image covers two cells, see figure 4.2a.

All 32-bit HDR-images were imported and the texture buffer was set to 8k resolution and 16-bit depth, as well as the texture-map resolution were changed to 8k for all tiles.

Yllefabriken was the first object to be textured by aligning and painting the warped images in ortho view. These pictures did not cover the entire building so they were supplemented

(38)

4.1. Unreal (Environment set-up)

(a) (b)

Figure 4.2: (a)The green line shows the seam between two cells. (b) An overview of how much texture I was able to paint. The building in the right corner has been left with the base channel color

with regular images (non-warped), painted in the perspective view. The pictures that had a lower/higher brightness had their light settings adjusted to better melt together with over-laying images. Much of the area in Strömparken could not be textured since we had a limited set of images taken from only a few angles. This areas were left to have the basic gray channel color shown in figure 4.2b . After all the painting work was done, each cell’s texture tiles was exported as 16-bit EXR-files into individual folders for each cell.

4.1 Unreal (Environment set-up)

The municipality had sent 9 tiles of their 3D map over central Norrköping. The files were sent in Collada format and had to be converted to FBX format before they could be imported to Unreal. A new project was created with a first person character. New projects do always have a start level which contains:

• SkyDome • SkySphere • Directional light • A character with a gun • Fog

• A floor

The floor was deleted and replaced by the nine map models. When a model is imported into Unreal the user is always asked if LODs should be created. I chose to never create LODs since it took too much time for Unreal to went through that process. For texture assignment it is simply to check “import textures” and “import materials” and the engine will automat-ically import the texture, create a materials of it and assign it to the models material slots as long as the files are located in same folder. Each map-tile is stored with SWEREF-coordinates which made the map appear several kilometers away from origin. For simplicity the tiles

Towards automatic asset management for real-time visualization of urban environments

Department of Science and Technology

Institutionen för teknik och naturvetenskap

LiU-ITN-TEK-A--17/049--SE

Towards automatic asset

management for real-time

visualization of urban

environments

Erik Olsson

LiU-ITN-TEK-A--17/049--SE

Towards automatic asset

management for real-time

visualization of urban

environments

Examensarbete utfört i Medieteknik

vid Tekniska högskolan vid

Linköpings universitet

Erik Olsson

Handledare Patric Ljung

Examinator Jonas Unger

Norrköping 2017-09-08

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –

under en längre tid från publiceringsdatum under förutsättning att inga

extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,

skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för

ickekommersiell forskning och för undervisning. Överföring av upphovsrätten

vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av

dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,

säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ

art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i

den omfattning som god sed kräver vid användning av dokumentet på ovan

beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan

form eller i sådant sammanhang som är kränkande för upphovsmannens litterära

eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se

förlagets hemsida

http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possible

replacement - for a considerable time from the date of publication barring

exceptional circumstances.

The online availability of the document implies a permanent permission for

anyone to read, to download, to print out single copies for your own use and to

use it unchanged for any non-commercial research and educational purpose.

Subsequent transfers of copyright cannot revoke this permission. All other uses

of the document are conditional on the consent of the copyright owner. The

publisher has taken technical and administrative measures to assure authenticity,

security and accessibility.

According to intellectual property law the author has the right to be

mentioned when his/her work is accessed as described above and to be protected

against infringement.

For additional information about the Linköping University Electronic Press

and its procedures for publication and for assurance of document integrity,

please refer to its WWW home page:

http://www.ep.liu.se/

Linköping University | Department of Science and Technology

Master thesis, 30 ECTS | Medieteknik

202017 | LIU-ITN/LITH-EX-A--2017/001--SE

Towards automatic asset management

for real-time visualization of urban

environments

Realtidsvisualisering av stadsmiljöer

Erik Olsson

Upphovsrätt

Copyright

Acknowledgments

Contents

List of Figures

1

Introduction

1.1

Motivation and aim

1.2

Research questions

1.3

Delimitations

2