Code:________________
Faculty of Engineering and Sustainable Development
Feature Extraction From Images of Buildings Using Edge Orientation and Lengths
Viktor Danhall June 2012
Bachelor Thesis, 15 credits, C Computer Science
Computer Science Program
Examiner: Peter Jenke
Supervisor: Stefan Seipel
Feature Extraction from Images of Buildings Using Edge Orientation and Length
by
Viktor Danhall
Faculty of Engineering and Sustainable Development University of Gävle
S801 76 Gävle, Sverige
Email:
ndv09vdl@student.hig.se
Abstract
To extract information from a scene captured in digital images where the information represents some kind of feature is an important process in image analysis. Both the speed and the accuracy for this process is very important since many of the analysis applications either require analysis of very large data sets or requires the data to be extracted in real time. Some of those applications could be 2 dimensional as well as 3 dimensional object recognition or motion detection. What this work will focus on is the extraction of salient features from scenes of buildings, using a joint histogram based both the edge orientation and the edge length to aid in the extraction of the relevant features. The results are promising but will need some more refinement work to be used successfully and is therefore quite a bit of reflected theory.
Keywords: Feature extraction, Edge detection, Joint histogram.
Content
1 Introduction...4
2 Theoretic Background...4
3 Method...5
3.1 Outline...5
3.2 Filtering the image...9
3.3 Determine the strongest edge...12
3.4 Calculating the line length...13
3.5 Selecting Edge Features...17
3.6 Expected Result ...19
4 Results...19
5 Discussion...27
6 Conclusions...34
References...35
1 Introduction
Mobile devices (both smartphones and tablet computers) have found their way into the modern day households. Most mobile devices are equipped with a digital camera, an internet connection by either wifi or 3g/4g, global positioning system (GPS) and the computational power capable of many different applications with heavy computational demands. Many interesting and innovative applications could be created by using the available hardware.
Such applications could be different novelty augmented reality games that use markerless surface tracking to place computer created objects “in the real world” using the camera and display of the mobile device. Markerless in the context of augmented reality means that there is no need for any special object to act as a point of reference or point of measure. Dynamic information overlay of a scene is another possible application mentioned by Specht et. al. [1] where the mobile device is used to display information of the real world using the camera and the display, for instance displaying the house name and number of a school building as a virtual marker on the mobile display.
Object recognition is another possible application suitable for the powerful mobile devices. This could be done by extracting feature points in the mobile device, sending that data to a server that crossreferences the data towards the reference data and returns the most probable objects in the mobile camera view. This could be used for virtual tours to guide tourists in a cheap and efficient way around unfamiliar areas, for instance using the mobile device to offer a personalized tour around archaeological sites (Vlahakis et. al. [2]).
For aforementioned applications, there is a pervasive requirement to extract structural features from the surrounding scene. This could be done by the use of image segmentation and pattern recognition.
The purpose of this work is to find a reliable method to extract feature information using computer vision with the main focus on images of buildings and urban areas. The objective is to implement and test a new feature detection approach that utilizes two visual properties in images: The edge feature orientation and its length to attempt to create a statistic model.
2 Theoretic Background
Developing reliable methods to extract features from a scene is a challenging task but is necessary for a wide range of applications. For instance, could feature extraction methods be used for estimating geometric structure and camera motion by e.g using the RANSAC algorithm created by Chum [3], they could be used to model objects (e.g. facades Müller et al. [4]), or they could be used for recognition of different objects (e.g. Lowe [9]).
In computer vision, feature extraction is the process of extracting sets of data points, lines or any other kind of feature from a digital image in order of simplifying the computational processes later on. Most applications therefore require fast feature extraction, not only real time applications, such as augmented reality, where the result is shown as the application is running, but also in asynchronous applications such as building extraction from satellite imagery where the result is shown after the algorithm has completed.
The typical image segmentation algorithms usually bring out some features of the
[7]) or they label regions by clustering algorithms. Fu and Mui [8] mention that: ”The image segmentation problem is basically one of psychophysical perception and therefore not susceptible to a purely analytic solution”. To paraphrase this, one could say that there is no one universal solution to the image segmentation problem, instead it requires case dependent solutions, i.e a flower extraction algorithm requires a different image segmentation than a face recognition algorithm.
A common method for image segmentation is edge detection. Edge detection generally works by estimating rapid changes in intensity of a grayscale image. This is often done by calculating the derivative of the intensity channel using convolution with a discrete kernel: Where the derivative crosses zero the pixels are labeled as a part of an edge. The problem of filtering digital image data with the existing convolution based methods such as the Prewitt operator [5], Sobel operator [6] and the Roberts operator [7] is that the intensity does not always represent a structural feature of the real world, i.e it can also represent shadows. The intensity from image data can also contain occlusion due to poor camera conditions such as too dark scenes or due to that moving objects or the moving camera can introduce motion blur.
Using special equipment such as laser scanners instead of ordinary cameras can reduce noise since they introduce more information from the scene. Special equipment can however both be expensive and not as accessible for everyone as a simple digital camera. Also, ordinary digital images already exist in large quantities in public databases such as Flickr [16] and in possible private photo collections that could use photo organizing algorithms that are dependent on some sort of image segmentation such as the face recognition available in Google Picasa [17].
Many methods utilize the ScaleInvariant Feature Transform (SIFT) method first developed by Lowe [9], in order to extract salient features. That works by detecting interest points, or keypoints from a training set that describes each object that should be detectable by the algorithms. SIFT then use those keypoints to attempt to find objects within the ”real” image data. The actual extraction of the keypoints used by Lowe was the difference of Gaussian method. The difference of Gaussian algorithm works by subtracting one blurred version of the original image from a less blurred image where the blurring process is performed by convolution with a Gaussian filter kernel. Both Zhang and Kosecka [10] and Schindler et al. [11] use a SIFTbased technique for their segmentation and Fritz et al. [12] use an altered version of the SIFT algorithm called informative SIFT (or iSIFT for short) that works in a similar way.
Daugman [15] presents a method to create two dimensional Gabor filters. Those filters can detect edge features at different orientation and frequency.
3 Method
3.1 Outline
The method proposed in this work is to extract edge features and determine the length and orientation for each one of those features and use that information to attempt to select the best features. This could for example be used to extract feature points by finding where the edge structures of the different orientation intersect.
In Figure 1, a synthetic image shows the basic principle of the method, where the building has many recurrent edge structures composed of windows. All window features have approximately the same edge length and only populate the horizontal and vertical directions. This would create a joint histogram, as shown in Figure 1, where there is a clear overpopulation for the horizontal and vertical orientations. The algorithm would also keep the spatial positions by labeling the edges with its structural length and orientation.
Table 1 shows an example of a joint histogram of Figure 1 where a clear pattern can be seen. The horizontal and vertical lines show a clear majority of short lines representing the repeated window structures in the image.
Figure 1: Synthetic edge image example. All images (Upper left). Horizontal lines (Upper center). Vertical lines (Upper right). Other angles (Bottom left, center and right).
Horizontal Vertical Angle 1 Angle 2 Angle 3
Very short 0 0 8 9 4
Short 29 28 0 0 0
Medium 0 2 0 0 0
Long 0 2 0 0 0
Very long 2 2 0 0 0
Table 1: Joint histogram based on Figure 1.
The algorithm could be described in 4 major steps:
1. Determine the discrete amount of angles to use in the algorithm and for each of those angles determine the strength of the edges in that given angle.
Each of the discrete angles used will in this work be given an index number as shown in Table 3 (i.e. angle 0.0 will be referred to as index 1 and angle 0.25 will be referred to as index 5)
2. For each pixel determine the dominant orientation and label that pixel with the index number for that angle to create an “angle map” containing the angle of the strongest edge values for each pixel in the image. (i.e. if the strongest edge angle at a given pixel in the image is that of index 5, then the value of the angle map for that pixel will be 5)
3. Calculate the length of each feature in their given orientation. This will create the joint histogram shown in Table 1 along with a length image (example at Table 2) for each angle. The length image contains the detected edge feature lengths for each feature of an edge direction. The purpose of the length image is to keep the spatial positions of each edge feature to simplify the process of masking the final edge features. (i.e. to select the edge features of the length 10 and direction of index 5 we simply select all pixels with the value 10 in the length image of angle #5)
4. Lastly, select the edge features with a dominant orientation and a frequent edge length in the histogram.
Code Segment 1 shows a brief outline of the code used in the method where each function call will be explained in depth in its appropriate sub chapter.
The code provided has been implemented using GNU Octave, a Matlab clone. It should probably work fine in Matlab with some minor adjustments. Each full code segment is a separate function and should be in a separate file however Code Segment 5 to 7 are one function split into three parts to fit into the work (and is one file). The images used to test the algorithm are from publicly available image collections [18]
[19][20] together with some private images (for nonbuilding images).
0 0 0 0 0 0 0
3 0 0 4 0 0 0
0 3 0 0 4 0 0
0 0 3 0 0 4 0
0 0 0 0 0 0 4
0 0 0 0 0 0 0
0 0 0 0 0 0 0
Table 2: Length image example
Index 1 2 3 4 5 6 7 8
Angle 0.0 0.0625 0.125 0.1875 0.25 0.3125 0.375 0.4375
Index 9 10 11 12 13 14 15 16
Angle 0.5 0.5625 0.625 0.6875 0.75 0.8125 0.875 0.9375
Table 3: used angles (in π radians).
function features=main( image, angle_list, max_line_length)
%% out:
%% features= the detected features
%% in:
%% image = gray-scale image
%% angle_list = the list of angles to use
%% max_line_length = the maximum length of a line
%% edge. Any lines longer will be marked as
%% max_line_length
% Variables
[iheight iwidth]=size(image) filter_count=length(angle_list);
filter_size=9;
filter_waves=2.0;
filter_wave_falloff=1.0;
filter_stdev=0.5;
threshold=0.01;
% 2.2 Filtering the image filter_list={};
raw_edges={};
for( ind=1:filter_count ) angle=angle_list{ind};
filter=angular_filter(angle, filter_size,filter_waves,
filter_wave_falloff, filter_stdev );
filter_list=cat(2, filter_list, filter);
endfor( ind=1:filter_count )
filter=filter_list{ind};
edges=conv2(image, filter, ”same”);
raw_edges=cat(2, raw_edges, edges);
end
% 2.3 Determine the strongest edge
angle_index=index_angles(raw_edges, iwidth, iheight, threshold);
% 2.4 Calculate the Line length angle_length_histogram=[];
angle_length_image={};
for( ind=1:filter_count ) angle=angle_list{ind};
edges=(angle_index == ind);
seeds=create_seedpoint(angle, iwidth, iheight);
[length_histogram length_image]=ray_cast(
angle, edges, seeds, max_line_length );
angle_length_histogram=cat(1,
angle_length_histogram,length_histogram);
angle_length_image=cat(2,angle_length_image, length_image);
end% 2.5 Selecting the Edge Features To Use features=select_features(
angle_length_histogram, angle_length_image, angle_list, 5,6);
Code Segment 1: Code outline.
3.2 Filtering the image
The first step of the algorithm is to filter the image for each given direction. This could be done by any edge detector, given that it can detect edges at specific angles. For the sake of this work I have developed a function shown in Code Segment 2 that will dynamically create a filter for a given angle. The used filters are a simplified version of the two dimensional Gabor filters [15]. The simplified version detects only the orientation of the image features by discarding the trailing kernel waves. By discarding the trailing waves we can get a higher response for edges that are not periodically recurrent.
The continuous function for a filter with a centered kernel spike (Figure 2 left) is shown in Formula 2 and the function for offcentered spikes (Figure 2 right) is shown in Formula 3. Both those formulas use the Formula 1, which calculate the distance from the center line of the kernel.
Figure 2: Directed filter kernel with Gaussian distribution. Using the cosine function (Left). Using the sine function (Right). Variables used: 30 by 30 samples, angle of 0, 2 waves,
falloff of 0.5 and a standard deviation of 0.5.
dist =(sin (α)∗x+cos(α)∗y)∗wfactor
Formula 1: Distance from kernel center line
kernel=
{
dist≥wfalloff : cos(wfalloff )∗e−((x2+y2)/(2∗stdev2)) dist<wfalloff : cos(dist )∗e−((x2+y2)/(2∗stdev2))Formula 2: Kernel with centered spike
kernel=
{
dist≥wfalloff : sin(wfalloff )∗e−((x2+y2)/(2∗stdev2)) dist<wfalloff :sin (dist)∗e−((x2+y2)/(2∗stdev2))Formula 3: Kernel with offcentered spikes
Code Segment 2 shows the actual function implementation used in GNU octave . The function requires the following 5 arguments: (1) the angle of the edges that the filter should be sensitive to, (2) the filter width (e.g. 9 would produce a 9 by 9 filter), (3) the wave_factor or the amount of waves that should be fitted onto the filter, (4) the wave_falloff that represent at what wave the function shall truncate the filter to zeros and (5) a standard deviation (stdev) used by the Gaussian kernel multiplied to the filter. The function will also accept a 6th argument for the trigonometric function to produce the waves; the cosine function will produce a filter with a centered wave while the sine function will produce two equally intensive offcentered pikes.
function filter=angular_filter(
angle, width, wave_factor, wave_falloff, stdev, trig_func )
%% out:
%% filter = the angular filter
%% in:
%% angle = the used angle in radians
%% width= filter side width (and height)
%% wave_factor= the number of waves
%% wave_falloff= the number of waves until
%% truncation
%% stdev= standard deviation of Gaussian kernel.
%% trig_func= trigonometrical function to use
%% for wave. e.g ”@cos”/”@sin”
% If no trigonometric function provided then
% use cosine
if( !exist(”trig_func”, ”var”) ) trig_func=@cos;
endfilter=zeros(width, width);
wave_falloff*=0.5;
wave_falloff*=pi;
coords=-1:(2/width):1; %Pre-calculate coordinates for( i=1:width )
for( j=1:width)
tmp_filter= sin(angle)*coords(i) +cos(angle)*coords(j);
tmp_filter*=wave_factor;
if( wave_falloff > 0 )
mask=abs(tmp_filter)>wave_falloff;
tmp_filter(mask)=wave_falloff;
endtmp_filter=trig_func(tmp_filter);
gaussian=e^(-((coords(i)^2+coords(j)^2) / (2*stdev^2)));
tmp_filter*=gaussian;
filter(i,j)=tmp_filter;
end end
% Correction: Fit filter to sum to approximately 0 threshold=0.01;
s=sum(filter(:));
while( abs(s) > threshold ) filter-=s/(width*width);
s=sum(filter(:));
end
Code Segment 2: Edge filter function
Figure 3 shows an example filter created by the function in Code Segment 2 using the variables: an angle of 0, width of 9 (9 by 9), wave_factor of 2.3, wave_falloff of 1.5, standard deviation (stdev) of 1.5 and cosine for the trigonometric function (trig_func). All used filters will be presented in the result chapter (Table 4).
Figure 3: Discrete filter example that is sensitive to horizontal lines.
3.3 Determine the strongest edge
This step is to determine the strongest edge intensity for each direction at each pixel to remove overlapping edge features at different directions. In order to mark the pixels with the dominant angle we use an index number (as explained in the method outline in chapter 3.1). Code Segment 3 provides with the code for this step.
function angle_index=index_angles(raw_edges, width, height, threshold)
%% out:
%% angle_index= a two dimensional array (image)
%% where each element represent an indexed
%% angle.
%% in:
%% raw_edges= The raw edge data. Could be viewed
%% as an image with one channel for each
%% angle.
%% width= Image/raw_edges width
%% height= Image/raw_edges height
%% threshold= index threshold.. All edge values
%% below will be ignored angle_count= length(raw_edges);
angle_index=zeros(height, width);
% Current max weight
current_weight=ones(height, width)*threshold;
for( ind=1:angle_count )
edge_map=raw_edges{ind};
mask=(current_weight<edge_map);
current_weight(mask)=edge_map(mask);
angle_index(mask)=ind;
end
Code Segment 3: Classifying angles to index.
3.4 Calculating the line length
Calculating the edge length is done by ray casting for each of the directions. The ray casting is performed once per each orientation for the creation of the angle and length joint histogram and once for each orientation for the creation of the length images.
Each sample code segment in this subchapter works upon one direction channel at the time.
The first step is to determine the seed points for the ray casting as the code in Code Segment 4 shows. The seed points could be explained as the points where each ray should start sampling. Figure 4 shows that there are four different cases to take into considerations when determining the seed points: where the ray travels horizontally (with the angle is 0.0 radians),π where the ray travels vertically (with the angle is 0.0 radians) and also when the ray travels at anπ diagonal angle, (with the angle of either 0.0 < angle < 0.5 or 0.5 < angle < 1.0 radians). For the horizontalπ π π π or vertical ray cases, the seed points need to cover the vertical edge or the horizontal edge respectively (the left and middle right part in Figure 4) and for the diagonal rays, the seed points need to cover the vertical edge and the horizontal edge (the middle left and right part in Figure 4).
Notice that the directions shown in Figure 4 only cover a half circle since the ray casting is symmetrical; i.e. calculating the line length can be done both from left to right and right to left without changing the resulting length of the edge feature.
Angle = 0π 0 < Angle < 0.5π π Angle = 0.5π 0.5 < Angle < π π Figure 4: Seed points. The seed points are represented by the black border of the
images. (angles in radians)
function seeds=create_seedpoint(angle, w, h)
%% out:
%% seeds = seed points to start ray casting from
%% in:
%% angle = angle in radians to ray cast
%% w = image width
%% h = image height
% Clamp angle to half-circle length angle=mod(angle,pi);
% Determine the initial seed points if( angle == 0 * pi) % Horizontal
seeds=[ 1:w; ones(1,w) ];
elseif( angle < 0.5 * pi ) % Diagonal seeds=cat(2,
[ 2:w; ones(1,w-1) ],
[ ones(1,h); 1:h ]);
elseif( angle == 0.5 * pi ) seeds=[ ones(1,h); 1:h ];
else
seeds=cat(2,
[ 2:w; ones(1,w-1)*h ],
[ ones(1,h); 1:h ]);
end
Code Segment 4: creating seedpoints.
Figure 5: Increment correction. Normalized vector (Left). Individually normalized increments (Right).
Code Segment 5 shows the preparation for the actual ray casting. In order to remove smaller discontinuities of the edge features we can use a morphological dilation to the edge map with a directed structure element shaped as a line with the same direction as the edge features. The sine and cosine of the angle (clamped to a half circle) is also precomputed to be used as an increment in the x and y dimensions to the actual ray casting. Since the increment now is normalized, i.e the increment vector created has a length of one (illustrated in Figure 5), some pixels will be visited more than once. This can be solved by denormalizing the increments as shown in Formula 4 and therefore always increment the ray by one in one dimension.
{
x=x / max(x , y) y=y /max (x , y)Formula 4: denormalization of increments
function [length_histogram length_image]=ray_cast(
angle, edges, seeds, max_line_length )
%% This code is for one edge map at the time i.e only
%% one angle at the time
%% out:
%% length_histogram= histogram of the length of
%% the edges
%% (for this angle)
%% length_image = An image where the intensity
%% value corresponds to the length of the
%% edge features each pixel belong to
%% in:
%% angle = angle in radians
%% edges = edge map
%% seeds = seed points
%% max_line_length = the maximum edge length.
%% This will be used to clamp the length.
% Dilate to connect lines
% The function lineSE creates a structuring element
% consisting of only a line
edges=imclose(edges, lineSE(9, angle) );
[w,h]=size(edges);
% Clamp the angle to a half circle angle=mod(angle, pi);
% Calculate the increment in both the x and y
% direction
y_inc=-cos(angle);
x_inc=sin(angle);
% Prevent the ray from visiting each pixel more than
% once
m=max(y_inc,x_inc);
y_inc/=m;
x_inc/=m;
length_histogram=zeros(max_line_length);
length_image=zeros(size(edges));
…
Code Segment 5: Ray casting part 1. (preparation)
The next step is to do the actual ray casting to calculate the edge length. This is shown in Code Segment 6. Each ray starts at a seed point and increments the position with the precomputed increment vector (from Code Segment 5) and counts every edge pixel sampled until a nonedge pixel is reached (where the count is reset). For each sampled pixel the count is stored to the length image and every time the count is reset (by reaching a nonedge pixel) the count should first be stored in the length histogram. This causes the length image to store the incremental line length for each pixel as shown in the two first steps in Figure 6.
...for( i=1:length(seeds) )
% Initialize the ray x=double(seeds(1,i));
y=double(seeds(2,i));
cumulative_length=0;
% While the ray is within the image
while( (1<=x) && (1<=y) && (x<w+1) && (y<h+1) )
% Discretize the ray coordinate fx=floor(x);
fy=floor(y);
x+=x_inc; % Move the ray forward y+=y_inc;
% if the ray finds an edge, then
% increment the edges found since the
% last non-edge pixel if( edges(fx, fy) == 1 )
cumulative_length+=1;
else if( cumulative_length>
max_line_length) cumulative_length=
max_line_length;
end
if( cumulative_length > 0 ) length_histogram(
cumulative_length)+=1;
end
cumulative_length=0;
end
length_image(fx,fy)=cumulative_length;
end ...
Code Segment 6: Ray casting part 2. (forward ray casting)
1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 4 0 0 0 0
0 1 0 0 0 0 2 0 0 0 0 2 0 0 0 0 4 0 0 0
0 0 1 0 0 → 0 0 3 0 0 → 0 0 4 0 0 → 0 0 4 0 0
0 0 0 1 0 0 0 0 4 0 0 0 0 4 0 0 0 0 4 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Figure 6: ray casting diagonally. Edge map (left). Length image for first ray cast (Middle left). Length image in the middle of the second ray cast (Middle right). Length image
complete (Right).
We now have the joint histogram completed, but in order to fully complete the length image we need to ray cast once more in the opposite direction as shown in the last two steps in Figure 6. This time all we need to do is to substitute each edge feature length in the length image with the total length of each feature and since we ray cast in the other direction now, that length is the first nonzero length followed by a zero length. The code for this can be seen at Code Segment 7.
...
% Reverse ray cast x-=x_inc;
y-=y_inc;
length=0;
while( (1<=x) && (1<=y) && (x<w+1) && (y<h+1) )
% Discretize the ray coordinate fx=floor(x);
fy=floor(y);
% Move the ray back x-=x_inc;
y-=y_inc;
if( length_image(fx,fy) > 0 ) if( length == 0 )
length=length_image(fx,fy);
end else
length = 0;
end
length_image(fx,fy)=length;
end
end % (end for the loop in Code Segment 6) Code Segment 7: Ray casting part 3. (backwards ray casting)
3.5 Selecting Edge Features
Here we choose the actual features that should be returned from the method which is shown in Code Segment 8. First we identify the most prominent angles in the image which we do by summing each direction column in the histogram and by choosing the
“winning” directions based on the biggest count. In other words: we choose the directions that are present in most edge features of the image. This direction selection should however be performed after thresholding the short noisier edge lines out from the histogram. Nevertheless using the edge length for more than the lower length thresholding just mentioned proved to be tricky and will be addressed in the result chapter.
The final step is to extract the final features which is done by simply selecting all pixel values with the chosen direction from the length image.
function features=select_features( histogram, images,
angle_list, lower_threshold, angle_count);
%% out:
%% features = the final features in the algorithm.
%% in:
%% histogram = two dimensional array where the
%% dimensions represent classified direction
%% and length
%% images = a cell list of edge images with length
%% angle_list = list of the used angles
%% lower_threshold = all lines shorter than the
%% lower_threshold will be ignored.
%% angle_count = how many directions will be used
%% for the final result. Must be smaller than
%% the total amount of directions classified.
% Find the dominant angles
histogram=histogram(:,lower_threshold:end);
channel_sum=sum(histogram');
top_channel_sum=sort(channel_sum);
top_channel_sum=top_channel_sum(end-angle_count:end);
index_list=[];
for( ind=1:length(top_channel_sum) )
if( any( channel_sum==top_channel_sum(ind) ) ) index_list=cat(2,index_list, ind);
end end
features=zeros(size(images{1}));
for( i=index_list )
length_histogram=histogram(i,:);
image=images{i};
angle=angle_list{i};
selected=zeros(size(image));
% Extract the spatial position of each
% edge pixel within the joint
% histogram selection
for( hist=length_histogram ) selected(image==hist)=1;
endselected=imdilate(logical(selected), logical(seLine(9, angle)) );
features+=selected;
end% Select feature points where the two or more edges
% are overlapping.
features=(features>1);
Code Segment 8: Final feature extraction
3.6 Expected Result
The primary expectation from this method is that it produces statistics of the edge features length and orientation that could be used to further label structures in the image. This is since we can assume that most man made objects have a repeated structure using the same or similar angles leading to systematic overrepresentation in the histogram. Hence we can select the structures that share the same or similar angles and have approximately the same length and label them as features of the same visual structure.
4 Results
The image data used to produce the results consists of, as previously mentioned, images that are publicly available image collections [18][19][20] together with some private images (for nonbuilding images). The angles are 16 evenly distributed angles (as presented in Table 3 and visualized in Figure 7) and the filter size used for the edge detection was 9 by 9 pixels. The results shown subsequently have been obtained using the image set seen in Figure 8 (unless otherwise stated) where the top row consist of images of buildings and the bottom row consist of images of nonbuilding scenes as a reference.
Figure 7: the used angles 16
Figure 8: Images used for result (unless other stated). Example image 1 (Top left).
Example image 2 (Top left). Example image 3 (Bottom right). Example image 4 (Bottom left).
The filter function shown in Code Segment 2 proved to require the correction since the sum of the filter cells was not 0 and therefore may cause the filter convolution to produce some minor unexpected results.
0.0 π 0.0625 π 0.125 π 0.1875 π
0.25 π 0.3125 π 0.375 π 0.4375 π
0.5 π 0.5625 π 0.625 π 0.6875 π
0.75 π 0.8125 π 0.875 π 0.9375 π
Table 4: Actual filter kernel
Table 4 shows the final 16 discrete filters used for the convolution. All angles in the table are written in radians and cover a full circle. Figure 9 shows a result of the dominant edge classification from step 2.
Figure 9: Detected edges. (Ignoring the edge directions)
In Figure 10 the joint histogram is shown, represented by an intensity map. The horizontal dimension of the intensity maps represent the directional samples, where the left side is at an angle of 0 radians and the right side is at an angle of
π
radians and the vertical dimension represent the edge feature length, where the top of the intensity map represent 1 pixel long features and the bottom represent 50 pixels.Figure 10 shows a clear pattern in the joint histograms of the images example 1 and example 2 (the ones with building scenes). High counts of lines can be seen at certain edge feature directions, together with a periodic pattern in the edge length direction. Neither the example 3 and or the example 4 images (the nonbuilding example images) show any significant pattern in the aspect of length, however they give a small response in aspect of the edge feature direction. A pattern visible in both the images of the building scenes and the nonbuilding scenes is that there is a spike of features at the edge feature length of 9.
Figure 10: Joint histogram truncated at edge length of 50 pixels. Example image 1 (Left). Example image 2 (Middle left). Example image 3 (Middle right). Example image 4
(Right)
Low resolved images proved to pose a problem (as shown in Figure 11) since the edge features have a high probability to be classified as noise by the static thresholding of line length. Such images are however a problem in most methods (if not all) since they contain too small amounts of data.
Figure 11: Low resolution problem. Original image (left). Detected edges (Right)
Another problem visible in Figure 12 occurs with buildings with to much decorations on the facade that will create excessive amounts of edges. Those edges will interfere the window edges which are the salient visual features which we intend to identify. This is however a problem due to the edge detection and somewhat outside of the scope of this work.
Figure 12: Facade clutter problem. Original (Left). Detected Edges (Right).
Another problem with this technique was due to the shift in perspective. Figure 13 shows that parallel edges of the same building have a different orientation due to the perspective shift which causes a wider spread of angles in the histogram. This effect is however mostly present when the surface projected onto the image is at an angle away from the camera (such as in Figure 13) and should therefore not cause too much problem.
Figure 13: Perspective offset problem (Example image 2). All detected edges (Top).
Edges with orientation 0.875 (Bottom left). Edges with orientation 0.975 (Bottom right).π π
5 Discussion
The method successfully extracts edges and stores them in the joint histogram, where the histograms shown in Figure 10 displays clear patterns. Figure 14 shows how the segmentation of the joint histogram could look, where first the short edge features are removed from the histogram (Middle left). The inferior angles, which are the angles that are less represented in the image, are later removed from the histogram (Middle right). The last step is to select only the edge features with a prominent length i.e peaks in the histogram (Right).
Figure 14: Example image 2 joint histogram with masking. Original joint histogram (Left). Shorter features masked (Middle left). Minor angles masked (Middle right). Periodic
mask (Right).
The algorithm is however very dependent on the edge detection in Step 1. This is since the algorithm cannot classify any edge features that was not found by the edge detector and with that said, the algorithm may require relatively heavy interaction from the user to get good results in the edge detection step.
As for the joint histogram selection in Step 4 the angle selection might not be optimal. The solution used for now is simply counting the edge features for each direction while discarding the length information and selecting the top populated orientations of the histogram. Using this solution creates a need for the knowledge of the amount of channels to use for the final feature selection.
The discrete nature of the image structure also causes some problems when ray casting. This problem arises since we are trying to fit a continuous ray into the discrete grid that is the image. For this purpose, the ray is discretized by sampling the ray at discrete intervals. When the ray is offset by a fraction of the discrete steps it would cause the discretized line to shift at some points (pixels) but not all. This could cause edge features in the image to be detected as dashed when the edge offset differ from the discretized line.
The rightmost figure in Figure 15 shows what could be this case, where the black pixels represent the pixels that the ray recognize the edge feature pixels, the dark gray pixels represent where the ray travels without recognizing any edge feature pixels and lastly, the light gray pixels represent the edge feature pixels where the ray do not traverse. This is nevertheless somewhat avoided by a morphological dilation before the ray casting step so that the edge features covers more pixels and thereby bypass the offset problem to some degree.
1 1 1
1 1 1
1 1 1
Figure 15: Line offset problem. Single digitized line (Left). Two overlapping lines (Right).
Problems also arise when the detected lines have a direction between two detection angles or if the edge features have somewhat of a wavelike structure. This will cause the detected feature orientation to be classified as both directions like a dashed line. The line length will also be affected by this dashed line effect possibly causing one long line to be classified as noise.
Possible ways to continue this work are to improve the selection criteria for the edge segment length in the histogram. The criteria currently employed is to simply threshold the shorter lines out of the selection. This could be done by using the periodic noise shown in the joint histogram.
Since the proposed algorithm works on discretized lines, there are cases when the detected edge direction is at the border between two different directions. This may cause the line to be detected as two different entwined dashed lines as Figure 16 shows.
Figure 16: the dashed edge problem with example image 1. Light and dark gray areas consist of different angles.
You could also between determining the dominant angle in step 2 and ray casting in step 3, swap those dashed edge features with a hybrid angle in between. This would cause to double the covered directions. This could be done by extending the ray casting with hybrid angles i.e given that we detect only two directions in step 2 we first ray cast in the two primary directions, but also in the hybrid directions using both edge maps.
Using an approach like this, both the ”dashed line” problem can be avoided and in step 2 only half of the angles needs to be filtered since the covered angles are doubled from the detected raw edges. However a possible drawback with this is that those hybrid edges may become stronger than the equivalent normal edges since two edge maps are used instead of one.
Something that may give promising results is to alter step 2 to instead of determining the dominant directions, just threshold the raw data. This would enable to label pixels as related to more than one edge direction and by that may give better result at edge features crossing each other. This improvement would be most prominent at edge features crossing at similar angles as seen in Figure 17 (keep in mind that the edges can be wider than the illustration shows and therefore would the effect be greater). This is since perpendicularly crossing edge features would give a small gap in one of the edges that should be covered by the dilation present in the ray casting at step 3. For this change to fit into the code some code at step 3 should have
Figure 17: intersection size difference at different angles
A pattern I found when viewing the dominant angles in the angle_index variable was that noisy regions such as trees or shrubbery had not one dominant angle but a clutter of different angles as you can see highlighted in Figure 18. This could be solved simply by thresholding the lines out by their length (since it is a clutter of short lines) but that could still leave some noise depending on different factors such as the used threshold or if many lines happen to line up still. Using a too large threshold could also cut feature lines out causing bad response. Due to this i figured another approach that could be used: to create a clutter map containing the density of the clutter.
Table 5: Clutter map idea. Example image 1 (Left column). Example image 2 (Right Column). Edges (Top). Clutter map (Second from the top). Clutter map with morphological
closing (Bottom).
The idea is an extension of the basic feature point extraction idea for this work where you apply a morphological dilation with a standard round structuring element for each directional edge (from angle_index, not from the raw edges) and sum them all together. This would produce a clutter map of how many edges from the various angles are present at a pixel to pixel level.
Table 5 (Middle row) shows the clutter map output together with a crudely more refined version (Bottom row) where a morphological closing using a large structuring element has been applied to the clutter map. The refined clutter map clearly shows that there is very high response at the noisy areas and none to a very low response at the points of interest so this may prove to be useful to use as a mask to mask away noise.
Figure 18: clutter zoom
Using the vanishing points of each line segment as a histogram criteria rather than just the absolute angles of the edge features could give better result since that takes the perspective shift into the calculations, which using only the absolute angle does not. If taking this approach one would have to merge the edge detection in step 1 and the angle classification in step 2 to detect edges in all directions at once (using a filter with similar properties as the filter shown in Figure 19). Step 3 would later have to detect the vanishing point and ray cast out from that vanishing locus at the different directions to determine the edge length, the different directions would however not be presented in the histogram.
1 2 1 0.0625 0.125 0.0625
2 4 2 0.125 0.25 0.125
1 2 1 0.0625 0.125 0.0625
16
Figure 19:discrete Laplace operator
This could give stronger response in the aspect of the angles and possibly the line length as well since the cast ray would be better in relation to the edge features. Hough transform, either the original approach by Hough [13] or the generalized approach by Duda and Hart [14] (used for circle detection but that may work somehow), could be used for the hybrid step 1 and 2 to determine the vanishing locus.
This could however cause some uncertain results when ray casting from the locus, since the ray would only have an optimal response at a very small interval that is, not too far away from the locus. This is since the rays would be very sparse at a far away point from the locus, and very close together when near the locus; since the rays would all visit the same pixels when too close to the locus (i.e since the neighborhood of a pixel is 8 pixels and the used angles in my code is 16, the pixels would be traversed by at least two rays at that point). Figure 7 illustrates this problem.
Using continuous angles instead of discrete could also be a different, but promising approach to continue this work. This has two impacts on the code i provided: step 1 and step 2 would have to be merged and step 3 would not have any realistic angles to cast at and would therefore have to find an alternative solution to the histogram creation such as ”manually” tracing (by flooding?) all edge features and calculating the angle between pixels.
The last way to continue this work that I can see would be to apply this method using a different color space (other than a gray scale image) and see the different responses. Interesting colorspaces that could be used for this would be the cylindrical coordinatebased color spaces using color hue (HSV, HSI, HSL or HSB) where the algorithm would be used upon the hue channel. The hue channel may however possibly need some kind of stabilization from the saturation channel since the hue is very unpredictable on low saturation. If this were to be used then the edge extraction at Step 1 could give better edge results.
6 Conclusions
I find the algorithm to perform relatively good, however it still does need some work to get it fully usable and running efficiently. Using discrete angles posed as a problem since some problems arise when trying to classify an edge as an angle, especially at shorter edge feature lengths.
The proposed method seems promising and may give good results if given the appropriate attention. In the Discussion some ideas on how to further develop this method is mentioned.