Subido por Alejandro

Extraction-of-residential-building-instances- 2018 ISPRS-Journal-of-Photogra

ISPRS Journal of Photogrammetry and Remote Sensing 144 (2018) 453–468
Contents lists available at ScienceDirect
ISPRS Journal of Photogrammetry and Remote Sensing
journal homepage: www.elsevier.com/locate/isprsjprs
Extraction of residential building instances in suburban areas from mobile
LiDAR data
Shaobo Xiab, Ruisheng Wanga,b,
a
b
T
⁎
School of Geographical Sciences, Guangzhou University, Guangzhou 510006, China
Department of Geomatics Engineering, University of Calgary, T2N 1N4, Canada
A R T I C LE I N FO
A B S T R A C T
Keywords:
Mobile LiDAR
Individual buildings
Hypotheses and selection
Point cloud segmentation
Shape prior
In the recent years, mobile LiDAR data has become an important data source for building mapping. However, it
is challenging to extract building instances in residential areas where buildings of different structures are closely
distributed and surrounded by cluttered objects such as vegetations. In this paper, we present a new “localization
then segmentation” framework to tackle these problems. First, a hypothesis and selection method is proposed to
localize buildings. Rectangle proposals which indicate building locations are generated using projections of
vertical walls obtained by region growing. The selection of rectangles is formulated as a constrained maximization problem, which is solved by linear programming. Then, point clouds are divided into groups, each of
which contains one building instance. A foreground-background segmentation method is then proposed to extract buildings from complex surroundings in each group. Based on the graph of points, an objective function
which integrates local geometric features and shape priors is minimized by the graph cuts. The experiments are
conducted in two large and complex scenes, Calgary and Kentucky residential areas. The completeness and
correctness of building localization in the former dataset are 87.2% and 91.34%, respectively. In the latter
dataset, the completeness and correctness of building localization are 100% and 96.3%, respectively. Based on
the tests, our binary segmentation method outperforms existing methods regarding the F1 measure. These results
demonstrate the feasibility and effectiveness of our framework in extracting instance-level residential buildings
from mobile LiDAR point clouds in suburban areas.
1. Introduction
Building extraction from various remote sensing data is of great
importance for many applications, such as cadastral surveying
(Rutzinger et al., 2011), 3D reconstruction (Chen et al., 2017), change
detection (Qin et al., 2015), urban analysis (Yu et al., 2010), energy
management (Cheng et al., 2018), and visualization (Deng et al., 2016).
Automatic or semi-automatic building extraction algorithms from aerial
(Cote and Saeedi, 2013) or satellite images (Ok et al., 2013) have been
widely studied in the past. These methods are useful in large-scale
mapping, but image resolutions and object occlusions limit the accuracy. Light Detection and Ranging (LiDAR) is an efficient and accurate
tool to acquire point clouds of object surfaces. In the work of Lafarge
et al. (2008), airborne LiDAR is used for building extraction. However,
similar to the results from optical images, buildings extracted from
airborne LiDAR are rooftops, and façade information is missing.
Mobile LiDAR which refers to vehicle-mounted laser scanning systems can acquire accurate and precise point clouds and has been widely
⁎
used in mapping the environment along roads (Guan et al., 2016).
Mobile LiDAR is good at collecting detailed façade information which is
an important supplement for the aerial data. In the recent years,
building extraction from mobile LiDAR point clouds has been studied in
several researches (Fan et al., 2014; Wang et al., 2016). However, there
still exist problems in extracting buildings in residential areas. Compared with large buildings in downtown areas, residential houses are
often smaller in size and with complex components such as porches.
Besides, residential buildings are often closely distributed, surrounded
by dense vegetations and cluttered objects, which also increase the
difficulty in extracting individual buildings from original point clouds.
In this paper, we propose a new framework for building instance
extraction from mobile LiDAR point clouds in suburban residential
areas where outer walls do not connect buildings. In our framework,
buildings are first localized, then points of each building are extracted
from surroundings. The main contributions of this paper are threefold:
(1) we propose a new “hypotheses and selection” method for independent building localization; (2) we propose a segmentation
Corresponding author at: Department of Geomatics Engineering, University of Calgary, T2N 1N4, Canada.
E-mail address: [email protected] (R. Wang).
https://doi.org/10.1016/j.isprsjprs.2018.08.009
Received 4 April 2018; Received in revised form 6 August 2018; Accepted 8 August 2018
Available online 22 August 2018
0924-2716/ © 2018 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS). Published by Elsevier B.V. All rights reserved.
ISPRS Journal of Photogrammetry and Remote Sensing 144 (2018) 453–468
S. Xia, R. Wang
(2013) propose an independent building detection method for MLS
point clouds. A histogram with the x-axis corresponding to the distance
along trajectory and y-axis corresponding to the number of points is
calculated. If there are gaps between neighboring buildings, the corresponding bin values in the histogram will be much lower than
building bins. Thus, adjacent houses can be separated by the gaps in the
histogram. However, it suffers from under-detection, over-detection,
and miss-detection of buildings. The assumption that buildings are
parallel to the trajectory may not always be correct in the real world.
Also, only building locations are detected by their method while
building points are not identified from surroundings.
As most of the building surfaces are flat, buildings can be detected
from MLS point clouds by identifying flat regions. Pu and Vosselman
(2009) apply region growing to segment mobile LiDAR point clouds and
recognize buildings based on prior knowledge such as the wall orientation. The wall flatness can also be observed at super-voxel scale.
Aijazi et al. (2013) propose a building recognition method based on
super-voxels. First, supervoxels of input point clouds are generated and
then neighboring voxels with similar properties (e.g. colors, intensity)
are merged into objects. After that, buildings are recognized by analyzing the shape of each object. Yang et al. (2015) proposes a hierarchical object extraction method based on multi-scale supervoxels
which are generated based on Euclidean distance, colors and eigenvalue-based features at two scales. These supervoxels are first grouped
into larger segments and then merged into meaningful objects based on
specific rules, such as geometric similarities between neighboring segments. This bottom-up grouping method can achieve good results when
the geometric features are accurate and predefined rules are correct,
but these conditions may not always be satisfied when dealing with the
real world point clouds. Similarly, the work of Wang et al. (2016)
presents a category-oriented grouping scheme based on voxels to group
point clouds into individual objects. They improve the geometric shape
labeling method for each segment, and different rules are applied to
merge segments with different shape properties. They also propose an
indicator named horizontal hollow ratio, the ratio of the projected area
of object points to the surrounded area of object contours, to recognize
individual building objects. However, this indicator does not work well
with low-rise buildings (Wang et al., 2016).
In summary, the existing methods have various problems in extracting buildings from mobile LiDAR point clouds. First, as buildings in
the real world vary in size, shapes and detailed structures (e.g. balcony,
porches), it is difficult to identify all building components with specific
rules (Yang et al., 2012; Gao and Yang, 2013; Fan et al., 2014). Second,
existing building extraction methods (Yang et al., 2012; Aijazi et al.,
2013; Wang et al., 2016) often follow a “segmentation then recognition” route to identify buildings. The accuracy of extraction is highly
restricted by the performance of scene segmentation which is still an
open problem, especially in complex scenes (Yang et al., 2015;
Golovinskiy et al., 2009; Xu et al., 2017). Third, most studies focus on
the extracting building regions from MLS point clouds, and few
methods are proposed to extract independent buildings. However, instance-level building extraction is critical for many applications such as
building reconstruction (Li et al., 2016) and land-use classification
(Kang et al., 2018). In this study, we aim at extracting building instances from dense residential areas in mobile LiDAR point clouds.
algorithm for separating buildings from overlapping objects by integrating local geometric features and shape priors; (3) our proposed
framework can achieve instance-level building extraction results in
dense suburban residential areas.
The paper is organized as follows. In Section 2, related works including building extraction methods and point clouds segmentation
methods are reviewed. In Section 3, the proposed framework is carefully illustrated. Experiments and analysis are in Section 4 and the
conclusion is given in Section 5.
2. Related works
In this section, building extraction methods from remote sensing
data with a focus on mobile LiDAR point clouds are reviewed. As
building points extraction can also be viewed as a specific case of point
clouds segmentation which aims at partitioning input data into individual objects, existing LiDAR point clouds segmentation methods are
also discussed in this section.
2.1. Building extraction methods
Building extraction has been widely studied with various data
sources, including remote sensing images, airborne LiDAR data, and
mobile LiDAR point clouds. When dealing with 2D images, corners and
edges are often detected first, which are used to construct object
boundaries. For example, if a closed outline consisting of corners in the
images maintains a reasonable size, its covered areas will be recognized
as one building instance (Cote and Saeedi, 2013). Similarly, building
areas can be extracted by using rectangular outline models which are
generated by edges in the aerial images (Lin and Nevatia, 1998). Xiao
et al. (2012) first detect walls from oblique airborne images. By coupling adjacent walls, many building hypotheses are generated, which
are then verified by checking the elevation discontinuity of roofs. The
elevation discontinuity around roof boundaries can also be used to
distinguish between buildings and adjacent vegetation areas (Liu et al.,
2013). Yang et al. (2013) propose a building extraction method for
airborne LiDAR point clouds. In this work, buildings are approximated
by cubes and detected by optimizing the cube configurations using the
Reversible Jump Markov Chain Monte Carlo (RJMCMC). In the work of
Awrangjeb et al. (2010), two kinds of aerial data, LiDAR data and multispectral imagery are combined to detect buildings in the residential
areas. Building masks are obtained from LiDAR data, and line segments
detected from images are used to indicate building boundaries. Finally,
buildings are detected by forming rectangles using neighboring line
segments. In general, most building extraction methods for aerial data
focus on utilizing roof boundary information and solving the extraction
problem via identifying regular shapes from the input data.
Compared with airborne remote sensing data, mobile LiDAR mainly
acquires detailed façade information instead of the rooftops. Therefore,
the key to building extraction from mobile LiDAR is utilizing the wall
information. In the work of Hernández and Marcotegui (2009), 2D
images are created by projecting 3D LiDAR point clouds onto horizontal
grids, and grids with a large number of accumulated points are identified as building areas. Similarly, in the work of Yang et al. (2012), 2D
geo-referenced feature images are first generated and classical edge
detection methods like Canny edge detector are applied to extract object contours, then buildings are identified by analyzing the contour
shapes. It should be pointed out that these methods only concern the
building regions in MLS point clouds and the extraction of independent
building instances is not discussed.
Some building extraction methods are based on the spatial distribution patterns of buildings. For example, Fan et al. (2014) divide the
input point clouds into three layers and group points of each layer into
clusters. They assume that building points are consecutive in the vertical direction. Thus, if there exist clusters in the same location at each
layer, these clusters will be extracted as building objects. Gao and Yang
2.2. Point clouds segmentation methods
Segmenting unorganized, incomplete and uneven mobile LiDAR
point clouds into individual objects is a challenging task. By using 2D
grids (Serna and Marcotegui, 2014), point clouds can be segmented by
2D image processing methods such as morphological operations.
However, 3D information will be missing, and it is also difficult in
partitioning overlapping objects. In most cases, segmentation methods
dealing with the original 3D point clouds are preferred.
Region growing is popular in LiDAR point clouds segmentation
454
ISPRS Journal of Photogrammetry and Remote Sensing 144 (2018) 453–468
S. Xia, R. Wang
between different objects, ground is first removed by a ground filtering
algorithm (Zhang et al., 2016).
(Belton and Lichti, 2006). In general, it begins by initializing seed
points and then groups adjacent points into one cluster according to
some criteria. For example, if normal differences between two points
are smaller than a predefined threshold, they will be grouped into one
cluster (Nurunnabi et al., 2016). However, region growing is often
sensitive to thresholds and relies on the accuracy of local features (Che
and Olsen, 2018). Euclidean clustering is a widely used algorithm for
point clouds segmentation (Klasing et al., 2008). It groups points into
one cluster if the distance between neighboring points is no larger than
a given threshold. Its distance threshold is difficult to determine, and
this method also fails in separating closing or overlapping objects.
Maalek et al. (2018) present a clustering method to segment linear and
planar structures in point clouds. In their method, points are first
classified based on features derived from robust principle component
analysis (PCA) on local points, and then grouped based on the feature
similarity between adjacent points. In the work of Xu et al. (2017), a
hierarchical clustering algorithm is proposed, which measures the similarity between points based on Euclidean distances and normal differences. The clustering results are optimized by solving the bipartite
matching. One important and common attribute of these methods is
that further processing such as rule-based segments merging is a must
to achieve the object-level segmentation results. However, merging
segments into instances is difficult because of the object complexity and
diversity in the real world.
Several studies focus on the optimal segmentation of foreground
and background objects in point clouds. Golovinskiy and Funkhouser
(2009) presents a foreground and background segmentation algorithm
called MinCut. First, a k-NN (k nearest neighbor) graph of original point
clouds is built. Second, an objective function consisting of data term
and smoothness term is designed. The data term for the foreground
label is set to a constant, and for the background label is calculated
based on the distances between points to the user-defined foreground
center. Finally, the objective function on k-NN graph is optimized by
max-flow/min-cut (Boykov et al., 2001). The MinCut often requires the
input of foreground center and the object radius manually. It is also
prone to output over-segmentation and under-segmentation results
(Zheng et al., 2017). In the work of Yu et al. (2015), an extended
normalized cut algorithm termed as 3DNCut for the segmentation of
mobile LiDAR point clouds is proposed. A group of points is first voxelized, and a graph consists of non-empty voxels are built, then the
graph is partitioned into two independent segments by spectral clustering method (Shi and Malik, 2000). The 3DNCut is a promising tool
for binary segmentation of point clouds. But, there are two limitations
when applying this method to mobile LiDAR point clouds. The first is
the number of objects in the input data should be known in advance.
The second is the normalized cut tends to partition input data into two
segments with similar size, which may not be right in the real world
(Zheng et al., 2017).
In summary, current segmentation methods for mobile LiDAR point
clouds mainly relay on local geometric properties of points including
neighboring distances, local features such as normal differences.
However, these local geometric features are not reliable especially in
uneven point clouds and also highly depended on neighborhood size
(Weinmann et al., 2015). Besides, existing methods often requires
specific rules and manual inputs, and the performance of overlapping
object partition still needs to be improved.
3.1. Building localization
Although buildings vary in shape and sizes, most buildings have two
dominant directions and can be approximately represented by
bounding rectangles according to the building footprint analysis (Zhang
et al., 2006; Liqiang et al., 2013). Based on this fact, we propose a new
method to detect buildings from MLS point clouds, which mainly consists of three steps. In the first step, vertical wall segments are detected
by region growing. Then, wall segments are projected horizontally and
used to construct 2D rectangles. Finally, a subset of rectangle hypotheses is selected to indicate building locations. In a word, we aim at
localizing buildings via detecting 2D rectangles from projected MLS
point clouds.
3.1.1. Rectangle hypothesis
Region growing with an angle threshold θ (e.g. 5.0°) is used to
segment the non-ground point clouds into groups. To improve the
program efficiency, segments containing points less than 50 points will
be not be saved. The removing threshold is empirically selected and has
little effect on the following steps as only large segments will be used
for the rectangle generation. To find potential vertical walls, two rules
are applied to filter out non-wall objects like fences and vehicles. First,
only planes with their normals directing horizontally will be kept, i.e., if
the dot product between a plane normal and (0, 0, 1) is larger than 0.05,
the segment will be removed. Second, wall segments should maintain a
minimum height over the ground and should be large enough.
Concretely, the minimum segment-to-ground height Hm and the
minimal segment length Lmin are used to filter out non-wall segments.
Hm equals to the distance from the nearest ground point to the highest
point in the segment. In this paper, Hm and Lmin are both set to 2.0 m.
Finally, the remaining segments are recognized as potential walls.
To generate rectangle guesses, potential walls are projected onto the
ground as 2D line segments. Supposing there are n line segments and
n
every two line segments can generate one rectangle, it will result in
2
guesses which are huge. However, some pairs of line segments cannot
make up a rectangle. First, line segment pairs which are not in parallel
and perpendicular relationships are discarded. Second, collinear line
segments such as Fig. 1(a) and intersected lines such as Fig. 1(b) are
also not used to generate guesses. Fig. 1(c) and (d) give two examples of
correct rectangle guesses. It should be noticed that if two selected line
segments are not strictly parallel or perpendicular, a compulsive adjustment of included angles between lines will be applied. In this case,
the longer line segment will remain unchanged while the shorter one
will be adjusted. The number of guesses will be further reduced by
removing rectangles which are too small or too large to form a building
outline. Given a rectangle, if its width is below Bmin (e.g. 5.0 m) or its
length is over Bmax (e.g. 30.0 m), this rectangle will be discarded. Besides, if the width to length ratio of one rectangle is smaller than a
threshold Rwl (e.g. 0.3), this outline proposal will also be removed. After
n
filtering out unqualified hypotheses, only 5% of
are kept in most
2
cases.
()
()
3.1.2. Rectangle selection
The problem is turned into selecting a subset of rectangles which
stand for coarse building outlines. A binary variable x i is defined to
indicate whether the ith hypothesized rectangle is selected ( x i = 1) or
not ( x i = 0 ). For a given line segment t, if the minimal distance between
its middle point to the four edges of ith rectangle is smaller than a
threshold Rectin (e.g. 0.5 m), the line segment t will be identified as an
inlier of ith guess or called as covered by ith rectangle. Similarly, a
variable yt is introduced to indicate whether the tth line segment is
covered by some rectangles ( yt = 1) or not ( yt = 0 ).
3. The method
We propose a new framework for extracting building instances from
mobile LiDAR point clouds. It mainly consists of three steps. First, independent buildings are localized using clues of walls. Then, original
point clouds are divided into groups, each of which contains only one
potential building. Finally, building points in each group are extracted
by a newly proposed foreground/background segmentation method. To
reduce the computational burden and eliminate the connectivity
455
ISPRS Journal of Photogrammetry and Remote Sensing 144 (2018) 453–468
S. Xia, R. Wang
(Awrangjeb et al., 2010). Except for the constraints on building distribution, we also want to select rectangles that cover as many wall
points as possible. For example, no rectangle covers the wall points of
the right building in Fig. 2(c) if the coverage on data is not considered.
To this end, the constraint can be written as ∑i ⊢ t x i ⩾ yt where i ⊢ t
means rectangle i covers wall segment t. The consequence of adding this
rule is that yt must be zero if no rectangle that covers line segment yt is
selected based on the constraint 0 ⩾ yt .
By combining constraints mentioned above, the objective function
can be formed as Eq. (1), which accumulates the variable yt weighted by
the projected length of wall segment Lt . The Nl is the number of wall
segments. In general, by maximizing the objective function with constraints, we prefer to select independent rectangles which cover as
many wall segments as possible. This is a typical linear programming
problem and can be solved by software like Gurobi (Gurobi
Optimization, 2016). An example of the optimal rectangle configuration is given in Fig. 2(d).
Nl
Maximize
∑ yt ·Lt .
Subject to
∀ {i , j}, ai, j > 0: x i + x j ⩽ 1
t=0
Fig. 1. From line segments to rectangles. Proposed rectangles are formed by
dashed red lines. (a) Collinear line segments. (b) Intersected lines. (c) A rectangle formed by two perpendicular line segments. (d) A rectangle formed by
two parallel lines. (For interpretation of the references to colour in this figure
legend, the reader is referred to the web version of this article.)
∀ {i, j}, Di, j > Dn : x i + x j ⩽ 1
∑ xi ⩾ yt
i⊢t
x i , yt ∈ {0, 1}
(1)
Fig. 3 gives a step-by-step example of building localization in a
scene consisting of four individual buildings. The input of our method is
non-ground point clouds, such as Fig. 3(a). The point clouds are first
segmented into planes, and only vertical walls are kept, such as
Fig. 3(b). There exist 25 wall segments in this demo. Then 44 potential
rectangles are generated using the projections of these walls, such as
Fig. 3(c). By selecting a subset of hypothesized rectangles, building
locations can be estimated, for example, four rectangles corresponding
to four houses are selected as shown in Fig. 3(d). It should be pointed
out that fences in the rightmost building are not covered by the selected
rectangle. The rectangle is used to find main walls of one building, not
construct accurate building outlines.
3.2. Dividing individual buildings into groups
After building localization, non-ground points can be divided into
groups, each of which contains only one building instance. First, the
Euclidean clustering algorithm with a distance threshold (e.g. 1.0 m) is
applied to group non-ground points into clusters. Some isolated buildings will be grouped into individual cluster while adjacent buildings
may be merged into one cluster. For example, points in Fig. 4(a) are all
merged into one cluster after clustering due to the grasses in the green
circle. To tackle this problem, similar to the method in Gao and Yang
(2013), a profile of point numbers along two rectangles can be built as
illustrated in Fig. 4(b) and the building cluster can be divided at the
lowest bin in this profile as indicated by the red arrow. To reduce the
noise from non-building points, only points from planar segments are
counted in the profile. Fig. 4(c) shows the grouping results guided by
localization rectangles. As the number and locations of potential
buildings are known, the drawbacks of trajectory-based method (Gao
and Yang, 2013) such as over-detection and miss-detection are avoided
in this step. Finally, the problem of building extraction is turned into
how to segment building points from surrounding. For example,
Fig. 4(d) shows an example of the final building extraction results. The
used segmentation method will be illustrated in the following sections.
Fig. 2. Optimal selection of rectangles. Black line segments stand for walls and
dashed rectangles are generated guesses. (a) Overlapped rectangles (blue)
cannot be selected simultaneously. (b) Two close rectangles (blue) are rejected
and the red one is selected. (c) The building on the right is missed. (d) The
optimal selection of rectangles. (For interpretation of the references to colour in
this figure legend, the reader is referred to the web version of this article.)
Since buildings in the real world always distribute at intervals, no
building overlaps are allowed. Based on these facts, two constraints are
introduced. The first is the overlapping constraint. The overlapping
area ai, j between ith and jth rectangles is calculated. If ai, j > 0, i th and
jth rectangles cannot coexist, i.e., x i + x j ⩽ 1. For example, in Fig. 2(a),
two blue rectangle proposals are overlapped, and they cannot be selected simultaneously. Also, buildings should not be too close to each
other. Thus the second constraint controls the minimal distance Di, j
between parallel edges of adjacent rectangle i and j. If the Di, j is smaller
than the predefined threshold Dn (e.g., 2.0 m), ith and jth rectangles
cannot be selected simultaneously, i.e., x i + x j ⩽ 1. For example, in
Fig. 2(b), two blue rectangles are very close to each other, thus they
cannot be selected simultaneously. This rule can reduce the multiple
detection errors, i.e., one building is detected by multiple rectangles
3.3. Segmentation-based building extraction
After dividing buildings into individual groups, buildings need to be
further extracted from surroundings. This problem can be viewed as a
456
ISPRS Journal of Photogrammetry and Remote Sensing 144 (2018) 453–468
S. Xia, R. Wang
Fig. 3. An example of building localization. (a) Non-ground point clouds. (b) Segmented wall segments. (c) Hypothesized rectangles are colored randomly. (d)
Selected rectangles in red color. Wire-frames derived from rectangles are added for visualization.
data term which accumulates the cost Di (li ) of assigning foreground/
background label to the points i. ∑(i, j) ∈ N Vij (li , l j ) is the smoothness
term and it accumulates the penalty of the label difference between
neighboring point i and j in the k-NN graph. The coefficient β is a nonnegative number to balance these two terms. This function can be optimized by min-cut/max-flow algorithms (Boykov et al., 2001). In this
study, the data term is set according to the geometric distributions of
local points. The smoothness term focuses on the similarity between
neighbors. Besides, we extend the model in Eq. (2) by adding a shape
term which considers the shape priors derived from planar segments.
3.3.1. Data term
According to Boykov and Jolly (2001), the data term should be set
based on prior information of background/foreground, which can either be defined beforehand or modeled based on seeds. For example, a
constant value is adopted to penalize the foreground in Min-Cut
(Golovinskiy and Funkhouser, 2009). Gaussian mixture models (GMMs)
learned from seeds are applied to predicate the penalties for background and foreground in Rother et al. (2004). Specifically, the “foreground” refers to buildings and the “background” mainly consists of
cluttered vegetations in our problem.
The prior knowledge adopted here is that the buildings mainly
consist of flat regions. To predict the data cost, three eigenvalue-based
indexes linear ( f1D ), planar ( f2D ) and volumetric ( f3D ) features
(Demantké et al., 2011; Yang et al., 2015) are calculated as follows,
Fig. 4. An example of dividing individual buildings into clusters. (a) Points are
merged into one cluster by Euclidean clustering. (b) A histogram of point
number. Red rectangles indicate the building localization results. (c) Point
grouping results based on (b). (d) Illustration of building extraction from each
group. Points in black and blue indicate two separated buildings. Non-building
points are colored green. Wire-frames derived from rectangles are added for
visualization. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
f1D =
foreground/background segmentation task and formulated as an energy
minimization problem (Rother et al., 2004; Boykov and Kolmogorov,
2004). To this end, a k-NN graph (e.g. k = 10 ) of points is first constructed then the general form of the energy model can be written as,
E (l) = ∑ Di (li ) + β
i∈P
∑
(i, j ) ∈ N
λ1 − λ2
λ1
,
f2D =
λ2 − λ3
λ1
,
f3D =
λ3
λ1
(3)
where λ1, λ2 and λ3 (λ1 ⩾ λ2 ⩾ λ3 > 0 ) are three eigenvalues derived
from PCA. f1D reaches a high value in linear structures such as branches
and railings. f2D measures the local planarity. It will be large in flat
regions such as walls. f3D measures the degree of the volumetric distribution of local points. It will be large in scattered regions like vegetations. Typically, local points with the large value of f2D indicate a
high possibility of being labeled as building. Points with a large value of
f1D or f3D often locate in non-building areas such as vegetations. The
penalty for labeling one point as “foreground” (building) or “background” (non-building) is defined in Eq. (4). Fig. 5 gives an example of
Vij (li , l j ),
(2)
where P is the input point set and N is the neighbor system in the kNN graph. li indicates the label of point i. li equals 1 when the point i is a
foreground point (building point), otherwise equals 0. ∑i ∈ P Di (li ) is the
457
ISPRS Journal of Photogrammetry and Remote Sensing 144 (2018) 453–468
S. Xia, R. Wang
Fig. 5. Example of planar ( f2D ) and max {f1D , f3D } . (a) Original point clouds of the building, tree and bushes. (b) The values of f2D . (c) The values of max {f1D , f3D } .
Fig. 6. A 2D example of anchor points. Normals at anchor points may not be accurate.
widely studied. In Freedman and Zhang (2005), the distances between
pixels and the shape template of the foreground object are introduced
as shape prior. The penalty for discontinuity of two pixels should be low
if they are close to the object contour, and will become large for pixels
away from the shape template. The minimization of energy function
will facilitate boundary-preserving segmentations. Veksler (2008) proposes a general way termed as star-shape to incorporate shape prior
based on the generic shape properties, i. e ., if the center of the foreground object is known as c and one point labeled as foreground is p, all
points on the line segment with endpoints c and p should also be
marked as foreground.
Shape prior from planar segments. Planar segments have been retrieved in Section 3.1.1 by region growing. Most of these segments are
from building structures, which can be used to set shape constraints.
However, for the walls adjacent to vegetations, the normal vectors may
deviate from the true normals. Thus, parts of walls will be missing after
region growing. A 2D example is given in Fig. 6. Based on the human’s
perception, there exists a potential linear structure consisting of red and
blue points, and two cluttered groups (green points). After running
region growing, red points are grouped into one segment. The growing
stops as the angle difference between point A and B is larger than θ ,
although the blue points in Fig. 6 may also be regarded as parts of the
linear structure.
To find the missing potential structures, a set of points termed as
“anchor points” are selected from the non-planar points. A non-planar
point i will be selected as “anchor points” if it meets two standards: (1)
the distance between point i and a planar surface ps is no larger than dt
(e.g. 0.05 m), (2) the angle difference between the normal at pi and the
surface normal of ps is no larger than 3θ . For example, in Fig. 6, blue
points will be selected as anchor points. Finally, the points of planar
segments and the anchor points are combined into one point set S
which is termed as the “shape set” in this paper.
Shape prior in energy function. Fig. 7 gives an example of how to
f2D and max {f1D , f3D } behaviors in the real data.
(1−f2D ) ∗max {f1D , f3D } li = 1,
Di (li ) = ⎧
⎨
⎩ (1−max {f1D , f3D }) ∗f2D li = 0.
(4)
3.3.2. Smoothness term
The second term in Eq. (2) accumulates the penalty on the label
inconsistency of neighboring points. This term can be calculated as
li = l j ,
⎧0
Vij (li , l j ) =
{
dij
⎨ exp − min(d , d )
i j
⎩
} l ≠ l.
i
j
(5)
where dij equals to the Euclidean distance between point i and j. d
corresponds to the mean distance between one point and its neighbors.
{
dij
exp − min(d , d )
i
j
} weights the distance between two neighboring points
and min(di , dj ) is used to normalize the distance values. The closer two
points are, the higher penalty will be imposed for the label disagreement.
3.3.3. Implementing shape prior
Point cloud segmentation based on geometric properties in the local
neighborhood is not reliable. First, mobile LiDAR point clouds are often
noisy, incomplete, and uneven, thus features derived from PCA are
prone to be affected. Also, the selection of neighborhood size will also
affect the performance of local geometrical features (Demantké et al.,
2011). Second, scenes in the real world are complicated and cluttered.
Objects are often close to each other or even overlapped, which make it
impossible to find boundaries between neighboring objects only by
local features.
A possible solution for these problems is introducing high-level
geometrical constraints to assist segmentation. In the recent years,
object extraction from 2D images with geometric constraints has been
458
ISPRS Journal of Photogrammetry and Remote Sensing 144 (2018) 453–468
S. Xia, R. Wang
Fig. 7. A 2D example of segmentation on a k-NN
graph. Black dashed line: a cut without shape prior.
Red dashed line: a cut with shape prior. The thickness of edge indicates the weight derived from shape
priors. Thicker edge has more weights. (For interpretation of the references to colour in this figure
legend, the reader is referred to the web version of
this article.)
arrow. Anchor points may also come from the non-building objects (e.g.
blue points in the bush). After minimizing the energy function in Eq.
(8), a better segmentation result is achieved as shown in Fig. 8(c).
extract the bottom linear segment from adjacent cluttered groups. First,
a k-NN graph is built. By minimizing the energy function Eq. (2) which
does not consider the shape prior, a cut may be achieved like Cut A in
Fig. 7. However, a preferred cut is Cut B which divides the whole graph
into two parts where clutters and the linear segment are well separated.
In the favorite cut, cluttered points (e.g. point A) may be labeled as
foreground and anchor points may also be labeled as background (e.g.
point B), which depend on the distance between points and evidence
from neighbors.
To achieve the optimized cut with shape priors, we propose another
term Sij (li , l j ) .
Sij (li , l j ) =
4. Experiments and analysis
4.1. Datasets
Two datasets of residential areas acquired by different mobile
LiDAR systems are used for experiments. The RIEGL VMX-450 System
collects LiDAR point clouds in a typical residential area in Calgary,
Canada. This dataset contains more than 340 million points. It covers a
rectangular region whose length is around 1200 m and width is close to
210 m. The scene is complex and contains buildings, vegetations, vehicles, power-lines, pole lights and other objects (e.g. pedestrians). The
Optech Lynx mobile mapper V200 acquires about 53 million points
LiDAR points in a residential region of Kentucky, USA. The shape of this
region is irregular, and its bounding box is about 370 m in length and
140 m in width. The ground truth for building localization and extraction is obtained manually. Specifically, as buildings in MLS point
clouds are often incomplete, a valid building should contain at least one
wall segment with a height over 2.0 m. Thus, 375 buildings in the
Calgary dataset and 78 buildings in the Kentucky dataset are regarded
as ground truth. Before further processing, the ground filtering algorithm (Zhang et al., 2016) is applied to classify the original data into
ground and non-ground point sets. For convenience, the Calgary dataset
is split into 17 subsets, and each subset contains around 20 million
points. The results of all subsets are combined after processing. The
proposed algorithms are implemented in C++, and the experiments
are conducted on a computer with Intel Core i7-6700 3.4-GHz CPU.
Currently, no parallel programming strategy is adopted in our implementation.
li = l j ,
⎧0
i+j
⎨ϕ 2
⎩
( )
l i ≠ l j.
(6)
i+j
In this term, the penalty for neighboring label inconsistence is ϕ ( 2 )
which is defined in Eq. (7) and always returns a non-negative value. pi
and pj stand for positions (coordinates) of point i and j.
(
p +p
)
⎧ Dist−1 i j
i, j ∈ S ,
2
⎪
⎪
i+j
pi + pj
⎞=
⎞⎫
ϕ⎛
⎧ Dist ⎛
⎝ 2 ⎠ ⎨1.0−exp − ⎝ 2 ⎠ otherwise.
⎪
dt
⎨
⎬
⎪
⎩
⎭
⎩
) returns the positive distance between
) is the inversed Dist ( ). According to Eq. (7),
In this equation, the Dist
(
(7)
pi + pj
2
the mean point of two neighboring points i, j and the nearest planar
segment. Dist−1
(
pi + pj
2
pi + pj
2
if two adjacent points are both in the shape set S , the penalty will be
large which means they are more likely to share a same label, either
foreground or background. Otherwise, the label inconsistency penalty
will be small if two adjacent points are close to the planar segments.
The penalty will increase to a large value if points are far from any
planar segments. As a result, edges near the planar structures contain
less weights and are more easily to be removed (cut). For example, in
Fig. 7, the edges near the linear segment are much thinner than edges
along the segment and edges far from the segment. Finally, the energy
function with shape priors can be rewritten as Eq. (8) and weighted by
γ . The last term derived from shape priors is also called the “shape
term” in this paper. The final objective function can be optimized by the
graph cuts (Kolmogorov and Zabin, 2004).
E (l) = ∑ Di (li ) + β
i∈P
∑
(i, j ) ∈ N
Vij (li , l j ) + γ
∑
(i, j ) ∈ N
4.2. Building localization
4.2.1. Results and analysis
There are several parameters and thresholds in the building detection step. Based on our test, these values are empirically set and fixed in
all the experiments. Table 1 summaries the eight parameters for
building detection. Most of the executing time for building localization
is spent in the region growing step which costs nearly one hour for the
whole scene. In contrast, the total time for the rectangle generation and
selection is about 150 s. Take one subset as an example, 141 vertical
wall segments are detected after region growing, then 520 rectangle
proposals are generated. Finally, there are 661 variables and 17863
constraints in Eq. (1) and 26 rectangles are selected. The time spent for
rectangle generation and selection is around 11 s.
The overviews of original point clouds overlapped by detected
building rectangles are drawn in Figs. 9 and 10. The correctly selected
rectangles are colored in red. Detected non-building objects and multiple detections are colored in blue. The missing buildings are indicated
by black rectangles. According to these results, the building instance
localization has better performance in Kentucky dataset than Calgary
dataset. We further give some detailed examples of building
Sij (li , l j )
(8)
Point clouds in Fig. 8 are used to show the anchor points and the importance of the shape prior. By minimizing the energy function in Eq.
(2), the segmentation results are given in Fig. 8(a). Some points of wall
and stairs are mislabeled as non-buildings. In Fig. 8(b), points from
region growing are colored red, anchor points are colored blue, and the
remaining points are colored green. In this example, some planar segments, such as parts of roofs indicated by the red arrow, may be missed
because of the inaccurate normal estimations and small sizes of segments. Also, points near the contours may not be contained in planar
regions, such as the regions near the window indicated by the black
459
ISPRS Journal of Photogrammetry and Remote Sensing 144 (2018) 453–468
S. Xia, R. Wang
Fig. 8. A 3D example of segmentation. (a) Segmentation without shape prior. (b) Points (red) from all segments after region growing, anchor points (blue) and other
points (green). (c) Segmentation with shape prior. β = 1.0, γ = 1.0 . (For interpretation of the references to colour in this figure legend, the reader is referred to the
web version of this article.)
multiple detections by analyzing connectivity between detected buildings. However, due to the complexity of the real world, there are no
universal solutions and the multiple detections are difficult to avoid.
There are three false detections in Fig. 11(c). Two small blue rectangles
belong to the multiple detections while the largest blue rectangle covers
non-building objects, i.e., bare ground mainly. This rectangle is formed
by wall segments from two different houses. This kind of false detection
may be revised in the post-processing by analyzing the point distribution within the region indicated by the localized rectangle. In some
cases, the false and miss detections are related. For example, if a large
rectangle that covers two building instances is falsely selected, the
correct rectangles of the two buildings will not be selected as they have
large overlaps with the large one. This problem may be alleviated by
reducing some thresholds such as the minimum rectangle width.
However, changing thresholds may only solve some specific issues but
increase the probability of multiple detections.
Table 1
Parameter description and settings for building localization.
Notation
Description
Settings
θ
Hmin
Lmin
Bmin
Bmax
Rwl
Dn
Rectin
Angle threshold in region growing
Minimal wall height
Minimal wall length
Minimal building width
Maximal building length
Minimal width to length ratio of a rectangle
Minimal distance between rectangles
Threshold for rectangle inliers
5°
2.0 m
2.0 m
5.0 m
30.0 m
0.3
2.0 m
0.5 m
localization from Calgary dataset in Fig. 11.
In Fig. 11(a), all buildings are correctly localized by rectangles. In
the right middle of Fig. 11(b), one small building indicated by a black
rectangle is missed by our method. In this case, only one side of this
building is scanned by MLS. Thus no rectangle for this building can be
formed with only one wall segment. In fact, the insufficiency of wall
segments is the main reason of miss detection. The problem of wall
missing is caused by occlusions, and some occlusions can be avoided by
using multiple scans from different view points. There exists two blue
rectangles in Fig. 11(b). They both belong to the multiple detection
errors. This problem may occur when the building has complex shapes,
which can be solved by post-processing methods such as removing
4.2.2. Quantitative analysis and comparisons
To quantitatively evaluate the performance of the proposed building
detection algorithm, the completeness and correctness are defined in
Eq. (9),
Completness =
TP
,
TP + FN
Correctness =
TP
.
TP + FP
(9)
True positive (TP) is the number of buildings which are localized by our
Fig. 9. Overview of building localization
results in Calgary residential dataset.
Ground are colored yellow, non-ground
points are colored by height (a blue low to a
high red). Red rectangles are correctly detected buildings. Blue rectangles are falsely
detected buildings. Black rectangles indicate
the location of undetected buildings. (For
interpretation of the references to colour in
this figure legend, the reader is referred to
the web version of this article.)
460
ISPRS Journal of Photogrammetry and Remote Sensing 144 (2018) 453–468
S. Xia, R. Wang
Fig. 10. Overview of building localization results in Kentucky residential dataset. The color scheme is the same as that in Fig. 9. (For interpretation of the references
to colour in this figure legend, the reader is referred to the web version of this article.)
and other rectangles are counted as FP. False negative (FN) is the
number of buildings which are missed by our method. In the Calgary
dataset, 358 rectangles are detected by our method. The number of FP
is 31, and the number of FN is 48. Therefore, the completeness of our
method is 87.2%, and the correctness is 91.34%. In the Kentucky dataset, 81 rectangles are selected by our method, three of them belong to
false positive. Thus, the completeness in this dataset is 100% and the
correctness is 96.3%. The main reason for better performance in Kentucky dataset lies in the different scene complexities. Compared with
Calgary dataset, there are much fewer vegetations in the Kentucky
dataset, which results in complete building walls due to fewer occlusions.
In the recent years, few methods are proposed to localize buildings
in mobile LiDAR point clouds. For example, Fan et al. (2014) report that
32 out of 46 (69.57%) buildings are detected by their method in the
residential scene. The detection rate is relatively low compared with
other methods. In the work of Gao and Yang (2013), the average
completeness and correctness of their proposed building detection
method is 86.46% and 91.41%, respectively. However, their method
requires accurate trajectory, and only buildings that are parallel and
close to the trajectory can be detected. A more critical problem in Gao
and Yang (2013) is that their method will falsely detect adjacent
buildings as one object if there are trees or shrubs between them. This
problem rarely happens in their test data, but in many residential areas
such as our test data, scenes of flourishing vegetations between buildings are common. Besides, it is difficult for their method to distinguish
buildings from large trees, which also makes their method infeasible in
vegetation lush residential areas.
Wang et al. (2016) test their method in a typical urban scene with
192 buildings, and the average completeness is 94.7%, and correctness
is 91%. Specifically, the low-rise building’s completeness is 86.3% according to their paper. It should be pointed out that only buildings
containing at least two vertical walls are counted as ground truth in
their evaluation. In fact, if the same criterion is used in our dataset, only
341 buildings are identified as ground truth, and thus the completeness
of our method increases to 95.89%. In their research (Wang et al.,
2016), the independent buildings are detected by analyzing the horizontal hollow of projected point clouds. The buildings are said to
maintain a much lower horizontal hollow ratio than other objects as
roofs of buildings are often missed during scanning. However, this rule
works badly in residential scenes as most of the buildings are low and
parts of roofs can be fully scanned. In our dataset, only 24.3% of all
buildings have a horizontal hollow ratio lower than 50%, which means
the completeness of the hollow-ratio based method will be much lower
than our proposed method. Besides, buildings in Wang et al. (2016) are
often far from each other, and the problem of separating neighboring
buildings rarely happens. In summary, our proposed method can
achieve state-of-the-art building detection results in the complex
Fig. 11. Details of building localization. Point clouds and rectangles are colored
same as in Fig. 9. (a) All buildings are localized correctly. (b) Miss-detection
and multiple detection of buildings. (c) The detection of no-building objects and
multiple detections.
method and also in the building reference set. False positive (FP) is the
number of buildings which are recognized by our method but not in the
building reference set. There are two types of FP. In the first type, the
selected rectangle covers non-building objects. The second type is the
multiple detections. In this situation, only the rectangle covering the
highest number of LiDAR points is recognized as the correct detection,
461
ISPRS Journal of Photogrammetry and Remote Sensing 144 (2018) 453–468
S. Xia, R. Wang
which balances the completeness and correctness.
The completenesses of all results are shown in Fig. 13. Among all
the five methods, our proposed method achieves the highest completenesses in all the samples. High completeness indicates that most of
the building points have been retrieved from original point clouds.
Compared with our algorithm, other methods are much lower in terms
of completeness. The completeness of Mincut is the highest among four
compared methods. But its results highly depend on the foreground
seed and predefined object radius. The averaged completeness of
3DNCut (85.75%) is a little lower than Mincut (89.32%), and it also
needs manual work when merging multiple segments. In the rectangle
of #6 in Fig. 12(f), shrubs are closely adjacent to the walls, which will
highly affect the classification results of voxel-based features (i.e.,
linear, planar, and volumetric). In this case, if most of the points in a
super-voxel are vegetations, all the points in that super-voxel will be
labeled as non-building points. The accuracy of feature calculation can
be improved by scale selection like VG (Wang et al., 2016) thus results
in better completeness in Sample 6. But this type of improvement is
limited in complex scenes. For example, in the rectangle #2 of
Fig. 12(b), the roof points are still mistakenly labeled as non-building
points by VG (Wang et al., 2016). Although our proposed method is also
based on local geometric features, however, the graph-cut based framework with shape prior is effective in overcoming shortages of local
features. For example, roof points in Fig. 12(b) and incomplete wall
points in Fig. 12(f) are all labeled correctly. Besides, the predefined
grouping rules in MSG and VG are not always true, especially in complex residential areas. For instance, the linear structures and planar
structures will be merged if the normal or principle directions between
them are less than a small threshold (e.g. 10∘) in VG (Wang et al., 2016).
This rule may not be applicable in merging some small components of
buildings, such as low steps in #1 in Fig. 12(a), porches in rectangle #3
in Fig. 12(c), and eaves in rectangle #6 in Fig. 12(f). As a comparison,
our method can achieve better results in the buildings containing detailed structures.
The correctnesses are shown in Fig. 14. Large correctness indicates
the precision of building extraction method, i.e., the percentage of actual building points in the extracted points. The correctnesses of MSG
and VG are both over 95%. The main reason for the good correctnesses
of voxel-based methods is that only planar voxels are merged based on
rules in most cases, and planar voxels mainly come from buildings in
the residential areas. In fact, the correctness of our method is a little
worser than MSG and VG. This is mainly because our method can extract more building’s non-planar structures (e.g. balustrade) than voxelbased techniques, which will also increase the probability of labeling
adjacent non-building objects as buildings. In general, the correctnesses
of 3DNCut and Mincut are much worse than other three methods. There
are mainly three reasons. First, local geometric features are not used in
both methods. Thus, the prior knowledge that most buildings consist of
planar structures is not utilized. Second, the extraction results of both
methods highly depend on the Human-computer interaction. For example, in the rectangle of #4 in Fig. 12(d), a large part of buildings are
mistakenly labeled as the background using Mincut with a foreground
radius of 12.0 m. By increasing the radius to 15.0 m, points in the
rectangle may be marked as building points, but the tree in the middle
will also mistakenly be labeled as buildings. Third, small objects close
to the building are often merged into buildings using these two
methods. For instance, low grasses in the rectangle #4 in Fig. 12(d) are
all recognized as building points.
The balanced accuracy F1 measure is given in Fig. 15. Our proposed
method achieves the highest averaged F1 values (around 97.89%)
among five methods. The second highest F1 mean is over 92% by VG
(Wang et al., 2016). The F1 values of MSG (Yang et al., 2015) and
Mincut (Golovinskiy and Funkhouser, 2009) also reach nearly 90% in
these samples. The lowest averaged F1 87% is by 3DNcut (Yu et al.,
2015). In general, our proposed method has the best performance on
these residential building samples. Voxel-based methods can also
Table 2
Summary of seven building examples.
Samples
Figs.
Size (L/W/H, m)
#Point number
Running time (s)
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
Sample 6
Sample 7
Fig. 12(a)
Fig. 12(b)
Fig. 12(c)
Fig. 12(d)
Fig. 12(e)
Fig. 12(f)
Fig. 12(g)
14.5/13.7/6.8
18.6/16.8/11.1
17.2/7.5/9.9
15.0/13.8/10.4
30.9/14.6/11.4
14.9/13.0/5.2
16.2/16.1/13.5
40,893
71,540
31,231
264,409
187,641
53,375
46,025
4.26
4.54
3.31
30.7
19.49
6.62
4.95
residential area with dense vegetation. Compared with assumptions
used in the existing methods, such as no vegetations exist between
adjacent buildings (Gao and Yang, 2013) and few roof points are recorded by MLS (Wang et al., 2016), our assumption that buildings
consist of vertical walls is more general.
4.3. Building instances extraction
4.3.1. Experimental results
The performance of the proposed building extraction method is
evaluated with seven typical residential building samples. Sample
name, size, number of points and executed time of our method are listed
in Table 2. According to the running time listed in the last column, the
speed of our method is highly related to the number of points. In fact,
most of the time is spent on the initialization step which includes feature calculation, finding anchor points and graph construction while the
α -expansion costs little time. For instance, the initialization of Sample 4
is around 26.0 s and only 4.2 s are used in optimization.
The ground-truth labeled manually is shown in the first column in
Fig. 12. These samples form a representative group of residential
building point clouds acquired by mobile LiDAR. There are different
kinds of surroundings in these scenes. For example, shrubs and conifers
are against the walls in Fig. 12(a). In Fig. 12(b), walls are surrounded
by bushes and roofs are connected with high vegetations. The buildings
also vary in sizes and structures. For example, Fig. 12(c) shows a rectangular house with porch, Fig. 12(d) gives a squared building, and a
one-story large-area house is shown in Fig. 12(e). Besides, the sample
point clouds are different in point density and degrees of incompleteness. For example, large parts of walls in Fig. 12(f) are missing, and the
point density in Fig. 12(g) is lower than others.
Four existing methods are introduced for comparison. The first
method is 3DNcut (Yu et al., 2015). The initial implementation is
provided by (Shi and Malik, 2000) and extended to 3D point clouds. To
obtain better performance, the input point clouds are iteratively segmented into eight parts, and the parts consisting of most building points
are combined manually as extracted building points. The implementation of Mincut (Golovinskiy and Funkhouser, 2009) is acquired from
Point Cloud Library (PCL) (Rusu and Cousins, 2011). The seeds and
radius for Min-cut are selected manually. Besides, we also compare our
method with the multi-scale super-voxel grouping method (MSG) (Yang
et al., 2015) and the voxel grouping method (VG) (Wang et al., 2016). It
should be pointed out that no color information is used with MSG (Yang
et al., 2015) in our test. The results of four existing methods and our
proposed method are listed in Fig. 12.
To quantitatively evaluate the performance of these methods, three
indicators named completeness, correctness, and F1 measure are calculated. The completeness and correctness are in the same form like Eq.
(9). TP is the number of building points that are correctly extracted. FN
is the number of missing building points. FP is the number of nonbuilding points that are falsely labeled as building points. The F1
measure is defined as
F1 =
2TP
,
2TP + FN + FP
(10)
462
ISPRS Journal of Photogrammetry and Remote Sensing 144 (2018) 453–468
S. Xia, R. Wang
Fig. 12. Example of building extraction. Sub-figure (a)-(g): Sample 1 - Sample 7. From left column to right column: Ground truth, results of 3DNcut (Yu et al., 2015),
results of Mincut (Golovinskiy and Funkhouser, 2009), results of MSG (Yang et al., 2015), results of VG (Wang et al., 2016), and our results (β = 1.0, γ = 1.0 ). Blue
points are building points and green points are non-building points. (For interpretation of the references to colour in this figure legend, the reader is referred to the
web version of this article.)
Fig. 14. Correctness of extraction results using different methods.
Fig. 13. Completeness of extraction results using different methods.
463
ISPRS Journal of Photogrammetry and Remote Sensing 144 (2018) 453–468
S. Xia, R. Wang
Fig. 15. F1 measure of extraction results using different methods.
Fig. 16. Evaluation on different parameter settings of β and γ . The Z axis is the
mean F1 measure of seven samples.
achieve good results when the local features are accurately estimated,
and predefined voxel grouping rules are correct. The performance of
3DNcut and Mincut methods is not as good as other three methods. But,
they are more flexible in handling various scenes and can always output
acceptable results. In this building extraction task, the averaged F1
measures of these two methods are both over 86%.
According to the experiments, our proposed method also has limitations. If the shape prior is wrong or unavailable, the extraction results will be affected. For instance, planar walls in the #7 in Fig. 12(g)
failed to be extracted as no shape prior can be used in that region.
Besides, if non-building objects with flat surfaces (e.g. cars, bushes,
trunks) are very close to the house, they may be mislabeled as buildings. For instance, although the roof points in #2 are correctly recognized as building parts, the regular hedges in the #8 in Fig. 12(b) is
mislabeled as parts of the building.
4.3.2. Parameters tuning
There are two parameters β and γ in the final formulation of the
energy function in Eq. (8). In the above examples, the values of β and γ
are both fixed to 1.0. To analyze the influence from parameters as well
as the importance of each term, the ranges of β and γ are both set to
[0, 2.0] with the interval of 0.1. The upper limit 2.0 is selected in this
study because the segmentation accuracy will decrease when larger
parameters are used. The proposed building extraction algorithm is
executed with combinations of these parameters on seven samples, and
then the results are evaluated based on ground truth. Finally, the
averaged F1 measure of all samples are calculated and drawn as meshes
in Fig. 16. According to Fig. 16, the lowest F1 is close to 85% when no
spatial relationship between points is considered, i.e., β and γ are both
zero. The F1 increases by increasing β and γ and reaches the plateau of
high F1 values in the middle of the mesh. The highest F1 measure
97.99% is achieved when β and γ are both 0.9. From this points, the F1
value will decrease by increasing β and γ . For example, by increasing β
and γ to 2.0, the averaged F1 drops to 96.67%. Besides, if the γ is fixed
to 0.9, the F1 will drop to 97.5% by increasing β to 2.0. In this paper,
we set β and γ to 1.0 for simplification, and a high of 97.89% F1
measure is achieved.
In Fig. 16, the importance of the shape term (the last term in Eq. (8))
is obvious. If the γ is set to zero, which means the last term in the Eq.
(8) is discarded, the mean F1 first increases steadily and constantly
stays at 94% by increasing the β . However, by increasing the weight of
shape term, the F1 value exceeds 97% quickly. Although the F1 increment is relatively small (e.g. three percents), most of the improvements occur at detailed structures such as porches and walls surrounded
by other objects. This indicates that the performance of building extraction is improved largely by considering the shape prior derived
Fig. 17. Performance of proposed method using different parameters in Sample
6. The Z axis is the F1 measure of Sample 6.
from segments, especially for those incomplete buildings with complex
surroundings. For example, a more significant example can be found in
Fig. 17 which shows how F1 values of Sample 6 (Fig. 12(f)) changes
with different parameters. The highest F1 90.8% without considering
the shape term, i.e., γ = 0 , is achieved by setting β to 1.4 while it will
rapidly exceed 98.0% by setting γ to 0.1.
Compared to the shape term, the second term in Eq. (8) seems less
important. For example, by setting β to zero, a high of 97.4% is
achieved with only shape term (γ = 0.9). The F1 increases to 97.9% by
setting β to 1.0. This improvement (around 0.5 percent) is not significant in point cloud processing and may be discarded for reducing
the model complexity. However, the existing of the second term in Eq.
(8) guarantees a more reliable model when shape priors from segments
are not available or even wrong. Therefore, we still keep this term in
our final formulation in Eq. (8).
4.3.3. Large scale results
Our building extraction is also tested in both datasets. The point-bypoint accuracy of dividing non-ground points into groups in Section 3.2
is not evaluated in this paper due to the difficulty of defining the ground
truth. For example, it is hard to determine the belonging of fences
shared by adjacent buildings in point clouds. If the building dividing
step is evaluated at the object level, 327 buildings localized correctly
464
ISPRS Journal of Photogrammetry and Remote Sensing 144 (2018) 453–468
S. Xia, R. Wang
Fig. 18. Instance-level building extraction
overview for Calgary dataset. Orange points
are ground. Green points are non-building
objects. Individual buildings are randomly
colored. To reduce the huge data volume,
the ground points shown in the overview are
down-sampled. (For interpretation of the
references to colour in this figure legend, the
reader is referred to the web version of this
article.)
294 m by 103 m and consists of 21,423,155 points. There are 56
buildings in these two scenes, nine buildings are from Scene1, and 47
buildings are from Scene2. The results of Scene1 are shown in Fig. 20
and the Scene2 results are shown in Fig. 21. The average completeness
is 99.1%, the mean correctness is 97.9%, and the mean F1 measure
equals 98.6%. Furthermore, we also randomly select ten building instances from Kentucky dataset and evaluate the accuracy. It turns out
that the mean completeness is 98.83%, the mean correctness is 96.76%,
and the F1 measure equals to 97.78%. The accuracy in Kentucky dataset is a little lower than in Calgary dataset. The main reason is that
many vehicles are closely parked near the buildings in the Kentucky
scene.
In Fig. 20, buildings are large and most vegetations are not adjacent
to walls. By visually inspect the details, most of the extraction results
are correct. In the scene of Fig. 21, buildings are varied in sizes and
shapes. Although vegetations are much closer to the buildings than that
in Fig. 20, our method is still able to extract most buildings correctly.
are all separated correctly in the Calgary dataset. The remaining 48
buildings failed in the detection step are manually clipped from original
data and serve as the input for building extraction algorithm.
In this paper, we focus on the performance of the proposed segmentation-based building extraction method in Section 3.3. It took
about 1.5 h to extract all buildings from MLS point clouds in Calgary
dataset and about 20 min were spent in the Kentucky dataset. The
building extraction results of the Calgary dataset are shown in Fig. 18
and the results of Kentucky dataset are shown in Fig. 19. By examining
the two overviews and enlarged details, we can find that most building
points are correctly extracted from surroundings, regardless of the
building shapes and cluttered vegetations.
As the evaluation of the whole dataset is extremely time-consuming
and labor-intensive, two relative small scenes in Calgary dataset named
Scene1 and Scene2 to estimate the performance of our building extraction method in large-scale scenes. Scene1 covers an area of 161 m
by 55 m and consists of 22,618,943 points. Scene2 covers an area of
Fig. 19. Instance-level building extraction overview of Kentucky dataset. Colors are the same as those in Fig. 18. (For interpretation of the references to colour in this
figure legend, the reader is referred to the web version of this article.)
465
ISPRS Journal of Photogrammetry and Remote Sensing 144 (2018) 453–468
S. Xia, R. Wang
4.4. Discussion
In this study, retrieving building instances from mobile LiDAR point
clouds is treated as a specific problem of instance-level object extraction. Generally, instance-level object extraction can be decomposed into
two subtasks, object localization and semantic segmentation
(Golovinskiy et al., 2009; Liang et al., 2017). Its final output should be
separated objects with class labels. If our goal is extracting building
regions from mobile LiDAR point clouds, the localization only has adverse effects on the final results. However, the goal of our research is
extracting building instances. Therefore, the localization is the key to
divide building regions into individual instances or to merge discrete
and unorganized building points into independent building objects.
There are other ways to group building points into individual
building instances. For example, Wang et al. (2016) propose a rulebased method which merges potential building points into independent
instances. However, their low-rise building detection rate is relatively
lower (86.3%) than ours (95.89% and 100% in two datasets). More
importantly, the performance of rule-based instance extraction methods
depends on two factors. The first factor is the predefined rules which
are difficult to design as buildings in the real world have various
structures. The second factor is the accuracy of building point recognition. Wrongly labeled points will reduce the grouping accuracy.
For example, if non-building points between two buildings are falsely
marked as buildings, these two buildings will have high possibility to be
merged into one instance. In fact, building point labeling is still a
challenging task. The state-of-the-art accuracy of building point labeling in ground-based LiDAR point clouds is approximately between
85% and 95% (Wang et al., 2015; Weinmann et al., 2015) which depends on many factors such as data quality and scene complexity. Even
if the semantic labeling of building points is infallible, i.e., overall accuracy is 100%, dividing building points into instances is still a challenge task. For example, distance-based clustering method may be used
for separating adjacent building instances. In Fig. 22(a), although the
distance between closest walls of two adjacent buildings is over 2.0 m,
we have to use a smaller clustering threshold (< 0.4 m) to separate
these two buildings due to the existence of a small protruding component between them. However, for other buildings, a grouping threshold
below 0.4 m may result in over-detection, which means that one
building instance is divided into several clusters. For example, the
building in Fig. 22(b) has a gap of 0.82 m due to the occlusion. Also, it is
difficult to retrieve building instances if buildings are connected by
building components such as enclosing walls or adjacent eaves. For
example, buildings in Fig. 22(c) are connected by enclosure walls (not
building outer walls), which will result in one merged building by the
rule-based method or clustering method. In contrast, our proposed
building localization method can deal with all these situations without
tuning parameters or introducing specific rules.
There are mainly three advantages of localizing buildings before
building points extraction. First, knowing the positions of building instances, non-building objects far away from the potential buildings such
as vehicles on the street can be removed in the early step, which largely
reduces the computational burden. Second, if the input for the graphbased segmentation is large-scale point clouds, a graph consists of tens
of millions of vertexes will be constructed, which may not be feasible
for most desktops to process such a graph. To solve this problem, we
divide the original point clouds into small groups based on localization
results, which reduces the problem size and improves the efficiency.
Third, the shape prior is used in our proposed segmentation algorithm.
As different buildings often have various structural priors, shape information from adjacent buildings can be removed with the help of
localization.
One major limitation of the proposed framework is that it cannot
separate buildings which are connected by outer-walls such as façades.
These situations often appear in downtown areas, and these buildings
may only have one wall scanned by mobile LiDAR due to the occlusions.
Fig. 20. Building extraction details in Scene1. Orange points are ground. Nonbuilding points are colored green. Building points from different instances are
randomly colored. Building points in the circle are mislabeled as ground after
ground filtering. (For interpretation of the references to colour in this figure
legend, the reader is referred to the web version of this article.)
Fig. 21. Building extraction details in Scene2. Orange points are ground. Nonbuilding points are colored green. Building points from different instances are
randomly colored. In the circled region, car points are mislabeled as building.
(For interpretation of the references to colour in this figure legend, the reader is
referred to the web version of this article.)
There exist problems in the large-scale tests. For example, in the enlarged view in Fig. 20, building points indicated by the dashed circle are
mis-recognized as ground points during the ground filtering step. Another example lies in the dashed circle in the enlarged rectangle #1 in
Fig. 21, the car very close (less than 0.5 m) to the building is mislabeled
as a building part. One potential way to solve this problem is further
segmenting the building points into smaller clusters, then identifying
the class of each cluster. In summary, these experiments demonstrate
that our proposed building extraction method is able to achieve high
quality building extraction results at instance level.
466
ISPRS Journal of Photogrammetry and Remote Sensing 144 (2018) 453–468
S. Xia, R. Wang
Fig. 22. Challenges of building instance extraction in different scenes. (a) A protruding
component between two adjacent building
instances. (b) Gaps within one building due
to occlusions. (c) Building instances connected by enclosure walls. Colors are the
same as those in Fig. 18. (For interpretation
of the references to colour in this figure legend, the reader is referred to the web version of this article.)
extract building points from complex surroundings, we propose a
foreground-background segmentation method which integrates local
geometric features and planar shape priors derived from segments into
an energy model. Finally, the model is minimized by graph cuts. The
experimental results show the advantages of our method, especially
when the walls are incomplete or intimately connected with other objects. This is mainly contributed by the proposed shape term in the
objective function. Besides, we argue that the use of shape priors will
also improve the performance of point cloud segmentation in other
applications.
Our methods still have some limitations. In the building localization
step, the multiple detection problems occur, and buildings with only
one detected wall segment are easily missed. Our building extraction
method has difficulty in distinguishing planar non-building objects
close to the walls. Also, for those non-planar building structures, shape
prior may not be available. Besides, our methods cannot extract
building instances whose walls are spatially connected such as façades
in the urban areas. Therefore, the future work will focus on reducing
multiple detections, finding advanced shape priors for complex structures, and developing instance-level segmentation methods for connected buildings.
In fact, partitioning connected façades into instances is a complicated
problem and a different approach such as parsing-based algorithm may
be developed. In the work of Martinovic et al. (2015), façade images are
first classified into semantic regions such as windows, walls, and doors
based on color and geometric features, and then the façade separation is
turned into a multi-label optimization problem. In the field of mobile
LiDAR point clouds processing, Hammoudi et al. (2010) try to divide
façade point clouds into building instances with the help of existing
cadastral maps. Serna et al. (2016) propose a city block-level façade
segmentation method based on the influence zone analysis, and they
test their methods in a urban building dataset acquired by mobile
LiDAR (Vallet et al., 2015). But, the problem of how to divide blocklevel façades into building instances is not discussed. In summary, extracting instance-level buildings directly from façade point clouds is a
challenging task and remains unsolved. Tackling these problems requires new methods which may be similar to the façade parsing studies
(Shen et al., 2011).
In short, the building localization method is proposed to divide
building regions into building instances. Existing methods such as rulebased and clustering methods are not performing well in many realworld situations, and they also cannot process connected façades.
Moreover, our building localization method also provides an approach
to detect buildings from original point clouds without supervised classification, and the buildings positions can be used for extracting instances from classified point clouds. To deal with the connected
building instances such as façades, methods based on façade parsing
may be developed in the future.
Acknowledgments
The first author is supported by China Scholarship Council and
University of Calgary. This work is partially supported by Natural
Sciences and Engineering Research Council(NSERC). We would like to
thank the City of Calgary Council and Dr.Ruigang Yang for providing
the mobile LiDAR data.
5. Conclusion
References
Building instance extraction from MLS point clouds in residential
areas has several challenges such as how to separate adjacent buildings
and how to extract buildings from cluttered vegetations. In this paper,
we propose a “localization then segmentation” framework which can
solve most of these problems and achieve instance-level building extraction results. The building localization is turned into a problem of
finding rectangles formed by projected vertical wall segments. A hypothesis and selection strategy is proposed to approach this problem.
First, hundreds of rectangle proposals are generated using vertical
walls. Then, the selection of rectangle hypotheses is formed as an energy maximization problem solved by linear programming. The
building detection results demonstrate that our method can localize
buildings in dense and complex residential areas with high accuracy. To
Aijazi, A.K., Checchin, P., Trassoudaine, L., 2013. Segmentation based classification of 3d
urban point clouds: a super-voxel based approach with evaluation. Rem. Sens. 5 (4),
1624–1650.
Awrangjeb, M., Ravanbakhsh, M., Fraser, C.S., 2010. Automatic detection of residential
buildings using lidar data and multispectral imagery. ISPRS J. Photogramm. Rem.
Sens. 65 (5), 457–467.
Belton, D., Lichti, D.D., 2006. Classification and segmentation of terrestrial laser scanner
point clouds using local variance information. Int. Arch. Photogram., Rem. Sens.
Spatial Inform. Sci. 36 (Part 5), 44–49.
Boykov, Y., Kolmogorov, V., 2004. An experimental comparison of min-cut/max-flow
algorithms for energy minimization in vision. IEEE Trans. Pattern Anal. Mach. Intell.
26 (9), 1124–1137.
Boykov, Y., Veksler, O., Zabih, R., 2001. Fast approximate energy minimization via graph
cuts. IEEE Trans. Pattern Anal. Mach. Intell. 23 (11), 1222–1239.
467
ISPRS Journal of Photogrammetry and Remote Sensing 144 (2018) 453–468
S. Xia, R. Wang
Semantic segmentation of urban scenes from start to end in 3d. In: Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pp. 4456–4465.
Nurunnabi, A., Belton, D., West, G., 2016. Robust segmentation for large volumes of laser
scanning three-dimensional point cloud data. IEEE Trans. Geosc. Rem. Sens. 54 (8),
4790–4805.
Ok, A.O., Senaras, C., Yuksel, B., 2013. Automated detection of arbitrarily shaped
buildings in complex environments from monocular vhr optical satellite imagery.
IEEE Trans. Geosci. Rem. Sens. 51 (3), 1701–1717.
Pu, S., Vosselman, G., 2009. Knowledge based reconstruction of building models from
terrestrial laser scanning data. ISPRS J. Photogramm. Rem. Sens. 64 (6), 575–584.
Qin, R., Huang, X., Gruen, A., Schmitt, G., 2015. Object-based 3-d building change detection on multitemporal stereo images. IEEE J. Sel. Top. Appl. Earth Observ. Rem.
Sens. 8 (5), 2125–2137.
Rother, C., Kolmogorov, V., Blake, A., 2004. Grabcut: interactive foreground extraction
using iterated graph cuts. In: ACM Transactions on Graphics (TOG), vol. 23. ACM, pp.
309–314.
Rusu, R.B., Cousins, S., 2011. 3d is here: Point cloud library (pcl). In: 2011 IEEE
International Conference on Robotics and automation (ICRA). IEEE, pp. 1–4.
Rutzinger, M., Höfle, B., Oude Elberink, S., Vosselman, G., 2011. Feasibility of facade
footprint extraction from mobile laser scanning data. PhotogrammetrieFernerkundung-Geoinformation 2011 (3), 97–107.
Serna, A., Marcotegui, B., 2014. Detection, segmentation and classification of 3d urban
objects using mathematical morphology and supervised learning. ISPRS J.
Photogramm. Rem. Sens. 93, 243–255.
Serna, A., Marcotegui, B., Hernández, J., 2016. Segmentation of façades from urban 3d
point clouds using geometrical and morphological attribute-based operators. ISPRS
Int. J. Geo-Inform. 5 (1), 6.
Shen, C.-H., Huang, S.-S., Fu, H., Hu, S.-M., 2011. Adaptive partitioning of urban facades.
In: ACM Transactions on Graphics (TOG), vol. 30. ACM, pp. 184.
Shi, J., Malik, J., 2000. Normalized cuts and image segmentation. IEEE Trans. Pattern
Anal. Mach. Intell. 22 (8), 888–905.
Vallet, B., Brédif, M., Serna, A., Marcotegui, B., Paparoditis, N., 2015. Terramobilita/
iqmulus urban point cloud analysis benchmark. Comput. Graph. 49, 126–133.
Veksler, O., 2008. Star shape prior for graph-cut image segmentation. Comput. Vis.–ECCV
2008, 454–467.
Wang, Y., Cheng, L., Chen, Y., Wu, Y., Li, M., 2016. Building point detection from vehicleborne lidar data based on voxel group and horizontal hollow analysis. Rem. Sens. 8
(5), 419.
Wang, Z., Zhang, L., Fang, T., Mathiopoulos, P.T., Tong, X., Qu, H., Xiao, Z., Li, F., Chen,
D., 2015. A multiscale and hierarchical feature extraction method for terrestrial laser
scanning point cloud classification. IEEE Trans. Geosc. Rem. Sens. 53 (5), 2409–2425.
Weinmann, M., Jutzi, B., Hinz, S., Mallet, C., 2015. Semantic point cloud interpretation
based on optimal neighborhoods, relevant features and efficient classifiers. ISPRS J.
Photogramm. Rem. Sens. 105, 286–304.
Xiao, J., Gerke, M., Vosselman, G., 2012. Building extraction from oblique airborne
imagery based on robust façade detection. ISPRS J. Photogramm. Rem. Sens. 68,
56–68.
Xu, S., Wang, R., Zheng, H., 2017. Lidar point cloud segmentation via minimum-cost
perfect matching in a bipartite graph. arXiv preprint arXiv:1703.02150.
Yang, B., Dong, Z., Zhao, G., Dai, W., 2015. Hierarchical extraction of urban objects from
mobile laser scanning data. ISPRS J. Photogramm. Rem. Sens. 99, 45–57.
Yang, B., Wei, Z., Li, Q., Li, J., 2012. Automated extraction of street-scene objects from
mobile lidar point clouds. Int. J. Rem. Sens. 33 (18), 5839–5861.
Yang, B., Xu, W., Dong, Z., 2013. Automated extraction of building outlines from airborne
laser scanning point clouds. IEEE Geosci. Rem. Sens. Lett. 10 (6), 1399–1403.
Yu, B., Liu, H., Wu, J., Hu, Y., Zhang, L., 2010. Automated derivation of urban building
density information using airborne lidar data and object-based method. Landscape
Urban Plann. 98 (3), 210–219.
Yu, Y., Li, J., Guan, H., Wang, C., Yu, J., 2015. Semiautomated extraction of street light
poles from mobile lidar point-clouds. IEEE Trans. Geosci. Rem. Sens. 53 (3),
1374–1386.
Zhang, K., Yan, J., Chen, S.-C., 2006. Automatic construction of building footprints from
airborne lidar data. IEEE Trans. Geosci. Rem. Sens. 44 (9), 2523–2533.
Zhang, W., Qi, J., Wan, P., Wang, H., Xie, D., Wang, X., Yan, G., 2016. An easy-to-use
airborne lidar data filtering method based on cloth simulation. Rem. Sens. 8 (6), 501.
Zheng, H., Wang, R., Xu, S., 2017. Recognizing street lighting poles from mobile lidar
data. IEEE Trans. Geosci. Rem. Sens. 55 (1), 407–420.
Boykov, Y.Y., Jolly, M.-P., 2001. Interactive graph cuts for optimal boundary & region
segmentation of objects in nd images. In: Proceedings. Eighth IEEE International
Conference on Computer Vision, 2001. ICCV 2001, vol. 1. IEEE, pp. 105–112.
Che, E., Olsen, M.J., 2018. Multi-scan segmentation of terrestrial laser scanning data
based on normal variation analysis. ISPRS J. Photogramm. Rem. Sens.
Chen, D., Wang, R., Peethambaran, J., 2017. Topologically aware building rooftop reconstruction from airborne laser scanning point clouds. IEEE Trans. Geosc. Rem.
Sens. 55 (12), 7032–7052.
Cheng, L., Xu, H., Li, S., Chen, Y., Zhang, F., Li, M., 2018. Use of lidar for calculating solar
irradiance on roofs and façades of buildings at city scale: methodology, validation,
and analysis. ISPRS J. Photogramm. Rem. Sens. 138, 12–29.
Cote, M., Saeedi, P., 2013. Automatic rooftop extraction in nadir aerial imagery of suburban regions using corners and variational level set evolution. IEEE Trans. Geosci.
Rem. Sens. 51 (1), 313–328.
Demantké, J., Mallet, C., David, N., Vallet, B., 2011. Dimensionality based scale selection
in 3d lidar point clouds. Int. Arch. Photogramm., Rem. Sens. Spatial Inform. Sci. 38
(Part 5), W12.
Deng, H., Zhang, L., Mao, X., Qu, H., 2016. Interactive urban context-aware visualization
via multiple disocclusion operators. IEEE Trans. Visual. Comput. Graphics 22 (7),
1862–1874.
Fan, H., Yao, W., Tang, L., 2014. Identifying man-made objects along urban road corridors from mobile lidar data. IEEE Geosci. Rem. Sens. Lett. 11 (5), 950–954.
Freedman, D., Zhang, T., 2005. Interactive graph cut based segmentation with shape
priors. In: IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, 2005. CVPR 2005, vol. 1. IEEE, pp. 755–762.
Gao, J., Yang, R., 2013. Online building segmentation from ground-based lidar data in
urban scenes. In: 2013 International Conference on 3D Vision-3DV 2013. IEEE, pp.
49–55.
Golovinskiy, A., Funkhouser, T., 2009. Min-cut based segmentation of point clouds. In:
2009 IEEE 12th International Conference on Computer Vision Workshops (ICCV
Workshops). IEEE, pp. 39–46.
Golovinskiy, A., Kim, V.G., Funkhouser, T., 2009. Shape-based recognition of 3d point
clouds in urban environments. In: 2009 IEEE 12th International Conference on
Computer Vision. IEEE, pp. 2154–2161.
Guan, H., Li, J., Cao, S., Yu, Y., 2016. Use of mobile lidar in road information inventory: a
review. Int. J. Image Data Fusion 7 (3), 219–242.
Gurobi Optimization, I., 2016. Gurobi optimizer reference manual. URL <http://www.
gurobi.com>
Hammoudi, K., Dornaika, F., Soheilian, B., Paparoditis, N., 2010. Extracting wire-frame
models of street facades from 3d point clouds and the corresponding cadastral map.
IAPRS 38 (Part 3A), 91–96.
Hernández, J., Marcotegui, B., 2009. Point cloud segmentation towards urban ground
modeling. In: Urban Remote Sensing Event, 2009 Joint. IEEE, pp. 1–5.
Kang, J., Körner, M., Wang, Y., Taubenböck, H., Zhu, X.X., 2018. Building instance
classification using street view images. ISPRS J. Photogramm. Rem. Sens.
Klasing, K., Wollherr, D., Buss, M., 2008. A clustering method for efficient segmentation
of 3d laser data. In: IEEE International Conference on Robotics and Automation,
2008. ICRA 2008. IEEE, pp. 4043–4048.
Kolmogorov, V., Zabin, R., 2004. What energy functions can be minimized via graph cuts?
IEEE Trans. Pattern Anal. Mach. Intell. 26 (2), 147–159.
Lafarge, F., Descombes, X., Zerubia, J., Pierrot-Deseilligny, M., 2008. Automatic building
extraction from dems using an object approach and application to the 3d-city modeling. ISPRS J. Photogramm. Rem. Sens. 63 (3), 365–381.
Li, M., Nan, L., Liu, S., 2016. Fitting boxes to manhattan scenes using linear integer
programming. Int. J. Digital Earth 9 (8), 806–817.
Liang, X., Lin, L., Wei, Y., Shen, X., Yang, J., Yan, S., 2017. Proposal-free network for
instance-level semantic object segmentation. IEEE Trans. Pattern Anal. Mach. Intell.
Lin, C., Nevatia, R., 1998. Building detection and description from a single intensity
image. Comput. Vis. Image Understanding 72 (2), 101–121.
Liqiang, Z., Hao, D., Dong, C., Zhen, W., 2013. A spatial cognition-based urban building
clustering approach and its applications. Int. J. Geogr. Inform. Sci. 27 (4), 721–740.
Liu, C., Shi, B., Yang, X., Li, N., Wu, H., 2013. Automatic buildings extraction from lidar
data in urban area by neural oscillator network of visual cortex. IEEE J. Sel. Top.
Appl. Earth Observations Rem. Sens. 6 (4), 2008–2019.
Maalek, R., Lichti, D.D., Ruwanpura, J.Y., 2018. Robust segmentation of planar and linear
features of terrestrial laser scanner point clouds acquired from construction sites.
Sensors 18 (3), 819.
Martinovic, A., Knopp, J., Riemenschneider, H., Van Gool, L., 2015. 3d all the way:
468