Subido por octavinavarro

HYPERSPECTRAL INK DETECTION ICCST2014 ID46

Anuncio
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/272486831
The use of Hyperspectral Analysis for Ink Identification in Handwritten
Documents
Conference Paper · September 2014
DOI: 10.1109/CCST.2014.6986980
CITATIONS
READS
11
286
5 authors, including:
Aythami Morales
Miguel A. Ferrer
Universidad Autónoma de Madrid
Universidad de Las Palmas de Gran Canaria
120 PUBLICATIONS 1,254 CITATIONS
281 PUBLICATIONS 2,843 CITATIONS
SEE PROFILE
SEE PROFILE
Moises Diaz
Cristina Carmona-Duarte
Universidad de Las Palmas de Gran Canaria
Universidad de Las Palmas de Gran Canaria
70 PUBLICATIONS 766 CITATIONS
44 PUBLICATIONS 181 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
SEMI AUTOMATIG SYSTEM SIGNATURE RECOGNITION View project
sclera biometrics View project
All content following this page was uploaded by Aythami Morales on 13 April 2015.
The user has requested enhancement of the downloaded file.
SEE PROFILE
The use of Hyperspectral Analysis for Ink
Identification in Handwritten Documents
Aythami Morales1, Miguel A. Ferrer1, Moises Diaz-Cabrera1, Cristina Carmona1, Gordon L. Thomas²
1
Instituto Universitario para el Desarrollo Tecnológico y la Innovación en Comunicaciones (IDeTIC)
Universidad de Las Palmas de Gran Canaria Campus de Tafira s/n, E35017, Las Palmas de Gran Canaria, Spain
[email protected], [email protected], [email protected], [email protected], ²Independent consultant and author, UK,
[email protected]
Abstract - Hyperspectral analysis is employed in many different
areas, such as medicine, environmental studies, security and
forensics. Focusing on law enforcement, ink discrimination has
become an important factor for the detection of fraudulent
documents. This paper proposes an approach for ink analysis in
handwritten documents and pen verification using hyperspectral
analysis and Least Square SVM classification. The proposed
method obtains immediate results in a non-contact way from the
document or test sample. The first step is to determine the best
possible lighting conditions. Then a detailed study is made of
components and properties of the ink and pens used. This paper
proposes a classification method based on the hyperspectral
characteristics of the ink derived from its physical properties.
Furthermore, a database of hyperspectral curves of several types
of inks is made, which is used to obtain the characteristics of
different inks. The proposed method for automated ink type
identification is tested using 25 different pens and more than
1000 samples. The achieved discrimination between types of ink
was 87.5%. The experimental protocol includes three different
scenarios.
Keywords—ink identification, pen verifier,
analysis, handwritten document analysis, forensics.
I.
hyperspectral
INTRODUCTION
The analysis of inks, particularly in Document
Examination, is of great importance. The ink type and
temporal factors can be important evidence in criminal
prosecutions [1].
There are many document analysis techniques, perhaps as
many as for forging them. Documentoscopy is the area of
knowledge that it is aimed at determining the authenticity of a
document, its authorship, structure and content [2], a
document being defined as any medium capable of hosting
graphical content, either printed or handwritten.
According to the Spanish Directorate General of Police, in
the Document Examination section of the Forensic Science
Department, the following material is available to study a
document’s authorship: 1. a binocular microscope or
magnification loupe for examining finer details of the various
documents. The system usually incorporates a photographic
camera. 2. An infrared microscope for the spatial analysis of
inks. This allows the optical removing of certain pigmented
inks, therefore permitting the visualization of traces produced
by others. 3. Projector profiles for precision measurements. 4.
A videospectral comparator for the optical analysis of the ink
reflection under different lighting conditions and wavelengths
(UV, IR and green and blue filters of different nm.). 5.
Fluotest, for luminescence observation under UV of different
wavelengths. As can be seen, all methods used are non-contact
in order not to interfere with the evidence.
There are other non-contact techniques, such as: colour
analysis, absorption spectrum analysis, examining by
ultraviolet radiation, infrared radiation detection or infrared
absorption. The contact techniques are chromatography (either
thin or high performance liquid layer) and the use of test
chemicals [3][4].
There are also more technologically advanced techniques
that require more complex instrumentation, for instance,
specialised spectroscopic techniques for studying the
interaction between electromagnetic radiation and the ink.
These techniques include FTIR (Fourier Transform InfraRed), Raman Spectroscopy, Electrophoresis and Mass
Spectroscopy.
This paper is focused on the application of optical
spectrometry. As is well-known, each natural element has
properties of absorption and reflection depending on its atomic
structure. When an ink is irradiated with white light, it absorbs
some wavelengths and reflects others. As the white light
contains energy at several wavelengths, the spectral response
of the deposited ink can be used to characterise it and extract
useful information (e.g. determine if two different samples
have been made with the same ink). So as not to bias the
measurements, the light source must contain the same amount
of energy in all the radiated wavelengths. Thus, it is possible
to infer the composition of an ink given the dispersion of the
light reflected by it.
In recent decades, the industry has commercialized devices
for spectroscopy analysis (e.g. Spectrum FORAM 685-2 by
Foster and Freeman or HSI Examiner 100 QD by ChemImage)
In general, these devices provide detailed microscopic analysis
alongside a spectral curve which can be used to compare inks,
papers, holograms and other forms of image. The main
drawback of such devices is the high price which is not
affordable for some small forensic offices.
The Spectrum FORAM 685-2 works as follows: white
light is used to illuminate the questioned document while a
narrow bandpass optical filter is placed in front of a camera in
order to analyze the ink sample at the selected wavelength.
This is a multispectral device, i.e., a tens of bands can be
analyzed. The device allows visualization of the strokes at
different microscopic magnifications. This way it is possible
to analyze how the ink is deposited on the paper and how it
fills the interstices between the fibres. This can give revealing
clues about the fluidity and viscosity of the ink, which,
supplemented with other tests, can help to reach a deeper
knowledge of the document authorship. Moreover, if the
spectral responses of the inks are significantly different, it
could mean they are different or at least a sufficient time
interval has elapsed between the imprints to justify
discrepancies.
Another more current system used for document analysis
is the HSI Examiner 100 QD produced by ChemImage [5] [6],
which provide hardware and software for many chemical and
biological applications such as pathology, forensic studies,
pharmaceutical studies and threat detection.
The HSI Examiner 100 QD is a hyperspectral imaging
system and software package specifically designed for
forensic document examination. This platform provides,
according to its manufacturer, the most sensitive commercially
available device for ink discrimination purposes. Again the
price is its major drawback.
The aim of the work reported here is to develop an
automatic ink classifier based on optical properties of the ink
The spectral range is from 400 nm to 1100 nm which includes
the near infrared. The proposed algorithm runs in real-time,
giving a probability of the same line being written with the
same pen, and providing additional evidence of a document
being fraudulent or not.
The block diagram of the system can be summarized as:
1. Acquisition, which include the hyperspectral camera, the
light and “box” for document acquisition.
2. Hyperspectral image processing to reduce the noise and
obtain the ink hyperspectral curve at different locations
3. Characterization of ink hyperspectral reflection and
classifier design.
4. Database build and test performed to validate the final
system.
II.
ACQUISITION DEVICE
To obtain the hyperspectral image we use a spectrograph
in conjunction with a CCD camera. The system scans a line
image, obtaining the spectral response at each point of the
line. The gain and exposure parameters are setup to increase
the contrast between bands.
The spectrograph used in this paper is the ImSpector
V10E, and the camera the model TM-1327GE which has a
Figure 1. Spectral response of different bulbs tested.
resolution of 1392×1024 pixels. The vertical axis represents
the wavelength, so we have 1024 spectral bands between 400
and 1100nm. The spatial resolution depends on the angle of
view of the camera lens and the focus distance. It is possible to
acquire up to 30 frames per second.
For illumination, after testing different lamps (fluorescent,
halogen, CFL, LEDs, etc.) looking for white light emission as
uniform as possible between 400 and 1100nm, we chose the
Philips EcoClassic bulb and the OSRAM bulb, each emitting
at 100W. Figure 1 shows the spectral radiation of the chosen
bulbs alongside the other two tested bulbs, when their light is
projected over a white sheet.
The data acquisition is made under controlled conditions
inside a box of width 30 cm, depth 30 cm and 40 cm high. The
interior of the box is painted in white with a painting material
that also reflects in the infrared. There are two apertures in the
box, one for the camera lens and another one for document.
Each is covered with a black felt curtain to avoid external light
interference. Figure 2 shows the box, the camera and the
curtain configuration. The camera gain and exposure time
were experimentally fixed to 1.96 and 0 respectively.
III.
HYPERSPECTRAL IMAGE PROCESSING
After introducing a paper with ink lines drawn upon it, the
reflection of the lines is broken down into different
wavelengths, which are subsequently projected onto the CCD
detector allowing the creation of a two-dimensional image of
the reflection, as shown in Figure 2, where one axis represents
the spatial information and other spectral information. In this
grayscale image, the white tones indicate high levels of
reflection while the dark shades indicate less reflection.
Sheet with 3 ink samples
Hyperspectral image
Ink 3
Wavelength (nm)
Angle
a)
Ink 2
Ink 1
Wavelength
Scanned line
Hyperspectral curves
Background removal
b)
c)
d)
Figure 2. Procedure to obtain the ink hyperspectral curve along a line in a document. a) sample document; b) hyperspectral image; c)
hyperspectral image processed; and d) hyperspectral curves.
The image processing is performed in the following steps:
1.
2.
3.
The sample document is introduced into the closed
enclosure and optimally illuminated. Focus, gain values
and camera exposure are set. Inside the box there is a
mark which shows the line analyzed by the hyperspectral
camera, see figure 2.
The hyperspectral image of the line analyzed is obtained
by the hyperspectral device to obtain the image at Figure
2.b.
The hyperspectral image is processed in order to remove
the background noise. This is conducted by subtracting
the hyperspectral image of a white sheet from the
document hyperspectral image. This is to equalize the
effect of non-flat spectral illumination. We thus obtain
figure 2bc.
4.
5.
The hyperspectral curves of ink pixels are extracted as
follows:
a.
The line corresponding to wavelength equal to 800nm
which is a maximum for ink reflection is extracted
and derived (it corresponds to a row of the image
matrix).
b.
The higher negative peaks of the derived line are the
position (angle) of ink pixels.
c.
At each pixel position, the hyperspectral curve is
obtained (column at angle position of the ink).
The hyperspectral curve is smoothed by a moving
averaging filter of length 21 pixels thus obtaining
hyperspectral curves as shown at Figure 2.d.
The curves characterize the ink composition and need to be
parameterized in order to identify the ink.
IV.
DATABASE
For the database, we have used 25 different pens of
different ink types as follows: 7 different pens of viscous ink,
4 different pens of liquid ink, 7 different pens of gel ink and 7
different marker pens [7].
Two different databases have been built, the first for
system design and the second for evaluation.
A. Database for system design
We start with lines drawn on paper with all the pens. We have
used the same kind of sheet for all the documents: business
paper of 80 g/m2.
With each of the above described pens, we draw 50 lines
and just after drawing (minimum time lapse) we work out the
hyperspectral curve of the central pixel of each line. So, we
obtained 50×25=1250 hyperspectral curves. An example of a
document belonging to this database being placed in the box
can be seen at figure 3 (upper).
After a week, with the ink dried, the lines were scanned
again, thus obtaining another 1250 hyperspectral curves. In
total 2500 hyperspectral curves comprise the designed corpus.
B. Database for validation
The database for validation consists of 30 bank checks.
Ten of them were written with just one pen, fictitiously, of
course, because there was no intention to use them for bank
transactions. Other 10 of them were written with a specific ink
and afterwards forged with a different ink. The remaining 10
were written with one pen and fraudulently altered with a
different pen with the same ink type. An example can be seen
at figure 3 (lower) where the amount 900 is altered to 90,000
with another pen.
within the context of statisticaal learning theory and structural
risk minimization. Least Squares Support Vector Machines are
VMs which lead to solutions of
reformulations to standard SV
the indefinite linear system
ms generated within them.
Robustness, sparseness, and weightings
w
can be imposed on
LS-SVMs where needed. Wee apply a Bayesian framework
with three levels of inference [99].
Figure 3. Upper: sample of document belonging to thhe database for system
design being introduced into the box. Lower: examplle of an altered check
belonging to the validation database.
V.
INK HYPERSCPECTRAL CURVE PARAM
METRIZATION
We represent each hyperspectral curve by
b several features
in order to enter it into the ink recognizeer. Two kinds of
hyperspectral curve parameters have been developed: the first
based on area and the second based on curve slope [8].
Prior to working out the parameters, the hyperspectral
curve is divided uniformly into sections of Δ nm from 400nm
to 1100nm and features based on area and slope
s
are obtained
from each section.
The area of each section is numerically calculated with
the trapezoidal rule using:
900
·
C 900
·
1
2
being the value of the hyperspectrall curve at
1
nm.
For slope parameters, the derivative of
o each section is
approximated as:
C 900
·
1
C 900
·
2
The area and slope based features of the hyperspectral
curve are obtained by concatenating the parrameters of all the
sections as follows:
,
|
0, 900
·
1600
3
,
|
0, 900
·
1600
4
c
both
The ink feature vector is obtained by concatenating
characteristics , .
VI. CLASSIFIER
The model we use to discriminate one innk from another is
built using a Least Squares Support Vecttor Machine (LSSVM). Support Vector Machines (SVMs) arre frequently used
The meta-parameters of thee LS-SVM model are the width
of the Gaussian kernels σ and thhe regularization factor which
are trained with parameter vecctors from the modelled ink as
positive samples and other inks as negative samples. The
regularization factor is taken as
30 and the Gaussian
width σ parameter is optim
mized as follows: the training
sequence is randomly partitiioned into two equal subsets
30 times with the
,1
2. The LS-SVM is trained
and Gaussian width
w
,1
equal to T
first subset
logarithmically equally spacedd values between 10 and 10 .
Each one of the T LS-SVM models
m
is tested with the second
subset
so as to obtain T Equual Error Rate
,1
measures. The Gaussian widthh σ of the model is obtained as
σ= where
. Finally, the ink model
is obtained by training the LS-S
SVM with the complete training
sequence.
This training procedure is employed to work out a LSSVM model per ink or pen, depending on the experiment,
using its own training samples as positive vectors and training
samples of other inks as negative vectors. To verify that a
questioned ink vector correspoonds to a given ink model, the
score of the questions ink is worked out with the LS-SVM
model of the given ink. If the score is greater than the
s
threshold, it is accepted as the same.
X
VII. EXPERIMENTS
Several sets of experimentss have been performed. The first
were aimed at determining the ability of the device to
distinguish between inks and between pens. The latter were
designed to validate the schemee with the bank check database.
Ink Classification – no tim
me lapse: The first experimental
session was addressed at workiing out the ink verification rate,
i.e., the ability to distinguish among viscous, gel liquid and
marker ink. With the 1250 samples of the design database
collected just after writing (wiith the ink fresh), the classifier
was trained with 30% of thee samples and tested with the
remainder 70%. Table I shownn the results. It can be seen that
the viscous and liquid inks arre the least stable while the gel
ink is the most stable. The maarker is very different from the
other pens, so it is not difficult to discriminate.
Ink Classification – onee week time lapse: With the
trained ink models, we testedd the samples acquired in the
second session, i.e., the dried ink. The results can also be seen
at Table I. Obviously, the peerformance is reduced except in
the case of the viscous ink. This
T
is because the viscous ink
dries very quickly and there is no real difference between first
and second scanning.
TABLE I.
HIT RATIO FOR INK IDENTIFICATION NO TIME LAPSE
Ink
Viscous
Gel
Liquid
Marker
Viscous
Gel
Liquid
Marker
Fresh
Dried
TABLE II.
Hit ratio (%)
85 %
95 %
75 %
95 %
85 %
90 %
70 %
85 %
HIT RATIO FOR INK IDENTIFICATION AFTER ONE WEEK
Ink
Fresh
Dried
Viscous
Gel
Liquid
Marker
Viscous
Gel
Liquid
Marker
Hit ratio (%)
63 %
75 %
65 %
73 %
63 %
65 %
53 %
66 %
Pen Classification – Same ink: The third experimental set
investigates the ability of the scheme to discriminate between
pens using the same ink. For the gel ink, we have 7 classes
(the 7 pens) and 50×7=350 samples freshly inked. Again we
trained with the 30% of the samples and test with the
remainder 70%. The results can be seen at Table II which
shows the difficulties of the scheme to distinguish among
different pencils. Again, the gel achieves the best
performance.
Pen Classification – Same ink after one week: From the
fourth experimental set, with the trained models of experiment
3, we work out the hit ratio to distinguish between the pens
after the ink had dried. The results are also given at Table II.
Forgery Detection: The last set of experiments is used to
validate with the bank check database. The 20 altered checks
are presented to the system. The written amounts are scanned.
The task is to determine whether all the numbers were written
with the same pen. This is conducted by training the classifier
with samples of the first digit and testing with samples of the
remaining digits. In all cases, the checks were scanned a week
later, with the ink dried.
The results are given at Table III. It can be seen that when
the ink is different, all forgeries were detected. When the same
ink is used in a different pen 80% of the alterations are found.
No false alarms were detected by our scheme in our database.
TABLE III
HIT RATIO OF THE VALIDATION TEST WITH THE CHECKS
Checks forged with:
different ink
Same ink, different pen
No forged
VIII. CONCLUSIONS
This paper proposes a methodology to detect forgeries in
handwritten documents. The proposal includes the device
design. This is meant to decrease the scheme cost since the
commercially available systems are generally expensive. The
proposed scheme is based on hyperspectral ink physics
parameterizing the hyperspectral curve of the ink pixels. The
hyperspectral ink curve is modeled with a LS-SVM classifier.
The validation experiments were performed with a database of
altered bank checks to detect forgeries. The results are
extremely encouraging.
ACKNOWLEDGMENT
This study was funded by the Spanish Government’s
MCINN TEC2012-38630-C04-02 research project.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
View publication stats
Forgery Detection rate
100 %
80%
0%
Comisaría General de Policía Científica, Departamento de
Documentoscopía.
España.
[Online].
Available:
http://www.policia.es/org_central/cientifica/servicios/tp_docum_copia.ht
ml. Accessed Sep. 22, 2014.
Tony Roig, “Documentoscopía: Discriminación de tintas”. Blog El
Investigador 2.0, Spain, Sep. 2009. [Online]. Available:
http://policiasenlared.blogspot.com.es/2009/09/documentoscopiadiscriminacion-de.html. Accessed Sep. 22, 2014.
Headwall photonics – Forensics applications, Fitchburg, Massachusetts,
EE.UU.
[Online],
Available:
Accessed
http://www.headwallphotonics.com/applications#forensics.
Sep. 22, 2014.
ForensicXP: The next generation in questioned documents examination,
Global Marketing & Research Inc, Nueva York, EE.UU. [Online],
Available: http://arxmar.com/index-1.html. Accessed Sep. 22, 2014.
ChemImage Corporation website, Pittsburgh, Pensilvania, EE.UU.
[Online], Available: http://www.chemimage.com/. Accessed Sep. 22,
2014.
ChemImage Corporation - The HSI Examiner 100 QD, Pittsburgh,
Pensilvania,
EE.UU.
[Online].
Available:
http://www.chemimage.com/products/instrumentation/examiner/100.asp
x. Accessed Sep. 22, 2014.
K. Franke, O. Bünnemeyer, and T. Sy, “Writer identification using ink
texture Analysis”, in Proc. 8th International Workshop on Frontiers in
Handwriting Recognition (IWFHR), pp. 268–273, Canada, 2002.
Miguel A. Ferrer, Aythami Morales and Alba Díaz, "An approach to
SWIR Hyperspectral Hand Biometrics", Information Sciences, vol. 268,
2014, pp. 3-19.
C. J. C. Burges, “A tutorial on support vector machines for pattern
recognition”, Data Mining and Knowledge Discovery, vol. 2 (2), 1998,
pp. 955-974.
Descargar