a method to reduce large number of concordances

Anuncio
A METHOD TO REDUCE LARGE NUMBEROF CONCORDANCES.
Maria Pozzi, Javier Becerra, Jaime Rangel, Luis Fernando Lara.
Diccionario del Espa~ol de M~xico.
El Colegio de M~xico
Camino al Ajusco #20
M~xico 20,D.F.
MEXICO
Summary
possible to have present in mind everything that
is being analysed. At f i r s t
thought this could
In order to help to solve the problem of
be solved by taking at random a smaller number
analysing large number of concordances of a
of concordances; however, when reducing in this
given word 'W', the 'Diccionario del Espa~ol de
way, one is about to loose the grammatical and
M~xico~ (DEM), has implemented a programme that
semantic information contained in a l l
i) Reduces t h i s number, as to obtain the
those
concordances to be taken away; hence a method
maximum possible informati.on with the minimum
had to be implemented as to a t t a i n the maximum
number of concordances to be handled.
possible information.
ii)
Sortes and rearranges the output so that
In order to solve this problem, the DEM
s i m i l a r concordances are printed out together.
presents a method whose aim is to obtain optimal
This was done by comparing up to four words
information with the minimum number of concor-
to the l e f t and to the r i g h t of word W, through
dances to be handled.
the whole set of concordances, associating toge:
This method consists of, for each concordance
ther those which were repeated in a p a r t i c u l a r
context. Once knowing t h i s ,
to analyse and compare four words to the l e f t
some s i g n i f i c a n t
and to the r i g h t of word W together with t h e i r
concordances were selected to be printed out,
grammatical category associated; and e s t a b l i s h i n g
and the rest was discarded.
which one of them is i d e n t i c a l
to which other in
a p a r t i c u l a r context: A tree s t r u c t u r e is
I Introduction
generated.
Having known t h i s ,
In the composition of a d i c t i o n a r y , t h o s e
it
is proceeded to reduce
the number, by selecting some of them considered
involved in the d e f i n i t i o n of each word have to
to be representatives.
study very consciously i t s set of concordances,
so that no meaning or use is missed.
there are, of course, some d i f f i c u l t i e s
I I Preliminary Requirements
since
on one hand, the sample is never large enough as
to insure the occurrence of all
the d i f f e r e n t
Our sample (Corpus del Espa~ol Mexicano Contempor~neo: CEMC), consists of 1,973,151
meanings and uses of every word to be defined.
occurrences, r e s u l t i n g in 65,200 d i f f e r e n t
This problem is solved by consulting other dic-
types, I whose frequency vary from I to 68,252. 2
t i o n a r i e s and expertees on the p a r t i c u l a r sub-
Some preliminary work has been done consis-
ject.
ting in the automatic labeling of each and every
On the other hand, there are words having a
word of the corpus with i t s grammatical catego-
very large number of occurrences, making t h e i r
ry, 2 in which from the t o t a l
analysis a very d i f f i c u l t
ces, 1,083,945
task, since i t
is not
--590--
number of occurren-
were automatically solved, and
the rest had to be solved by hand, then the
computer was fed with the r e s u l t s , obtaining
Each of the next four words to the r i g h t of
in
W will
this form, the complete sample l a b e l l e d . We took
take i t s place in Oi+ 1 i f and only
if
advantage of this work, since otherwise i t would
w5+i ~-
have been impossible to t r y to reduce the number
mark Pi
05+i
and1# punctuation
such that
of concordances in terms of the same grammatical
w5+i-I
category.
Pi
w5+i
and P i ~ { " ; : L ? i ]
as they are considered to break up the
Next, was to implement a programme that prod£
ces, for any given word, i t s set of concordances;
c o n t i n u i t y of a context.
each word s t a t i n g i t s own grammatical category.
In s i m i l a r way, the words to the l e f t of W
This is stored in a f i l e
are associated to t h e i r place in ORDENA.
it
called CONCUERDA, and
is organized in the following
way:
Figure No. 2 shows how to construct table
Every concordance has three l i n e s , each one
ORDENA from a given concordance.
of them consisting of:
- 6 characters (nnnnnn) reserved for the number
3.2 Generation of a Tree Structure s t a r t i n 9
of occurrence.
- 12 characters ( t t t p p p l l l )
from ORDENA.
reserved f o r the
r e g i s t e r of that l i n e , according to the
original
and
Once obtained this set of up to nine words,
t e x t , and s t a t i n g t e x t code, page
it
line.
is proceeded to construct a tree structure
- 72 characters reserved f o r the actual t e x t
f o r the words to the r i g h t of W and one f o r the
- 18 characters f o r the label of each word of
words to the l e f t of W.
the l i n e , stating
code. The f i r s t
It will
the grammatical category
two characters indicate the
is generated immediately a f t e r ,
number of words in the l i n e .
tric
Figure number 1 shows part of f i l e
only be described here the construc-
tion, of the r i g h t branch of the tree. The l e f t
CONCUERDA
though in symme-
form:
- The tree has a root node which is the word W
itself,
and i t s organization.
and has f i v e l e v e l s , being the root in
level 5.
I I I The Algorithm
-
A d i r e c t descendant of a node wi is given by
the word wj such that wiw j are adjascent, i . e .
i f wi-~ORDENAi and wj.~ORDENAi+1
3.1 Association of the i-Concordance to table
wj is a d i r e c t descendant of wi .
O R D E N A .
-
For each concordance, a table ORDENA is asso-
The label of each node consists of:
-
ciated in the following way:
-
- The word in question is located in the
-
Word w associated.
I t s grammatical category.
- Its frequency.
middle l i n e and associated to ORDENA(5)
And pointers to:
Four words are selected to the r i g h t and
- Direct ascendant.
to the l e f t
- F i r s t d i r e c t descendant.
of W, since they are supposed
to be carrying the most s i g n i f i c a n t grammatical
then
and semantic information about the
-
Next node whose d i r e c t ascendant is the same
as the one of i t s e l f .
word W. 3 We took this idea from the Centre
- Another f i l e
called
CONCORD,where i t
du Tr#sor de la Langue frangaise"s work
stored the number of the concordance or
concerning to the treatment of binary groupes
concordances where that word in that
--591--
is
..t"
cO
0
d" O 0
( 0 -'~ d"
~OO
O,,-I
c~o
O
0
0
I~ O 0
00=t
~D 000",
O~
.;.1" I~
0
0,1 rn
....~ 0 0 , I
,~
d"
I~
001'3
~
O
eO~O
~ I ~-
eO0
,,.00
O~ c o d "
I/~ 0D CO
00~
O ~t
(3 mo
o~D~
"~P~O
~,o,
g2-g
0~0
1=3 0 ~ 0
0 tt~O
d" O,,,-I
~ oO',cO
go,=,
mO
oO',
n'~ec
,..Oi'~-
~o,:,'
'=~,I
~1"0~
040
000
=.'.1"i"-
0
,,.~0
O0
0
,,-40
eO(;O
040,1
0..t"=,1"
eO0
g
C,J O
~.o
O~O
-.1" ~0
g ~C°) , ~
° ' :",.0° °C~
o
0
,,.-~ ,-t ,"r*,
O0
c3o
(:z, fO
~, ,::}
),-
U')
WbJ
0 ~
~ O Z
0
"~
c-
n
~
I--
,~
>..
0
r ~,
Z
H
O W
~
Z
o
Z
.~ ,<
.Jr~O
0
~
H
I~
~
0
O0
O
rr'
~
-Z
,,J ,,~n,'
J,mW
U~
-0
OI-I--~
'
~
(:3
17'1
OV~J
0
~E
~
~
O
"3
hI~
,~ (J,
,~,,~
~C
,.J
ZO
..J ~,,.-,
W~ ~r ~ - r
O~
u~w
mD~
>,--
Z
/
u~
,<
b..I
b'}
~
J
-
w<m
I-.-
bJ
-r
- O0
0
m
~
~oo
~ tj~..
n
~
.cbJ --IW
0
I--
b~ , , : ( J
,,_I c [ O
b~
~
m,,~°
n.'~.
I..,J
~.J
.Q~
row<
wz
bJ~-
:"
,~1--
o
MUJ
~7
a~-
w
~a
=DJ
~
b..I ,~
~'~
IM Z
o
:D 0. o
b9 b ~ J
b.I
•~ H
W
t/~
~
H
r~r~
Z) ~ ,
>
~
O W ~
o
OZ6~
I1)
e-
~-o
WZ
-m
~m
~0
b.l I--
ILl
bl
ul>0
Z,~.
~
W
~° , J
D
J
I (3~ 13
W
~-~ UIC3
(:r. LO Ld
4o
0
ct~
0
Z) ;--
rr "3 rr'
(~
O
IZI < ~
W
O
~
L/~
J ~ U
"O
J
~
~/~
>
~0
•
Zl--Z
J
W
Q
)J
r'~" X
Z ~.
~ o~
Z 'YH
~.WbJ
.
I,.- LO ."~
u1~
~-I
Z 7
CC l i p "
0
~
0
0
C~ ."~ --I
0
~.~-.Z
m
,<I-
m
ow
~-
O~-
o
~
~'~o
~-~
~
OC ZbO
~.~,<
O
U
-~ " ~ 0
~d
.~ u , ,
0
~ (~)
~:
O~Ld
u~
>-W~
O~-~
I.-=~
dlJ
I,dh
~- ~ b J
. •
010
OZ
+~
I& J - J
<0
d~ ~/) trl
~ Ul L,.I
~ td~
Oh-cq
',1
~o~
>
=~
0--3
~5o
~:>
Ztm
b'~°
WO
W J0.
L~ 0 ~ .
N ,~ b-I
N
ZO
~..-~ "r"
b'~
0
< Jb~
J ' ~
O .~W
U~ J r-~
W
_rl M O
OJ
~
"~ ° U')
0
~W
W UIC~
.J . J O
_J ~ t J
~dH
D. 6 q ~
~
,.J
~
O~
~W
0.~
O
OW
n
,d
""~ '
~:
0
~ ~lr~
.'3.. b..I ~
e-
J
i~O
~
~
:D ~1 I.I.
~
::D ~ .~
I,~
Z
~
~-4
,~ ..-I I-.
l-- I~I ~
H
IWWOJ
~'~
e4~
,J
A. O W
m
~
O
U
~ g ~
~2
JbJW
~
l.h
~1~ ,Y
~J
Uo
~-• eb_
o
W
T
~ .
g,.,×
g~,~
.I=,,
<o
o-,~
~ o
o-,,,
OO
-J
Z
h
o
o
o
g
g
~Q
Q
°
o
o
o
o
4"
o
o
:I"
c)
C)
,r;
0
W
o-z
•4
0
D~
,-~
O~
0~
,~I
o
ed
c3
J',l
o
•-4
c~I
r~
4"
.I'I
,,9
o
z
o
•
t ~-
--592--
,'0
C)
0
LZ
0
4-~
in
particular context came from, making in this way
possible the retrieval operation.
The process repeats i t s e l f until the last
concordance has been processed.
'-A node has as many branches as different words
are found to be direct descendants to that
Figure No. 3 shows, for a set of 14 concor-
word, with the same grammatical category
dances, the l e f t and right trees generated.
through the whole set of concordances.
12
ObOl6B030
SI/, LA AooLESCENCIA NO PUEDE SER SUPERAOA SIPIO COMOOLVIDO DE SI/p
12500100030045
COMO ENTRCGA. POR ESO LA AOOLESCENCIA NO ES SO/LO LA EDAD DE LA
130045001g16846
SOLLDAD, SINO TAMBIE/N LA E/POCA DE LOS GRANDES ANOREs, DEL HEROI/SMO Y 12631004000783
ORDENA[I :9]
NO
ES
SOL.O
LA
I
9...
i
6
EDAD'
DE
8
q
,LA
SOLEDAD
8
Figure No. 2 Table ORDENA is obtained from a given concordance. Note that ORDENA(9)
is void, since there is a comma (,) after the word 'soledad'
--593--
90
323025064
SUCEDI/A ALLA/ POR EL A+O DE 18,~!, CUANDO DON~ PEPE~ T!:NI/A UNOS 55
A+OS DE EDAD Y MUCHOS RI+ONES AU/N° TUVO UN IMITAOOR NOTABLE~ QUE FUE
UN BANDERILLERO LLAMADO ANTONIOP GONZA/LEZ~ EL~,ORIZA6E+O~ QUIEN DIO A
91
32403#073
AHORA, LA EMPRESA GUE LA TIEN!~ R~NTAOA~ SE ~ s T A / GASTANDO UN DINERAL EN 13100:,l.! 5 9 i 6 8 4
ESTE SERIAL~ BUSCANDO NUEVOS VALORES~ MISMO5 QUE - HASTA QLiE SU EDAD SE 1 2 2 8 0 U ! : 3 4 5 2 8 5
LOS PERMITA - NO HABRA/N DE SALIR DE ENTRC LOS NI+OS TOREi~OS,
1L59L0404:,683
92
33306(,012
CONSEGuIR DINERO PARA SACAR ADELANTE LA FUNDACIO/No PRIMERO HABLO/ EL
SE+OR CURA eUE FNTONCES NO T E H I / A NI TRZZi4TA A+OS DE EDAD. LUEGO DON
TOMA/S8 SA/NCHEZO (ESTE S I / VIEJO Y COLUDO) PROPUSO COLECTAS Y R I F A S .
I00 ~ ; : .: 96
1380,) 1..'.73 B~:;4CI0
11B52!9300'i3 .'
93
3351#8021
CABALLOS.
I 0
EN SAN ~ JOSE/g H A B I / A MEDIO MILLAR DE HOMBRES EN EDAD DE TOMAR LAS
ARMAS E IRSE A LA GUERRA~ PERO NO TODOS SE SINTIERON CON A/NIMOS DE
134850084~3#3",~96
I ~ 8 3 0 #0 .:31359~9q.
94
33.;148034
CASADOS Y TENI/AN H I J O S . LOS MA/S ERAN JO/VEN[:S Z:N EL VERDOR DE LA
EDAD, DE 16 A 30 A+OS, CON ALGUNA DESTRE2A EN L:L MANEJODE ARMAS Y
CABALI..OS Y SIN DISCIPLINA MiL!TAR.
5 03#OJ
95
3,~50#q.023
ENCUBIERTOS DEL DIABLO~ O AL MENOS DO/CILES INSTRUMENTOS DE SUS AVIESOU l.i_ 076370;" 426
DESIGNIOS, LA BEATA IMAGEN DE LA EDAD [)E ORO REDIVIVA SE TRANSMUTO/~ AL 13008~;#6843:~597
CONJURO DEL DES~NGA+O~ EN EDAD DE HIER~<O EN QUE DOMINABA LA CRECIENTZ
1287~4C~L~O'I.OL; ;
96
3:'.50 q.# 02t~
DESIGNIOS, LA BEATA IMAGEN DE LA EDAD DE ORO REDIVZVA SE TRANSMUTO/~ AL 1300' ,468L~0 .597
CONJURO DEL DFSENGA+O~ EN EDAD bE HIERRO EN QUE DOMINABA LA CRECIENT~:
12875#54:3q.50 OC!
CONVICC:Io/N DE QUE ESOS DESNUDOS HIJOS DEL OCE/ANO ~ FORMABAN PARTE DEL 1.L0#02637C907
97
5#2251019
INDI/GEHAS~ COMO ES AU/N~ EN PARTE E s T E / R I L , SINO ~UE R E A L I Z A R I / A SU
PROGRESIVA EDUCACIO/N EN LA ADOLESCENCIA Y HASTA EN LA EDA9 ADULTA'.
EN EL PLAN DEFINITZVAMENTE REG::Ni:RADOR DICTADO EN EL LLANO DEL RODEO~
98
3450650g5
J U R A / I S Y YO PIERDO UN ALUMNO.
6 935958
PERO DESD.r:.ILA EDAD OE OCHO O NUEVE A+OSHASTA LA DE DIEC!NUZVZ O VEINTE 153468#839840#639
NO EXISTE ILL DESI:O DE UN TRABAJO MANUAL PESADO. ESTO ESE:XAC'FO EN LA
1#i0684580.~593#:;
3~09603~
PERCIBIR SUS CUALiDADES TANTO MATERIALES COMOFUNCIONALES, ASI/ COMO
9 0280,! '1
SU CONVENIENCIA RESPECTO A LA EDAD DE QUIEN LA IBA A USARI OBSERVO/ SU 14281468L~5594092
CONTENIDO Y MANEJOp SE DIO CUENTA DE SU PESOY RESISTENCIA AS// COMODE 1 4 8 3 0 5 9 8 # 2 8 3 0 1 0 4
100
34409603(~
DESARI<OLLO DE LA IMAGINACIO/N CREADORA YALGUNAS HABILIDAOES PARA oFERAR 9 8 4 0 9 : '
CON HERi.,AMIENTAS SENCILLASI ES, ADEMA/S, ADECUADO A LA EDAD DE MI H I J O , 1 2 4 0 ; ] 9 i 8 4 6 8 4 2 8
INSTRUCCIONES, LA PRESENTE "SCALA CONTIENI] OCHOASPECTOS E~iCNCIALES EN 9 0 L ) . ; ' : . : 4
101
345322048
L E E S F A / C I L HACER AMISTADES l 'ME ES BASTANI'E F A I C I L HACERLAS Y ME
12590~),599
35
GUSTA QUE SEAM ALEGRZS, DE MI EDN) Y TENGAH UN I ; I V E L CULTURAL POCO MA/S 1 4 9 0 9 8 4 2 8 3 9 6 6 0 q
0 MF'NOS COMO EL M I / O . ' Y FRENTE A UN GRUPO DE NI+OS: 'ME DA GUSTO VER
1630d6553468485980
102
3#5355029
TERMINAR LA CAR'~EIA DE M',ZDICINA.
5 O. 4~
C A S O 2 . ALUMNO DE 19 A+OS DE E:DAO! SEXO MASCULINO. PROCED~NTE DE LA
ESCUELA" DE ARQUITECTURA DE UNA UNIVERSIDADDE PROVINCZA.
103
3#6037919
I D E N T I F I C A CON LA PLENA REALIT.~CZO/N DE LAS A~SPIR/',CtONE.~; f~UE ::L HOMBRE 1~.9#0,) 4 i
66
"IIENL Dt:SDF: !:L TI_'/RMINO DE LA EDALr" M E D I A " , Y NO SE DANCUEliTA DE QU[:
150468#68';31598#0
EL A+O 20,~' PIED[: NO SE LA CULMINACIO/N ROTUNDA Y FELIZDE UN PERIODO
1#66C0153C 3 # 6 8
99
1391468#C!0805C
1384,932(~106,.30:,9
1068081~68594
1 3 8 3 9 8 0 0 ,LZq-6846
158@Cz!C 0f~O:~q-66483
110:~91#01;3 ! 2
I J,80t+O ' ,34;~68~
11466.[ 0 :) q.687,'3
130C8#C8#GOJ'4:
6 0#u468~',.0
UNOSNI~NUME.-DE~
ANO5---- DE,
MILLAR-DE-- HOMBRES--EN
E'L--TER M I N O ~ _ _
/¥
EL-- VERDOR ~ D E \
BEATA--IHAGEN--",X
Y--HASTA
EN~L~
PERO--DESDE7 '
CONVENIENCIA
Figure No. 3
//MEDIA
//jADULTA
//-,I
HI~HIJO
IEDADF
\\
At
QUE-----HASTA,
~ M U C H O S - - - - R I QONES--AU N
TENGAN--UN
NIVEL
\
\
DE--M
QUE--SU
/.TOHAR--LAS--ARHAS
\ ~DE~-- HIERRO-- EN
QUE
~OCHO--O ~NUEVE
QUI E N - - L A ~ I B A
~E--LOS--PERHITA--NO
Left and right trees generated from a set of 14 concordances.
--594--
3.3
The a l g o r i t h m
to
select
significant
Once t h e
is
tree
proceeded
found
is
fully
constructed
to make t h e
actual
it
some f a c t s
to
be c o n s i d e r e d
repeated
same c o n t e x t ,
probability
W in
ber o f
than
the
context
of
repeated
since
there
meanings
functions
of
~ word
me s e t
words.
it
is
of
to
the
the word
a small
such
or
however,
way to
words
find
all
own d i r e c t
is
into
grammatical
W followed
by t h e
s~
be d e s c r i b e d
the
proced
then
was t h o u g h t
by t h e
It
is
by
proceeded
the
then
of
its
same f r e -
category,
these
that
same gramma-
descendants
number o f
the
have to
r e d u c e was not
but
direct
is
clear
account
duction
is
text
less
is
that
that
and
concordances
chosen
the
process
as t h e
closer
to
5,
level
then
significant;
ger number o f
will
it
is
t h e word
I,
would
ascendant with
the
It
so ma-
of
it
reduced.
num-
not
level
~ode i s
branch
category.
then
num-
a larger
are
frequency
that
quency and g r a m m a t i c a l
same.
one r e p e a t e d
times
Next,
greater
the
ny d i f f e r e n t
the
of
the
recon-
hence a l a r -
concordances
to m a n t a i n
takes
have to
required
be
quality
information.
ure:
In o r d e r
most
path
A 6th
to
is
level
analysed
is
level
in
right
of
the
If
this
branch
the
5,
of
the
and t h a t
being
is
greater
direct
After
a left-
tree
the
the
tree
analysed).
than
I,
descendant
is
to
If
then
is
form,
means t h a t
level
rode
and t h e
is
frequency
t h e words
four
words o c u r r e d
a times
of
rent
concordances.
As i t
re t h e r e
is
meaning o f
t h e word W i n
context
is
the
n concordances"
two o f
dom f u n c t i o n ,
concordance,
same in
hence,
them,
we o b t a i n
and t h e
can be s a f e l y
+ I and
if
level
is
F~50
by t h e s e
befothe
of
is
6
the
Q would
4 or
number
be
7 or 3 then
F >50
If
team o f
pattern
F>30.
for
of
level
is
8 or 2 then
Q = F//4+I
~or F ~ 7 0
Q = F//7
for
F>70
level
is
Finally,
if
the
Q = F//5
+ 1 for
only
Q = F//IO
+ i
9 or
I then
F~ 50
for
F>50
a ran-
1) or
from
selected
+ 1 for
* At
a significant
omited
reduction
F~ 30 then
Q = F//5
by t a l k i n g
( n -
of
concordances
If
particu-
by means o f
level
frequency
Q = F//3
that
all
the
by our
it
following:
it
was s a i d
this
was t h e
Q=F//2
n diffe-
a good p r o b a b i l i t y
that
Q=F//4
in
n > I,
in
a reasonable
If
analysed
W followed
decided
linguists*
and t h e
the
and many t r i a l s
was e m p i r i c a l l y
the
its
reached
some s t u d y
reduction
root
same way.
a 9th
one or
tree,
(Remember t h a t
W is
frequency
leftmost
analyse
followed.
first
nal
of
a possible
tical
in
may be more s i g n i f i c a q t
another
of
exactly
meaning
is
words
times
ber o f
the
that
that
A set
left
the
to
identical
The more words
lar
that
analysis
reduc-
beforehand:
in
same i n t e r m e d i a t e
be s t o p p e d ;
There a r e
-
at
associated
tion.
the
If
-
concordances.
thank
(n-2)
the
for
fi-
--595--
in
point,
particular
her v a l u a b l e
esting
output.
this
we would
to
Paulette
discussions
suggestions.
like
to
Levy
and i n t e r -
It
this
has to be mentioned
pattern
of r e d u c t i o n
ged a c c o r d i n g
to o b t a i n
here,
is
to the wprd analysed.,
already
of concordances
( Q out of
that
F) i t
them a g a i n ,
is
stored
may be chan-
the best r e s u l t s
When i t
ARBOL: Each node of the t r e e
that
characters,
as
llowing
each t i m e .
7 for
Known the number
will
composed
distributed
I for
to s e l e c t
own address
the word
and each one of them i s marked as
2 for
such,
to avoid any one of them be s e l e c -
3 for
the
4 for
the f r e q u e n c y
7 for
the address of
or more t i m e s .
3.4 Output.
cating
output
category
when a p p l i c a b l e
ted
presented
of the l a s t
7 for
the
rect
word
rect
IV
presented.
The Computational
University
of Norway v e r s i o n
the "Centro
de Procesamiento A r t u r o
Rosenblueth"
of the Secretar~a
ciSn
P~blica
(Ministry
with
262K words of 36 b i t e s
memory and 8 , 0 0 0 , 0 0 0
of
dants
(i.e.
ging
from i t )
is
The r o o t ,
the word W is
of c e n t r a l
in
disc.
emer-
and
in f i l e
stored
the t r e e s
the f o l l o w i n g
-
di-
descen-
No of branches
the address
each one of
Education),
next b r o t h e r )
the number of d i r e c t
CONCORD
the number of
From the c o m # u t a t i o n a l
de Educa-
of c h a r a c t e r s
like
own d i r e c t
the concordance where i t
of ALGOL 60
a UNIVAC 1106 computer of
(i.e.
its
the address of the f i r s t
where i t
in the
NUALGOL f o r
direct
descendant
6 for
System~
The system was implemented
its
the address of the next d i -
4 for
Figure No 4 shows the form in which
is
category
of the wo~d
descendant of
7 for
chosen are l i s -
below.
the o u t p u t
leng~
ascendant
and the f r e q u e n c y .
the Q concordances
the grammatical
ascendant
indi-
the group of words repeated
grammatical
Next,
is
in f i l e
the l e v e l
24 f o r
by means of a random f u n c -
The f i n a l
in the f o -
way:
its
tion,
ted t w i c e
of 72
ARBOL
be chosen
proceeded
in a l i n e
is
comes from,
point
of view,
is generated
in
whose node a s s o c i a t e d
is
way:
in a p r e f i x e d
and i t
will
dance.
This word is
be p r e s e n t
address,
in every c o n c o r -
taken
from ORDENA
(5)
4.1
Data Storage.
-
The next word in ORDENA w i l l
be s t o -
We made use of 3 f i l e s :
red by means of a hash f u n c t i o n ,
a)
is decided
to be the same node as one
previously
stored,
File
CONCUERDA, where the whole
set of concordances
W was s t o r e d ,
and i t
of the word
word,
was d e s c r i -
b) F i l e s
same,
ARBOL and CONCORD~ these
two f i l e s
obtained
the r i g h t
and o n l y
stored
while
trees.
--596--
the
category,
level
ascendant
are e x a c t l y
the
in such case the f r e q u e n c y
the number of t h i s
in a d d i t i o n
and l e f t
if
grammatical
aumented by one and in f i l e
are supposed to c o n t a i n
the i n f o r m a t i o n
generating
its
and d i r e c t
bed above.
if
and i t
is
CONCORD is
concordance
to the p r e v i o u s
one.
r~
I./1
o
D
-/
I-o
I-..
ZDI--
~1
O
O
~
OV1
e~by
~-'~ ~ . Z
~
O1~
.~ I - - H
W
01 ~ W
0
0
o
z
W
O0
~H
,,~ r T Z
~ W~
(~ C3 I O
Z
0 0 ~
l~m-"
:3
Ld
U'I .~
0
"r~
I~
.J
Z
O
-J
69 ~
• :~ J r ' , "
...I
:3
Wk-
W
O~r~
.~- - I O
0
,,~
L) O , ~
O
ZOl
0
U
I
I
NW
0
7n~
ZOhJ
~
&.
,~m
~..1
Z
OV)
U'I
W
121W
~
I, ~V I
'~
ZZ
7~]1
~
D
W
wmm
O
H
C3 W ~
,~ n~" . J
MO
ffl U 3 Z
O
O~
.-I
0. Z
b~ X W
O
~
,~
W
m.J
U1 v l X
(D
W
- J I./1
r-,
L~
H
~.- I--
4 Qz
t3d
HO
m (3::
O IsJ l:g
,.J n O
/
~
m
n~ ~
~
Z
0
o
ul
W ~0
1:3 0 0 ~
Z
~Z
I
If)
W O W
.J
•J
Z W~
4., ~
~'~ ~ 0
m n o
:~
H
L ~JO
bJ _~ ~_
t~ O ~
,~_J
--I
Z
O
Z
~m~
~o<
~ OW
~. ~ . J
~-I O 0
Z ~n
kd .J
J
'~.
H
O
,~ I~I
t7
._1
N
Z
,~
o
e~
I
WZ
Z_lr,
~
O
~O
0~ , - ~ . O ~W
Z~3t~
bJ ~J. ~
W
60
WbJ
0
bJbJ.
0
~ .
Z
Z
:-40
+
J
WW
rn~
=~ ~ C I
bJO
ID~ ~J,
b.J
nvI
I-(z}
U1
,~
W(.)
r'~ C3
J
m
Z~
0 ~-
r'7
Jl~
Z~,--I
b.I
W
:~
W N (
J"~
Z
WZ
H
WbJ
~
L~
O
eq-.
O
W
O
ct~
"O
H X b::~
r'~ b..i r'~
2)
O
U
cO
O
Z
W
0~I
LdO
V~J
--I
n<~
bJ (~ ,w
O
Ld~-~
%
O
--I
F-m
m
J
.J
09
W
LD
v
DLQ
m
09
~W
I-~0
C~
O_ O W
I
cO
0 I J I~1
0
~'4
O
Z
W
h.IO
L~ O
~9C
C~I~J
W~
H ~T3
01 _.I Z
!
El
tO
LLI -~Zt~ t
t~ L J O
0 ~-~ b9
~ 0~"
~l~O
GhOUl
Z no
~
.J
~
O
~ W
L~I bJ t 7
~ :~ b.I
O
Ln Z 0 -
~i~
1::3
Z
~
I
•
I
•
W
4o
4~
~ 0 ~
E
I~
II
II
l:g
e¢
(J
I
O
%
iLL
I.L
II
u~
,,
Z
al
+
~
~
g
.~
m
f2~
~
c~
~
Z
+
o
e4
o
o
O
+
~
o
C~l
n
6
Z
ft.)
L
W
ul
~
~W
W
W
J
~-~ .-I O
n~
U
W
~lZ
ZUI
OLd
~ ~-"~
t.~'T"
0
H
WOJ.
t-- U1 0
o ow
I-- H c ~
o 0,. ;.-.
b9
W
*O
0
~
II
U
bJ
W
C-
4-.)
!
c2~
,,~ t',"
~. U'I
W
ZJ
W'~
0
W
~.0
n~ O ~
O
u
I./') ,,~
"O
HH
i,
~,~ ,,~
Q r~ W
~w
O
,j
~I~Q
ZU
Wb9
I-- C~
01WW
O
Z'~
I!
O
W
I--W
ON
bJ
n
cl
ZO
69n~
J
L/1
.~ W
VlZ
•-
bJ~'
~-~ L~ 0
L~ H ~
.JO
:3>0
~:: H C3
0
bJ
t/l~
~td
Wkt.
Zb9
:E H = E
I-- ~ r-~
~ O
L41 ~ - ~ O
O
0.
O
W'~
J
ZO
O
H
.J
Id
~'0
-'~' Ul
ZE~
~-~
.J
~
QV~
~,-~
bJ tw
W
-I
OE~
~/I
Z
DH
WO
~
l:1
LO LL. Z
0
W
W
O(:3J
Q
Ld
~
m
O
j
~
O
r~
tlJ
uJ
o
ul
bJ
--597--
,,.1
Z
W
W
cl
e-~
O~
TREE STNUCTURE GENERATED FOR WORD *EDAD* (AGE).
51596
31608
31620
31652
31644
31656
51668
31680
31692
51716
51728
31740
31752
31800
31812
31824
31856
31848
51860
31896
51908
51920
31932
31944
31956
31968
31980
31992
32004
32016
32028
32040
52052
32004
320?6
3~088
32100
32112
52124
52136
32148
52160
32172
32184
32196
52208
32220
52252
52244
32256
32268
32280
52292
32304
32516
32528
52540
Figure No. 5
2DE
3Y
4MI
2MODERADAMENTE
2PUES
5Y
3Y
20E
2DE
3SO/LO
4ESA
2ALGU/N
2PROBABLENENTE
2CON
2CON
2CON
45U
4TU
4TAL
4POCA
2INVERSAMENTE
2PARA
3A
5A
3A
5A
3@UE
4TODA
3A
80UE
3A
3CIERTAMENTE
#ESTA
2POR
4CUYA
5DE
5OK
3DE
40TRA
3DE
5DE
2HASTA
2i4ASTA
30E
2CONFORME
bEN
6YA
SEN
SEN
6140
4ESTE
3A
5PERO
3A
2LE
3EXACTAMENTE
3A
PR2
COl
AJ2
AV13
C04
COl
COl
PR2
PR2
AV5
AJ5
AJ6
AVIS
PR3
PR5
PR5
AJ2
AJ2
AJ3
AJ4
AVI2
PR4
PRI
PR1
PRI
PRI
C03
AJ4
PR1
C03
PRI
AVII
AJ4
PR3
AJ4
PR2
PR2
PR2
AJ4
PR2
PR2
PR5
PR5
PR2
C08
PR2
AV2
PR2
PR2
AV2
AJ4
PRI
C04
PRI
PN2
AVll
PRi
i
i
9
I
1
1
i
2
1
1
5
2
I
1
1
1
16
2
1
2
i
i
i0
25
1
1
2
1
2
i
I
i
6
2
1
9
6
21
I
1
2
I
1
I
i
15
2
1
I
2
2
1
2
2
1
1
i
30936
32712
12
36360
32436
34008
32688
33372
31056
3#008
12
36312
31944
31980
35400
35664
12
12
12
12
32424
51980
54344
34008
52832
51620
34008
12
31728
31856
33060
34008
12
50504
12
31856
31620
34008
12
32124
31896
32016
32208
32052
35628
34008
24
51728
34344
24
12
31860
52712
52052
36900
34008
31848
36576
33168
36552
32100
I
4
32496
37344
5252
1
1
32208
31896
3#644
32232
1
#
55028
31920
33192
3434# 32088 6
31728 31116 2
52552 32280
1
32076 32148 1
32580
5556
50828
32880
32112
32124
56524
52568
5472
3468
31532
31560
31800
32532
35496
52160
5
20
1
1
2
I
1
1
35604
33252
52184
5
82556
32016
51968
320#0
31848
34856
32504
31980
54080
32004
32796
39264
51860
31608
52856
50468
35256 8
31344 4
37520
19
32156 I
34524 1
30996 2
35124
1
35340
39324
35976
55892
37884
36156
36168
11
1
1
1
2
1
1
36804
1
36516
5220
1
1
File ARBOL, where the tree ~tructure is generated.
--598--
4647
3228
69
2046
3879
3987
4149
4674
4749
141
507
2685
1563
219
633
1680
3
q98
2802
540
1749
260?
15
27
54
i14
216
366
557
1128
1740
411
1980
2625
648
6
72
345
381
384
543
1131
1215
1983
2546
189
222
510
1809
651
2787
2805
2877
3000
3159
3165
5192
Otherwise
Figure
it
will
be a new rode.
No 5 shows p a r t
EDAD (AGE) is
being
of f i l e
VI
References
ARBOL,
processed.
1.-
Roberto
Ham Chande:
en L e x i c o g r a f ~ a ,
V
Results
The f i r s t
ging,
And
results
since
for
Applications~
those words w i t h
medium
2.-
say up to 600 -
to reduce the number b e t -
ween 30% and 40%, a c c o r d i n g
comparing
ces w i t h
It
is
set of concordan-
expected t h a t
frequency,
will
However,
is
in
(by
the reduced v e r s i o n )
higher
ties,
was r e p o r t e d
the o r i g i n a l
cribed
of view,
words w i t h
3.-
the method here des.
Investigaciones
the g e n e r a t i o n
of each t r e e
question
working
to o p t i m i z e
The most i m p o r t a n t
des the o r i g i n a l
that
by t h i s
find
expressions
increases.
~e
it.
application
besi-
main o b j e c t i v e s ,
method i t
is
is
possible
and p a t t e r n s
of
to
langua-
and used c o n s i s t e n t l y .
~-599--
del
DEM
lingu~sticas
Lexicograf~a.
Jornadas
gio de M~xico,
1979.
Le T r a i t e m e n t
des
Cahiers
I~-[~0-I~
en
89 El Cole-
Centre du T r ~ s o r de la Langue
from the c o m p u t a t i o n a l
some d i f f i c u l -
Gramatical
Base E s t a d ~ s t i c a
Lexicologie.
are s t i l l
La F o r m a l i -
y
Francaise:
point
1979
Fernando Lara y Roberto Ham
Groupes B i n a r i e s .
of the word in
ge repeated
Analizador
be more e f f i c i e n t .
there
since
for
very time consuming as the f r e q u e n c y
are s t i l l
del
Chande:
information
Jor-
de M~xico,
zaci6n
DEM
i00
en L e x i c o g r a f ~ a ,
Garc~a H i d a l g o :
Luis
1 al
Invest!~1~ones
Isabel
del
to the word
in q u e s t i o n .
No l o s t
in
nadas 89 El Colegio
were very encoura-
number of concordances
we were able
Lingu~sticas
Del
de
J
Descargar