A METHOD TO REDUCE LARGE NUMBEROF CONCORDANCES. Maria Pozzi, Javier Becerra, Jaime Rangel, Luis Fernando Lara. Diccionario del Espa~ol de M~xico. El Colegio de M~xico Camino al Ajusco #20 M~xico 20,D.F. MEXICO Summary possible to have present in mind everything that is being analysed. At f i r s t thought this could In order to help to solve the problem of be solved by taking at random a smaller number analysing large number of concordances of a of concordances; however, when reducing in this given word 'W', the 'Diccionario del Espa~ol de way, one is about to loose the grammatical and M~xico~ (DEM), has implemented a programme that semantic information contained in a l l i) Reduces t h i s number, as to obtain the those concordances to be taken away; hence a method maximum possible informati.on with the minimum had to be implemented as to a t t a i n the maximum number of concordances to be handled. possible information. ii) Sortes and rearranges the output so that In order to solve this problem, the DEM s i m i l a r concordances are printed out together. presents a method whose aim is to obtain optimal This was done by comparing up to four words information with the minimum number of concor- to the l e f t and to the r i g h t of word W, through dances to be handled. the whole set of concordances, associating toge: This method consists of, for each concordance ther those which were repeated in a p a r t i c u l a r context. Once knowing t h i s , to analyse and compare four words to the l e f t some s i g n i f i c a n t and to the r i g h t of word W together with t h e i r concordances were selected to be printed out, grammatical category associated; and e s t a b l i s h i n g and the rest was discarded. which one of them is i d e n t i c a l to which other in a p a r t i c u l a r context: A tree s t r u c t u r e is I Introduction generated. Having known t h i s , In the composition of a d i c t i o n a r y , t h o s e it is proceeded to reduce the number, by selecting some of them considered involved in the d e f i n i t i o n of each word have to to be representatives. study very consciously i t s set of concordances, so that no meaning or use is missed. there are, of course, some d i f f i c u l t i e s I I Preliminary Requirements since on one hand, the sample is never large enough as to insure the occurrence of all the d i f f e r e n t Our sample (Corpus del Espa~ol Mexicano Contempor~neo: CEMC), consists of 1,973,151 meanings and uses of every word to be defined. occurrences, r e s u l t i n g in 65,200 d i f f e r e n t This problem is solved by consulting other dic- types, I whose frequency vary from I to 68,252. 2 t i o n a r i e s and expertees on the p a r t i c u l a r sub- Some preliminary work has been done consis- ject. ting in the automatic labeling of each and every On the other hand, there are words having a word of the corpus with i t s grammatical catego- very large number of occurrences, making t h e i r ry, 2 in which from the t o t a l analysis a very d i f f i c u l t ces, 1,083,945 task, since i t is not --590-- number of occurren- were automatically solved, and the rest had to be solved by hand, then the computer was fed with the r e s u l t s , obtaining Each of the next four words to the r i g h t of in W will this form, the complete sample l a b e l l e d . We took take i t s place in Oi+ 1 i f and only if advantage of this work, since otherwise i t would w5+i ~- have been impossible to t r y to reduce the number mark Pi 05+i and1# punctuation such that of concordances in terms of the same grammatical w5+i-I category. Pi w5+i and P i ~ { " ; : L ? i ] as they are considered to break up the Next, was to implement a programme that prod£ ces, for any given word, i t s set of concordances; c o n t i n u i t y of a context. each word s t a t i n g i t s own grammatical category. In s i m i l a r way, the words to the l e f t of W This is stored in a f i l e are associated to t h e i r place in ORDENA. it called CONCUERDA, and is organized in the following way: Figure No. 2 shows how to construct table Every concordance has three l i n e s , each one ORDENA from a given concordance. of them consisting of: - 6 characters (nnnnnn) reserved for the number 3.2 Generation of a Tree Structure s t a r t i n 9 of occurrence. - 12 characters ( t t t p p p l l l ) from ORDENA. reserved f o r the r e g i s t e r of that l i n e , according to the original and Once obtained this set of up to nine words, t e x t , and s t a t i n g t e x t code, page it line. is proceeded to construct a tree structure - 72 characters reserved f o r the actual t e x t f o r the words to the r i g h t of W and one f o r the - 18 characters f o r the label of each word of words to the l e f t of W. the l i n e , stating code. The f i r s t It will the grammatical category two characters indicate the is generated immediately a f t e r , number of words in the l i n e . tric Figure number 1 shows part of f i l e only be described here the construc- tion, of the r i g h t branch of the tree. The l e f t CONCUERDA though in symme- form: - The tree has a root node which is the word W itself, and i t s organization. and has f i v e l e v e l s , being the root in level 5. I I I The Algorithm - A d i r e c t descendant of a node wi is given by the word wj such that wiw j are adjascent, i . e . i f wi-~ORDENAi and wj.~ORDENAi+1 3.1 Association of the i-Concordance to table wj is a d i r e c t descendant of wi . O R D E N A . - For each concordance, a table ORDENA is asso- The label of each node consists of: - ciated in the following way: - - The word in question is located in the - Word w associated. I t s grammatical category. - Its frequency. middle l i n e and associated to ORDENA(5) And pointers to: Four words are selected to the r i g h t and - Direct ascendant. to the l e f t - F i r s t d i r e c t descendant. of W, since they are supposed to be carrying the most s i g n i f i c a n t grammatical then and semantic information about the - Next node whose d i r e c t ascendant is the same as the one of i t s e l f . word W. 3 We took this idea from the Centre - Another f i l e called CONCORD,where i t du Tr#sor de la Langue frangaise"s work stored the number of the concordance or concerning to the treatment of binary groupes concordances where that word in that --591-- is ..t" cO 0 d" O 0 ( 0 -'~ d" ~OO O,,-I c~o O 0 0 I~ O 0 00=t ~D 000", O~ .;.1" I~ 0 0,1 rn ....~ 0 0 , I ,~ d" I~ 001'3 ~ O eO~O ~ I ~- eO0 ,,.00 O~ c o d " I/~ 0D CO 00~ O ~t (3 mo o~D~ "~P~O ~,o, g2-g 0~0 1=3 0 ~ 0 0 tt~O d" O,,,-I ~ oO',cO go,=, mO oO', n'~ec ,..Oi'~- ~o,:,' '=~,I ~1"0~ 040 000 =.'.1"i"- 0 ,,.~0 O0 0 ,,-40 eO(;O 040,1 0..t"=,1" eO0 g C,J O ~.o O~O -.1" ~0 g ~C°) , ~ ° ' :",.0° °C~ o 0 ,,.-~ ,-t ,"r*, O0 c3o (:z, fO ~, ,::} ),- U') WbJ 0 ~ ~ O Z 0 "~ c- n ~ I-- ,~ >.. 0 r ~, Z H O W ~ Z o Z .~ ,< .Jr~O 0 ~ H I~ ~ 0 O0 O rr' ~ -Z ,,J ,,~n,' J,mW U~ -0 OI-I--~ ' ~ (:3 17'1 OV~J 0 ~E ~ ~ O "3 hI~ ,~ (J, ,~,,~ ~C ,.J ZO ..J ~,,.-, W~ ~r ~ - r O~ u~w mD~ >,-- Z / u~ ,< b..I b'} ~ J - w<m I-.- bJ -r - O0 0 m ~ ~oo ~ tj~.. n ~ .cbJ --IW 0 I-- b~ , , : ( J ,,_I c [ O b~ ~ m,,~° n.'~. I..,J ~.J .Q~ row< wz bJ~- :" ,~1-- o MUJ ~7 a~- w ~a =DJ ~ b..I ,~ ~'~ IM Z o :D 0. o b9 b ~ J b.I •~ H W t/~ ~ H r~r~ Z) ~ , > ~ O W ~ o OZ6~ I1) e- ~-o WZ -m ~m ~0 b.l I-- ILl bl ul>0 Z,~. ~ W ~° , J D J I (3~ 13 W ~-~ UIC3 (:r. LO Ld 4o 0 ct~ 0 Z) ;-- rr "3 rr' (~ O IZI < ~ W O ~ L/~ J ~ U "O J ~ ~/~ > ~0 • Zl--Z J W Q )J r'~" X Z ~. ~ o~ Z 'YH ~.WbJ . I,.- LO ."~ u1~ ~-I Z 7 CC l i p " 0 ~ 0 0 C~ ."~ --I 0 ~.~-.Z m ,<I- m ow ~- O~- o ~ ~'~o ~-~ ~ OC ZbO ~.~,< O U -~ " ~ 0 ~d .~ u , , 0 ~ (~) ~: O~Ld u~ >-W~ O~-~ I.-=~ dlJ I,dh ~- ~ b J . • 010 OZ +~ I& J - J <0 d~ ~/) trl ~ Ul L,.I ~ td~ Oh-cq ',1 ~o~ > =~ 0--3 ~5o ~:> Ztm b'~° WO W J0. L~ 0 ~ . N ,~ b-I N ZO ~..-~ "r" b'~ 0 < Jb~ J ' ~ O .~W U~ J r-~ W _rl M O OJ ~ "~ ° U') 0 ~W W UIC~ .J . J O _J ~ t J ~dH D. 6 q ~ ~ ,.J ~ O~ ~W 0.~ O OW n ,d ""~ ' ~: 0 ~ ~lr~ .'3.. b..I ~ e- J i~O ~ ~ :D ~1 I.I. ~ ::D ~ .~ I,~ Z ~ ~-4 ,~ ..-I I-. l-- I~I ~ H IWWOJ ~'~ e4~ ,J A. O W m ~ O U ~ g ~ ~2 JbJW ~ l.h ~1~ ,Y ~J Uo ~-• eb_ o W T ~ . g,.,× g~,~ .I=,, <o o-,~ ~ o o-,,, OO -J Z h o o o g g ~Q Q ° o o o o 4" o o :I" c) C) ,r; 0 W o-z •4 0 D~ ,-~ O~ 0~ ,~I o ed c3 J',l o •-4 c~I r~ 4" .I'I ,,9 o z o • t ~- --592-- ,'0 C) 0 LZ 0 4-~ in particular context came from, making in this way possible the retrieval operation. The process repeats i t s e l f until the last concordance has been processed. '-A node has as many branches as different words are found to be direct descendants to that Figure No. 3 shows, for a set of 14 concor- word, with the same grammatical category dances, the l e f t and right trees generated. through the whole set of concordances. 12 ObOl6B030 SI/, LA AooLESCENCIA NO PUEDE SER SUPERAOA SIPIO COMOOLVIDO DE SI/p 12500100030045 COMO ENTRCGA. POR ESO LA AOOLESCENCIA NO ES SO/LO LA EDAD DE LA 130045001g16846 SOLLDAD, SINO TAMBIE/N LA E/POCA DE LOS GRANDES ANOREs, DEL HEROI/SMO Y 12631004000783 ORDENA[I :9] NO ES SOL.O LA I 9... i 6 EDAD' DE 8 q ,LA SOLEDAD 8 Figure No. 2 Table ORDENA is obtained from a given concordance. Note that ORDENA(9) is void, since there is a comma (,) after the word 'soledad' --593-- 90 323025064 SUCEDI/A ALLA/ POR EL A+O DE 18,~!, CUANDO DON~ PEPE~ T!:NI/A UNOS 55 A+OS DE EDAD Y MUCHOS RI+ONES AU/N° TUVO UN IMITAOOR NOTABLE~ QUE FUE UN BANDERILLERO LLAMADO ANTONIOP GONZA/LEZ~ EL~,ORIZA6E+O~ QUIEN DIO A 91 32403#073 AHORA, LA EMPRESA GUE LA TIEN!~ R~NTAOA~ SE ~ s T A / GASTANDO UN DINERAL EN 13100:,l.! 5 9 i 6 8 4 ESTE SERIAL~ BUSCANDO NUEVOS VALORES~ MISMO5 QUE - HASTA QLiE SU EDAD SE 1 2 2 8 0 U ! : 3 4 5 2 8 5 LOS PERMITA - NO HABRA/N DE SALIR DE ENTRC LOS NI+OS TOREi~OS, 1L59L0404:,683 92 33306(,012 CONSEGuIR DINERO PARA SACAR ADELANTE LA FUNDACIO/No PRIMERO HABLO/ EL SE+OR CURA eUE FNTONCES NO T E H I / A NI TRZZi4TA A+OS DE EDAD. LUEGO DON TOMA/S8 SA/NCHEZO (ESTE S I / VIEJO Y COLUDO) PROPUSO COLECTAS Y R I F A S . I00 ~ ; : .: 96 1380,) 1..'.73 B~:;4CI0 11B52!9300'i3 .' 93 3351#8021 CABALLOS. I 0 EN SAN ~ JOSE/g H A B I / A MEDIO MILLAR DE HOMBRES EN EDAD DE TOMAR LAS ARMAS E IRSE A LA GUERRA~ PERO NO TODOS SE SINTIERON CON A/NIMOS DE 134850084~3#3",~96 I ~ 8 3 0 #0 .:31359~9q. 94 33.;148034 CASADOS Y TENI/AN H I J O S . LOS MA/S ERAN JO/VEN[:S Z:N EL VERDOR DE LA EDAD, DE 16 A 30 A+OS, CON ALGUNA DESTRE2A EN L:L MANEJODE ARMAS Y CABALI..OS Y SIN DISCIPLINA MiL!TAR. 5 03#OJ 95 3,~50#q.023 ENCUBIERTOS DEL DIABLO~ O AL MENOS DO/CILES INSTRUMENTOS DE SUS AVIESOU l.i_ 076370;" 426 DESIGNIOS, LA BEATA IMAGEN DE LA EDAD [)E ORO REDIVIVA SE TRANSMUTO/~ AL 13008~;#6843:~597 CONJURO DEL DES~NGA+O~ EN EDAD DE HIER~<O EN QUE DOMINABA LA CRECIENTZ 1287~4C~L~O'I.OL; ; 96 3:'.50 q.# 02t~ DESIGNIOS, LA BEATA IMAGEN DE LA EDAD DE ORO REDIVZVA SE TRANSMUTO/~ AL 1300' ,468L~0 .597 CONJURO DEL DFSENGA+O~ EN EDAD bE HIERRO EN QUE DOMINABA LA CRECIENT~: 12875#54:3q.50 OC! CONVICC:Io/N DE QUE ESOS DESNUDOS HIJOS DEL OCE/ANO ~ FORMABAN PARTE DEL 1.L0#02637C907 97 5#2251019 INDI/GEHAS~ COMO ES AU/N~ EN PARTE E s T E / R I L , SINO ~UE R E A L I Z A R I / A SU PROGRESIVA EDUCACIO/N EN LA ADOLESCENCIA Y HASTA EN LA EDA9 ADULTA'. EN EL PLAN DEFINITZVAMENTE REG::Ni:RADOR DICTADO EN EL LLANO DEL RODEO~ 98 3450650g5 J U R A / I S Y YO PIERDO UN ALUMNO. 6 935958 PERO DESD.r:.ILA EDAD OE OCHO O NUEVE A+OSHASTA LA DE DIEC!NUZVZ O VEINTE 153468#839840#639 NO EXISTE ILL DESI:O DE UN TRABAJO MANUAL PESADO. ESTO ESE:XAC'FO EN LA 1#i0684580.~593#:; 3~09603~ PERCIBIR SUS CUALiDADES TANTO MATERIALES COMOFUNCIONALES, ASI/ COMO 9 0280,! '1 SU CONVENIENCIA RESPECTO A LA EDAD DE QUIEN LA IBA A USARI OBSERVO/ SU 14281468L~5594092 CONTENIDO Y MANEJOp SE DIO CUENTA DE SU PESOY RESISTENCIA AS// COMODE 1 4 8 3 0 5 9 8 # 2 8 3 0 1 0 4 100 34409603(~ DESARI<OLLO DE LA IMAGINACIO/N CREADORA YALGUNAS HABILIDAOES PARA oFERAR 9 8 4 0 9 : ' CON HERi.,AMIENTAS SENCILLASI ES, ADEMA/S, ADECUADO A LA EDAD DE MI H I J O , 1 2 4 0 ; ] 9 i 8 4 6 8 4 2 8 INSTRUCCIONES, LA PRESENTE "SCALA CONTIENI] OCHOASPECTOS E~iCNCIALES EN 9 0 L ) . ; ' : . : 4 101 345322048 L E E S F A / C I L HACER AMISTADES l 'ME ES BASTANI'E F A I C I L HACERLAS Y ME 12590~),599 35 GUSTA QUE SEAM ALEGRZS, DE MI EDN) Y TENGAH UN I ; I V E L CULTURAL POCO MA/S 1 4 9 0 9 8 4 2 8 3 9 6 6 0 q 0 MF'NOS COMO EL M I / O . ' Y FRENTE A UN GRUPO DE NI+OS: 'ME DA GUSTO VER 1630d6553468485980 102 3#5355029 TERMINAR LA CAR'~EIA DE M',ZDICINA. 5 O. 4~ C A S O 2 . ALUMNO DE 19 A+OS DE E:DAO! SEXO MASCULINO. PROCED~NTE DE LA ESCUELA" DE ARQUITECTURA DE UNA UNIVERSIDADDE PROVINCZA. 103 3#6037919 I D E N T I F I C A CON LA PLENA REALIT.~CZO/N DE LAS A~SPIR/',CtONE.~; f~UE ::L HOMBRE 1~.9#0,) 4 i 66 "IIENL Dt:SDF: !:L TI_'/RMINO DE LA EDALr" M E D I A " , Y NO SE DANCUEliTA DE QU[: 150468#68';31598#0 EL A+O 20,~' PIED[: NO SE LA CULMINACIO/N ROTUNDA Y FELIZDE UN PERIODO 1#66C0153C 3 # 6 8 99 1391468#C!0805C 1384,932(~106,.30:,9 1068081~68594 1 3 8 3 9 8 0 0 ,LZq-6846 158@Cz!C 0f~O:~q-66483 110:~91#01;3 ! 2 I J,80t+O ' ,34;~68~ 11466.[ 0 :) q.687,'3 130C8#C8#GOJ'4: 6 0#u468~',.0 UNOSNI~NUME.-DE~ ANO5---- DE, MILLAR-DE-- HOMBRES--EN E'L--TER M I N O ~ _ _ /¥ EL-- VERDOR ~ D E \ BEATA--IHAGEN--",X Y--HASTA EN~L~ PERO--DESDE7 ' CONVENIENCIA Figure No. 3 //MEDIA //jADULTA //-,I HI~HIJO IEDADF \\ At QUE-----HASTA, ~ M U C H O S - - - - R I QONES--AU N TENGAN--UN NIVEL \ \ DE--M QUE--SU /.TOHAR--LAS--ARHAS \ ~DE~-- HIERRO-- EN QUE ~OCHO--O ~NUEVE QUI E N - - L A ~ I B A ~E--LOS--PERHITA--NO Left and right trees generated from a set of 14 concordances. --594-- 3.3 The a l g o r i t h m to select significant Once t h e is tree proceeded found is fully constructed to make t h e actual it some f a c t s to be c o n s i d e r e d repeated same c o n t e x t , probability W in ber o f than the context of repeated since there meanings functions of ~ word me s e t words. it is of to the the word a small such or however, way to words find all own d i r e c t is into grammatical W followed by t h e s~ be d e s c r i b e d the proced then was t h o u g h t by t h e It is by proceeded the then of its same f r e - category, these that same gramma- descendants number o f the have to r e d u c e was not but direct is clear account duction is text less is that that and concordances chosen the process as t h e closer to 5, level then significant; ger number o f will it is t h e word I, would ascendant with the It so ma- of it reduced. num- not level ~ode i s branch category. then num- a larger are frequency that quency and g r a m m a t i c a l same. one r e p e a t e d times Next, greater the ny d i f f e r e n t the of the recon- hence a l a r - concordances to m a n t a i n takes have to required be quality information. ure: In o r d e r most path A 6th to is level analysed is level in right of the If this branch the 5, of the and t h a t being is greater direct After a left- tree the the tree analysed). than I, descendant is to If then is form, means t h a t level rode and t h e is frequency t h e words four words o c u r r e d a times of rent concordances. As i t re t h e r e is meaning o f t h e word W i n context is the n concordances" two o f dom f u n c t i o n , concordance, same in hence, them, we o b t a i n and t h e can be s a f e l y + I and if level is F~50 by t h e s e befothe of is 6 the Q would 4 or number be 7 or 3 then F >50 If team o f pattern F>30. for of level is 8 or 2 then Q = F//4+I ~or F ~ 7 0 Q = F//7 for F>70 level is Finally, if the Q = F//5 + 1 for only Q = F//IO + i 9 or I then F~ 50 for F>50 a ran- 1) or from selected + 1 for * At a significant omited reduction F~ 30 then Q = F//5 by t a l k i n g ( n - of concordances If particu- by means o f level frequency Q = F//3 that all the by our it following: it was s a i d this was t h e Q=F//2 n diffe- a good p r o b a b i l i t y that Q=F//4 in n > I, in a reasonable If analysed W followed decided linguists* and t h e the and many t r i a l s was e m p i r i c a l l y the its reached some s t u d y reduction root same way. a 9th one or tree, (Remember t h a t W is frequency leftmost analyse followed. first nal of a possible tical in may be more s i g n i f i c a q t another of exactly meaning is words times ber o f the that that A set left the to identical The more words lar that analysis reduc- beforehand: in same i n t e r m e d i a t e be s t o p p e d ; There a r e - at associated tion. the If - concordances. thank (n-2) the for fi- --595-- in point, particular her v a l u a b l e esting output. this we would to Paulette discussions suggestions. like to Levy and i n t e r - It this has to be mentioned pattern of r e d u c t i o n ged a c c o r d i n g to o b t a i n here, is to the wprd analysed., already of concordances ( Q out of that F) i t them a g a i n , is stored may be chan- the best r e s u l t s When i t ARBOL: Each node of the t r e e that characters, as llowing each t i m e . 7 for Known the number will composed distributed I for to s e l e c t own address the word and each one of them i s marked as 2 for such, to avoid any one of them be s e l e c - 3 for the 4 for the f r e q u e n c y 7 for the address of or more t i m e s . 3.4 Output. cating output category when a p p l i c a b l e ted presented of the l a s t 7 for the rect word rect IV presented. The Computational University of Norway v e r s i o n the "Centro de Procesamiento A r t u r o Rosenblueth" of the Secretar~a ciSn P~blica (Ministry with 262K words of 36 b i t e s memory and 8 , 0 0 0 , 0 0 0 of dants (i.e. ging from i t ) is The r o o t , the word W is of c e n t r a l in disc. emer- and in f i l e stored the t r e e s the f o l l o w i n g - di- descen- No of branches the address each one of Education), next b r o t h e r ) the number of d i r e c t CONCORD the number of From the c o m # u t a t i o n a l de Educa- of c h a r a c t e r s like own d i r e c t the concordance where i t of ALGOL 60 a UNIVAC 1106 computer of (i.e. its the address of the f i r s t where i t in the NUALGOL f o r direct descendant 6 for System~ The system was implemented its the address of the next d i - 4 for Figure No 4 shows the form in which is category of the wo~d descendant of 7 for chosen are l i s - below. the o u t p u t leng~ ascendant and the f r e q u e n c y . the Q concordances the grammatical ascendant indi- the group of words repeated grammatical Next, is in f i l e the l e v e l 24 f o r by means of a random f u n c - The f i n a l in the f o - way: its tion, ted t w i c e of 72 ARBOL be chosen proceeded in a l i n e is comes from, point of view, is generated in whose node a s s o c i a t e d is way: in a p r e f i x e d and i t will dance. This word is be p r e s e n t address, in every c o n c o r - taken from ORDENA (5) 4.1 Data Storage. - The next word in ORDENA w i l l be s t o - We made use of 3 f i l e s : red by means of a hash f u n c t i o n , a) is decided to be the same node as one previously stored, File CONCUERDA, where the whole set of concordances W was s t o r e d , and i t of the word word, was d e s c r i - b) F i l e s same, ARBOL and CONCORD~ these two f i l e s obtained the r i g h t and o n l y stored while trees. --596-- the category, level ascendant are e x a c t l y the in such case the f r e q u e n c y the number of t h i s in a d d i t i o n and l e f t if grammatical aumented by one and in f i l e are supposed to c o n t a i n the i n f o r m a t i o n generating its and d i r e c t bed above. if and i t is CONCORD is concordance to the p r e v i o u s one. r~ I./1 o D -/ I-o I-.. ZDI-- ~1 O O ~ OV1 e~by ~-'~ ~ . Z ~ O1~ .~ I - - H W 01 ~ W 0 0 o z W O0 ~H ,,~ r T Z ~ W~ (~ C3 I O Z 0 0 ~ l~m-" :3 Ld U'I .~ 0 "r~ I~ .J Z O -J 69 ~ • :~ J r ' , " ...I :3 Wk- W O~r~ .~- - I O 0 ,,~ L) O , ~ O ZOl 0 U I I NW 0 7n~ ZOhJ ~ &. ,~m ~..1 Z OV) U'I W 121W ~ I, ~V I '~ ZZ 7~]1 ~ D W wmm O H C3 W ~ ,~ n~" . J MO ffl U 3 Z O O~ .-I 0. Z b~ X W O ~ ,~ W m.J U1 v l X (D W - J I./1 r-, L~ H ~.- I-- 4 Qz t3d HO m (3:: O IsJ l:g ,.J n O / ~ m n~ ~ ~ Z 0 o ul W ~0 1:3 0 0 ~ Z ~Z I If) W O W .J •J Z W~ 4., ~ ~'~ ~ 0 m n o :~ H L ~JO bJ _~ ~_ t~ O ~ ,~_J --I Z O Z ~m~ ~o< ~ OW ~. ~ . J ~-I O 0 Z ~n kd .J J '~. H O ,~ I~I t7 ._1 N Z ,~ o e~ I WZ Z_lr, ~ O ~O 0~ , - ~ . O ~W Z~3t~ bJ ~J. ~ W 60 WbJ 0 bJbJ. 0 ~ . Z Z :-40 + J WW rn~ =~ ~ C I bJO ID~ ~J, b.J nvI I-(z} U1 ,~ W(.) r'~ C3 J m Z~ 0 ~- r'7 Jl~ Z~,--I b.I W :~ W N ( J"~ Z WZ H WbJ ~ L~ O eq-. O W O ct~ "O H X b::~ r'~ b..i r'~ 2) O U cO O Z W 0~I LdO V~J --I n<~ bJ (~ ,w O Ld~-~ % O --I F-m m J .J 09 W LD v DLQ m 09 ~W I-~0 C~ O_ O W I cO 0 I J I~1 0 ~'4 O Z W h.IO L~ O ~9C C~I~J W~ H ~T3 01 _.I Z ! El tO LLI -~Zt~ t t~ L J O 0 ~-~ b9 ~ 0~" ~l~O GhOUl Z no ~ .J ~ O ~ W L~I bJ t 7 ~ :~ b.I O Ln Z 0 - ~i~ 1::3 Z ~ I • I • W 4o 4~ ~ 0 ~ E I~ II II l:g e¢ (J I O % iLL I.L II u~ ,, Z al + ~ ~ g .~ m f2~ ~ c~ ~ Z + o e4 o o O + ~ o C~l n 6 Z ft.) L W ul ~ ~W W W J ~-~ .-I O n~ U W ~lZ ZUI OLd ~ ~-"~ t.~'T" 0 H WOJ. t-- U1 0 o ow I-- H c ~ o 0,. ;.-. b9 W *O 0 ~ II U bJ W C- 4-.) ! c2~ ,,~ t'," ~. U'I W ZJ W'~ 0 W ~.0 n~ O ~ O u I./') ,,~ "O HH i, ~,~ ,,~ Q r~ W ~w O ,j ~I~Q ZU Wb9 I-- C~ 01WW O Z'~ I! O W I--W ON bJ n cl ZO 69n~ J L/1 .~ W VlZ •- bJ~' ~-~ L~ 0 L~ H ~ .JO :3>0 ~:: H C3 0 bJ t/l~ ~td Wkt. Zb9 :E H = E I-- ~ r-~ ~ O L41 ~ - ~ O O 0. O W'~ J ZO O H .J Id ~'0 -'~' Ul ZE~ ~-~ .J ~ QV~ ~,-~ bJ tw W -I OE~ ~/I Z DH WO ~ l:1 LO LL. Z 0 W W O(:3J Q Ld ~ m O j ~ O r~ tlJ uJ o ul bJ --597-- ,,.1 Z W W cl e-~ O~ TREE STNUCTURE GENERATED FOR WORD *EDAD* (AGE). 51596 31608 31620 31652 31644 31656 51668 31680 31692 51716 51728 31740 31752 31800 31812 31824 31856 31848 51860 31896 51908 51920 31932 31944 31956 31968 31980 31992 32004 32016 32028 32040 52052 32004 320?6 3~088 32100 32112 52124 52136 32148 52160 32172 32184 32196 52208 32220 52252 52244 32256 32268 32280 52292 32304 32516 32528 52540 Figure No. 5 2DE 3Y 4MI 2MODERADAMENTE 2PUES 5Y 3Y 20E 2DE 3SO/LO 4ESA 2ALGU/N 2PROBABLENENTE 2CON 2CON 2CON 45U 4TU 4TAL 4POCA 2INVERSAMENTE 2PARA 3A 5A 3A 5A 3@UE 4TODA 3A 80UE 3A 3CIERTAMENTE #ESTA 2POR 4CUYA 5DE 5OK 3DE 40TRA 3DE 5DE 2HASTA 2i4ASTA 30E 2CONFORME bEN 6YA SEN SEN 6140 4ESTE 3A 5PERO 3A 2LE 3EXACTAMENTE 3A PR2 COl AJ2 AV13 C04 COl COl PR2 PR2 AV5 AJ5 AJ6 AVIS PR3 PR5 PR5 AJ2 AJ2 AJ3 AJ4 AVI2 PR4 PRI PR1 PRI PRI C03 AJ4 PR1 C03 PRI AVII AJ4 PR3 AJ4 PR2 PR2 PR2 AJ4 PR2 PR2 PR5 PR5 PR2 C08 PR2 AV2 PR2 PR2 AV2 AJ4 PRI C04 PRI PN2 AVll PRi i i 9 I 1 1 i 2 1 1 5 2 I 1 1 1 16 2 1 2 i i i0 25 1 1 2 1 2 i I i 6 2 1 9 6 21 I 1 2 I 1 I i 15 2 1 I 2 2 1 2 2 1 1 i 30936 32712 12 36360 32436 34008 32688 33372 31056 3#008 12 36312 31944 31980 35400 35664 12 12 12 12 32424 51980 54344 34008 52832 51620 34008 12 31728 31856 33060 34008 12 50504 12 31856 31620 34008 12 32124 31896 32016 32208 32052 35628 34008 24 51728 34344 24 12 31860 52712 52052 36900 34008 31848 36576 33168 36552 32100 I 4 32496 37344 5252 1 1 32208 31896 3#644 32232 1 # 55028 31920 33192 3434# 32088 6 31728 31116 2 52552 32280 1 32076 32148 1 32580 5556 50828 32880 32112 32124 56524 52568 5472 3468 31532 31560 31800 32532 35496 52160 5 20 1 1 2 I 1 1 35604 33252 52184 5 82556 32016 51968 320#0 31848 34856 32504 31980 54080 32004 32796 39264 51860 31608 52856 50468 35256 8 31344 4 37520 19 32156 I 34524 1 30996 2 35124 1 35340 39324 35976 55892 37884 36156 36168 11 1 1 1 2 1 1 36804 1 36516 5220 1 1 File ARBOL, where the tree ~tructure is generated. --598-- 4647 3228 69 2046 3879 3987 4149 4674 4749 141 507 2685 1563 219 633 1680 3 q98 2802 540 1749 260? 15 27 54 i14 216 366 557 1128 1740 411 1980 2625 648 6 72 345 381 384 543 1131 1215 1983 2546 189 222 510 1809 651 2787 2805 2877 3000 3159 3165 5192 Otherwise Figure it will be a new rode. No 5 shows p a r t EDAD (AGE) is being of f i l e VI References ARBOL, processed. 1.- Roberto Ham Chande: en L e x i c o g r a f ~ a , V Results The f i r s t ging, And results since for Applications~ those words w i t h medium 2.- say up to 600 - to reduce the number b e t - ween 30% and 40%, a c c o r d i n g comparing ces w i t h It is set of concordan- expected t h a t frequency, will However, is in (by the reduced v e r s i o n ) higher ties, was r e p o r t e d the o r i g i n a l cribed of view, words w i t h 3.- the method here des. Investigaciones the g e n e r a t i o n of each t r e e question working to o p t i m i z e The most i m p o r t a n t des the o r i g i n a l that by t h i s find expressions increases. ~e it. application besi- main o b j e c t i v e s , method i t is is possible and p a t t e r n s of to langua- and used c o n s i s t e n t l y . ~-599-- del DEM lingu~sticas Lexicograf~a. Jornadas gio de M~xico, 1979. Le T r a i t e m e n t des Cahiers I~-[~0-I~ en 89 El Cole- Centre du T r ~ s o r de la Langue from the c o m p u t a t i o n a l some d i f f i c u l - Gramatical Base E s t a d ~ s t i c a Lexicologie. are s t i l l La F o r m a l i - y Francaise: point 1979 Fernando Lara y Roberto Ham Groupes B i n a r i e s . of the word in ge repeated Analizador be more e f f i c i e n t . there since for very time consuming as the f r e q u e n c y are s t i l l del Chande: information Jor- de M~xico, zaci6n DEM i00 en L e x i c o g r a f ~ a , Garc~a H i d a l g o : Luis 1 al Invest!~1~ones Isabel del to the word in q u e s t i o n . No l o s t in nadas 89 El Colegio were very encoura- number of concordances we were able Lingu~sticas Del de J