Floresta sintá(c)tica: some examples of the Floresta treebank

Demo at LREC'2002 for the Floresta sintá(c)tica project, prepared by Diana Santos and Susana Afonso
The first two examples show the need to revise /change the CG format. The next examples show the several steps, from CG format to AD (visl) format.

  1. In this first example a wrong function has to be corrected.
    Este	 [este] <dem> DET M S @>N
    golpe=de=Estado	 [golpe=de=Estado] <*1> <*2> N M S @SUBJ>
    deixa	 [deixar] <fmc> V PR 3S IND VFIN @FMV
    céptica	 [céptico] ADJ F S @<OC
    a	 [o] <artd> DET F S @>N
    maior	 [grande] <KOMP> ADJ F S @>N
    parte	 [parte] N F S @<ACC
    de	 [de] <new> <sam-> PRP @N<
    os	 [o] <new> <-sam> <artd> DET M P @>N
    grandes	 [grande] ADJ M P @>N
    mestres	 [mestre] N M P @P<
    de	 [de] PRP @N<
    xadrez	 [xadrez] N M S @P<
    $(
    cerca=de	 [cerca=de] ADV @>A
    300	 [300] <card> NUM M P @>N
    em	 [em] PRP @N<
    todo	 [todo] SPEC M S @>N
    o	 [o] DET M S @>N
    mundo	 [mundo] N M S @P<
    $)
    $, 
    que	 [que] <rel> SPEC M P @SUBJ> @#FS-N
    esperam	 [esperar] V PR 3P IND VFIN @FMV
    ver	 [ver] V INF @IMV @#ICL-<ACC
    a	 [o] <artd> DET F S @>N
    situação	 [situação] N F S @<ACC
    clarificada	 [clarificar] V PCP F S @N<  --> @<OC 
    $, 
    independentemente	 [independentemente] ADV @<ADVL
    de	 [de] <sam-> PRP @N<PRED 
    a	 [o] <-sam> <artd> DET F S @>N
    parte	 [parte] N F S @P<
    que	 [que] <rel> SPEC F S @SUBJ> @#FS-N<
    acabe	 [acabar] <fmc> V PR 3S SUBJ VFIN @FMV
    vencedora	 [vencedor] ADJ F S @<SC
    $.
    </s>
    
    The section in bold can be literally translated by "who hope to see the situation clarified". The problem is that, out of context, situação clarificada can also be understood as "clarified situation".
  2. The second needs another function (due to embedding):
    <s>
    $"
    Quantas	 [quanto]  <interr> DET F P @>N
    Laura=Pausini	 [Laura=Pausini] PROP F P @SUBJ>
    estão	 [estar] <fmc> V PR 3P IND VFIN @FMV  ----> @#FS-ACC> 
    aqui	 [aqui] ADV @<ADVS
    esta	 [este] <dem> DET F S @>N
    noite	 [noite] N F S @<ADVL
    $?
    $"
    $,
    pergunta	 [perguntar] <fmc> V PR 3S IND VFIN @FMV
    $.
    </s>
    
    Literally, the whole sentence can be rendered as: "How_many Laura=Pausini are here this night? ", asks. A more idiomatic translation would be: "How_many Laura=Pausini are there here tonight?", he asks.

    The annotator has to encode (add) the information that the direct speech question is the object of the clause headed by the verb perguntar (ask).

  3. This example reflects attachment revision (underspecified in the CG format)
    <s>
    Em=relação=a	 [em=relação=a] <sam-> PRP @ADVL>
    o	 [o] <artd> <-sam> DET M S @>N
    Iraque	 [Iraque] PROP M S @P<
    $, 
    Valeri=Progrebenkov	 [Valeri=Progrebenkov] PROP M S @SUBJ>
    $, 
    porta-voz	 [porta-voz] N M S @N<PRED 
    de	 [de] <sam-> PRP @N<
    a	 [o] <-sam> <artd> DET F S @>N
    sociedade	 [sociedade] N F S @P<
    de	 [de] PRP @N<
    Estado=Rosvooroujenie	 [Estado=Rosvooroujenie] PROP M S @P<
    $, 
    responsável	 [responsável] ADJ M/F S @N<PRED 
    por	 [por] <sam-> PRP @A<
    as	 [o] <-sam> <artd> DET F P @>N
    exportações	 [exportação] N F P @P<
    militares	 [militar] ADJ F P @N<
    $, 
    desmentiu	 [desmentir] <fmv> V PS 3S IND VFIN @FMV
    a	 [o] <artd> DET F S @>N
    existência	 [existência] N F S @<ACC
    de	 [de] PRP @N<
    uma	 [um] <arti> DET F S @>N
    encomenda	 [encomenda] N F S @P<
    de	 [de] PRP @N<
    4000	 [4000] <card> NUM M P @>N
    carros=de=combate	 [carro=de=combate] N M P @P<
    russos	 [russo] ADJ M P @N<
    $, 
    como	 [como] <rel> <ks> ADV @ADVL> @#FS-<ADVL
    afirmara	 [afirmar] V MQP 3S IND VFIN @FMV
    o	 [o] <artd> DET M S @>N
    genro	 [genro] N M S @<SUBJ
    de	 [de] PRP @N<
    Saddam=Hussein	 [Saddam=Hussein] PROP M S @P<
    que	 [que] <rel> SPEC M S @SUBJ> @#FS-N<
    desertou	 [desertar] V PS 3S IND VFIN @FMV
    para	 [para] PRP @<ADVL
    a	 [o] <artd> DET F S @>N
    Jordânia	 [Jordânia] PROP F S @P<
    $.
    </s>
    
    The section in bold can literally be rendered as: as had_stated the son-in-law of S.H. who deserted to the Jordania.

    In the CG format, there is no information about which N does the relative clause attaches to, only that it is modifying a noun to its left.

    The result of the automatic computation of AD format results in the following:

    <s>
    
    SOURCE: CETEMPúblico n=63 sec=pol sem=95b
    C63-4 Em relação a o Iraque, Valeri Progrebenkov, porta-voz da sociedade de Estado Rosvooroujenie, responsável por as exportações militares, desmentiu a existência de uma encomenda de 4000 carros de combate russos, como afirmara o genro de Saddam Hussein que desertou para a Jordânia.
    A1
    STA:fcl
    ADVL:pp 
    =H:prp('em_relação_a' <sam->)	Em_relação_a
    =P<:np 
    ==>N:art('o' <-sam> M S)	o
    ==H:prop('Iraque' M S)	Iraque
    ,
    SUBJ:np
    =H:prop('Valeri_Progrebenkov' M S)	Valeri_Progrebenkov
    =,
    =N<PRED:np
    ==H:n('porta-voz' M S)	porta-voz
    ==N<:pp 
    ===H:prp('de' <sam->)	de
    ===P<:np 
    ====>N:art('o' <-sam> F S)	a
    ====H:n('sociedade' F S)	sociedade
    ====N<:pp 
    =====H:prp('de')	de
    =====P<:prop('Estado_Rosvooroujenie' M S)	Estado_Rosvooroujenie
    ====,
    ====N<PRED/N<PRED[-3]:ap 
    =====H:adj('responsável' M/F S)	responsável
    =====A<:pp 
    ======H:prp('por' <sam->)	por
    ======P<:np 
    =======>N:art('o' <-sam> F P)	as
    =======H:n('exportação' F P)	exportações
    =======N<:adj('militar' F P)	militares
    ,
    P:v-fin('desmentir' PS 3S IND)	desmentiu
    ACC:np 
    =>N:art('o' F S)	a
    =H:n('existência' F S)	existência
    =N<:pp 
    ==H:prp('de')	de
    ==P<:np 
    ===>N:art('um' <arti> F S)	uma
    ===H:n('encomenda' F S)	encomenda
    ===N<:pp 
    ====H:prp('de')	de
    ====P<:np 
    =====>N:num('4000' <card> M P)	4000
    =====H:n('carro_de_combate' M P)	carros_de_combate
    =====N<:adj('russo' M P)	russos
    ,
    ADVL:fcl 
    =ADVL:adv('como' <rel> <ks>)	como
    =P:v-fin('afirmar' MQP 3S IND)	afirmara
    =SUBJ:np 
    ==>N:art('o' M S)	o
    ==H:n('genro' M S)	genro
    ==N<:pp 
    ===H:prp('de')	de
    ===P<:np 
    ====H:prop('Saddam_Hussein' M S)	Saddam_Hussein
    ====N<:fcl 
    =====SUBJ:pron-indp('que' <rel> M S)	que
    =====P:v-fin('desertar' PS 3S IND)	desertou
    =====ADVL:pp 
    ======H:prp('para')	para
    ======P<:np 
    =======>N:art('o' F S)	a
    =======H:prop('Jordânia' F S)	Jordânia
    .
    
    </s>
    
    
    This can be rendered in English as: as stated by the son-in-law of S.H, who (S.H.) deserted to Jordania.

    In order to get the interpretation of "as stated by S.H.'s son-in-law, who deserted to Jordania", which is known by an human annotator, due to world knowledge, to be the correct one, manual revision of the ad (visl) format is required:

    <s>
    
    SOURCE: CETEMPúblico n=63 sec=pol sem=95b
    C63-4 Em relação a o Iraque, Valeri Progrebenkov, porta-voz da sociedade de Estado Rosvooroujenie, responsável por as exportações militares, desmentiu a existência de uma encomenda de 4000 carros de combate russos, como afirmara o genro de Saddam Hussein que desertou para a Jordânia.
    A1
    STA:fcl
    ADVL:pp 
    =H:prp('em_relação_a' <sam->)	Em_relação_a
    =P<:np 
    ==>N:art('o' <-sam> M S)	o
    ==H:prop('Iraque' M S)	Iraque
    ,
    SUBJ:np
    =H:prop('Valeri_Progrebenkov' M S)	Valeri_Progrebenkov
    =,
    =N<PRED:np
    ==H:n('porta-voz' M S)	porta-voz
    ==N<:pp 
    ===H:prp('de' <sam->)	de
    ===P<:np 
    ====>N:art('o' <-sam> F S)	a
    ====H:n('sociedade' F S)	sociedade
    ====N<:pp 
    =====H:prp('de')	de
    =====P<:prop('Estado_Rosvooroujenie' M S)	Estado_Rosvooroujenie
    ====,
    ====N<PRED/N<PRED[-3]:ap 
    =====H:adj('responsável' M/F S)	responsável
    =====A<:pp 
    ======H:prp('por' <sam->)	por
    ======P<:np 
    =======>N:art('o' <-sam> F P)	as
    =======H:n('exportação' F P)	exportações
    =======N<:adj('militar' F P)	militares
    ,
    P:v-fin('desmentir' PS 3S IND)	desmentiu
    ACC:np 
    =>N:art('o' F S)	a
    =H:n('existência' F S)	existência
    =N<:pp 
    ==H:prp('de')	de
    ==P<:np 
    ===>N:art('um' <arti> F S)	uma
    ===H:n('encomenda' F S)	encomenda
    ===N<:pp 
    ====H:prp('de')	de
    ====P<:np 
    =====>N:num('4000' <card> M P)	4000
    =====H:n('carro_de_combate' M P)	carros_de_combate
    =====N<:adj('russo' M P)	russos
    ,
    ADVL:fcl 
    =ADVL:adv('como' <rel> <ks>)	como
    =P:v-fin('afirmar' MQP 3S IND)	afirmara
    =SUBJ:np 
    ==>N:art('o' M S)	o
    ==H:n('genro' M S)	genro
    ==N<:pp 
    ===H:prp('de')	de
    ===P<:prop('Saddam_Hussein' M S)	Saddam_Hussein 
    ==N<:fcl 
    ===SUBJ:pron-indp('que' <rel> M S)	que
    ===P:v-fin('desertar' PS 3S IND)	desertou
    ===ADVL:pp 
    ====H:prp('para')	para
    ====P<:np 
    =====>N:art('o' F S)	a
    =====H:prop('Jordânia' F S)	Jordânia 
    .
    
    </s>
    
  4. Example of the treatment of ambiguous sentences:

    It is not always possible (or sensible) to determine some attachment choices:

    <s>
    A	 [o] <artd> DET F S @>N
    polícia	 [polícia] N F S @SUBJ>
    alemã	 [alemã] ADJ F S @N<
    localizou	 [localizar] <fmc> V PS 3S IND VFIN @FMV
    $,
    ontem=de=manhã	 [ontem=de=manhã] ADV @<ADVL
    $,
    os	 [o] <artd> DET M P @>N
    dois	 [dois] <card> NUM M P @>N
    criminosos	 [criminoso] N M P @<ACC
    que	 [que] <rel> SPEC M P @SUBJ> @#FS-N<
    fizeram	 [fazer] V PS 3P IND VFIN @FMV
    cinco	 [cinco] <card> NUM M/F P @>N
    reféns	 [refém] N M/F P @<ACC
    durante	 [durante] PRP @<ADVL
    uma	 [um] <arti> DET F S @>N
    fuga	 [fuga] N F S @P<
    de	 [de] PRP @N<
    mais=de	 [mais=de] <quant> ADV @>A
    27	 [27] <card> NUM F P @>N
    horas	 [hor] N F P @P<
    por	 [por] <sam-> PRP @N<
    a	 [o] <-sam> <artd> DET F S @>N
    Alemanha	 [Alemanha] PROP F S @P<
    $,
    após	 [após] PRP @N<
    terem	 [ter] V INF 3P @IAUX @#ICL-P<
    assaltado	 [assaltar] V PCP @IMV @#ICL-AUX<
    um	 [um] <arti> DET M S @>N
    banco	 [banco] N M S @<ACC
    $,
    em	 [em] <sam-> PRP @<ADVL @N<
    a	 [o] <-sam> <artd> DET F S @>N
    segunda-feira	 [segunda-feira] N F S @P<
    $, 
    em	 [em] PRP @<ADVL
    Stuttgart	 [Stuttgart] PROP F S @P< 
    $.
    </s>
    

    Lit. The police German located, yesterday_morning, the two criminals who did five hostages during an escape of more_than 27 hours around the Germany, after having robbed a bank, in the Monday, in Stuttgart.

    Translation: The German police located, yesterday morning, the two criminals who made five hostages during an escape of over 27 hours through Germany, after having robbed a bank on Monday in Stuttgart.

    It is not necessary to determine whether "in Stuttgart" applies to bank or to the robbing event, because one implies the other.

    In the AD (visl) format, we use a compact notation to explicitly encode ambiguity:

    SOURCE: CETEMPúblico n=149 sec=soc sem=94b
    C149-2 A polícia alemã localizou, ontem de manhã, os dois criminosos que fizeram cinco reféns durante uma fuga de mais de 27 horas por a Alemanha, após terem assaltado um banco, na segunda-feira, em Stuttgart.
    A1
    STA:fcl
    SUBJ:np
    =>N:art(F S)	A
    =H:n(F S)	polícia
    =N<:adj(F S)	alemã
    P:v-fin(PS 3S IND)	localizou
    ,
    ADVL:adv	ontem_de_manhã
    ,
    ACC:np
    =>N:art(M P)	os
    =>N:num(<card> M P)	dois
    =H:n(M P)	criminosos
    =N<:fcl
    ==SUBJ:pron-indp(<rel> M P)	que
    ==P:v-fin(PS 3P IND)	fizeram
    ==ACC:np
    ===>N:num(<card> M/F P)	cinco
    ===H:n(M/F P)	reféns
    ==ADVL:pp
    ===H:prp	durante
    ===P<:np
    ====>N:art(<arti> F S)	uma
    ====H:n(F S)	fuga
    ====N<:ppN:art(<arti> F S)	uma
    ====H:n(F S)	fuga
    =====H:prp	de
    =====P<:np
    ======>N:ap
    =======>A:adv(<quant>)	mais_de
    =======H:num(<card> F P)	27
    ======H:n(F P)	horas
    ====N<:pp
    =====H:prp(<sam->)	por
    =====P<:np
    ======>N:art(<-sam> F S)	a
    ======H:prop(F S)	Alemanha
    ====,
    ====N<:pp
    =====H:prp	após
    =====P<:icl
    ======P:vp
    =======AUX:v-inf(3P)	terem
    =======MV:v-pcp	assaltado
    ======ACC:np
    =======>N:art(<arti> M S)	um
    =======H:n(M S)	banco
    ======,
    ======ADVL:pp
    =======H:prp(<sam->)	em
    =======P<:np
    ========>N:art(<-sam> F S)	a
    ========H:n(F S)	segunda-feira
    ======, 
    ======ADVL/N<[+1]:pp       <------------------
    =======H:prp	em
    =======P<:prop(F S)	Stuttgart 
    .
    
    </s>
    
  5. Finally, we present examples of discontinuous constituents and how they are encoded in the AD format.
    <s>
    Importantes	 [importante] ADJ M P @>N
    reforços	 [reforço] N M P @SUBJ>
    foram	 [ser] <fmc> V PS 3P IND VFIN @FAUX
    já	 [já] ADV @ADVL>
    enviados	 [enviar] V PCP M P @IMV @#ICL-AUX<
    para	 [para] PRP @<ADVS
    a	 [o] <artd> DET F S @>N
    zona	 [zona] N F S @P<
    $.
    </s>
    
    In the AD (visl) format, constituents have to be explicitly coded as discontinuous
    <s>
    
    SOURCE: CETEMPúblico n=149 sec=soc sem=94b
    C149-5 Importantes reforços foram já enviados para a zona.
    A1
    STA:fcl
    SUBJ:np
    =>N:adj(M P)	Importantes
    =H:n(M P)	reforços 
    P:vp-
    =AUX:v-fin(PS 3P IND)	foram 
    ADVL:adv	já 
    -P:vp
    =MV:v-pcp(M P)	enviados 
    ADVS:pp
    =H:prp	para
    =P<:np
    ==>N:art(F S)	a
    ==H:n(F S)	zona
    .
    
    </s>
    
    Lit. Important additional_troops were already sent to the area. Translation. Considerable troops have already been sent to the area.

Last update: 22 August 2006.
Comments and suggestions about the Floresta treebank