O Seu Saber Ocupa um Lugar! DSpace

Repositório Comum >
FCCN - Fundação para a Computação Científica Nacional >
FCCN - Fundação para a Computação Científica Nacional >

Please use this identifier to cite or link to this item: http://comum.rcaap.pt/handle/123456789/470

Title: Introducing the Portuguese web archive initiative
Authors: Gomes, Daniel
Nogueira, André
Miranda, João
Costa, Miguel
Keywords: Archive
Issue Date: Sep-2009
Publisher: Springer
Citation: Daniel Gomes, André Nogueira, João Miranda, Miguel Costa, Introducing the Portuguese web archive initiative, 8th International Web Archiving Workshop, Aarhus, Denmark, Setembro de 2008
Abstract: This paper introduces the Portuguese Web Archive initiative, presenting its main objectives and work in progress. Term search over web archives collections is a desirable feature that raises new challenges. It is discussed how the terms index size could be reduced without significantly decreasing the quality of search results. The results obtained from the first performed crawl show that the Portuguese web is composed approximately at least by 54 million contents that correspond to 2.8 TB of data. The crawl of the Portuguese web was stored in 2 TB of disk space using the ARC compressed format.
URI: http://comum.rcaap.pt/handle/123456789/470
Appears in Collections:FCCN - Fundação para a Computação Científica Nacional

Files in This Item:

File Description SizeFormat
introducing-the-portuguese-web-archive-initiative.pdf225.4 kBAdobe PDFView/Open
FacebookTwitterDeliciousLinkedInDiggGoogle BookmarksMySpaceOrkut
Formato BibTex mendeley Endnote 

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.


Estamos no RCAAP Governo Português separator Ministério da Educação e Ciência   Fundação para a Ciência e a Tecnologia

Financiado por:

© 2009 - REPOSITÓRIO COMUM - Comentários - Statistics