|
Repositório Comum >
Fundação para a Computação Científica Nacional >
FCCN - Fundação para a Computação Científica Nacional >
Please use this identifier to cite or link to this item:
http://comum.rcaap.pt/handle/123456789/470
|
| Title: | Introducing the Portuguese web archive initiative |
| Authors: | Gomes, Daniel Nogueira, André Miranda, João Costa, Miguel |
| Keywords: | Archive Portugal Preservation History |
| Issue Date: | Sep-2009 |
| Publisher: | Springer |
| Citation: | Daniel Gomes, André Nogueira, João Miranda, Miguel Costa, Introducing the Portuguese web archive initiative, 8th International Web Archiving Workshop, Aarhus, Denmark, Setembro de 2008 |
| Abstract: | This paper introduces the Portuguese Web Archive initiative, presenting
its main objectives and work in progress. Term search over
web archives collections is a desirable feature that raises new challenges.
It is discussed how the terms index size could be reduced
without significantly decreasing the quality of search results. The
results obtained from the first performed crawl show that the Portuguese
web is composed approximately at least by 54 million contents
that correspond to 2.8 TB of data. The crawl of the Portuguese
web was stored in 2 TB of disk space using the ARC compressed
format. |
| URI: | http://comum.rcaap.pt/handle/123456789/470 |
| Appears in Collections: | FCCN - Fundação para a Computação Científica Nacional
|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
|