Browsing by Author "Miranda, João"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
- An Updated Portrait of the Portuguese WebPublication . Miranda, João; Gomes, DanielThis study presents an updated characterization of the Portuguese Web derived from a crawl of 48 million contents belonging to all media types (2.5 TB of data), performed in March, 2008. The resulting data was analyzed to characterize contents, sites and domains. This study was performed within the scope of the Portuguese Web Archive.
- Arquivo e Medição da Web PortuguesaPublication . Gomes, Daniel; Miranda, JoãoNacional. O projecto visa preservar a informação publicada na Web para as gerações vindouras à semelhança do que é feito com as publicações impressas nacionais. A disponibilização de serviços eficientes de pesquisa e análise da informação arquivada é essencial para que o Arquivo se torne uma ferramenta usada por todos os cidadãos. Em Fevereiro de 2008 realizou-se a primeira recolha da Web portuguesa, tendo sido realizadas medições quantitativas. Segundo os resultados obtidos, a Web portuguesa é constituída pelo menos por 56 milhões de conteúdos, o que corresponde a 2,8 TB de informação.
- How Are Web Characteristics Evolving? (poster)Publication . Miranda, João; Gomes, DanielThe Web is a hypertextual environment in permanent evo- lution. There are new technologies and Web publishing be- haviors emerging everyday. This study presents trends on the evolution of the Web, derived from the comparison of two characterizations of a web portion performed within a 5 year interval. The Portuguese Web was used as a case study. Several metrics regarding content and site character- istics were analyzed.
- Introducing the Portuguese web archive initiativePublication . Gomes, Daniel; Nogueira, André; Miranda, João; Costa, MiguelThis paper introduces the Portuguese Web Archive initiative, presenting its main objectives and work in progress. Term search over web archives collections is a desirable feature that raises new challenges. It is discussed how the terms index size could be reduced without significantly decreasing the quality of search results. The results obtained from the first performed crawl show that the Portuguese web is composed approximately at least by 54 million contents that correspond to 2.8 TB of data. The crawl of the Portuguese web was stored in 2 TB of disk space using the ARC compressed format.
- Trends in Web characteristicsPublication . Miranda, João; Gomes, DanielAbstract—The Web is permanently changing, with new technologies and publishing behaviors emerging everyday. It is important to track trends on the evolution of the Web to develop efficient tools to process its data. For instance, Web trends influence the design of browsers, crawlers and search engines. This study presents trends on the evolution of the Web derived from the analysis of 3 characterizations performed within an interval of 5 years. The Web portion used as a case study was the Portuguese Web. Several metrics regarding site and content characteristics were analyzed.