Access Services

Customizable access platforms

There are two ways to make a domain crawl available for users: either by deploying and supporting an access system internally (usually Wayback or a variant) or by utilizing a hosted instance supported and maintained by IA, but potentially designed in accordance with a partner’s website. The first method is dependent upon the custodial institution and its resources and capabilities. The latter version is something that IA has done for domain-scale crawling partners and examples are provided below.

Wayback Portal

  • Access to content from specific domains held within the Internet Archive’s global web archive.
  • Ability to include Internet Archive’s Save Page Now function, which allow users to add a webpage or individual URL for archiving. This functionality will mirror that currently available via the Wayback Machine.
  • In addition to the traditional search box, other access points can be developed, such as screenshots that point towards the most visited or linked-to websites or specific topics, subjects, or categories of websites found within a specific domain.
access_1
Example of a Wayback Portal Designed for the German National Library.

Some examples:

Search functionalities

Site search: Includes both URL search as well as keyword search. Keywords are derived from the anchor text of all webpages linking to a host. Site search functionality is currently viewable in the new Wayback Machine at https://web.archive.org.

access_2

Media search: Media search takes an archived web media resource (such as an image) and “tokenizes” its URL name by turning the filename into individual words which then become the text for a search index. An example of URL tokenization search can be seen in GifCities, where the search engine is powered by the words in the (in this case) .gif filenames. Tokenization provides a way to allow for search of resources that themselves may contain no text.

All search indexing at the Internet Archive is done using ElasticSearch, an open-source and widely utilized search tool. ElasticSearch is used across the Internet Archive for both web and non-web search and includes and monitored and maintained search cluster for high performance and easy addition of multiple indicies.

access_3

Explore our service offerings

Archive-It's logo

Web Archiving

Archive-It is our user-controlled web service for creating curated, publicly accessible web archives and born-digital collections.

Learn about Archive-It
Vault logo in navy

Digital Preservation

Vault is our low-cost, easy-to-use digital repository and preservation service to store, manage, and preserve digital files and collections.

Learn about Vault
ARCH logo in navy

Text & Data Mining

ARCH is our research and education service that helps users easily build, access, and analyze digital collections computationally at scale.

Learn about ARCH