ARCH

ARCH (Archives Research Compute Hub) is a platform for building research collections, analyzing them computationally, and generating datasets from terabytes and even petabytes of data. ARCH supports the open publication and preservation of user-generated datasets created from thousands of libraries, archives, and memory organizations worldwide, giving researchers, students, and information professionals the power to study and understand digital collections in new ways.

arch_dashboard

Interested? Start now!

For pricing and service offering inquiries, please fill out our interest form. Try ARCH anytime by signing up for the free service tier, ARCHWay. For other questions or to talk to ARCH staff, contact us at arch@archive.org.

ARCH Features

Build: Curate a research collection for analysis using primary source web, text, and image digital collections.

Access: Generate more than a dozen different datasets (full text, images, pdfs, graph data, etc.) from primary source digital collections with the click of a button. Download generated datasets in-browser or via API.

Analyze: Easily work with research-ready datasets, both through in-browser visualizations in ARCH and in interactive computational tools like Jupyter Notebooks, Google CoLab, Gephi, and Voyant, and others.

Publish and Preserve: Publish datasets with one click on archive.org, where they can be openly accessed and shared. All published datasets are preserved in perpetuity.

Support: Technical support, online training, and extensive help center documentation are all available.

ARCH Product One Sheet

arch_link_graph

Streamline Data-Driven Research

ARCH leverages the Internet Archive’s non-profit infrastructure and open-source tools to streamline the computational use of digital collections. Librarians, collection managers, and educators can offer ARCH to their researchers and students to facilitate sophisticated research processes that would otherwise require coding/scripting skills and significant computing resources.

arch_img_search
Screenshot of an image search engine for the Artists Websites web archive collection, created by Hugging Face with the ARCH image graph dataset

Publications

Recent research using ARCH software and datasets:

Background

ARCH was made possible in part by funding from the Mellon Foundation and via a long-running collaboration with the Archives Unleashed project of the University of Waterloo and York University.

Developing the ARCH platform to support all digital collections is made possible with funding from the U.S. Institute of Museum and Library Services.

Explore our service offerings

Archive-It's logo

Web Archiving

Archive-It is our user-controlled web service for creating curated, publicly accessible web archives and born-digital collections.

Learn about Archive-It
Vault logo in navy

Digital Preservation

Vault is our low-cost, easy-to-use digital repository and preservation service to store, manage, and preserve digital files and collections.

Learn about Vault
ARCH logo in navy

Text & Data Mining

ARCH is our research and education service that helps users easily build, access, and analyze digital collections computationally at scale.

Learn about ARCH