About the NDLTD Union Catalog

This service is operated on behalf of the Networked Digital Library of Theses and Dissertations (NDLTD) . It aggregates and exposes metadata for electronic theses and dissertations (ETDs) contributed by participating repositories worldwide. The service collects metadata via the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) and provides discovery and downstream dissemination pathways.

Using the Union Catalog

How to use the catalog is covered in the documentation. The API reference documents every machine interface with working examples, including OAI-PMH harvesting, the JSON search API, item metadata in JSON and JSON-LD, the MCP endpoint for AI assistants, RSS, and sitemaps. The OAI-PMH harvester guide covers the harvesting endpoint in depth. See the Terms of Service.

History

Origins of the NDLTD and Early ETD Initiatives

1987 Representatives from UMI (University Microfilms International), the University of Michigan, Soft-Quad and ArborText met in Ann Arbor to discuss using SGML to structure theses and dissertations. Virginia Tech worked with Soft-Quad to develop an SGML Document Type Definition (DTD) for theses and dissertations, leading to one of the first ETDs in 1988^[1]^[2].

1992–1993 By 1992 the Virginia Tech team (John Eaton, Edward Fox and Gail McMillan) worked with pre-release versions of Adobe Acrobat to submit theses as PDF files. Meetings organized by the Coalition for Networked Information and the Council of Graduate Schools brought together ten U.S. and Canadian universities to plan a National Digital Library of Theses and Dissertations (NDLTD)^[2].

1994–1997 A 1994 meeting at Virginia Tech set two goals for ETDs: to create SGML-encoded theses and to allow PDF submission using Adobe's new tools. Funding from the South-eastern Universities Research Association (SURA) and the U.S. Department of Education supported a 1996–1999 project that launched the NDLTD. On 1 January 1997 Virginia Tech made electronic submission of theses mandatory and West Virginia University adopted a similar requirement in 1998^[1]^[2].

The NDLTD Union Catalog grew out of early ETD and digital library work in the 1990s, when NDLTD member institutions began moving from local, heterogeneous search approaches to interoperable metadata exchange. The adoption of OAI-PMH enabled a harvest-based union architecture that could scale across institutions and countries.

Goals and Early Vision Fox, Eaton and their colleagues defined the NDLTD's vision as improving graduate education, preparing students to work with digital libraries and ensuring broader access to theses and dissertations^[4]. Early benefits included cost savings and wider dissemination of student research, as well as the possibility of integrating multimedia and hyperlinks into theses. By September 1997 the project had grown to 20 member institutions worldwide and the initiative was renamed the Networked Digital Library of Theses and Dissertations to reflect its international reach^[3].

Towards a Union Catalog and Early Search Tools

1993–1998 A federated search prototype developed at Virginia Tech allowed users to submit a query to multiple ETD repositories and receive separate result sets. The system mapped queries to each site's search language and cached results, but it was fragile—any change at a remote site required reconfiguration and results could not be merged^[5].

1999–2000 To address scalability and interoperability problems, representatives from NCSTRL (a technical report library) and NDLTD met in Santa Fe in 1999 and agreed upon a simple metadata-harvesting approach. This meeting produced the Santa Fe Agreement, which led to the Open Archives Initiative (OAI) Protocol for Metadata Harvesting (OAI-PMH)^[5].

1999–2001 Around the time of the Santa Fe meeting, the NDLTD began a vital partnership with UNESCO. With UNESCO funding and support, the NDLTD published the UNESCO International Guide for the Creation of Electronic Theses and Dissertations in 2001. This multi-lingual guide served as a major catalyst for global adoption, particularly in developing nations, shifting the ETD movement from a primarily North American initiative to a worldwide standard^[6].

2000 The Online Computer Library Center (OCLC) proposed a Public Catalog of Theses and Dissertations. OCLC's WorldCat contained about 4.3 million thesis/dissertation records, and an OAI-harvestable prototype contained 4,760 ETD records with name-authority linking^[7].

2001 OCLC launched the Advanced Library Collection Management Environment (ALCME) project. The NDLTD Authority Linking Proposal was published on 15 March 2001 and the project began in October 2000. ALCME developed open-source components (OAICat, OAIHarvester and XTCat) conforming to OAI-PMH, and OCLC extracted thesis/dissertation records from WorldCat for experimental services^[8].

2001–2002 Hussein Suleman and Edward Fox described an OAI-based "Union Archive" for the NDLTD. The Union Archive harvested metadata from OAI-compliant ETD repositories to create a central collection, avoiding the fragility of federated search. They noted that past efforts at federated search had limited success and that metadata harvesting via OAI-PMH promised greater reliability and easier integration^[5].

VTLS‑based Union Catalog and Visualizer

2002 The NDLTD Steering Committee established a partnership with VTLS (Virginia Tech Library Systems) to develop a Union Catalog that would create a global database of all ETDs for NDLTD members, offer a single searchable interface, and accept metadata in MARC, Dublin Core or ETD-specific formats. The committee designated VTLS as the Union Catalog agency, responsible for hosting the catalog, converting submitted metadata to a standardized format, creating search indexes and providing a web client and Z39.50 server. The VTLS Union Catalog initially included ETDs from seven countries and supported 14 languages^[9].

2003 After the catalog's launch, VTLS built the Visualizer interface, described as a dynamic search and discovery platform for the NDLTD Union Catalog that supports faceted browsing and visualization. It served as one of the primary search interfaces alongside services provided by OCLC and Scirus (Elsevier's search engine)^[2].

Prototype Union Archive, Service Providers, and Formalization

The prototype Union Archive harvested records from nine sites (Virginia Tech, Humboldt University of Berlin, University of Duisburg, TU Dresden, PhysNet, MIT, CalTech, Uppsala University and University of South Florida). The authors proposed daily harvesting schedules and double-escaping Unicode characters to ensure correct encoding.

Importantly, they distinguished between the metadata harvesting layer and end-user services: the NDLTD Union Catalog (VTLS Inc.) and the experimental ODL-based Union Catalog provided search and browsing services based on the harvested metadata. The Union Archive (metadata store) thus served as the backbone for multiple service providers.

2003 An important administrative milestone occurred in May 2003 when the NDLTD officially incorporated as a 501(c)(3) non-profit organization. It transitioned from a loosely configured, volunteer-run project based primarily out of Virginia Tech into a formal entity with a Board of Directors, bylaws, and standing committees, solidifying its long-term sustainability^[10].

OCLC Production Service and Authority Linking

After the experimental stage, NDLTD decided to separate the metadata repository from the search interfaces. Hussein Suleman's 2012 paper on "The NDLTD Union Catalog: Issues at a Global Scale" provides a detailed retrospective^[12].

Launch (2001) The Union Catalog project began in 2001 with a prototype built at Virginia Tech using an OAI-PMH harvester and data provider from the Open Digital Library (ODL) suite^[11]. The prototype included 14 sites and fewer than 50,000 records.

Transfer to OCLC After the experimental phase, NDLTD migrated the service to OCLC, which operated a production system for about ten years. OCLC extracted ETD metadata from WorldCat using a custom harvester and XTCat software to build the collection. During this period the number of records doubled annually during the first five years, eventually surpassing one million records.

Division into Archive and Service Providers When the service moved to OCLC, NDLTD decided to divide the union catalog into (i) a metadata-only Union Archive and (ii) downstream service providers responsible for search and browse interfaces. OCLC maintained the Union Archive; VTLS and Scirus provided the search and visualization services. OCLC also worked on name-authority linking, resulting in the NDLTD Authority Linking Proposal being published in March 2001. These efforts laid the groundwork for linking ETD records to standardized author and subject headings.

University of Cape Town Era and Global Search

2011–2012 In 2011 OCLC transferred the Union Archive (metadata repository) to the University of Cape Town's Digital Libraries Laboratory, run by Hussein Suleman. The collection was migrated using ETDPortal. After months of tweaking, the system entered limited production in early 2012, with automated daily harvesting. The central metadata repository stored records in MySQL and provided an OAI-PMH data provider and RSS feed generator^[12].

2012 At the ETD 2012 conference Hussein Suleman presented "The NDLTD Union Catalog: Issues at a Global Scale," which noted that the Union Catalog had grown to almost two million records^[12].

2015 Dr. Suleman has run the NDLTD Global Search system since 2015^[13]. The Global Search service builds upon the Union Catalog by providing faceted browsing and search across millions of ETD records. A 2017 article describing scenarios for advanced services notes that there were "some 5 million ETDs in the NDLTD Union Catalog” and that basic faceted browsing and searching of those works was supported through the NDLTD Global Search at that time^[14].

2016–2020s The focus of the NDLTD and the broader ETD community began shifting from basic PDF access to integrating ETDs into the wider scholarly communications infrastructure. Institutions increasingly championed the use of Persistent Identifiers (PIDs), such as ORCID iDs for authors and Digital Object Identifiers (DOIs) for the theses themselves, treating ETDs with the same infrastructural respect as formal journal articles^[15].

Concurrently, ETDs evolved beyond static text documents into "complex digital objects." As research data management became a priority in higher education, the NDLTD began addressing how to handle supplementary research data, code, and multimedia files attached to an ETD, ensuring these components adhere to FAIR (Findable, Accessible, Interoperable, and Reusable) data principles^[16].

The Global Search architecture continually modernizes to aggregate well over 6 million records from hundreds of universities and national consortia worldwide (such as the Brazilian BDTD and UK EThOS)^[17]. To manage "dirty data" from non-standard metadata across global repositories and resolve broken links, the backend infrastructure relies heavily on modern search indexing technologies like Apache Solr, ensuring rapid, multi-lingual query capabilities.

The Union Catalog links ETDs to their home institutions and has become an important gateway for global scholarship. OCLC's Search & Retrieve Web Service, VTLS's Visualizer, Elsevier's Scirus (discontinued in 2013), and the NDLTD Global Search service have historically provided different ways to access the metadata, cementing the NDLTD's legacy in open science.

NDLTD acknowledges the long-term contributions of many partners across these phases, including work led by Hussein Suleman and collaborators at the University of Cape Town.

Reassessment and Rebuild Initiative

Late 2010s–Early 2020s Over time, the Global Search and Union Archive infrastructure began to experience technical and operational strain. The underlying software stack relied on legacy components, custom harvesting logic, and manually maintained configurations. As repository platforms evolved and metadata formats diversified, maintaining reliable harvesting and link integrity required increasing effort.

During this period, downstream discovery services such as library knowledge bases and indexing providers continued to rely on the NDLTD Union Archive as a centralized metadata source. The separation between the metadata archive and end-user search services—originally designed as a strength—became harder to sustain without dedicated engineering capacity and formal operational governance.

By the mid-2020s, portions of the Global Search and Union Archive services were no longer operating at full reliability. This effectively created a gap in the global ETD metadata infrastructure. While alternative discovery layers continued to provide user-facing search, the absence of a stable, machine-readable aggregation endpoint reduced interoperability and affected services that depended on centralized harvesting^[18].

2024–Present In response to the service interruption, the NDLTD Board initiated a reassessment of its infrastructure role. Technical review concluded that the legacy system was not portable or maintainable under modern security and deployment expectations. Rather than attempt incremental patching, the Board began planning a modular rebuild based on contemporary harvesting, indexing, and API design principles.

The rebuild initiative emphasizes a clear separation between: (i) metadata aggregation and normalization, (ii) public search interfaces, and (iii) versioned machine-readable endpoints for downstream services. This architecture returns to the original Union Archive concept—metadata as infrastructure—while modernizing implementation and governance.

Current Stewardship

Virginia Tech University Libraries is leading the redevelopment of the NDLTD Union Catalog and Global Search infrastructure. The effort replaces legacy components with a modular, standards-based architecture designed to restore reliable metadata aggregation, indexing, and machine-readable access. The goal is to reestablish the Union Archive as a stable part of the global scholarly communications ecosystem, consistent with the original harvest-based vision of the NDLTD.

Contributing repositories

If you operate an ETD repository and want it included, provide an OAI-PMH base URL and (if applicable) a setSpec that contains ETD records. Contact NDLTD via the information on ndltd.org.

References

Edward A. Fox (2005). Improving Education through the Networked Digital Library of Theses and Dissertations (NDLTD). https://fox.cs.vt.edu/~fox/russia05.htm ↵
Fox, McMillan, Srinivasan (2010). Electronic Theses and Dissertations: Progress, Issues, and Prospects. http://hdl.handle.net/10919/9198 ↵
Edward A. Fox (1997). Networked Digital Library of Theses and Dissertations (D-Lib Magazine). https://www.dlib.org/dlib/september97/theses/09fox.html ↵
Hussein Suleman, Edward A. Fox, et al. (2001). Networked Digital Library of Theses and Dissertations: Bridging the Gaps for Global Access - Part 1: Mission and Progress. D-Lib Magazine, 7(9). https://www.dlib.org/dlib/september01/suleman/09suleman-pt1.html ↵
Hussein Suleman; Edward A. Fox (2002). Towards Universal Accessibility of ETDs: Building the NDLTD Union Archive (ETD 2002 paper). https://docs.ndltd.org/collection/etd2002/188_1.pdf ↵
Fineman, R., et al. (2001). UNESCO International Guide for the Creation of Electronic Theses and Dissertations. UNESCO. https://web.archive.org/web/20010405235024/http://etdguide.org/ via the Internet Archive's Wayback Machine. ↵
OCLC (2000). A Public Catalog of Theses and Dissertations (presentation). https://docs.ndltd.org/collection/etd2000/176_1.pdf ↵
OCLC Research (2001). ALCME project page. https://www.oclc.org/research/activities/alcme.html ↵
Vinod Chachra (VTLS Inc.) (2002). A Union Catalog for Networked Digital Library for Theses and Dissertations (presentation). https://docs.ndltd.org/collection/etd2002/196_1.pdf ↵
Networked Digital Library of Theses and Dissertations (NDLTD). (2003). Articles of Incorporation. https://ndltd.org/about-ndltd/governance/ ↵
Hussein Suleman, Edward A. Fox, et al. (2001). Networked Digital Library of Theses and Dissertations: Bridging the Gaps for Global Access - Part 2: Services and Research. D-Lib Magazine, 7(9). https://www.dlib.org/dlib/september01/suleman/09suleman-pt2.html ↵
Hussein Suleman (2012). The NDLTD Union Catalog: Issues at a Global Scale (ETD 2012 paper). http://hdl.handle.net/10757/622568 ↵
NDLTD (2019). 2019 NDLTD Leadership Award (Hussein Suleman). https://ndltd.org/ndltd-awards/ndltd-leadership-awards/2019-ndltd-leadership-award/ ↵
Ma et al. (2017). Scenarios for Advanced Services in an ETD Digital Library. https://www.cs.odu.edu/~jwu/downloads/pubs/ma-2017-scenarios-etds/ma-2017-scenarios-etds.pdf ↵
Schöpfel, J., & Rasuli, B. (2018). Are electronic theses and dissertations (still) grey literature in the digital age? A FAIR debate. The Electronic Library. https://doi.org/10.1108/EL-02-2017-0039 ↵
Greenberg, J., et al. (2014/ongoing). Research data management and the complex digital object in ETDs. (Various ETD Symposium Proceedings). https://ndltd.org/etd-symposia/ ↵
NDLTD Global ETD Search Portal. https://search.ndltd.org/ ↵
NDLTD (2026). Union Catalog. https://union.ndltd.org/ ↵