404 File Not Found: How to Combat Link Rot
Have you ever clicked on a promising link only to encounter a “404 File Not Found” error? Do your bookmarked links vanish over time? Have hyperlinks in your documents stopped working? This is known as link rot. How can you prevent it?
What Is Link Rot?
Vast amounts of information and data exist in digital-only format. It is most often made available through the web. But the digital world is dynamic. Unlike printed materials that are archived in libraries, online content is at the mercy of constant updates and shifting digital landscapes. For you, this volatility poses a significant risk: a vital source that once bolstered your argument might not be accessible when you need to refer to it.
Digital information is inherently ephemeral. Unlike printed material that can survive centuries, online content is often transient. The ease with which content is updated or removed—even for legitimate reasons such as privacy concerns or site redesigns—creates a precarious environment for any professional who depends on the permanence of the written word. For you, this instability means that the digital trail left behind by prior research or online evidence may vanish without a trace.
Imagine citing a blog post, legal commentary or official document in your brief, only to find that the hyperlink no longer works. This is the essence of link rot. It occurs when a URL, once active and accessible, begins to deteriorate. As you depend increasingly on digital sources, the erosion of this content can undermine the integrity of your research and, by extension, the reliability of your legal arguments.
In 2014, Harvard law professors Lawrence Lessig, Jonathan Zittrain and Kendra Albert reviewed links published in three Harvard legal journals, as well as the links across all published U.S. Supreme Court opinions. More than 70 percent of the URLS in the journals and 50 percent of the URLS within the U.S. Supreme Court opinions suffered from “reference rot” –– the link did not produce the information originally cited. In 2021 Professors Zittrain, Bowers and Stanton examined link rot and content drift in the New York Times since the launch of the Times website in 1996 through mid-2019 and found that a quarter of the more than 2.2 million hyperlinks were broken. Pew Research Center estimates that from 2013 to 2023 a quarter of everything put on the web is inaccessible.
Why Link Rot Happens
In your daily practice, you may click on a link to review a legislative update, a court decision or a legal journal article, only to find that the page is no longer available. This frustrating experience occurs for several reasons:
- Website restructuring. Law firms, government agencies and academic journals regularly update their sites. In doing so, URLs can change without proper redirection.
- Content removal. Websites may remove content for a variety of reasons, such as privacy concerns or the decision to update outdated material.
- Domain changes. When organizations rebrand or their domains expire, previously accessible pages may vanish.
- Temporary server issues. Occasionally, a server error might render a page temporarily inaccessible.
- Content goes behind a paywall. Content that was freely available is placed behind a paywall after a period, or the content provider is bought and is no longer available for free.
A more recent phenomenon is Google’s decision to stop supporting the URL shortening service that turned a lengthy URL into a shortened version starting with “goog.gl.” While Google deprecated the service in 2018, the company announced that after August 25, 2025, URLS starting with “goog.gl/” will return a 404 HTTP error. Google will begin to warn developers and users who click shortened links by displaying a page with a warning about the 2025 expiration before redirecting users to the original target page.
Finding Missing Content
When you encounter a “404 Page Not Found” from web search results your recourse in finding the missing content has become more challenging. In the past, search engines like Google and Bing provided a link to a cached page, a snapshot of the page when it was crawled and indexed. However, Google has ceased providing the cached view in the search results and no longer supports finding a cached version of a page by appending “cache:” before the page’s URL in your browser bar. Similarly, the Bing search engine has removed the cache link in December 2024.
The Internet Archive Wayback Machine is now taking on the heavy lifting of preserving web pages and has been doing so since 1996. In addition to 835 billion web pages archived, the nonprofit organization also has preserved 44 million books and texts, plus multimillions of audio recording, videos, images and software programs. Though the Internet Archives has been at odds with for-profit publishers over copyrighted materials, the work that they do helps preserve the digital record for posterity.
If you run across a broken link, you can check the Internet Archive Wayback Machine to see if the page was archived. Go to https://web.archive.org/ and enter the URL or words related to the site’s home page. In some cases, for larger websites, the Wayback Machine captures the site over time, so you can view iterations by date.
Google is now using the Wayback Machine as a replacement for cached pages. If you perform a Google search and the link returns a 404 error, go back to the search results and click on the three horizontal dots. A panel will appear that provides more information about the web page. To get to the Wayback Machine snapshot click “More About This Page” on the panel and a new page loads in the browser. On that page click the link in the “See previous versions on Internet Archive’s Wayback Machine.” If you are lucky the page will appear in the Wayback Machine archive.
Another method, though it is haphazard, is to deconstruct a web page URL. If you find a link to a page that is resulting in a 404 error sometimes you can find the unique page again from the hosting website, depending on the website CMS (content management system). For instance, if this page is broken https://www.ncbar.org/members/resources/center-for-practice-management/cpm-icymi-newsletter/ you can start trying to see if the resource is still available by moving backwards in the URL. Try https://www.ncbar.org/members/resources/center-for-practice-management/ and see if the page still exists.
Another option is to view a sitemap. A sitemap is like the outline of a website, meant to help web crawlers find pages. To view a sitemap, type the URL and then sitemap.xml like this: www.yourwebsiteurl.com/sitemap.xml. The page you see may be written in code but could help if you are desperate.
Finally, see if the content is on another website. When you find a promising looking result in a Google search only to find the page link results in a 404 error, copy the page title and any description and try using that information as the basis of your new search.
Preserving Content
There are ways to preserve information that you find online. Let’s break down specific remedies into two main categories: one for your personal reference library and another for outward-facing documents that support your client and court communications.
For Your Personal Reference Library
Print to PDF
One of the simplest ways to preserve online content is to create local copies. When you find a source that you know is important, immediately save a PDF version of the page. Most modern browsers have built-in “print to PDF” functionality that allows you to capture the entire web page. Right click and choose Print and choose to Save as PDF or print to your PDF software like Adobe PDF in the panel that appears. In some instances, print to PDF may strip out ads, videos or hovering content. If you need to preserve those, as well as the date/time and URL of the capture, click on More Settings and scroll down and check the boxes under Options for Headers and Footers and Background Graphics. Want to capture a “clean” version of a web page that strips out extraneous ads and makes it more readable? In Microsoft Edge click on the circle with three dots next to the URL and choose Immersive Reader and then print that to PDF by right clicking.
Screenshots
If you need to capture a web page exactly as it appears, and you do not necessarily need it to be searchable you can take a screenshot. If you are using Windows 10 or 11 you have the Snipping tool built into the operating system. However, it is difficult to capture an entire web page even if you expand it to full screen with F11. In the Mac operating system press Command + Shift + 3 to create a screenshot.
To capture a scrolling screenshot of a web page, the screenshot utility by TechSmith, Snagit, is tried and true. An individual Snagit license is $39 per year. Install Snagit and choose Scrolling Screenshot then scroll vertically or horizontally on a web page to capture the entire page. The resulting screenshot will be saved in the Snagit Editor. It preserves the page and adds metadata including the application, website URL, title and the date/time it was created. You can also annotate the screenshot and OCR the text.
Web Clipping Tools
For a more sophisticated approach, consider using web clipping tools or browser extensions like Evernote Web Clipper, OneNote or similar applications. These tools can capture not only the visual appearance of a page but also its underlying structure and metadata. Some services even offer tagging and annotation features, which can help you organize your captures in a meaningful way.
Evernote
Evernote has been on the market for a long time. It is useful for taking notes and collaboration, but it is also an excellent tool to help capture web content, mainly through the web clipper browser extension. If you plan to use Evernote to capture the entire text of a web page, you will need to pay for a professional or teams plan to have enough storage space to store your saved content.
Through the web clipper browser extension, you can save articles, web pages and screen captures directly to Evernote notebooks. They are searchable and automatically date and time stamped and include a link to the source content. When you clip a web page you can choose article, simplified article, full page or screenshot. You can add it to a notebook of your choosing and insert notes, tags and tasks.
OneNote
OneNote comes with the Windows operating system and with Microsoft 365 subscriptions. Like Evernote, you can save a full page, region, article and more with the web clipper browser extension. Choose the notebook where you want the capture to be stored. It adds the page title, page URL, date and time and more information. If you want the capture to be full text searchable and editable be sure to capture the page as an article, versus a full page, which results in an image.
OneNote notebooks can be shared on a server or through a Microsoft 365 subscription, making this an excellent way to share research notebooks with others in the firm and capturing content that could later go missing.
For Outward-Facing Documents
For critical legal documents and evidence, you might need to go beyond web clippers or PDF captures. There are tools that you can use to preserve web content through a third party so that others may access it, and you do not need to be as concerned that hyperlinks shared with third parties suffer from link rot.
Archive.org
For limited use you can capture a web page on the Archive.org website by copying and pasting in the URL under Save Page Now. You can do this without creating a login, but if you do sign in you can save a screenshot, save to your own web archive, get an email of the results and more. Accounts are free. Archive.org also has additional archiving and data services under Archive-IT that includes services like web archiving, text and data mining, digital preservation and more.
FreezePage
FreezePage describes itself as being like the Wayback Machine and Google Cached pages. It specifically lists lawyers and other professionals as intended users, as well as journalists and content managers. The service has been in operation since 2003, and they state they have never experienced data loss. They also mention that their Terms of Use do not, however, give any guarantees.
There is a free plan, though a premium account will provide more storage space, priority access and advanced features. A professional plan costs $249 for 12 months with 1000 MB of storage.
Perma.cc
Perma.cc “helps scholars, journals, courts and others create permanent records of the web sources they cite.” It is developed and maintained by the Harvard Law School Library in conjunction with university law libraries and other organizations in the “forever” business.
When a user creates a Perma.cc link the Perma.cc archives the reference content and generates a link to an archived record of the page. No matter what happens to the original source, the archive will always be available through Perma.cc.
The site states “Organizations (such as law firms, publishers, non-profits and others) or individuals not associated with an academic institution or court are both able to use Perma via paid subscription.”
Page Vault
Page Vault provides legally admissible screen captures of web pages, video, documents and social media. There are several ways to use Page Vault, with an “on-demand” version to employ the Page Vault team to assist in capture or use their tools in house. The tools are built for legal, so they provide affidavits and metadata. Pricing varies depending on what is being captured, with free quotes available.
Best Practices for Law Firms
In addition to collecting and archiving fragile web content, firms can deploy a few best practices to ensure that efforts made to archive content are documented.
Standardized Citation Format
Use a consistent format for digital citations that includes the date of access and details about the archival process. For instance:
“Source accessed on [Date]. Archived version available at [Archived URL] via [Tool Name].”
This practice not only reassures your audience of the source’s reliability but also simplifies future reference checks.
Version Control and Documentation
Maintain version-controlled documents for key submissions. This practice allows you to track changes in citations over time and provides an audit trail if any discrepancies arise later.
Client Briefings
In your communications with clients, briefly explain how you handle digital citations and the measures you take to ensure that all referenced materials remain available. This transparency can build client confidence and demonstrate your commitment to thorough research.
Link rot represents a significant challenge in the digital age, particularly for professionals who rely heavily on online sources. The transient nature of digital content, coupled with frequent updates, removals and domain changes, creates an unstable environment where vital information can disappear without warning. This instability not only undermines the integrity of research but also poses a risk to the reliability of legal arguments and other professional work.
Understanding the causes of link rot—such as website restructuring, content removal, domain changes, server issues and paywalls—can help mitigate its impact. Additionally, staying informed about changes in digital services, like Google’s decision to stop supporting its URL shortening service, is crucial for maintaining access to important resources.
Ultimately, while the digital landscape offers unparalleled access to information, it also requires vigilance and proactive strategies to ensure the longevity and reliability of online content. By recognizing the risks and taking steps to address them, professionals can better navigate the challenges posed by link rot and continue to leverage the benefits of digital information.
Author’s Note: Consider supporting Internet Archive and the Data Rescue Project and other efforts to preserve our digital heritage.
©2025. First published in Law Practice Magazine Volume 51, Issue 2 March/April 2025 by the American Bar Association. Reproduced with permission. All rights reserved. This information or any portion thereof may not be copied or disseminated in any form or by any means or stored in an electronic database or retrieval system without the express written consent of the American Bar Association or the copyright holder.