Duplicate Auditor

When solving the problem requires the right tool

Available April 8

Define Audit Criteria

Define the comparison criteria needed for your particular scenario using a combination of 40+ properties, and any custom fields defined on the library.

Assign a point total when these properties match those of another document, or when one of the other available comparison conditions match for 2 documents.

A hash code representing each document along with title, filename, and file extension are probably the most well known, and as you use the tool you’ll learn the subtlety ways that others can be used.

Ranked Results

The comparison engine returns an Excel workbook with a list of document pairs  and  score that represents the likelihood a problem needs to be addressed.  Output is highly customizable to show only the columns needed to visualize your content.  Hyperlinks can be added  that navigate directly to the content instead of requiring you to pick your way through a series of libraries, views, ID’s to find the document that needs troubleshooting.

This file is extremely valuable when presenting problems to your end users that need their attention.

Libraries, Webs, Sites

The tool works seamlessly across these boundaries.  It’s simply not an issue.  None of these are an issue results are returned sorted by their score to let you prioritize scenarios that have to be addressed immediately.

Job results can be customized to show only the properties and fields needed to solve the problem at hand, with links to the documents themselves to you don’t have to hunt and peck your way through your content.

Version Support

The toolkit allows comparisons of current file versions to all previous versions of other documents, and their metadata.

This supports solving a fiendish but all-too-common problem where two identical files are unknowing branched by different authors, unknown to each other.

Version support is also invaluable when troubleshooting requires tracking  metadata changes to content.


Visualize Content

What problems could you solve if you easy access to a data dump of all properties, fields from all available versions of your documents, from any combination of libraries?   This type of visualization of sharepoint content  is a game changer and may be the single most useful tool offered.

Regular scheduling of this report to is a good way to maintain snapshots of your data available over time.


Configure Alerting

The criteria you identify as a duplicate can be configured to output to a SharePoint list where the power and flexibility of MS Flow can be used to notify the correct person based on the content found.  Or start simple with SharePoint’s out-of-the-box alert system and customize over time.

The document set approach does double duty as a job history log that shows details of very job run.

Schedule Jobs

Each job you define can be scheduled using Windows Task Scheduler to ensure the integrity of your content at whatever frequency you require.

The Job Output library does double duty as a Job Tracking system that shows management details for all jobs over time and allow quick comparison of results.


Maybe you need to compare documents in an InfoPath library by running a series of XPath expressions on each xml file, or compare files contained in a zip directory.

The system supports these types of advanced scenarios by allowing developers to create custom providers that inherit from a base class.