Content Audit Scenarios

A fictitious company named AnyCo implemented SharePoint on Office 365 to maintain support documentation used by their remote customer support staff who work out of their homes.  A 6 month review of the portal found 3 scenarios they hope to address.


Issue 1: Duplicate files in their documentation library

The support group has a document library where different departments across the company can maintain documents they believe will be helpful to the team.  To encourage participation they haven’t put a lot of limits on the ways content is submitted and maintained.  After 6 months they find themselves with a large amount valuable information, and a few issues to be addressed. 

Different people have independently and unknowingly submitted the same pdf’s multiple times, with different file names and titles.  This wasn’t a problem at first because files contain the same information, but as the files are updated they find themselves not being able to differentiate between new and old information.

They don’t know what to do about the duplicates other than to review all files individual which isn’t realistic with 600 documents. 


Issue 2: Conflicting Metadata - terms used twice

New staff are expected to follow a very specific script when walking customers through troubleshooting steps. The products are complicated and the steps are detailed.   These scripts are stored in Microsoft Word format and are tagged with a Model taxonomy field so they can easily be found by staff.   It’s very common for similar models to use the same troubleshooting guide and these files carry multiple tags.

They’ve had a number of cases where a model number is tagged on multiple troubleshooting documents.  This always results in an embarrassing call as the new agent instructs the customer to perform an action not relevent to their product.  The product engineering team maintains these documents for the support group and their high turnover has made it difficult to prevent the issue from reoccurring.


Issue 3: Overlapping Dates

The department maintains policy and procedures documents that staff are expected to review when responding to support requests.  Support is offered 24 x 7 and they use Start and End datetime fields combined with filtering to ensure calls in the early hours apply the correct policies.  Documents are shown in a view that filters using the current date.  They’ve had incidents where the two timespans overlapped and they had two different policies showing at the same time.

These three issues can be addressed with Duplicate Auditor and its ability to compare file properties and metadata between content.  


Resolution 1: Locating Duplicate Files

The Duplicate Auditor can be used to locate the duplicate documents.

Files where the current versions are hash duplicates are easily found, with output that contains the links to document settings where they can be deleted.  Files that were exact duplicates at one time with one of the files edited are known as Branched Duplicates and are found by using the option to review all versions instead of just the current version. Note this is possible only when library versioning is large enough to retain the early versions of a file.

A nightly job was scheduled to run the same comparison daily to prevent future occurrences.


Resolution 2: Finding duplicate managed metadata fields

Comparing standard file properties such as contents, title, and file name is not usually what comes to mine when thinking about file comparison, but this is only the beginning of Duplicator Auditor’s abilities.  It was created to compare all managed properties and custom fields defined by a document stored in SharePoint, including Managed Metadata fields.  In the Model termset used to tag troubleshooting documents it’s able to identify a tag that’s been used twice and see this as a conflict.  Note that this is a matter of configuration. With different settings it can compare the sum of all terms as one setting instead of being composed of multiple parts.  In that scenario the comparison below would not be seen as a conflict.

A job was scheduled to run 4 times throughout the business day to check for this duplicate scenario and will send alerts if conflict is found again on any documents in its scope that are tagged with the Models termset.


Resolution 3: Interpret overlapping dates as duplicate metadata

With date fields, Duplicate Auditor can be configured to interpret two dates as representing a timespan for a sharepoint document and it will interpret overlapping timeframes as a duplicate. The Start and End times below have a clear overlap and this problem is another that can be scheduled and alerted on with Duplicate Auditor.