If you're rolling out Microsoft Copilot — or you just need a defensible data classification posture — you have a problem on day one: the millions of files already sitting in SharePoint and OneDrive are unlabeled. Default library labels handle new files going forward. Auto-labeling policies that look for sensitive information types (SITs) handle the regulated stuff. But what about the long tail of generic business documents that don't trigger a SIT and were created before your label policy existed?
The answer most people land on is the metered Graph assignSensitivityLabel API, which requires Syntex billing. It works, but it's pay-per-call across millions of files. There's a free path that almost nobody talks about: service-side auto-labeling policies with a file-extension match condition. Yes, it works in enforcement, retroactively, against existing files at rest, with no Syntex bill.
It also looks completely broken until you understand the pipeline that sits behind it. This post is the playbook for getting it working and — more importantly — for not giving up when the Purview dashboard lies to you.
The configuration
A policy that stamps Internal on every Office document and PDF in a SharePoint site, regardless of content:
Connect-IPPSSession
# 1. Create the policy
New-AutoSensitivityLabelPolicy `
-Name "Org-DefaultInternal" `
-SharePointLocation "https://contoso.sharepoint.com/sites/MySite" `
-ApplySensitivityLabel "Internal" `
-OverwriteLabel:$false `
-Mode TestWithoutNotifications
# 2. Add a rule with the extension condition
New-AutoSensitivityLabelRule `
-Policy "Org-DefaultInternal" `
-Name "Org-DefaultInternal-OfficeAndPDF" `
-ContentExtensionMatchesWords @(
"docx","docm","dotx","dotm",
"xlsx","xlsm","xlsb","xltx","xltm",
"pptx","pptm","ppsx","ppsm","potx","potm",
"pdf"
) `
-Workload SharePoint
# 3. Let simulation complete, review results, then promote
Set-AutoSensitivityLabelPolicy -Identity "Org-DefaultInternal" -Mode Enable Three things to call out:
OverwriteLabel:$falseis critical. The policy will respect existing higher-priority labels (Confidential, Restricted) and only stamp the unlabeled and lower-priority files. You don't have to scope around your already-classified content.ContentContainsSensitiveInformationis left null. This is what makes the policy free. SIT-based auto-labeling has its own limits, but extension-only is a different code path with no metering.Workload SharePointcan be repeated for OneDrive (-Workload OneDrive) on a separate rule, or scope the policy to OneDrive locations directly.
The trap that makes everyone abandon this approach
Promote the policy to Enable, come back four days later, and the Purview dashboard says:
| Total Labeled | Pending | Failed |
|---|---|---|
| 0 | 0 | 0 |
You'd be forgiven for concluding that extension-based enforcement doesn't work. The simulation matched eight files. The enforcement queue is empty. There's no error. Microsoft Support's first reply will likely tell you to use the Purview portal — which is exactly what you did.
It is working. The dashboard is lying.
How the pipeline actually behaves
Service-side auto-labeling — both simulation and enforcement — depends on the SharePoint crawler indexing pipeline having indexed each file after the policy was activated. Files that haven't been crawled since you flipped the policy on simply won't appear. This is mechanism, not bug, and it's not in the documentation in any prominent place.
Practical implications:
- A brand-new library is invisible to the engine for hours. The crawler has to discover it, index its items, and then the labeling engine evaluates them on its own pass. End-to-end I see ~24 hours in test scenarios. Production sites with regular activity get evaluated faster.
- Files that aren't indexed don't get evaluated. Open files, checked-out files, files in transient states, and files of unsupported types are silently skipped.
- Existing labels of equal-or-higher priority suppress matching. This is correct behavior but contributes to "where did my expected matches go" confusion.
- The dashboard counters update on a cadence that's not coupled to actual labeling activity. I have a tenant where every file in a test library is verifiably labeled and the per-policy
CompletedItemsCountis still zero days later.
How to verify it's actually working
Don't trust the Purview dashboard. Verify three ways, in order of confidence:
-
Read
_IpLabelIddirectly off the SharePoint list item. Use PnP PowerShell:Connect-PnPOnline -Url $SiteUrl -Interactive $items = Get-PnPListItem -List "Documents" -PageSize 500 | Where-Object { $_.FileSystemObjectType -eq "File" } foreach ($it in $items) { $fv = $it.FieldValues Write-Host ("{0,-40} {1,-36} method={2}" -f ` $fv["FileLeafRef"], $fv["_IpLabelId"], $fv["_IpLabelAssignmentMethod"]) }A populated
_IpLabelIdis the GUID of the applied label. Cross-reference withGet-Labelfor the display name. Important gotcha: PnP's-Fieldsparameter sometimes drops_IpLabelIdfrom the output; omit-Fieldsand read the fullFieldValuesdictionary directly. - Activity Explorer in the Purview portal. Look for
Label apply succeededevents with the rule name in the Rule column. This is the closest thing to ground truth that the portal offers. - The dashboard counters. Useful only as a sanity check, not as primary evidence.
The other trap: malformed test files
If you generate test files in a hurry — or worse, ask an AI to produce them — verify the binaries before you draw conclusions. A .docx file with ASCII text inside is not a Word document. The auto-labeling engine will try to parse it as OOXML, fail, and report EncryptedFileNotSupported for every file. This error has nothing to do with encryption; it's the engine's bucket for any unparseable but structurally-expected document.
file Board-Minutes.docx
# Real: Microsoft Word 2007+
# Fake: ASCII text Real Office files via Python:
from docx import Document
d = Document()
d.add_heading("Test", 0)
d.add_paragraph("Real content.")
d.save("Board-Minutes.docx") Same applies to .xlsx (openpyxl), .pptx (python-pptx), and .pdf (weasyprint / reportlab).
When this approach makes sense
- You need to retroactively label existing files at rest in SharePoint or OneDrive
- You don't want to enable Syntex billing for the metered Graph API
- Your "default" classification (typically Internal) is something you want applied to all unclassified Office documents and PDFs
- You're willing to wait 24–72 hours for the first sweep across a new scope
- You have higher-priority labels in place for sensitive content so the catch-all doesn't paint over them
When to use something else
- For brand-new files going forward, library default sensitivity labels are simpler and stamp at upload time
- For sensitive content, SIT-based auto-labeling is the right tool and is also free
- For one-off remediation of a small file set with strict timing, the metered Graph API is fine if you accept the cost
Closing
Service-side auto-labeling with extension matching is one of the genuinely useful free features in Purview, and it's almost completely buried under bad telemetry and a dashboard that doesn't reflect the labeling engine's actual state. If you walk away with one thing: don't trust the policy dashboard. Verify on the file. The engine is doing its job; the UI hasn't caught up yet.