MSDN Archive Home
Help and FAQs
Near Real Time Classification using File Classification Infrastructure (FCI)
All Resource Updates
Change History (all pages)
Resource Page Description
This resource page describes a sample near real time classification application that builds on the File Classification Infrastructure (FCI) in Windows Server 2008 R2. This sample tool uses public FCI api to demonstrate one possible way for implementing near real time classification solution. The code sample is developed using C# and Visual Studio 2008. The sample includes a Visual Studio project file that can be used to compile and run the code.
This sample shows 3rd party developers how to interact with FCI in Windows Server 2008 R2 for the purposes of extending the inbox functionality. This sample functionality is targeted at enabling customers and partners to implement a complete end-to-end scenario with a little investment in code that builds on in-box FCI capabilities.
The ever expanding storage requirements and capacity has resulted in increase in data management cost. Effective data management has become more important than ever.
File Classification Infrastructure (FCI)
, shipping with
Windows Server 2008 R2
, is our attempt to help administrators
gain insight into their data
. FCI is also an important step towards fulfilling our commitment to provide best tools to administrators for
managing data more effectively, reduce cost and mitigate risk
The following blog entries dive dipper into the FCI.
Windows Server 2008 R2 File Classification Infrastructure – Managing data based on business value
Classifying files based on location and content using the File Classification Infrastructure (FCI) in Windows Server 2008 R2
Dealing with stale data on File Servers
Customizing File Management Tasks
FCI allows administrators to schedule classification and policy execution. However, in some scenarios, administrators may want to classify and optionally apply policies on a file in real time - right when a file is created or modified. For e.g. whenever an employee from a finance department creates a spreadsheet, administrator may want that spreadsheet to be classified as High Business Impact file and optionally apply leakage prevention policy on it.
Generally, a user creates a file and edits its content in long editing session. During editing session, the user may save the file multiple times. A real-time file classification solution classifies the file right after each modification. Classifying a file utilizes system resources. In some situations, it is useful to classify the file only after it is stable, for e.g. at the end of editing session, to conserve system resources. This is referred as near real time file classification.
This resorce page presents a solution that builds on in-box FCI capabilities to achieve near real-time file classification. The solution presented here allows administrators to specify UpdateWindow - time period to wait for file to become stable before
classifying it. The UpdateWindow determines how close the solution is to a real-time solution.
Build a sample near real-time classification application. This application should classify specific file(s) after UpdateWindow time period has elapsed since they are created and/or modified. Optionally, the sample application should also be able to apply policy to classified file(s), if it meets policy application criteria specified using a classification property.
Build a near real-time classification application with 0% file miss rate. Please refer to Alternate Design section for more details.
Supporting rich configuration such as complex condition involving multiple file properties.
Customized UI for configuration.
Automated test suit for the sample application.
A console application that classifies and displays files being processed in near real time.
Monitor user actions (file create, rename, modify, change in file size, system attributes, last write time, or security permissions of a file) on the target scope.
Target scope can only be a path on local computer. If a target scope contains a mount point, mount point and all its subdirectories are excluded from the target scope. However, a mount point or any directory underneath it, is a valid target scope. For example, if the path C:\foo\bar is a valid mount point, then
Target scope C:\foo excludes c:\foo\bar and all its subdirectories.
Target scope C:\foo\bar is a valid target scope.
Target scope C:\foo\bar\subDir is also a valid target scope.
Allow users to specify wildcard filters to describe target scope to monitor such as "D:\Dir1\*.docx".
Allow users to specify action triggering policy condition using classification property. Only one property condition is supported in the form of "PropertyName=PropertyValue".
Support user specified policy action in terms of command to be executed. While executing this command expand the path of the file being processed. This command execution is triggered by above specified action triggering policy condition.
Application runs as administrator and executes the specified action command using administrator credentials.
Embedded usage help.
Takes a <string> specifying Path of the directory to be monitored for the changes. This is a required arguement.
Takes a <boolean> specifying True, if you want to monitor subdirectories; false, otherwise. Default is false.
Takes a <string> specifying Filter used to determine what files are monitored. Filter string examples:
Monitors the following files
All files. Default.
All files with a "docx" extension.
HR review 2009.ppt
Takes a <string> specifying Policy that governs execution of the command specified in the form of "PropertyName=PropertyValue". This parameter is optional.
Takes a <string> specifying Command to be executed when the policy evaluates to true for a file. This parameter is optional.
Takes a <string> specifying Arguments to be passed to the command. Use [FILEPATH] as a substitute for full path of the file being processed.
Takes a <number> specifying Time duration (in seconds) to wait after last change before classifying file. It defaults to 60.
Takes a <number> specifying Time duration (in seconds) for caching information about recently classified files. It defaults to 300.
Takes a <number> specifying Maximum number of attempts to be made to classify an erroneous file. It defaults to 10.
Get embedded help:
Monitor and classify all files under c:\foo.
Monitor and classify *.docx files under c:\foo including subdirectories under it. Invoke sample.exe with filename as parameter for files that have BusinessImpact=HBI classification property.
NearRealTimeClassification.exe /Path:c:\foo /IncludeSubdirectories:true /Filter:*.docx /Policy:BusinessImpact=HBI /Command:sample.exe /CommandArguments:[FILENAME]
All parameters are optional, unless specified as required.
If a policy is specified, then Command argument is required.
Data Flow Diagram:
This sample tool uses .Net System.IO.FileSystemWatcher class to monitor file changes.
Overview of FileSystemWatcher:
FileSystemWatcher listens to the file system change notifications and raises events when a directory, or a file in a directory, changes. FileSystemWatcher uses a buffer allocated from non-paged memory to receive file change notifications from the Windows operating system. If there are many changes in a short time, the buffer can overflow. This causes FileSystemWatcher to lose track of changes in the directory. In such cases, it only provides blanket notification and is silent about lost events. Note that it is possible to increase the size of the buffer. However, increasing the size of the buffer is expensive, as it comes from non-paged memory that cannot be swapped out to disk. Currently, this sample does not allow to change size of this buffer. Allowing administrators to increase the size of this buffer is a possible enhancement to this sample. See Enhancements section for full list of enhancements.
File Discovery using FileSystemWatcher:
This sample tool uses .Net System.IO.FileSystemWatcher class to receive file create, change and rename notifications. It registers event handlers for the following events raised by FileSystemWatcher class.
Changed: when changes are made to the size, system attributes, last write time, creation time, or security permission of a file or directory in the path being monitored.
Created: when a file or directory is created in the path being monitored.
Renamed: when a file or directory in the path being monitored is renamed.
This sample maintains a queue of files that need to be classified. Create, change, and rename event handlers append a new file entry (consisting of file name and current timestamp) to this queue. If a file entry already exists in the queue, then it is removed and appended at the end of queue with new timestamp. Hence, all the file entries in the queue are always sorted based on the time of last change notification received for them.
FileSystemWatcher may raise multiple events for one user file operation, for e.g. moving a file to a directory being monitored raises a file create and file changed events. Also, during an editing session, user may save document multiple times. It would be useful to avoid classifying a file multiple times in such scenarios to conserve system resources. One way to do this is to wait for file to become stable before classifying it. This sample tool uses UpdateWindow parameter to indicate time period to wait.
A file entry sits in the above queue at least for UpdateWindow time period. Once it is older than that, the corresponding file is ready for processing. Note that because all file entries are sorted using the timestamp, they are dequeued from front end of the queue and processed one at a time. A possible enhancement to the sample tool is to process multiple eligible file entries concurrently using multiple threads. See enhancements section for full list of possible enhancements.
A file entry is processed by classifying the corresponding file. To do so, this sample tool uses File Classification Infrastructure APIs. An instance of FsrmClassificationManager is created to classify file. The file is classified by calling IFsrmClassificationManager::EnumFileProperties api with
None option parameter. This api classifies file on the fly. This api also retrieves all properties from all enabled storage modules. However, currently this api does not set returned classification properties on the file. Hence, this sample tool next calls IFsrmClassificationManager::SetFileProperty api to save all file properties on the file. Note that properties are stored in all registered storage modules including in-file storage module, for e.g. docx files.
When a property is saved using in-file storage module, it results in change in file. FileSystemWatcher notices this change and fires another change event. Processing that event results in re-classification of file and these activities continue in a loop.
Also, consider a scenario, in which a file is updated without affecting its classification properties. For e.g. changing attributes of a file (hiding or un-hiding file) right after it is created, in the absence of a classification rule that depends on this file attribute. In such scenarios, FileSystemWatcher raises create and change events. If these two events occur during UpdateWindow time period, this sample tool classifies file only once. However, if the second change event occurs only after UpdateWindow time period has elapsed since the first create event, this tool classifies the file twice. One can argue that this can be avoided by increasing the UpdateWindow, after all its a parameter to tool. However, the longer the UpdateWindow time period the further away is this tool from being real time. Nonetheless, it would be useful not to set file properties again on file in such scenarios.
In order to avoid running into above mentioned loop and avoid setting file properties again if there is no change in properties, this sample tool maintains in-memory cache of recently classified files. After classifying a file, the tool first checks if the file exists in the cache of recently classified files. If so, it checks if the file classification properties are changed. If file classification properties are not changed, this tool skips setting file properties and policy evaluation on the file. Only if classification properties have changed, or the file is not found in the cache, this tool sets all FSRM classification file properties on the file by calling IFsrmClassificationManager::SetFileProperty api and then moves on to evaluating policy condition. Note that this cache can grow infinitely large if not pruned periodically. This sample tool uses CacheWindow parameter to indicate time period after which a file in this cache is eligible to be discarded from cache. This tools periodically discards all files older than CacheWindow time period from the cache.
Policy Action Execution:
Once all the classification properties are set on the file, this sample tool evaluates the policy condition, only if user has specified one. If the condition evaluates to true, this sample tool invokes the user specified command in a separate process and waits for it to finish. This sample tool also passes the user specified command arguments to the command, after expanding
macro (if it exists in the user specified command arguments).
Only after successful policy action execution (if requested by user) or successfully setting properties (if policy action was not requested by user), recently classified files cache is updated to include the current file and its properties. The file entry from the files to be classified queue is removed.
In case of any error while classifying a file, the cache is not updated. In case of File does not exist error, the file entry is deleted from the files to be classified list. In case of any other error (for e.g. sharing violation), the file entry in the files to be classified queue is updated with new time stamp and it is moved to end of queue to reflect the change in time stamp. This results in periodic retry being made on the same file. Administrators can control maximum number of classification attempts using MaxAttempts parameter to the tool. This step also ensures that error in one file does not end up blocking other file entries from being processed from the files to be classified queue.
5. Alternative Design
One possible enhancement to this sample tool is to use USN Journal based file system watcher that does not suffer from buffer overflow problem of .Net System.IO.FileSystemWatcher class. However, USN Journal based file system watcher adds more complexity, related to keeping track of last USN record read, Journal wrap etc. Please refer to USN Journal documentation for more details.
• Hidden files are not ignored.
• Monitored files may be reported using short 8.3 file name format.
• Monitoring mechanism may miss few triggers during heavy activity.
• Support rich set of operators for specifying policy conditions. Please refer to FCI - File Management Task UI for examples.
• Allow administrator to set BufferSize for the FileSystemWatcher.
• Use USN Journal based file system watcher implementation in order to guarantee 0% file miss rate.
• Support timeout for policy action execution to avoid long running policy action execution blocking classification of files.
• Use thread pool for policy execution with N number of threads executing policy actions on multiple files in parallel.
8. Test Cases
This sample tool is tested to work as desired in the following scenarios.
Manual test cases:
• Command line parser tests: All combinations of required and optional arguments with
• Few test cases from below when policy is not specified.
• Few test cases from below when policy is specified, but the property in the policy is not found on the file.
• All test cases from below when the policy is specified and the property in the policy is found and command is executed with
and other arguments.
• Copying small text file to the target directory from outside the scope of directory being monitored.
• Copying large text file (10 gB) to the target directory from outside the scope of directory being monitored.
• Copying files to the subfolder in the target directory from inside the scope of directory being monitored.
• Deleting files from target directory being monitored.
• Creating text files in the target directory using explorer.
• Programmatically creating text files in the target directory.
• Programmatically appending to the existing text files in the target directory.
• Sharing violation condition by keeping a file open in exclusive mode programmatically and using MS Word 2007.
• Programmatically creating a text file, writing to file, flushing writes to disk and closing handle in the target directory with and without wait period between consecutive steps.
• Renaming a text file in the target directory.
• Renaming a word (docx) file in the target directory.
• Editing a word file using MS Word 2007 from target directory and saving file when file is still open.
• Editing a word file using MS Word 2007 from target directory and saving file while closing it.
• Monitoring c:\foo when c:\foo\bar is a mount point. Any changes in c:\foo\bar are not monitored.
• Monitoring c:\foo\bar when c:\foo\bar is a mount point. All changes in c:\foo\bar are monitored.
• Monitoring c:\foo\bar\subDir when c:\foo\bar is a mount point. All changes in c:\foo\bar\subDir are monitored.
**Delete the following note before publishing **
This resource page is currently in setup mode and only available to coordinators and developers. Once you have finished setting up your resource page you can publish it to make it available to all MSDN Code Gallery visitors.
To get your Resource Page ready to publish, you should do the following:
Make any changes to the details of your resource page
Here you can enable or disable functions of your resource page. You might want to turn on the Issue Tracker to allow users to provide feedback on your resource, or if you have a resource that does not involve a code sample, you may want to turn off the Releases tab.
Make sure your resource page description is detailed enough to let people search for your resource.
Add your code sample or other resources to the resource page
If you’re uploading code, go to the Releases tab and create a new release to house your code. Creating a release allows you to have the license properly displayed when people download your code, as well as provides a download count.
Edit your Wiki page to attach any resources you may have that are not source code.
If you want to let someone see your resource page before it is published, go to the People tab and add them to your resource page
This will let you add other team members who may be contributing to your resource, or just show it off and get feedback from someone you trust.
Tag your resource page with descriptive tags to make it easier for people to find your resources when browsing the gallery.
Publish your resource page so it becomes visible to everyone!
Additional information on starting a new resource page is available here:
Resource Page Startup Guide
Sep 4 2009 at 9:03 PM
, version 3
Sign in to add a comment
Fri Sep 18 2009 at 7:00 AM
Related Resource Pages
Windows Server 2008 R2
File Classification Infrastructure (FCI)
More Tags ...
Visual Studio 2005
Visual Studio 2008
Visual Studio 2010
Manage Your Profile
MSDN Flash Newsletter
© 2008 Microsoft Corporation. All rights reserved.