Opening up Outlook’s data format

June 26th, 2010 · No Comments

In Q4 last year, Microsoft announced through its Inter­op­er­ability @ Microsoft blog that it was planning to open up its propri­etary PST email format used by Outlook.

The data in .pst files has been acces­sible through the Messaging API (MAPI) and Outlook Object Model (two things of which my under­standing is minimal at best), but only if the user has Outlook installed:

In order to facil­itate inter­op­er­ability and enable customers and vendors to access the data in .pst files on a variety of platforms, we will be releasing documen­tation for the .pst file format. This will allow devel­opers to read, create, and inter­op­erate with the data in .pst files in server and client scenarios using the programming language and platform of their choice. The technical documen­tation will detail how the data is stored, along with guidance for accessing that data from other software appli­ca­tions. It also will highlight the structure of the .pst file, provide details like how to navigate the folder hierarchy, and explain how to access the individual data objects and properties.

The documen­tation will be released under Microsoft’s Open Speci­fi­cation Promise, which means that it is protected against patent claims. Other Microsoft Office formats, such as the XML-based .docx and .xlsx, and the older binary formats .doc and .xls, are covered under this promise.

This seems like a big win for users of Microsoft Outlook. Along with CodePlex, which hosts open source projects, it seems like Microsoft is slowly opening things up and making life easier for their customers. It certainly has the potential to make it easier for customers to leave the Outlook platform. From GigaOM:

In the past, if someone was moving from Outlook/Exchange to Gmail or any other platform, there was a pretty tedious process of exporting pieces of data from Outlook into various formats before moving over to the new platform. Basically, once you didn’t have Outlook, that .pst was a useless brick of data. Now in that case you’ll be able to take that .pst file with you and if other apps/platforms build readers, they will be able access that data. So migration to other platforms is a valid use case where there’s some benefit.

Some more ideas as to the reasons why Microsoft is making this change were floated on ZDnet a day after the announcement:

[Rob Helm, an analyst with Direc­tions on Microsoft,] added that he believed Microsoft is trying to wean large customers from storing mail in .PST files or file systems “because doing that makes it hard for organi­za­tions to back up all their e-mail, enforce e-mail retention policies, and locate relevant e-mails during legal discovery.”

Not just retention, but perhaps helping organi­za­tions mine their email data for knowledge which can all too frequently be lost forever if an employee leaves the company? Here’s an idea: How about a tool that will gather infor­mation from emails dating back years and populate a wiki automat­i­cally for new employees?

[Rob Sanfilippo, another Direc­tions on Microsoft analyst] added that .PSTs “are used most frequently for archiving purposes and Exchange Server 2010 includes a new server-based Personal Archive feature that gives users a separate mailbox to use for archiving on the server instead of using a PST.” He said this gives weight to the afore­men­tioned idea that Microsoft is trying to help organi­za­tions get users off PSTs and onto server storage.”

Then, in February of this year, the promised documen­tation was released on the MSDN website. Finally, about a month ago, two open source tools that make use of the documen­tation were released on CodePlex:

  • The PST Data Structure View Tool is a graphical tool allowing the devel­opers to browse the internal data struc­tures of a PST file. The primary goal of this tool is to assist people who are learning .pst format and help them to better under­stand the documentation.
  • The PST File Format SDK is a cross platform C++ library for reading .pst files that can be incor­po­rated into solutions that run on top of the .pst file format. The capability to write data to .pst files is part of the roadmap will be added to the SDK.

The project has seen some exciting progress, which is good news for organi­za­tions that use Outlook. And as you might know, data visual­ization used to enhance under­standing is a favourite topic of mine!

What risk do these devel­op­ments address within Outlook’d organi­za­tions? Knowledge/information management is critical to so many companies. The use, retention and (hopefully) reuse of knowledge developed by employees and stored in email conver­sa­tions within Outlook will be enhanced through this openness.

Has your organi­zation taken these devel­op­ments into account in your audits of knowledge/information management and strategy?

Category: Technology
Tags: , , , , , , , ,

0 responses so far ↓

  • There are no comments yet. Please feel free to post one, or ask a question. I try to respond to all comments.

Leave a Comment