Opening up Outlook’s data format

In Q4 last year, Microsoft announced through its Interoperability @ Microsoft blog that it was planning to open up its proprietary PST email format used by Outlook.

The data in .pst files has been accessible through the Messaging API (MAPI) and Outlook Object Model (two things of which my understanding is minimal at best), but only if the user has Outlook installed:

In order to facilitate interoperability and enable customers and vendors to access the data in .pst files on a variety of platforms, we will be releasing documentation for the .pst file format. This will allow developers to read, create, and interoperate with the data in .pst files in server and client scenarios using the programming language and platform of their choice. The technical documentation will detail how the data is stored, along with guidance for accessing that data from other software applications. It also will highlight the structure of the .pst file, provide details like how to navigate the folder hierarchy, and explain how to access the individual data objects and properties.

The documentation will be released under Microsoft’s Open Specification Promise, which means that it is protected against patent claims. Other Microsoft Office formats, such as the XML-based .docx and .xlsx, and the older binary formats .doc and .xls, are covered under this promise.

This seems like a big win for users of Microsoft Outlook. Along with CodePlex, which hosts open source projects, it seems like Microsoft is slowly opening things up and making life easier for their customers. It certainly has the potential to make it easier for customers to leave the Outlook platform. From GigaOM:

In the past, if someone was moving from Outlook/Exchange to Gmail or any other platform, there was a pretty tedious process of exporting pieces of data from Outlook into various formats before moving over to the new platform. Basically, once you didn’t have Outlook, that .pst was a useless brick of data. Now in that case you’ll be able to take that .pst file with you and if other apps/platforms build readers, they will be able access that data. So migration to other platforms is a valid use case where there’s some benefit.

Some more ideas as to the reasons why Microsoft is making this change were floated on ZDnet a day after the announcement:

[Rob Helm, an analyst with Directions on Microsoft,] added that he believed Microsoft is trying to wean large customers from storing mail in .PST files or file systems “because doing that makes it hard for organizations to back up all their e-mail, enforce e-mail retention policies, and locate relevant e-mails during legal discovery.”

Not just retention, but perhaps helping organizations mine their email data for knowledge which can all too frequently be lost forever if an employee leaves the company? Here’s an idea: How about a tool that will gather information from emails dating back years and populate a wiki automatically for new employees?

[Rob Sanfilippo, another Directions on Microsoft analyst] added that .PSTs “are used most frequently for archiving purposes and Exchange Server 2010 includes a new server-based Personal Archive feature that gives users a separate mailbox to use for archiving on the server instead of using a PST.” He said this gives weight to the aforementioned idea that Microsoft is trying to help organizations get users off PSTs and onto server storage.”

Then, in February of this year, the promised documentation was released on the MSDN website. Finally, about a month ago, two open source tools that make use of the documentation were released on CodePlex:

  • The PST Data Structure View Tool is a graphical tool allowing the developers to browse the internal data structures of a PST file. The primary goal of this tool is to assist people who are learning .pst format and help them to better understand the documentation.
  • The PST File Format SDK is a cross platform C++ library for reading .pst files that can be incorporated into solutions that run on top of the .pst file format. The capability to write data to .pst files is part of the roadmap will be added to the SDK.

The project has seen some exciting progress, which is good news for organizations that use Outlook. And as you might know, data visualization used to enhance understanding is a favourite topic of mine!

What risk do these developments address within Outlook’d organizations? Knowledge/information management is critical to so many companies. The use, retention and (hopefully) reuse of knowledge developed by employees and stored in email conversations within Outlook will be enhanced through this openness.

Has your organization taken these developments into account in your audits of knowledge/information management and strategy?

Payroll system conversion horror story

Converting their payroll system has resulted in some serious errors to the tune of greater than $1.5 million for the Fort Worth (Texas) school district.

The school district overpaid employees and former employees at least $1.54 million, according to the [internal] audit. It also found that the district’s payroll system lacked proper controls, was cumbersome and inconsistent, and included manual paper entries that led to human error.

Aside from the poor conversion, it doesn’t sound like the new system is all that great if it requires manual entries. I’m assuming the entries are needed because the payroll system doesn’t interface with their general ledger system. Additional review controls over the process between systems is required in that case.

Some trustees are seeking an independent audit of the problems to get more assurance that fraud wasn’t a factor and that all the issues have been resolved.

[Trustee Christene] Moss said she wasn’t comfortable with parts of the report in which the [internal] auditors could not determine why various issues happened.

Yeah, I’d be concerned about that too! As well, the auditors aren’t certain that all the overpayments have been identified and fixed. I think these are the main reasons why an independent audit is needed. The situation calls for a specific engagement looking at the system conversion process and subsequent issues.

Board President Ray Dickerson reiterated that he didn’t think there was a need for a costly external audit. He said controls will be put in place.

[…]

Dickerson said the problems that were found are typical in such a transition.

“No matter how well you plan and train, once you flip that switch, you’re going to find things you didn’t know,” he said.

Uh, not really dude! And certainly not $1.5 million worth of “things you didn’t know” (on a monthly average payroll of $41 million)!

As a not inconsequential footnote, the conversion to a new system was required because the old system’s vendor was no longer going to be supporting it. A quick search for “open source payroll software” turns up many options which will prevent vendor lock-in in the future.

Update: Another story, this one in the Fort Worth Weekly, has more details about the internal audit’s findings and the attempts by the district to have some former employees repay the erroneous amounts.

WSJ on why work tech sucks

You’ll have to hurry before Rupert puts it behind a paywall and blocks Google from indexing it, but the WSJ had a good article recently about technology in the workplace.

At the office, you’ve got a sluggish computer running aging software, and the email system routinely badgers you to delete messages after you blow through the storage limits set by your IT department. Searching your company’s internal Web site feels like being teleported back to the pre-Google era of irrelevant search results.

I don’t have a sluggish computer at work (it’s actually newer and better than my personal laptop), but it does run Windows XP still. Email storage limits should be a thing of the past and likely will be in 5-10 years as more businesses take advantage of cloud computing (or are forced to compete with that level of service). And I think we’ve all had bad intranet search experience!

Even more galling, especially to tech-savvy workers, is the nanny-state attitude of employers who block access to Web sites, lock down PCs so users can’t install software and force employees to use clunky programs.

For me, preventing software installation is much more heinous crime than blocking websites. Both treat employees like children, but the former serves to hurt productivity much more so than the latter. Youtube is a bandwidth hog, but explain to me why the default browser is still IE6?

“Virtual machine” software, for example, lets companies install a package of essential work software on a computer and wall it off from the rest of the system. So, employees can install personal programs on the machine with minimal interference with the work software.

This is an interesting idea. Has anyone experienced this method of organizing a work computer? It seems like a good compromise.

When they get fed up with work technologies, employees often become digital rogues, finding sneaky ways to use better tools that aren’t sanctioned by the IT department.

Is this really what the company (or the IT department) wants? Clearly not.

Instant Messaging (IM) is one area where corporations have really dropped the ball. Before I graduated from school I worked remotely part-time for a dotcom and I used MSN to communicate with my manager much more often than email. And it worked superbly. But that type of environment seems like a dream now.

The article talks about the changes Kraft Foods implemented to take better advantage of new technologies and improve worker productivity. They give employees an allowance for a phone and let them choose which one they want (60% chose iPhones). They even let employees choose their own computer, with the rule that they must consult forums for technical support if they choose not to use Windows.

For many of us, our computers and mobile phones are the primary tools we use to do our jobs. Companies that fail to provide their employees with the best tools will not get the best results.

If you enjoy hardware and software freedom at work, tell me about it in the comments!

Google Docs to surpass Office in a year

Now this is interesting. Comments from Google’s president of the enterprise division indicate he believes that Google Docs will “reach a ‘point of capability’ next year that it will serve the ‘vast majority’s needs.'”

He acknowledged that Docs is currently “much less mature” than Google Mail or Calendar. “We know it. We wouldn’t ask people to get rid of Microsoft Office and use Google Docs because it is not mature yet,” he said.

But this is expected to change in about a year, after the company’s introduces another “30 to 50” updates.

Less mature by a long shot in my experience. Every time I’ve tried to edit spreadsheets using the software I’ve thrown my hands up in frustration very early on in each attempt. Granted, I think I’m nearing the stage of “advanced” Excel user (I should hope I am by now anyway), but I find the assertion that Google Docs will be eclipsing Office in only a year’s time to be unbelievable.

We shall see once those 30-50 updates are released into the wild. For now, hang on to your desktop office suite if you’re producing professional documents.

Has anyone else attempted to use Google Docs (or Zoho) to replace Office for professional work? How did it turn out?

Why your organization should be using open document standards

Microsoft has the enterprise market cornered with its Office productivity suite. Skill with Outlook, Excel and Word is pretty much required in the corporate world. As a result, most companies have significant data tied up in the proprietary binary file formats doc and xls.

This is not to mention all the web-based software designed for Internet Explorer (and usually an obsolete version of IE like 6) which is a similar issue to the vendor lock-in problem. Corporations still overwhelmingly use IE6 as their default browser, but the missed opportunities related to browsers in industry is a topic for another day.

In Office 2007 Microsoft has made its XML-based formats (docx, xlsx) the default, which was certified as an open standard by Ecma International in 2006, and then by ISO in late 2008. But did we really need a second open document standard? We already had OpenDocument, which was an ISO standard as far back as 2006.

OpenDocument is now supported in Office Word 2007 SP2, and there are only a few formatting issues noted by me in informal testing. There are issues around the formula handling in Excel, as Microsoft built support on the 1.1 version of the standard instead of the newer 1.2 and thus strips formulas from ODF spreadsheets even if they’ve been created using the Excel add-in. For the time being businesses might be safer using Office Open XML.

Despite this, ODF is the future. Rob Weir puts it succinctly:

With an open standard, like ODF, I own my document. I choose what application I use to author that document. But when I send that document to you, or post it on my web site, I do so knowing that you have the same right to choose as I had, and you may choose to use a different application and a different platform than I used. That is the power of ODF.

There is a plugin available from Sun for older versions of Word, including: Microsoft Office 2000, Office XP, Office 2003, Office 2007 (Service Pack 1 or higher) or the equivalent stand-alone version of Microsoft Office Word, Excel or PowerPoint.

Governments and educational institutions have been making the move to OpenDocument, and it’s time for the private sector to follow suit. Preserving the integrity of data within critical files should be a top priority. OpenOffice.org is a free and open source productivity suite that with its latest 3.0 release has reached a level of maturity appropriate for business use, and its implementation of the ODF standard is without the caveats associated with Microsoft’s.

The most important benefit is the freedom to choose how to view and edit your data within documents and spreadsheets. But the cost differential between OpenOffice.org and Microsoft Office should also be a factor. And the history of Microsoft’s unique interpretation of the term ‘interoperability’ should be considered if your business chooses to continue to use closed standards.