Office SharePoint fails to fully crawl a content source containing Excel files with think-cell links and the following message is found in the crawl log:
Error in the Site Data Web Service. (Invalid high surrogate character (0xXXXX). A high surrogate character must have a value from range (0xD800 - 0xDBFF).)
This is due to a bug in Excel 2000 and Excel XP that results in the generation of Excel files with corrupt metadata. The problem occurs when a string custom document property with a linked source is added to an Excel document and the source cannot be resolved. In later versions of Excel the document property value is set to something valid (e.g. an empty string). In Excel 2000 and Excel XP, however, the value contains garbage and may cause the Office SharePoint crawler to fail. The Excel documentation explicitly states that the document property value is set to a default value before being updated when the source is resolved, and so this behavior is an Excel 2000 and Excel XP bug.
The problem can be reproduced using the following steps:
think-cell uses custom document properties and, after noticing this behavior, we altered our code to add our document properties with type boolean rather than string. Both Excel 2000 and Excel XP set the document property to a valid boolean value and this value remains valid when the link source cannot be resolved.
Files created using think-cell 5.0 and higher use this workaround and should be successfully crawlable by Office SharePoint.
Please contact Microsoft Office Support directly for advice about repairing corrupt document property values in Excel 2000 or Excel XP generated files.