While Microsoft and Google cross swords over consumer-oriented Internet search, IBM is getting ready to launch its paid corporate search onslaught. IBM Software (www.ibm.com/software) is soon expected to announce general availability of its DB2 Information Integrator, which will search not only HTML data prevalent on the Web but also all the structured and unstructured data that is the lifeblood of corporate IT. That would include the whole spectrum of Microsoft Word documents, Excel spreadsheets, PDF files and calendar entries that fuel business activity.
The software, code-named Masala, has been in beta since June and the company has previously said it would ship in the fourth quarter of 2004. IBM Software execs have long said that a truly effective corporate search engine needs to handle both the rows and columns of information in structured databases as well as the reams of free-form data in desktop applications.
Observers said that unlike Web-based documents, typically in HTML format, these internal documents are not usually interlinked and thus Google's relevancy engine is not a factor. The search engine giant Google scours terabytes of Internet data but the huge bulk of that is in HTML format, observers said. A Google spokesman, however, said the company's technology searches 12 main file formats, including HTML, Acrobat/PDF and Microsoft Office as long as the relevant documents are posted to the Web.