extract text from ms office

Interesting program to extract text

  • catdoc – extract text from ms word
  • xls2cvs – extract text from ms excell
  • pdftotext – extract text from pdf
  • ppthtml – extract text from ms. power point

Then a simple php function can capture the output eg:

function extractWord($word_file)
if (file_exists($word_file)
// prevent malicious command execution
exec("/usr/bin/catdoc -w ' . escapeshellarg($word_file), $output);

// $output is an array corresponding to lines of output
return join("\n", $output);

extracting text from office and pdf file

Published by


Different. In a good way

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s