Interesting program to extract text
- catdoc – extract text from ms word
- xls2cvs – extract text from ms excell
- pdftotext – extract text from pdf
- ppthtml – extract text from ms. power point
Then a simple php function can capture the output eg:
function extractWord($word_file)
{
if (file_exists($word_file)
{
// prevent malicious command execution
exec("/usr/bin/catdoc -w ' . escapeshellarg($word_file), $output);
// $output is an array corresponding to lines of output
return join("\n", $output);
}
}