Hi, now you can extract the text content from doc/docx without installing external dependencies.
You can use the node library called any-text
Currently, it supports a number of file extensions like PDF, XLSX, XLS, CSV etc
Usage is very simple:
- Install the library as a dependency (/dev-dependency)
```
npm i -D any-text
```
- Make use of the `getText` method to read the text content
```
var reader = require(‘any-text’);
reader.getText(`path-to-file`).then(function (data) {
console.log(data);
});
```
- You can also use the `async/await` notation
```
var reader = require(‘any-text’);
const text = await reader.getText(`path-to-file`);
console.log(text);
```
### Sample Test
```
var reader = require(‘any-text’);
const chai = require(‘chai’);
const expect = chai.expect;
describe(‘file reader checks’, () => {
it(‘check docx file content’, async () => {
expect(
await reader.getText(`${process.cwd()}/test/files/dummy.docx`)
).to.contains(‘Lorem ipsum’);
});
});
```
I hope it will help!