Abhinaba Ghosh
1 min readJul 9, 2020

--

Hi, now you can extract the text content from doc/docx without installing external dependencies.

You can use the node library called any-text

Currently, it supports a number of file extensions like PDF, XLSX, XLS, CSV etc

Usage is very simple:

- Install the library as a dependency (/dev-dependency)

```
npm i -D any-text
```

- Make use of the `getText` method to read the text content

```
var reader = require(‘any-text’);

reader.getText(`path-to-file`).then(function (data) {
console.log(data);
});
```

- You can also use the `async/await` notation

```
var reader = require(‘any-text’);

const text = await reader.getText(`path-to-file`);

console.log(text);
```

### Sample Test

```
var reader = require(‘any-text’);

const chai = require(‘chai’);
const expect = chai.expect;

describe(‘file reader checks’, () => {
it(‘check docx file content’, async () => {
expect(
await reader.getText(`${process.cwd()}/test/files/dummy.docx`)
).to.contains(‘Lorem ipsum’);
});
});
```

I hope it will help!

--

--

Abhinaba Ghosh

Tech Lead Manager @Postman 🚀 | Space Movie Lover 🪐 | Coder 👨‍💻 | Traveller ⛰️