Some time back, I came across a requirement to process a huge excel file in node js.
An obvious choice was to use Transform stream. Development was not as smooth as I was expecting. So I decided to share my experience.
I have taken a small part of the code which will accept excel file from API(Multer Custom Storage-multipart form data), process file using transform stream and pipe processed file to response.
Step 1: In Controller, create excelWriter using xlsx-write-stream package and attach it in the request object. I have attached it in request object so I can access this from Custom Multer storage callback.
req.xlsxWriter = new XLSXWriteStream();
Now we will use Multer with custom storage to get access to the uploaded file as stream and process it. Once file is processed, we get processed stream access using req.xlsxWriter.getOutputStream() and can pipe that to response object using: xlsxStream.pipe(res)
Step 2: Multer Custom Storage:
Here in _handleFile function, we get access to the uploaded file stream. I have used excel-stream package to transform the input excel file stream to JSON objects stream. We then pipe JSON objects stream to custom transform stream which will do the processing and give processed JSON objects stream.
Now this processed stream is set as input stream for xlsxWriter which will convert JSON stream back to excel stream.
Step3: Custom transform stream:
function getTransformObject() will return new transform stream object which will read and write JSON objects. We are logging current memory consumed by Process to check on memory leaks. Before returning stream, we are declaring array records so that we can do processing at batch(this step is optional if we want to process 1 row at a time).
This transform stream is batch processing 10 rows using saveDataToDB() function.
This function will add a random id to passed data and resolve after 10 mili seconds. This function can be used for any async task like save data in database, etc
package.json(excel-stream forked package):
excel-stream package depends on csv-stream package which needed to be updated, so I have just forked excel-stream and updated package.json.
Hit API with excel attached excel file:
compare input and output files:
We can definitely also add headers in the output file if required.
With node js transform stream and pipe keywork, we can do any complex time-consuming processing for any size file.