aws-lambda

Should your EC2 be a Lambda?

What applications are a good fit for serverless? Which can utilize the benefits of event-driven architecture, blazing-fast deployment times, incredible scalability, and decreased cost? Use this tool to help!

Accessing Amazon CloudWatch Logs for AWS Lambda

AWS Lambda automatically monitors Lambda functions on your behalf, reporting metrics through Amazon CloudWatch. To help you troubleshoot failures in a function, Lambda logs all requests handled by your function and also automatically stores logs generated by your code through Amazon CloudWatch Logs.

You can insert logging statements into your code to help you validate that your code is working as expected. Lambda automatically integrates with CloudWatch Logs and pushes all logs from your code to a CloudWatch Logs group associated with a Lambda function, which is named /aws/lambda/<span style="color: #ff0000;"><function name></span>.

lambda-text-extractor

<span style="color: #24292e;">lambda-text-extractor</span> is a Python 3.6 app that works with the AWS Lambda architecture to extract text from common binary document formats.

Features

Some of its key features are:

out of the box support for many common binary document formats (see section on Supported Formats),

scalable PDF parsing using OCR in parallel using AWS Lambda and asyncio,

creation of text searchable PDFs after OCR,

serverless architecture makes deployment quick and easy,

detailed instruction for preparing libraries and dependencies necessary for processing binary documents, and

sensible unicode handling

Supported Formats

<span style="color: #24292e;">lambda-text-extractor</span> supports many common and legacy document formats:

Portable Document Format (<span style="color: #24292e;">.pdf</span>),

PDFs with a text layer using Poppler utilities,

PDFs with OCR using Tesseract and Ghostscript 9.21 for PDF manipulation,

Microsoft Word 2, 6, 7, 97, 2000, 2002 and 2003 (<span style="color: #24292e;">.doc</span>) using Antiword with fallback to Catdoc,

Microsoft Word 2007 OpenXML files (<span style="color: #24292e;">.docx</span>) using python-docx,

Microsoft PowerPoint 2007 OpenXML files (<span style="color: #24292e;">.pptx</span>) using python-pptx,

Microsoft Excel 5.0, 97-2003, and 2007 OpenXML files (<span style="color: #24292e;">.xls</span>, <span style="color: #24292e;">.xlsx</span>) using xlrd,

OpenDocument 1.2 (<span style="color: #24292e;">.odm</span>, <span style="color: #24292e;">.odp</span>, <span style="color: #24292e;">.ods</span>, <span style="color: #24292e;">.odt</span>, <span style="color: #24292e;">.oth</span>, <span style="color: #24292e;">.otm</span>, <span style="color: #24292e;">.otp</span>, <span style="color: #24292e;">.ots</span>, <span style="color: #24292e;">.ott</span>) using odfpy,

Rich Text Format (<span style="color: #24292e;">.rtf</span>) using UnRTF v0.21.9,

XML files and HTML web pages (<span style="color: #24292e;">.html</span>, <span style="color: #24292e;">.htm</span>, <span style="color: #24292e;">.xml</span>) using lxml,

CSV files (<span style="color: #24292e;">.csv</span>) using Python csv module,

Images (<span style="color: #24292e;">.tiff</span>, <span style="color: #24292e;">.jpg</span>, <span style="color: #24292e;">.jpeg</span>, <span style="color: #24292e;">.png</span>) using Tesseract, and

Plain text files (<span style="color: #24292e;">.txt</span>)

How to Create Your First Python 3.6 AWS Lambda Function

Let’s learn how to quickly write and run a Lambda function to execute basic Python 3.6 code which uses environment variables as input. This code, which is also available on GitHub under the blog-post-examples repository can be changed so that you can build much more complicated Python programs.

←
1
2
3
4
→