Extracting Email Address from a PDF Using Power Automate and Dataverse Plugin Without Third-Party Paid Services
Extracting Email Address from a PDF Using Power Automate and Dataverse Plugin Without Third-Party Paid Services
In today’s world, automation plays a pivotal role in improving business efficiency. One such automation requirement is the need to extract email addresses from PDF attachments within emails. For many businesses, this task needs to be performed on a regular basis. In this blog, we will explore a solution for extracting the email address from the second-last page of a PDF using Power Automate, Dataverse plugin, and a custom integration, without relying on costly third-party paid services.
The Challenge
A business requirement emerged where every time an email with a PDF attachment was received, we needed to extract an email address from the second-last page of the PDF and forward the email to that specific address. Power Automate, while incredibly powerful, does not natively support text extraction from PDFs, making it difficult to directly handle this requirement. Using third-party paid services for this task would be costly, and this was a critical concern for the project’s success. So, how do we resolve this challenge without breaking the bank?
Solution
We turned to Microsoft’s Dataverse Plugin and leveraged a powerful solution: a custom plugin that utilizes a third-party iText library to extract the text from the second-last page of a PDF. This approach was a cost-effective alternative to using commercial paid services, while still allowing us to achieve the desired functionality.
Here’s an overview of how this solution was implemented:
-
PDF Text Extraction with iText Library: The iText library is a powerful tool for working with PDF documents in .NET environments. Using this library, we created a custom Dataverse plugin that could extract the text from the PDF document and focus on the second-last page where the email address is located.
-
Integrating iText Library into Dataverse Plugin: Once we had the necessary iText library for text extraction, we integrated it into the Dataverse Plugin. The plugin would process the incoming PDF and extract the content from the second-last page.
-
Forwarding the Email: After extracting the email address, we used Power Automate to forward the original email to the extracted email address. The automation workflow was built in Power Automate, making it seamless and efficient.
Challenges Faced
Despite the promise of an effective solution, we encountered a few obstacles along the way, particularly when working with the Microsoft Dataverse plugin environment.
Plugin Sandbox Environment Restrictions:
The most significant challenge arose when trying to load the iText library into the Microsoft Plugin Sandbox environment. As iText is an external assembly, we have to add it as an embedded resource. The issue arises because the iText assembly is not found at runtime in the plugin execution environment. Since Microsoft Dataverse plugins execute in a sandboxed environment, you cannot directly reference third-party libraries that are not deployed to the Dataverse server.
Resolving the Issue:
To overcome this hurdle, we had to take a couple of strategic steps:
-
DLL Inclusion: We included the required iText DLLs directly in the plugin project. This allowed the plugin to access the third-party library, but the challenge was still to get the environment to accept it.
-
Assembly Resolver: The next step was implementing an Assembly Resolver in the code. The resolver ensures that the plugin can dynamically load the required assembly during runtime, even within the restrictive plugin sandbox environment. With this solution in place, the plugin was able to successfully use the iText library for text extraction without any further issues.
Note: Please make sure to install version 8.0.1 of the iText library to ensure it works in the plugin execution environment.
To embed the iText library as an embedded resource in a Dataverse plugin, follow these detailed steps:
1. Add the DLL to Your Project:
- Right-click your project in Solution Explorer in Visual Studio.
- Select Add > Existing Item.
- Navigate to the itext dll file, select all dll files, and click Add.
2. Set the Build Action:
- Select each itext.dll file in Solution Explorer.
- In the Properties window, set the Build Action to Embedded Resource.
3. Implement an Assembly Resolver
Add a custom resolver to load embedded assemblies at runtime.
Code for Resolver:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Reflection;
using System.Threading.Tasks;
using System.IO;
namespace PDFTextExtractor
{
public static class EmbeddedAssemblyResolver
{
public static void AttachResolver()
{
AppDomain.CurrentDomain.AssemblyResolve += (sender, args) =>
{
string resourceName = $"{Assembly.GetExecutingAssembly().GetName().Name}.{new AssemblyName(args.Name).Name}.dll";
using (Stream stream = Assembly.GetExecutingAssembly().GetManifestResourceStream(resourceName))
{
if (stream != null)
{
byte[] assemblyData = new byte[stream.Length];
stream.Read(assemblyData, 0, assemblyData.Length);
return Assembly.Load(assemblyData);
}
}
return null; // Return null if the assembly is not found
};
}
}
}
Note: Add all DLL files of the iText library to the project and set the Build Action to 'Embedded Resource.
How It Works:
-
Receive an Email with PDF Attachment: Power Automate triggers when an email is received with a PDF attachment.
-
Extract the Email Address: The Dataverse plugin processes the PDF attachment and uses the iText library to extract text from the second-last page of the document.
-
Forward the Email: Power Automate then forwards the email to the extracted email address, completing the workflow.
Conclusion
By leveraging Power Automate, the Dataverse plugin, and the iText library, we were able to design a solution that meets the business requirements without relying on expensive third-party services. The custom plugin, once implemented, was able to successfully extract the email address from the second-last page of the PDF and trigger a forward action to send the email to the correct address.
This solution is an excellent example of how businesses can integrate various tools, such as Power Automate and Dataverse, with custom code to build cost-effective and powerful automations. It also showcases the flexibility of Microsoft’s ecosystem, allowing developers to solve complex problems while staying within budget constraints.
If you’re facing similar requirements, this solution can be adapted to meet your needs. By combining the power of Power Automate with custom Dataverse plugins, you can automate complex processes efficiently and without the overhead of expensive services.
You can view the complete plugin project solution on GitHub:
GitHub Plugin Project Solution - PDFTextExtractor
For more information on the iText library, visit:
iText PDF Library
Comments
Post a Comment