Centralizing Data and Enhancing Workflow Orchestration for a National Pathology Lab

Challenge

A national pathology laboratory partnered with us to implement a centralized Laboratory Operations Data Layer, leveraging APIs to transform disparate data sources into actionable information. By selecting AWS as their cloud infrastructure, this laboratory set the foundation for their next phase—Workflow Orchestration—while significantly reducing IT ownership costs and improving data management processes.

Background

The national pathology lab provides molecular and diagnostic testing services. Their challenge lay in managing large volumes of HL7 messages from multiple systems and transforming this data into a unified, actionable format. They needed a cloud-based data infrastructure that could ingest, process, and integrate with their existing CRM while offering seamless data visualization and error handling. We were tasked with designing and implementing a solution that would meet all these requirements.

Approach

The first step was assessing various cloud infrastructure platforms—Snowflake, AWS, Azure Databricks, and Mulesoft—based on several key criteria, including integration capabilities, setup speed, and cost efficiency. AWS was chosen as the best fit due to its existing internal use by the laboratory, the ability to eliminate third-party services, and its cost-effectiveness.

Once the infrastructure was defined, we designed the solution, which leveraged AWS components such as Amazon API Gateway, Amazon Kinesis, and AWS Lambda. The architecture enabled the lab to ingest HL7 messages, handle errors, and seamlessly integrate data with their Salesforce CRM. Our solution relied on a combination of the following AWS services:

Amazon API Gateway: Served as the data entry point, allowing HL7 messages encapsulated in JSON to be processed securely.
Amazon Kinesis: Handled real-time data streaming from the lab’s systems and stored it in Amazon S3 buckets.
AWS Lambda: Performed data transformation and validation, ensuring that the data was accurate before moving into the next phase.
Amazon Athena: Allowed for SQL-like queries on data stored in Amazon S3, ensuring rapid access to critical insights.
Amazon Glue and Amazon Redshift: Managed large data sets and ensured that only the most up-to-date information was stored for reporting and integration with the CRM.
Amazon AppFlow: Streamlined the data integration with Salesforce, ensuring accurate and timely reporting.

Solution

The final architecture created a robust Laboratory Operational Data Layer capable of handling mass amounts of HL7 messages, transforming them into actionable data, and integrating with the laboratory’s CRM system. This included:

Data Transformation and Validation: HL7 messages were ingested via API Gateway and streamed through Amazon Kinesis, where AWS Lambda functions validated and transformed the data.
Error Handling and Storage: Validated messages were stored in a “Processed” S3 bucket, while errors were directed to a separate bucket for review.
Data Querying and Reporting: Amazon Athena allowed for easy querying of stored data, while Glue Jobs ensured that only the most current data was available for operational use.
CRM Integration: Data was synced with Salesforce using Amazon AppFlow, ensuring that reports and CRM records were always up-to-date.

To ensure compliance with healthcare data regulations, a de-identification process was implemented to remove personal health information (PHI) from the data during processing.

Impact

The centralized Data Layer resulted in significant operational improvements for the national pathology lab:

Standardized Communication: The lab now benefits from standardized communication between disparate systems, improving overall data quality.
Seamless Interoperability: Integration with AWS and Salesforce has enabled a smoother flow of data across platforms, reducing the time needed for manual processes.
Agile Integration: The lab’s ability to integrate new systems quickly has greatly improved, making it easier to handle growing data volumes.
Reduced Dependencies and Costs: By eliminating the need for third-party services, the lab has reduced IT dependencies and lowered operational costs, freeing up resources for more critical projects.

Conclusion

As the laboratory continues to build out its Workflow Orchestration capabilities, we remain an active partner, helping them automate processes and configure rule-based triggers to streamline data handling further. Our ongoing support ensures that the lab can continue to focus on delivering critical diagnostic services while benefiting from a highly efficient, cost-effective data infrastructure.

Centralizing Data and Enhancing Workflow Orchestration for a National Pathology Lab