Overview

A pharmaceutical client engaged us to update a machine learning and artificial intelligence data engineering proof of concept for production use. The product uses natural language processing to process textual data from social media, publications, and conference papers, among other sources. The solution categorizes textual data by product, topic, and sentiment.

Solution

The project was founded on Azure Data Factory (ADF) and Databricks, with data stored in an Azure Data Lake and SQL Server. We integrated ADF and Databricks with Azure DevOps to facilitate code review quality checks and pipeline releases.

Key engineering contributions included:

  • Infrastructure as Code: Delivered structural database changes in a consistent, reproducible, and reversible process
  • Observability: Instrumented the code with logging and monitoring throughout the pipeline
  • Deployment: Containerized and deployed to Azure Kubernetes Service

Results

Our contributions enabled us to deploy enhancements reliably and continuously, while enhancing quality control over development operations. The solution allows the client to detect emerging signals in discourse about their therapies, and preemptively answer questions from healthcare providers and patients.