NDA
SharePoint via Microsoft Graph API
Secure, automated SharePoint data integration into Azure Storage using Microsoft Graph API and token-based authentication.
NDA
GfK, a leading market research company, provides critical consumer goods data — but accessing it required manual downloads via a website, with login flow protected by SAML federation. Sourcing and integrating that data into a modern analytics layer was cumbersome and error-prone, and the manual step blocked any attempt at automation.
We built a fully automated workflow that handles SAML login, file discovery, and ingestion end-to-end. Authentication uses Python's Mechanize library to drive the GfK Federation login flow programmatically, with credentials retrieved at runtime from Azure Key Vault — secrets never live in code or config files.
Once authenticated, the system uses BeautifulSoup to scan the GfK Connect portal and extract the list of available data files automatically, eliminating the need for manual discovery. A Databricks Compute Cluster then orchestrates ingestion: it filters out files already processed and transfers only new data into the Azure Data Lake — efficient, scalable, and idempotent.
Solution areas: Data Foundations