How to Access CosmosDB from Databricks Workspace

There are two ways to access CosmosDB from Azure Databricks workspace:

  1. Using CosmosDB Spark 3 Connector
  2. Using CosmosClient with credentials

Using CosmosDB Spark 3 Connector

This way, we must provide the endpoint and the Cosmos DB key. With the azure-cosmos-spark library installed, the following code snippet in Notebook works.

cosmosEndpoint = "https://<cosmosdb-name>"
cosmosMasterKey = "<cosmos-account-key>"
cosmosDatabaseName = "<db-name>"
cosmosContainerName = "<container-name>"
# config options: 
# <>
config = {
  "spark.cosmos.accountEndpoint" : cosmosEndpoint,
  "spark.cosmos.accountKey" : cosmosMasterKey,
  "spark.cosmos.database" : cosmosDatabaseName,
  "spark.cosmos.container" : cosmosContainerName
df ="cosmos.oltp").options(**config)
  .option("", "SELECT * FROM c OFFSET 0 LIMIT 1").load()


Using CosmosClient with credentials

Another way is to use CosmosClient. This way, you will need to create a Service Principal (SP) and use its clientId and clientSecret.

  1. Install required Python packages, i.e., azure-cosmos and azure-identity, in the compute
  2. Follow this document and use PowerShell or Azure CLI to create the role assignment for the SP on the CosmosDB
# For Service Principals make sure to use the Object ID as found in the Enterprise applications section of the Azure Active Directory portal blade.
az cosmosdb sql role assignment create --account-name $accountName --resource-group $resourceGroupName --scope "/" --principal-id $principalId --role-definition-id $DataContributorRoleDefinitionId
  1. Refer to this document to create a CosmosClient, and use the client to access CosmosDB. Example code snippet:
from azure.identity import ClientSecretCredential
from azure.cosmos import CosmosClient
ENDPOINT = "https://<cosmosdb-name>"
TENANT_ID = "<tenant-id>"
CLIENT_ID = "<client-id>"
CLIENT_SECRET = "<client-secret>"
credential = ClientSecretCredential(
    tenant_id=TENANT_ID, client_id=CLIENT_ID, client_secret=CLIENT_SECRET
client = CosmosClient(ENDPOINT, credential)
db = client.get_database_client("<db-name>")
container = db.get_container_client("<container-name>")
result = container.query_items(
  query="SELECT * FROM c OFFSET 0 LIMIT 1",