Azure

How to Access CosmosDB from Databricks Workspace

There are two ways to access CosmosDB from Azure Databricks workspace:

  1. Using CosmosDB Spark 3 Connector
  2. Using CosmosClient with credentials

Using CosmosDB Spark 3 Connector

This way, we must provide the endpoint and the Cosmos DB key. With the azure-cosmos-spark library installed, the following code snippet in Notebook works.

cosmosEndpoint = "https://<cosmosdb-name>.documents.azure.com:443/"
cosmosMasterKey = "<cosmos-account-key>"
cosmosDatabaseName = "<db-name>"
cosmosContainerName = "<container-name>"
# config options: 
# <https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/cosmos/azure-cosmos-spark_3_2-12/docs/configuration-reference.md>
config = {
  "spark.cosmos.accountEndpoint" : cosmosEndpoint,
  "spark.cosmos.accountKey" : cosmosMasterKey,
  "spark.cosmos.database" : cosmosDatabaseName,
  "spark.cosmos.container" : cosmosContainerName
}
df = spark.read.format("cosmos.oltp").options(**config)
  .option("spark.cosmos.read.customQuery", "SELECT * FROM c OFFSET 0 LIMIT 1").load()
df.head()

References:

Using CosmosClient with credentials

Another way is to use CosmosClient. This way, you will need to create a Service Principal (SP) and use its clientId and clientSecret.

  1. Install required Python packages, i.e., azure-cosmos and azure-identity, in the compute
  2. Follow this document and use PowerShell or Azure CLI to create the role assignment for the SP on the CosmosDB
resourceGroupName="<resource-group-name>"
accountName="<cosmosdb-name>"
DataContributorRoleDefinitionId="00000000-0000-0000-0000-000000000002"
# For Service Principals make sure to use the Object ID as found in the Enterprise applications section of the Azure Active Directory portal blade.
principalId="<sp-object-id>"
az cosmosdb sql role assignment create --account-name $accountName --resource-group $resourceGroupName --scope "/" --principal-id $principalId --role-definition-id $DataContributorRoleDefinitionId
  1. Refer to this document to create a CosmosClient, and use the client to access CosmosDB. Example code snippet:
from azure.identity import ClientSecretCredential
from azure.cosmos import CosmosClient
ENDPOINT = "https://<cosmosdb-name>.documents.azure.com:443/"
TENANT_ID = "<tenant-id>"
CLIENT_ID = "<client-id>"
CLIENT_SECRET = "<client-secret>"
credential = ClientSecretCredential(
    tenant_id=TENANT_ID, client_id=CLIENT_ID, client_secret=CLIENT_SECRET
)
client = CosmosClient(ENDPOINT, credential)
db = client.get_database_client("<db-name>")
container = db.get_container_client("<container-name>")
result = container.query_items(
  query="SELECT * FROM c OFFSET 0 LIMIT 1",
  enable_cross_partition_query=True
)
print(list(result))