Skip to main content

azure table storage inserts

Recently had a requirement to provide a service that would rapidly query a large dataset; of which was provided in CSV.

After some research I decided that Azure Table Storage would be a good choice of storage to enable rapid retrieval via PartitionKeys and RowKeys; I decided that I would insert the data into said table using Powershell; and query using a dotnet function app that already exists.

I created a Powershell script to insert the rows, something like the below:

insert rows into table storage via batch operations
Param(
$resourceGroup = "",
$storageAccountName = "",
$tableName = "",
$csvFile = ""
)

function CreateTableEntity {
param (
[string]$partitionKey,
[string]$rowKey,
[string]$prop1,
[string]$prop2,
[string]$prop3
)
$entry = New-Object Microsoft.Azure.Cosmos.Table.DynamicTableEntity($partitionKey, $rowKey)
$entry.Properties.Add("prop1", $prop1)
$entry.Properties.Add("prop2", $prop2)
$entry.Properties.Add("prop3", $prop3)

return $entry
}

$partitionKeyLength = 3
Install-Module -Name AzTable
Import-Module AzTable
$storageAccount = Get-AzStorageAccount -ResourceGroupName $resourceGroup -Name $storageAccountName
$ctx = $storageAccount.Context
$cloudTable = (Get-AzStorageTable -Name $tableName -Context $ctx).CloudTable

$csvObject = Import-Csv -Path $csvFile

[Microsoft.Azure.Cosmos.Table.TableBatchOperation]$batchOperation = New-Object -TypeName Microsoft.Azure.Cosmos.Table.TableBatchOperation
$insertPartitionKey = ($csvObject | Select-Object -First 1).pcd.Substring(0, $partitionKeyLength)

$batchOperation = New-Object -TypeName Microsoft.Azure.Cosmos.Table.TableBatchOperation
foreach ($row in $csvObject) {
# Remove whitespace for easier retrieval
$postcode = $row.pcd -replace '\s',''

# Set partitionKey for this row
$partitionKey = $postcode.Substring(0, $partitionKeyLength)

# if current row partition key is different from init/latest; execute batch and start again
# or batchOperation contains 100 rows (max for an operation)
if ($insertPartitionKey -ne $partitionKey -or $batchOperation.Count -eq 100) {
$cloudTable.ExecuteBatchAsync($batchOperation)
$batchOperation = New-Object -TypeName Microsoft.Azure.Cosmos.Table.TableBatchOperation
$insertPartitionKey = $partitionKey
}

$entry = CreateTableEntity -partitionKey $partitionKey -rowKey $postcode -pconcd $row.pconcd -pconnm $row.pconnm -pconnmw $row.pconnmw
$batchOperation.Insert($entry)
}

if ($batchOperation.Count -gt 0) {
$cloudTable.ExecuteBatchAsync($batchOperation)
}
note

This method of inserting rows into Azure Table Storage is great for relatively small numbers (less than 20k).

Due to my use case requiring 3m entries, we chose to use a Python script to prepare the CSV with PartitionKey and RowKey within; and then use an Azure Data Factory pipeline with a Copy Data activity to copy the rows between blob and table. This operation took 12 mins.