azure table storage inserts
Recently had a requirement to provide a service that would rapidly query a large dataset; of which was provided in CSV.
After some research I decided that Azure Table Storage would be a good choice of storage to enable rapid retrieval via PartitionKeys and RowKeys; I decided that I would insert the data into said table using Powershell; and query using a dotnet function app that already exists.
I created a Powershell script to insert the rows, something like the below:
Param(
$resourceGroup = "",
$storageAccountName = "",
$tableName = "",
$csvFile = ""
)
function CreateTableEntity {
param (
[string]$partitionKey,
[string]$rowKey,
[string]$prop1,
[string]$prop2,
[string]$prop3
)
$entry = New-Object Microsoft.Azure.Cosmos.Table.DynamicTableEntity($partitionKey, $rowKey)
$entry.Properties.Add("prop1", $prop1)
$entry.Properties.Add("prop2", $prop2)
$entry.Properties.Add("prop3", $prop3)
return $entry
}
$partitionKeyLength = 3
Install-Module -Name AzTable
Import-Module AzTable
$storageAccount = Get-AzStorageAccount -ResourceGroupName $resourceGroup -Name $storageAccountName
$ctx = $storageAccount.Context
$cloudTable = (Get-AzStorageTable -Name $tableName -Context $ctx).CloudTable
$csvObject = Import-Csv -Path $csvFile
[Microsoft.Azure.Cosmos.Table.TableBatchOperation]$batchOperation = New-Object -TypeName Microsoft.Azure.Cosmos.Table.TableBatchOperation
$insertPartitionKey = ($csvObject | Select-Object -First 1).pcd.Substring(0, $partitionKeyLength)
$batchOperation = New-Object -TypeName Microsoft.Azure.Cosmos.Table.TableBatchOperation
foreach ($row in $csvObject) {
# Remove whitespace for easier retrieval
$postcode = $row.pcd -replace '\s',''
# Set partitionKey for this row
$partitionKey = $postcode.Substring(0, $partitionKeyLength)
# if current row partition key is different from init/latest; execute batch and start again
# or batchOperation contains 100 rows (max for an operation)
if ($insertPartitionKey -ne $partitionKey -or $batchOperation.Count -eq 100) {
$cloudTable.ExecuteBatchAsync($batchOperation)
$batchOperation = New-Object -TypeName Microsoft.Azure.Cosmos.Table.TableBatchOperation
$insertPartitionKey = $partitionKey
}
$entry = CreateTableEntity -partitionKey $partitionKey -rowKey $postcode -pconcd $row.pconcd -pconnm $row.pconnm -pconnmw $row.pconnmw
$batchOperation.Insert($entry)
}
if ($batchOperation.Count -gt 0) {
$cloudTable.ExecuteBatchAsync($batchOperation)
}
This method of inserting rows into Azure Table Storage is great for relatively small numbers (less than 20k).
Due to my use case requiring 3m entries, we chose to use a Python script to prepare the CSV with PartitionKey and RowKey within; and then use an Azure Data Factory pipeline with a Copy Data activity to copy the rows between blob and table. This operation took 12 mins.