Think Airflow moving data around and has connectors to enable ingest/egress. This graphic oozes details.
"The main engine that is responsible for the running of Data Factory is called the integration runtime." [3]
Azure Pipelines
CI/CD in the Azure world.
External Tables
Rather than a bulk insert, Azure SQL can point to a blob store and use the data there as if it were a DB table. Which strategy you use when the underlying blob storage is update appears to be configurable. You do this with something like "CREATE EXTERNAL TABLE ... WITH (LOCATION = ... "
"This specifies where Synapse [see below] should load the data files (CSV files) from, and the column delimiters used in the CSV." [1]
This appears to be only available on Azure SQL not SQL Server 2016 according to this SO (nor 2022 if my Docker container is anything to go by). Apparently, PolyBase [SO] is needed.
The big downsides of external tables include lack of indexes, referential integrity and performance.
PolyBase
"PolyBase enables your SQL Server instance to query data with T-SQL directly from SQL Server, Oracle, Teradata, MongoDB, Hadoop clusters, Cosmos DB, and S3-compatible object storage without separately installing client connection software" [Microsoft] This " is to allow the data to stay in its original location and format"
Synapse
"Azure Synapse Analytics enables you to use SQL and Spark technologies to analyze big data, offers a Data Explorer for log and time series data analytics, and can persist data in its native data warehouse." [1]
"Azure Synapse provides more analytic storage."[2]
Synapse
"Azure Synapse Analytics enables you to use SQL and Spark technologies to analyze big data, offers a Data Explorer for log and time series data analytics, and can persist data in its native data warehouse." [1]
"Azure Synapse provides more analytic storage."[2]
"Synapse prefers an ELT (extract, load, transform) process for data ingestion over an ETL (extract, transform, load) process... the process basically moves data into Synapse and “loads” the data before the transforms happen." [2]
Think of Synapse as Azure's Hive.
Managed Instances
"Because of work seamlessly. its design, Azure SQL Managed Instance provides many more features that provide parity with SQL Server and yet provides the benefits of a fully managed service." [4]
[1] Azure Cookbook [O'Reilly]
[2] Architecting IoT Solutions on Azure [O'Reilly]
[3] Azure for Architects [O'Reilly]
[4] The Developers Guide to Azure [Microsoft]
No comments:
Post a Comment