Saturday, November 12, 2022

Architectural patterns

Some architectural terms (old and new) that I keep bumping into.

Eventual Consistency
"Eventual consistency — also called optimistic replication — is a consistency model used in distributed computing to achieve high availability that informally guarantees that, if no new updates are made to a given data item, ultimately all accesses to that item will return the last updated value.  Eventually-consistent services are often classified as providing BASE semantics (basically-available, soft-state, eventual consistency), in contrast to traditional ACID ... Another great model that can be directly implemented in the application layer is strong eventual consistency (SEC), which can be achieved via conflict-free replicated data types (CRDT), giving us the missing safety property of eventual consistency.  

"Event-driven applications usually favor eventual consistency, for the most part. However, we could also opt-in for strong consistency in particular system areas. Thus, it is fair to say we can combine both consistency models depending on our use case." - Functional Event-Driven Architecture, Volpe

The impacts of consistency on Microservices

Microservices should ideally be totally independent. For example, in a highway  management system, the weather service is totally orthoganol to the roadworks service even though both have an impact on congestion. However, microservices in the real world often have soft dependencies. As a result, "in a microservices world, we don’t have the luxury of relying on a single strongly consistent database. In that world, inconsistency is a given." [James Roper]

Hugo Oliviera Rocha outlines some antipatterns here. The first is "events as simple notifications. The source system publishes an event notifying the consumers that something changed in its domain. Then the consumers will request additional information to the source system... The main issue and the main reason why this option should be seldom used is when you apply it to a larger scale.

"[I]nstead of requesting the source system for additional information, it is possible to save the data internally as a materialized read model... The main issue isn’t the disk space, it is the initialization, maintenance, and keeping that data accurate."

He says event sourcing is just a band aid and suggests using fat (ie, denormalised) messages. The downside it they can be chunky.

CRDT
"To implement eventually consistent counting correctly, you need to make use of structures called conflict-free replicated data types (commonly referred to as CRDTs). There are a number of CRDTs for a variety of values and operations: sets that support only addition, sets that support addition and removal, numbers that support increments, numbers that support increments and decrements, and so forth." - Big Data, Nathan Marz

To a functional programmer, this looks a lot like semigroups and reducing.

Data Mesh
"Unlike traditional monolithic data infrastructures that handle the consumption, storage, transformation, and output of data in one central data lake, a data mesh supports distributed, domain-specific data consumers and views “data-as-a-product,” with each domain handling their own data pipelines. The tissue connecting these domains and their associated data assets is a universal interoperability layer that applies the same syntax and data standards." [TowardsDataScience]

"Data Mesh is a journey so you cannot implement Data Mesh per-se, you need to adopt the principles and start to make incremental changes." Adidas's journey [Medium]. Of the seven points given, two (decentralization and self-service) are the antithesis of ontologies.

Batch Views
"The batch views are like denormalized tables in that one piece of data from the master dataset may get indexed into many batch views. The key difference is that the batch views are defined as functions on the master dataset. Accordingly, there is no need to update a batch view because it will be continually rebuilt from the master dataset. This has the additional benefit that the batch views and master dataset will never be out of sync."  Big Data, Nathan Marz

Saga Pattern
"The Saga Pattern is as microservices architectural pattern to implement a transaction that spans multiple services. A saga is a sequence of local transactions. Each service in a saga performs its own transaction and publishes an event. The other services listen to that event and perform the next local transaction" [DZone]

Example in Cats here.

Type 1 and 2 data evolution
Slowly changing dimensions [Wikipedia] is a "concept that was introduced by in  Kimball and Ross in The Data Warehouse Toolkit."  A strategy could be that the data source "tracks historical data by creating multiple records. This is called a type 2 dimension." [The Enterprise Big Data Lake - Gorelik].  

Type 1 is overwritting a row's data as opposed to type that adds a new row.

Data Marts
Definitions for data marts tend to be a bit wooly but the best I heard was from a colleague who defined it as "data structured for use cases and particularly queries."

Data Marts tend to use type 2 dimensions (see above). 

Hexagon Architecture
Hexagon a.k.a Onion a.k.a Ports and Adapters "give us patterns on how to separate our domain from the ugliness of implementation." [Scala Pet Store on GitHub] This is an old pattern, as anybody who has written microservices will know, but the name was new to me.  The idea is that there are many faces the app shows the outside world for means of communication but the kernel inside "is blissfully ignorant of the nature of the input device." [Alistair Cockburn] This faciliates testing and reduces cognitive overhead that comes from having business logic scattered over many tiers and codebases.

Microservices
This is a huge area but here are some miscellaneous notes.

Before you jump on board with the Java based Lagom, it's worth noting that Martin Fowler wrote "Don't start with microservices – monoliths are your friend". This provoked a whole debate here. It's all worth reading but the comment that stuck out for me was:
"Former Netflix engineer and manager here. My advice:
Start a greenfield project using what you know ... Microservices is more often an organization hack than a scaling hack. Refactor to separate microservices when either: 1) the team is growing and needs to split into multiple teams, or 2) high traffic forces you to scale horizontally. #1 is more likely to happen first. At 35-50 people a common limiting factor is coordination between engineers. A set of teams with each team developing 1 or more services is a great way to keep all teams unblocked because each team can deploy separately. You can also partition the business complexity into those separate teams to further reduce the coordination burden."
A fine example of Conway's Law.

Builds in large organisations

Interestingly, Facebook report Git not being scalable. Meanwhile, Google uses Bazel which is supposed to be polyglot and very scalable.

Strangler Pattern
This is one of those obvious patterns that I never knew had a name.

"The Strangler pattern is one in which an “old” system is put behind an intermediary facade. Then, over time external replacement services for the old system are added behind the facade... Behind the scenes, services within the old system are refactored into a new set of services." [RedHat]

Downsides can be the maintenance effort.

Medallion architecture

This [DataBricks] divides data sets into bronze (raw), silver (cleaned) and gold (application-ready).

No comments:

Post a Comment