Wednesday, June 5, 2019

Google Cloud Analytics - Notes


The Google Cloud ecosystem is a sprawling conurbation of tools. These are some notes I made that helped me remember.

One sentence summaries

DataProc - Spark, Hadoop, YARN.

DataProc components - Tools that compliment DataProc including Apache Zeppelin.

DataLab - Jupyter notebooks. Not to be confused with DataStudio which is for ad campaigns etc. DataLab can integrate with DataProc (see Google's documentation)

BigTable - Basically, HBase.

DataStore - Basically, a BLOB store.

PubSub - Messaging.

BigQuery - Basically, a SQL engine behind a REST API.

Cloud SQL - MySQL or PostrgresSQL.

Cloud Dataflow - Apache Beam.

Composer - Apache Airflow.

Google SDK - has emulators to let you run locally. Emulators are limited to BigTable, DataStore, FireStore and PubSub.

Stackdriver - think JConsole mixed with Splunk. "gives you access to logs, metrics, traces, and other signals from your infrastructure platform(s), virtual machines, containers, middleware, and application tier, so that you can track issues all the way from your end user to your backend services and infrastructure" (documentation).

Prepare your Environment

Unlike Amazon, you can quite easily do a lot of the admin from your local machine. On my home Ubuntu box, I ran:

export CLOUD_SDK_REPO="cloud-sdk-$(lsb_release -c -s)"
echo "deb http://packages.cloud.google.com/apt $CLOUD_SDK_REPO main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
sudo apt-get update && sudo apt-get install google-cloud-sdk
sudo apt-get google-cloud-sdk-datalab
sudo apt-get install google-cloud-sdk-datalab
sudo apt-get install kubectl
gcloud init
sudo apt-get install docker

and had everything I needed to play.

Running DataLab locally

One very nice feature is that you can run pieces of Google's infrastructure locally.

export  IMAGE=gcr.io/cloud-datalab/datalab:latest
export  PROJECT_ID=$(gcloud config get-value project)
if [ "$OSTYPE" == "linux"* ]; then   PORTMAP="127.0.0.1:8081:8080"; else PORTMAP="8081:8080"; fi
docker pull $IMAGE
docker run -it -p $PORTMAP  -v "https://github.com/PhillHenry/googlecloud.git"  -e "PROJECT_ID=philltest"  $IMAGE
.
.
Open your browser to http://localhost:8081/ to connect to Datalab.

Note that not all components can be run locally. Significantly, BigQuery cannot.

No comments:

Post a Comment