First we log into GCP.
Next create a VM within "Compute Engine".
I create a small VM named Airflow for this demo.
I choose Ubuntu 18.04 LTS Minimal. Create the VM
Connect to the VM using the browser SSH client.
sudo su
apt-get update
apt install python
apt-get install software-properties-common
apt-get install python-pip
export SLUGIFY_USES_TEXT_UNIDECODE=yes
pip install apache-airflow
pip uninstall marshmallow-sqlalchemy
pip install marshmallow-sqlalchemy==0.17.1
airflow initdb
airflow webserver -p 8080
The first thing I'll do when connected is elevate my user.
Next I'll update the OS.
Next Install Python.
Next we'll install software-properties-common. This will help manage the repo's that we install software from.
Next let's install Pip
We also want to export an environment variable for UNIDECODE to prevent errors.
You can read more on this here : https://stackoverflow.com/questions/52203441/error-while-install-airflow-by-default-one-of-airflows-dependencies-installs-a
Now install apache airflow using pip
Currently in October 2019, you'll get a Marshmallow-SQLalchemy error if you attempt to initialize the default SQLite Database.
To prevent this error install an earlier version of Marshmallow-SQLalchemy.
Initialize the database
Run the web server on port 8080
Open the GCP Firewall to allow traffic to the airflow server.
At this point you may be wondering , why is there an warning at the top of the page related to the scheduler. This is due to a "Max Threads" setting in the airflow config being greater than 1. With Sqlite as the DB , this setting will need to be set to 1 and the scheduler will need to be started.
Ok, I'm going to log back into the console and use the browser to SSH into my instance.
Once I'm in , I'll switch users and open the airflow config file. Once the config file is open, scroll down until you see "max_threads". If you're using SQLite change this value to 1. Save the file.
Now we can start the scheduler.
Airflow docs: https://airflow.apache.org/start.html