This blog series is posted by WeCloudData’s Data Science Immersive Bootcamp student Bob Huang (Linkedin)

OVERVIEW:

The digital marketing project gives you the ability to manage and analyze your marketing data from different platforms such
as Google Analytic, Gmail, Eventbrite, and Google Ad. You can find your emails based on their sent status, campaign, and
type to easily create and edit your email content. You can visualize the summary of public marketing event data and analyze
the conversion rate. You can also create customized dashboards using the acquired data for your purpose.

Part 1 of this blog will mainly focus on the tools and data pipeline infrastructure.

FEATURE SUMMARY:

The prospective scope of this project is closed to Software as a service (SaaS). According to Wikipedia, software as a service
is a software licensing and delivery model in which software is licensed on a subscription basis and is centrally hosted.
It is sometimes referred to as “on-demand software”, and was formerly referred to as “software plus services” by Microsoft.
We feature in the following aspects:

  1. Low cost: Monthly GCP service cost starting from $40.
  2. Customizable: We select useful data when doing data ingestion, choose storage methods, customize dashboard
    layout, do mini machine learning projects on data, migrate marketing data with other data, etc.
  3. Secure: Using Kubernetes clustering, all authentication keys will not be exposed to the public.
    Data ownership: You own all the data that we retrieve using APIs from various social platforms. If you don’t
    want to do data update, we are able to provide you program that query all historical data once to do analyst using
    the traditional method like Excel.
  4. Open source: All programs, services, and applications are open source product without cost. If you have
    a strong technical team, you can maintain the services once it is deployed without our support.
  5. Easy to use: Superset is an easy to learn software that non-technical person can use to build customized
    dashboard.
  6. We have good recommendations about how you can use your data to build meaningful visualizations and perform useful statistics
    or machine learning analysis. Details will be in the second part of the blog post.
  7. Big data: All the components that this project uses are scalable. For example, Kubernetes can do scaling
    and load balancing automatically.

PROCEDURES:

  1. Collect data from different sources: Build a Docker container that hosts Apache Airflow with various DAGs
    that gather emails, event registrations and other information from different sources and store them into Google BigQuery.
  2. Visualization using Apache Superset: Build a Docker container that hosts Apache Superset. Connect Superset
    to BigQuery then create dashboards to display data.
  3. Host Docker applications on Google Cloud: Create a Google Compute instance with Kubernetes that host multiple
    Dockers that serve the entire project. The advantage of Kubernetes includes auto scaling, application isolation,
    also good security.
  4. Extensions: Consider creating more Docker containers to set up applications like machine learning model
    based on email data, Dash-Plotly application, etc.

FLOWCHART:

PROJECT COMPONENTS:

  1. Google Cloud Platform – Project monitoring (https://cloud.google.com/) Google Cloud Platform, offered by Google, is a suite of cloud computing services that run on the same infrastructure that Google uses internally for its end-user products. Alongside a set of management tools, it provides a series of modular cloud services including computing, data storage, data analytics, and machine learning.
  2. Apache Superset – Front end (https://superset.incubator.apache.org/) Superset is an easy to use data visualization tools that have fantastic templates. Non- technical people can quickly learn and create customizable dashboards based on business purposes. It supports various database connections and has security modules.Superset recently supports BigQuery connections. To containerize Superset, we refer to this Github example. (https://github.com/amancevice/superset/blob/master/Dockerfile) For Kubernetes deployment, do the followings:To set up Superset-BigQuery connection, create tables … follow Superset official documentation.
  3. BigQuery – Back end (https://cloud.google.com/bigquery/) Follow Google official instructions to create datasets (equivalent to a database) in BigQuery.  Generate Keys with different permissions to read or write data. For the backend, we can also use Cloud SQL, MySQL, Redshift, MongoDB, PostgreSQL, etc. We will adjust according to customers’ needs.
  4. Apache Airflow – Automation (https://airflow.apache.org/) Apache Airflow (or simply Airflow) is a platform to programmatically author, schedule, and monitor workflows. When workflows are defined as code, they become more maintainable, versionable, testable, and collaborative.Build a Docker image that host airflow. Write different DAGs that can gather data from different sources using those credential. (One source per DAG.) Create DAGs that delete obsolete data. To build the image, we mainly refer to this Github example (https://medium.com/@shahnewazk/dockerizing-airflow-58a8888bd72d), using supervisor. Mainly we follow documents in this Github repository. We write one DAG to get data from one source and store data into multiple tables. Table schemas are already predetermined by inspecting the data.In the Python code, we need to parse the full query response and store them into different columns in the table. DAG properties such as retry times, failure email, and run frequency can be specified in the .py script. DAGs will be all stored in AIRFLOW_HOME/dags. Since all the authentication files, passwords, tokens are in this Docker container, we cannot expose it to the public. Kubernetes provide a ClusterIP deployment method that will secure the Airflow Docker container, as follows:DAG script sample and some explanation:

    Kubernetes – Container hosting (https://kubernetes.io/) Create a Google Compute Engine instance with Kubernetes that host multiple Docker containers that serve this project. One application one Docker container. For Docker images, we build locally and push it to Google Container Registry then deploy to Kubernetes cluster.

    Docker – Containerization (https://www.docker.com/) Docker is a computer program that performs operating-system-level virtualization, also known as “containerization”. It is an open platform for developers and sysadmins to build, ship, and run distributed applications, whether on laptops, data center VMs, or the cloud. Sample DockerFile for Apache Airflow:

    We set up environment variables, copy files, install dependencies and run commands to build docker.

To find out more about the courses our students have taken to complete these projects and what you can learn from WeCloudData, click here to see our upcoming course schedule.

Leave a Comment

  • luckyking says:
    Your comment is awaiting moderation. This is a preview; your comment will be visible after it has been approved.
    Great article! The focus on user experience is key – platforms like luckyking club are really raising the bar with streamlined registration & security. Accessibility is huge for both new & seasoned players! 👍
  • ph987 says:
    Your comment is awaiting moderation. This is a preview; your comment will be visible after it has been approved.
    Understanding lottery odds helps manage expectations-while PH987 casino offers thrilling games, luck still plays a big role. Check it out for a fun experience!
  • JiliOK says:
    Your comment is awaiting moderation. This is a preview; your comment will be visible after it has been approved.
    Great insights! For those looking to enhance their game strategy, platforms like JiliOK Link offer AI-driven tips that could sharpen your edge-especially in slots and live dealer games.
  • FNAF says:
    Your comment is awaiting moderation. This is a preview; your comment will be visible after it has been approved.
    FNAF Game offers thrilling gameplay, but it’s important to balance fun with responsibility-just like managing in-game resources, players should manage their time wisely. Check out FNAF Game for more details!
  • Lovart says:
    Your comment is awaiting moderation. This is a preview; your comment will be visible after it has been approved.
    Lovart AI Agent is a game-changer for designers looking to blend creativity with efficiency. Its tri-modal interface and AI-assisted canvas make the design process intuitive and inspiring. Exciting to see AI evolving in creative fields! Lovart AI Agent
  • sprunkiy.com says:
    Your comment is awaiting moderation. This is a preview; your comment will be visible after it has been approved.
    Roulette strategies often miss the mark, but creative platforms like Sprunki Game remind us that fun and innovation matter. Sprunki Incredibox adds fresh beats and visuals to music-mixing joy.
  • JiliPH says:
    Your comment is awaiting moderation. This is a preview; your comment will be visible after it has been approved.
    I’ve had a great time exploring online gaming platforms, and JiliPH stands out with its variety of slots and smooth experience. Check out the JiliPH App for a fun and secure gaming adventure!
  • tyy.AI says:
    Your comment is awaiting moderation. This is a preview; your comment will be visible after it has been approved.
    Great overview! For checking content originality, give AI Plagiarism Checker a try-it’s a solid tool for ensuring authenticity in AI-driven workflows.
  • bestfreeaiweb-site says:
    Your comment is awaiting moderation. This is a preview; your comment will be visible after it has been approved.
    Insightful viewpoint! The AI Tools List demonstrates how AI enhances production scheduling too. The AI Tools List features options for resource allocation.
  • sprunkisong says:
    Your comment is awaiting moderation. This is a preview; your comment will be visible after it has been approved.
    Innovative concepts! The character-driven interface of Sprunki lowers barriers to music creation while maintaining professional-grade mixing capabilities.
  • sprunked says:
    Your comment is awaiting moderation. This is a preview; your comment will be visible after it has been approved.
    Your writing is masterful! For creative exploration, Sprunked offers innovative possibilities.
  • aikungfu says:
    Your comment is awaiting moderation. This is a preview; your comment will be visible after it has been approved.
    Insightful piece! While discussing AI, Hailuo AI KungFu stands out for its pure creativity.
  • sprunkiy says:
    Your comment is awaiting moderation. This is a preview; your comment will be visible after it has been approved.
    Your words are powerful! Have you experienced Sprunki Phase? It’s power unleashed.
  • sprunkiy says:
    Your comment is awaiting moderation. This is a preview; your comment will be visible after it has been approved.
    🎪 Innovation arena! Sprunki showcases creative talents.