Google Connector

Index documents, sheets, slides, files, and email across your Google Workspace (Google Drive and Gmail).

The Google connector enables the indexing of:

  • Documents, Sheets, Slides, and other Files in Google Drive
  • Email in Gmail
  • Pages in Google Sites
  • Users and Groups as metadata in your Google Workspace to enable user mapping / permissions

Content and updates are streamed as changes are detected and processed in your Google Workspace.

Create Terraform Service Account for Google

Terraform will need a service account to make changes to your GCP project. We will grant it permissions which will essentially allow it to create, read, and delete all project resources. As such, you should keep the credentials for this account safe and not reuse them. The Terraform scripts create a much more restricted service account for normal operation.

  1. Visit the IAM Service Accounts page https://console.cloud.google.com/iam-admin/serviceaccounts and select the project created previously to set up the Terraform service account.
  2. Click CREATE SERVICE ACCOUNT
  3. Choose a name. While it can be anything, the name “Atolio Terraform” is recommended.
  4. Select CREATE AND CONTINUE.
  5. Grant the following roles: “Owner”, “Security Center Admin”, “Project IAM Admin” (without conditions).
  6. Click DONE. Skip step 5, you will not need to grant other users access to this service account.

Now the service account is created we can create API keys:

  1. Navigate to https://console.cloud.google.com/iam-admin/serviceaccounts and select the project created previously.
  2. Click the service user that was created in the previous step.
  3. Select the KEYS tab click the ADD KEY pull-down menu
  4. Select Create new key and use JSON for key type. Click CREATE.

This will create another JSON file with the API key and related information. This file will be needed by your Deployment Engineer (in their deploy/terraform directory).

In order for Terraform to programmatically make modifications, you need to manually enable the Google Cloud Resource Manager API.

  1. Navigate to https://console.cloud.google.com/apis/library
  2. Search for “Cloud Resource Manager API”
  3. Select “Cloud Resource Manager API”. Click ENABLE.

The project will use additional APIs, but now that this API is available to Terraform, the Atolio deployment scripts will use Terraform to enable those APIs automatically.

Grant Service Account Permissions

The Google Workspace domain needs to grant permission to the integration’s default service account to perform operations. Unfortunately Terraform can’t automate this, so it has to be performed manually.

Your Deployment Engineer will need to make note of the unique ID for the default App Engine service account and the list of scopes. Be aware that this ID is of the default service account, typically named App Engine default service account. It is not the Service Account you’ve previously created.

And the list of scopes are:

https://www.googleapis.com/auth/gmail.readonly,https://www.googleapis.com/auth/admin.directory.user.readonly,https://www.googleapis.com/auth/admin.directory.group.readonly,https://www.googleapis.com/auth/calendar.readonly,https://www.googleapis.com/auth/drive.readonly,https://www.googleapis.com/auth/admin.directory.domain.readonly,https://www.googleapis.com/auth/drive.activity.readonly

We now need to set up Domain-wide Delegation:

  1. Navigate to the Google Admin console: https://admin.google.com/ac/owl. An account with admin access is required.
  2. Scroll down and click “MANAGE DOMAIN WIDE DELEGATION”.
  3. Click Add new and enter your service account client ID (obtained from your Deployment Engineer in the previous step). No overwrite.
  4. Enter the value for google_service_account_id as obtained from your Deployment Engineer.
  5. Enter the list of scopes as listed above (as a single comma-delimited string).
  6. Click AUTHORIZE.

Now Atolio is authorized to retrieve Google Workspace data.

Installing the Gmail / Google Drive Connector

Note: For this step, your deployment engineer will be required.

The Gmail and Google Drive connectors depend on a shared Google Workspace configuration section. Google Workspace requires three properties to be set during configuration:

  • ProjectID must reference the project created in GCP. The id, not the name, must be specified.
  • ServiceUser is currently a reference to a super user or administrator within your organization. For example, the primary IT email. This may change in the future.
  • ServiceAccountKey is the base64-encoded string as obtained during initial install of the Google source.

To obtain the service account key, your deployment engineer will run the following Terraform command:

terraform output -raw google_connector_service_account_key

And optionally:

  • Domains is a comma-delimited list of domains applicable to this deployment.
  • Parallelism is an optimization to increase the concurrency with respect to indexing Google data. Default is 32, so if set, should be higher than this value.

Once the Google source is configured, the Google Drive and Gmail sources can be configured with the same values.

Installing the Google Sites Connector

Note: For this step, your deployment engineer will be required.

Prerequisites

Google Sites Connector requires you to enable Google Vault for your organization. Google Vault enrollment details are available in the official documentation

You will need a Service Account as described in Grant Service Account Permissions section. In addition to the scopes listed there, you must grant the following scopes to the Service Account:

- https://www.googleapis.com/auth/ediscovery
- https://www.googleapis.com/auth/devstorage.read_only

Vault API access must be manually enabled by the GCP project associated with the Service Account. This can be done via GCP API Console, by following the steps in the official documentation.

Configuration

Once the prerequisites above are completed, you need to configure the connector.

  • ServiceUser, ServiceAccountKey, and Domains are parts of the configuration similar to GMail/GDrive
  • SiteUrls is a list of top-level Google Site URLs you want to index. e.g., https://sites.google.com/your-org/your-site
  • SiteConfigs is an optional map of site-specific configuration you want to setup.

For automatically updating Atolio with recent changes on your Google Site, provide an update frequency to your Deployment Engineer, e.g., every day, every hour, etc.

An example configuration would be as follows:

configuration:
  ...
  site-urls:
    - https://sites.google.com/your-org/your-site
  site-configs:
    https://sites.google.com/your-org/your-site
      # If you have a custom domain for your Google Site, specify it here.
      top-level-url: https://your-custom-domain.com/path

      # If you want to configure deeplinks for each page, 
      # you can do it by providing a "page name" -> "URL suffix" map here.
      # 
      # For a page titled "Home", the deeplink would be:
      # https://your-custom-domain.com/path/home
      page-name-to-url-suffix:
        Home: /home
        About: /about
      
      # If you have setup "Restricted" access for your site, specify it here. 
      # Default is `false`.
      restricted: true

If you do not prefer to have a static configuration for deeplinks, you can also provide a “url path” on each Google Site page. Simply add a text in the form of urlpath:[/relative-path-to-your-page] in the body of each page, and Atolio will index the page with the user-provided deeplink.