# Create a Sync from ClickHouse to Batch Profile Events

### Before you start

To create a ClickHouse → Batch Profile Events sync, you'll need:

* Access to the **Batch dashboard**
* A **ClickHouse table or view** containing one row per event
* Your **ClickHouse credentials** (Host, Port, Database, Username, Password)
* A table (or view) that follows the **Cloud Sync events input format** (see below)

***

### 1) Prepare your ClickHouse table or view

Cloud Sync expects your ClickHouse source (table or view) to include:

1. A **profile identifier** (to know which profile the event belongs to)
2. An **event name** (the type of event being recorded)
3. A **cursor field** (to know which rows are new since the last run)
4. Any number of **event attribute columns** (sent to Batch as event properties)

***

#### 1.1 One row per event

Your source must contain **one row per event**.\
Unlike profile attribute syncs, multiple rows can share the same `custom_id` — each row creates a separate event on the corresponding profile.

***

#### 1.2 Required columns

Your table (or view) must include the following columns:

| Column            | Required | Description                                            |
| ----------------- | :------: | ------------------------------------------------------ |
| `custom_id`       |     ✅    | The identifier of the profile the event belongs to     |
| `event_name`      |     ✅    | The name of the event (e.g. `add_to_cart`, `purchase`) |
| `last_updated_at` |     ✅    | Cursor used for incremental sync                       |

{% hint style="warning" %}
`last_updated_at` must be set **at the time the event row is inserted** and must not change afterwards. It is used only to determine which rows to fetch on the next sync run, not as the event timestamp.
{% endhint %}

***

#### 1.3 Optional columns

| Column       | Description                                                                                                                         |
| ------------ | ----------------------------------------------------------------------------------------------------------------------------------- |
| `event_time` | The timestamp of the event, in **RFC 3339 UTC** format (e.g. `2026-04-17T04:25:00Z`). If omitted, Batch uses the time of reception. |

***

#### 1.4 Event attribute naming rules (ClickHouse-compatible)

ClickHouse column names do not support characters like `$`, `(`, or `)`. Cloud Sync relies on **prefixes** in column names to represent typed event attributes.

**Supported prefixes**

| Prefix   | Meaning        | Example              |
| -------- | -------------- | -------------------- |
| `date__` | Date attribute | `date__purchased_at` |
| `url__`  | URL attribute  | `url__item_url`      |

Any column without a prefix is sent to Batch as a standard string, number, or boolean attribute.

***

#### 1.5 Handling arrays

Array event attributes must be passed as a **`String` column** containing a stringified JSON array. Native ClickHouse `Array()` types are not supported.

```sql
'["value1","value2","value3"]'
```

Example:

```sql
SELECT
  user_id               AS custom_id,
  'add_to_cart'         AS event_name,
  inserted_at           AS last_updated_at,
  '["yoga","wellness"]' AS interests    -- String, not Array(String)
FROM raw.events;
```

***

#### 1.6 Handling objects

Object event attributes must also be **stringified as a JSON object** before being passed to Cloud Sync. The connector parses the string and sends it in the correct format to the Profile API.

```sql
'{"color":"red","size":"M"}'
```

Example:

```sql
SELECT
  user_id                           AS custom_id,
  'purchase'                        AS event_name,
  inserted_at                       AS last_updated_at,
  toJSONString(product_details_map) AS product_details  -- stringified object
FROM raw.purchases;
```

***

#### 1.7 Example schema (e-commerce)

```sql
CREATE TABLE events_data.batch_events (
    custom_id          String,
    event_name         String,
    last_updated_at    DateTime64(3, 'UTC'),

    -- Optional event fields
    event_time         String,        -- RFC 3339 UTC, e.g. '2026-04-17T04:25:00Z'

    -- Standard event attributes
    item               String,
    quantity           UInt32,
    price              Float64,

    -- Typed event attributes
    url__item_url      String,        -- URL attribute
    date__purchased_at String,        -- date attribute

    -- Object event attribute (stringified)
    product_details    String         -- e.g. '{"brand":"Nike","size":"42"}'
) ENGINE = MergeTree()
ORDER BY (custom_id, last_updated_at);
```

**How this maps in Batch:**

* `custom_id` identifies the profile the event is attached to
* `event_name` sets the event type
* `event_time` sets the event timestamp (falls back to reception time if omitted)
* `item`, `quantity`, `price` become standard event attributes
* `url__item_url` is interpreted as a URL attribute
* `date__purchased_at` is interpreted as a date attribute
* `product_details` is interpreted as an object attribute

***

#### 1.8 Using a View

If your raw event table doesn't match the expected naming or format, create a **ClickHouse View** that converts your schema into the correct conventions.

```sql
CREATE OR REPLACE VIEW events_data.batch_events_view AS
SELECT
  toString(e.user_id)                              AS custom_id,
  e.event_type                                     AS event_name,
  e.inserted_at                                    AS last_updated_at,

  formatDateTime(e.occurred_at, '%Y-%m-%dT%H:%M:%SZ') AS event_time,

  e.item_name                                      AS item,
  e.qty                                            AS quantity,
  e.unit_price                                     AS price,

  e.item_url                                       AS url__item_url,
  formatDateTime(e.purchase_date, '%Y-%m-%d')      AS date__purchased_at,

  -- Stringify object attributes
  toJSONString(map('brand', e.brand, 'size', e.size)) AS product_details
FROM raw.events e;
```

This approach lets you:

* rename fields with the correct prefixes (`date__`, `url__`)
* set a reliable `last_updated_at` based on insertion time
* stringify array and object columns correctly
* format `event_time` to RFC 3339 UTC

***

#### 1.9 Handling nulls

If an optional column value is `NULL`, that attribute is simply omitted from the event. It does not affect other attributes on the same event or on the profile.

***

### 2) Create the Sync in the Batch dashboard

Cloud Sync is configured from the dashboard via a dedicated **Sync module**.

1. Open the **Batch dashboard**
2. Go to **Data → Cloud Sync**
3. Click **Create Sync**
4. Select **ClickHouse** as the source

***

#### 2.1 Configure your ClickHouse connection

Enter the following fields:

| Field    | Description                                                         |
| -------- | ------------------------------------------------------------------- |
| Host     | Your ClickHouse server hostname (e.g. `your-host.example.com`)      |
| Port     | ClickHouse HTTP(S) port — defaults to `8123`                        |
| Database | The database containing your source table/view (default: `default`) |
| Username | Your ClickHouse username                                            |
| Password | Your ClickHouse password                                            |
| SSL      | Toggle on to use a secure connection (recommended for production)   |
| Table    | The table or view name                                              |

Batch validates the connection before continuing.

***

#### 2.2 Select the destination

In the **Destination** dropdown, select **Batch > Profile events**.

***

#### 2.3 Configure event mapping

Cloud Sync applies a simple mapping model:

* `custom_id` → identifies which Batch profile to attach the event to
* `event_name` → sets the event type
* `last_updated_at` → used only for incremental sync logic
* all other columns → mapped to event attributes

***

### 3) How incremental sync works

Cloud Sync uses **incremental processing**, which means it does not re-import your full dataset at every run. Instead, it fetches only the rows that are new since the last successful sync.

***

#### 3.1 The `last_updated_at` cursor

Batch stores the last successful cursor value internally.

At each run, Batch fetches only rows where:

* `last_updated_at` is greater than the last stored cursor

This makes sync runs faster, more scalable, and more cost-efficient.

{% hint style="warning" %}
For events, `last_updated_at` should reflect when the row was **inserted** into the table, not when the event occurred (`event_time`). Do not backfill or modify `last_updated_at` after insertion.
{% endhint %}

***

#### 3.2 Inserts and deletes

Incremental syncs capture:

* ✅ new event rows (inserts)

They do **not** capture:

* ❌ deletions — events sent to Batch are immutable; they cannot be removed via Cloud Sync

***

#### 3.3 Best practices for reliable incremental syncs

To avoid missing events:

* Set `last_updated_at` at **insertion time** and never modify it afterwards
* Use a View if you need to rename columns or apply type conversions
* Use `ReplacingMergeTree` and order by `last_updated_at` for large tables

***

### 4) Test and enable your Sync

Before enabling the schedule:

1. Run a **test sync**
2. Verify:
   * Events are created on the correct profiles
   * `event_time` is set correctly (or defaults to reception time as expected)
   * `date__` and `url__` fields are interpreted correctly
   * Array and object attributes are correctly stringified

Once enabled, Batch automatically handles batching and retries.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://doc.batch.com/getting-started/features/customer-engagement-platform/profiles/cloud-sync/create-a-sync-from-clickhouse-to-batch-profile-events.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
