Back to Materialize

Troubleshooting

doc/user/content/ingest-data/troubleshooting.md

1235.4 KB
Original Source

As you wire up data ingestion in Materialize, you might run into some snags or unexpected scenarios. This guide collects common questions around data ingestion to help you troubleshoot your sources. See also Monitoring data ingestion

If you're looking for troubleshooting guidance for slow or unresponsive queries, check out the Transform data troubleshooting guide instead.

{{< tip >}} {{< guided-tour-blurb-for-ingest-data >}} {{< /tip >}}

Why isn't my source ingesting data?

First, check the status of your source in the Materialize console by navigating to https://console.materialize.com/, clicking the Sources tab in the navigation bar, and clicking the affected source.

Alternatively, you can get this information from the system catalog by querying the mz_source_statuses table:

mzsql
SELECT * FROM mz_internal.mz_source_statuses
WHERE name = <SOURCE_NAME>;
StatusDescription/recommendation
pausedSource is running on a cluster with 0 replicas. To resolve this, increase the replication factor of the cluster.
stalledYou likely have a configuration issue. The returned error field will provide more details.
failedYou likely have a configuration issue. The returned error field will provide more details.
startingIf this status persists for more than a few minutes, reach out to our team for support.
runningIf your source is in a running state but you are not receiving data when you query the source, the source may still be ingesting its initial snapshot. See Has my source ingested its initial snapshot?.

Has my source ingested its initial snapshot?

While a source is snapshotting, the source (and the associated subsources) cannot serve queries. That is, queries issued to the snapshotting source (and its subsources) will return after the snapshotting completes (unless the user breaks out of the query).

{{< include-md file="shared-content/snapshotting-cluster-size-postgres.md" >}}

To determine whether your source has completed ingesting the initial snapshot, you can query the mz_source_statistics system catalog table:

mzsql
SELECT snapshot_committed
FROM mz_internal.mz_source_statistics
WHERE id = <SOURCE_ID>;

You generally want to aggregate the snapshot_committed field across all worker threads, as done in the above query. The snapshot is only considered committed for the source as a whole once all worker threads have committed their components of the snapshot.

Even if your source has not yet committed its initial snapshot, you can still monitor its progress. See Monitoring data ingestion.

How do I speed up the snapshotting process?

{{< include-md file="shared-content/snapshotting-cluster-size-postgres.md" >}}

To speed up the snapshotting process, you can scale up the size of the cluster used for snapshotting, then scale it back down once the snapshot completes.

{{< include-md file="shared-content/resize-cluster-for-snapshotting.md" >}}

For upsert sources, a larger cluster can not only speed up snapshotting, but may also be necessary to support increased memory usage during the process. For more information, see Use a larger cluster for upsert source snapshotting.

Adding a new subsource to an existing source blocks replication. Should I just create a new source instead?

It depends. Materialize provides transactional guarantees for subsource of the same source, not across different sources. So, if you need transactional guarantees across the tables between the two sources, you cannot use a new source. In addition, creating a new source means that you are reading the replication stream twice.

To use the same source, consider resizing the cluster to speed up the snapshotting process for the new subsource and once the process finishes, resize the cluster for steady-state.

See also