Skip to main content

Elasticsearch

Elasticsearch is used in Gen3 for indexing and searching data. It provides:

  • Full text search
  • Flexible querying
  • Scalable analytics
  • Fast aggregations

For production, a managed Elasticsearch service is recommended:

On AWS: Amazon Elasticsearch Service

Using a fully managed service simplifies operations.

Currently Gen3 supports up to Elasticsearch 7. Support for newer versions is in progress. When deploying, use Elasticsearch 7 or earlier.

ETL process for Guppy

Gen3 uses an ETL (Extract, Transform, Load) process to optimize data for easier exploration.

This ETL pipeline:

  • Extracts data from the Sheepdog database
  • Transforms the data into an optimized format
  • Loads the transformed data into Elasticsearch

The Elasticsearch index powers the guppy service used in data explorer UI:

  • Fast filtering and aggregation
  • Quick exploration of complex datasets
  • Summary stats and visualizations

The ETL process runs on new data added to Sheepdog, keeping the Guppy cache synchronized.

The ETL job handles ingesting large amounts of data, so Guppy can quickly serve queries and visualizations.