Elasticsearch
Elasticsearch is used in Gen3 for indexing and searching data. It provides:
- Full text search
- Flexible querying
- Scalable analytics
- Fast aggregations
For production, a managed Elasticsearch service is recommended:
On AWS: Amazon Elasticsearch Service
Using a fully managed service simplifies operations.
Currently Gen3 supports up to Elasticsearch 7. Support for newer versions is in progress. When deploying, use Elasticsearch 7 or earlier.
ETL process for Guppy
Gen3 uses an ETL (Extract, Transform, Load) process to optimize data for easier exploration.
This ETL pipeline:
- Extracts data from the Sheepdog database
- Transforms the data into an optimized format
- Loads the transformed data into Elasticsearch
The Elasticsearch index powers the guppy service used in data explorer UI:
- Fast filtering and aggregation
- Quick exploration of complex datasets
- Summary stats and visualizations
The ETL process runs on new data added to Sheepdog, keeping the Guppy cache synchronized.
The ETL job handles ingesting large amounts of data, so Guppy can quickly serve queries and visualizations.