Documentation
Guidance for browsing trait data, working with the API, and downloading harmonized datasets.
Website Overview
The metaTraits interface is designed for fast, transparent exploration and annotation of microbial phenotypic traits.
Use the global search bar to look up taxa by name or identifier (e.g. Bifidobacterium bifidum, GCA_001606525.1, or spire_mag_00047739). Autocomplete ensures valid taxonomy ranks, and suggested examples help newcomers explore without prior knowledge.
For a broader overview, the Traits catalogue lists all harmonized features with concise descriptions, ontology links, database coverage, and, for prediction-based traits, the number of genomes used for model training.
Trait Summaries and Provenance
Each result page presents taxonomy-level summaries that combine data from all currently selected databases. The Databases menu allows users to include or exclude specific sources, for instance, to focus on experimentally observed traits or to include predictions from genome-based tools. Selections apply consistently across searches, result views, and annotation workflows.
Each summary panel displays:
- Trait distributions and proportions of genomes from the selected taxon that are assigned to each state, together with the number of contributing records.
- A database badge indicating how many of the currently selected databases contain annotation data for that feature.
- An AI badge when prediction data are included, signalling that measured and inferred values are merged.
- A “no robust majority” label whenever fewer than 85% of genomes in a clade agree, making uncertainty explicit rather than enforcing a consensus.
Percentages represent the fraction of observations and/or predictions from the selected databases — they are not confidence scores. This aggregation strategy highlights inconsistencies within lineages and helps users assess the stability of predicted traits across taxa.
Every trait entry links back to its original evidence in BacDive, BV-BRC, JGI GOLD/IMG, or the corresponding prediction model (BacDive-AI, GenomeSPOT, MICROPHERRET, Traitar). Use the Download Summary button to export the full result, including trait distributions, provenance metadata, and majority flags.
Record View
To explore individual genomes or strains, simply search directly by an accession or record identifier. The system automatically recognizes genome, SPIRE, JGI, or BV-BRC IDs and retrieves the corresponding Record View, displaying all available trait annotations for that entry.
Example searches include:
GCA_001606525.1 (genome accession), spire_mag_00047739 (SPIRE ID), Go0007368 (JGI ID), or 205913.6 (BV-BRC ID).
Each record view lists the raw annotations that underpin higher-level summaries, including source provenance and prediction context.
Annotation Workflows
The website also provides two integrated workflows for user-submitted data with more information on the dedicated pages:
- Genome Annotation (porTraits): input isolate genomes or MAGs to get phenotypic trait predictions and context from similar records in our database.
- Profile Annotation (sumTraits): input a taxonomic profile to receive trait profiles summarized at the community level.
Both workflows are directly accessible from the site and can also be run on the CloWM platform for automated, reproducible execution.
Downloads
Every results page includes a Download button that exports the current summary for the
queried taxon, including trait distributions, and the “no robust majority” flags. Bulk exports are
available below.
Taxonomic mappings between GTDB and NCBI:
Taxonomic mapping from GTDB (r220) to NCBI (2025-07-28) and vice versa, built from genomes and MAGs classified to both taxonomies, including from the GTDB release files, proGenomes3, SPIREv1, and JGI IMG. For each taxon in one taxonomy, a corresponding taxon in the other was assigned if at least 85% of genomes shared the same ID in both. For GTDB → NCBI and NCBI → GTDB, this approach had an average agreement rate of 99.47% and 99.87%, respectively. The table shows the respective taxonomy IDs and lineage from domain to strain level, the count of genomes involved in the mapping, and the fraction that corresponded to the majority vote.
metaTraits harmonized trait annotations:
metaTraits aggregates harmonized trait annotations at multiple taxonomic levels. Below you can find annotations
at family, genus, and species levels in compressed JSONL format.
API Access
The website exposes a lightweight JSON API rooted at https://metatraits.embl.de/api/v1. Endpoints require no authentication and return HTTP 400 errors for invalid requests, 404 when no records are found, and 500 for unexpected failures.
Retrieve Traits by Taxonomy ID (NCBI only)
Fetch all trait observations for a single taxonomy ID with a GET request to /traits/taxonomy/<taxonomy_id>:
GET /traits/taxonomy/123?databases=db1&databases=db2
The taxonomy_id is mandatory. If the databases parameter is omitted, all available databases are used. Multiple databases can be provided by repeating the databases parameter. Invalid database names trigger an error listing the valid options.
The response is returned as JSON and contains trait observations aggregated for the specified taxonomy ID.
Bulk Trait Queries (NCBI only)
Retrieve trait observations for multiple taxonomy IDs in one request with a POST to /traits/taxonomy:
POST /traits/taxonomy
Content-Type: application/json
{
"taxonomy_ids": [123, 456, 789],
"databases": ["db1", "db2"]
}
taxonomy_ids must be a JSON list. A maximum of MAX_IDS IDs can be submitted per request. The databases field is optional; if omitted, all available databases are used. Invalid or missing fields result in 400 errors. If no records are found for the provided IDs, the endpoint returns 404.
The response aggregates trait observations across all provided IDs and databases, mirroring the structure of single-ID responses.