Data download

Representative genome sets

The representative genomes available here are non-redundant collections of genomes which include the highest quality genome from every specI species cluster. As many specI clusters could be assigned to a habitat, we also provide habitat specific sets of representative genomes.

TypeContigsGenesProteins
Representative genomes contigs.representatives.fasta.bz2 genes.representatives.fasta.bz2 proteins.representatives.fasta.bz2
Aquatic aquatic.contigs.fa.gz aquatic.genes.fa.gz aquatic.proteins.fa.gz
Disease associated disease_associated.contigs.fa.gz disease_associated.genes.fa.gz disease_associated.proteins.fa.gz
Food associated food_associated.contigs.fa.gz food_associated.genes.fa.gz food_associated.proteins.fa.gz
Freshwater freshwater.contigs.fa.gz freshwater.genes.fa.gz freshwater.proteins.fa.gz
Host associated host_associated.contigs.fa.gz host_associated.genes.fa.gz host_associated.proteins.fa.gz
Host plant associated host_plant_associated.contigs.fa.gz host_plant_associated.genes.fa.gz host_plant_associated.proteins.fa.gz
Sediment mud sediment_mud.contigs.fa.gz sediment_mud.genes.fa.gz sediment_mud.proteins.fa.gz
Soil soil.contigs.fa.gz soil.genes.fa.gz soil.proteins.fa.gz

Other datasets

TypeFile
Habitats per isolateproGenomes3_habitat_isolates.tab.bz2
Habitats per specI clusterproGenomes3_habitat_specI.tab.bz2
Marker genesproGenomes3_markerGenes.tar.gz
SpecI clustering dataproGenomes3_specI_clustering.tab.bz2
GTDB taxonomyproGenomes3_specI_lineageGTDB.tab.bz2
Highly important strainshighly_important_strains.tab.bz2
Excluded genomesproGenomes3_excluded_genomes.txt.bz2
MGE ORFsrepresentatives_mge_ORFS.tsv.bz2
MGE annotationrepresentatives_mge_annotation.tsv.bz2
GECCO biosynthetic gene clusters (GenBank records)progenomes3_gecco_clusters.gbk.gz