ProcessDataFile¶
Description¶
A Process Data File is a File that contains data produced by an Analysis or workflow.
Fields¶
format¶
description : The file format of the Process Data File (e.g., CRAM, BAM).
required : True
data type : Controlled Vocabulary
Permissible Values
Permissible Values | Description |
---|---|
BAI |
BAM indexing format |
BAM |
BAM format, the binary, BGZF-formatted compressed version of SAM format for alignment of nucleotide sequences (e.g., sequencing reads) to (a) reference sequence(s). May contain base-call and alignment qualities and other data. |
BCF |
BCF, the binary version of Variant Call Format (VCF) for sequence variation (indels, polymorphisms, structural variation). |
BED |
Browser Extensible Data (BED) format of sequence annotation track, typically to be displayed in a genome browser. BED detail format includes 2 additional columns (http://genome.ucsc.edu/FAQ/FAQformat#format1.7) and BED 15 includes 3 additional columns for experiment scores (http://genomewiki.ucsc.edu/index.php/Microarray_track). |
CRAM |
Reference-based compression of alignment format. |
GFF |
GFF feature format (of indeterminate version). |
HDF5 |
HDF5 is a data model, library, and file format for storing and managing data, based on Hierarchical Data Format (HDF). An HDF5 file appears to the user as a directed graph. The nodes of this graph are the higher-level HDF5 objects that are exposed by the HDF5 APIs: Groups, Datasets, Named datatypes. Currently supported by the Python MDTraj package. HDF5 is the new version, according to the HDF group, a completely different technology (https://support.hdfgroup.org/products/hdf4/ compared to HDF. |
SAM |
Sequence Alignment/Map (SAM) format for alignment of nucleotide sequences (e.g., sequencing reads) to (a) reference sequence(s). May contain base-call and alignment qualities and other data. The format supports short and long reads (up to 128Mbp) produced by different sequencing platforms and is used to hold mapped data within the GATK and across the Broad Institute, the Sanger Centre, and throughout the 1000 Genomes project. |
VCF |
Variant Call Format (VCF) for sequence variation (indels, polymorphisms, structural variation). |
WIG |
Wiggle format (WIG) of a sequence annotation track that consists of a value for each sequence position. Typically to be displayed in a genome browser. |
OTHER |
A file format not captured by the controlled vocabulary. |
analysis¶
description : The alias of the Analysis that produced this Process Data File.
required : True
data type : Analysis
name¶
description : The given filename.
required : True
data type : string
dataset¶
description : The Dataset alias associated with this File.
required : True
data type : Dataset
ega_accession¶
description : The EGA accession ID of an entity.
required : False
data type : string
included_in_submission¶
description : Whether a File is included in the Submission or not.
required : True
data type : boolean
alias¶
description : The alias for an entity at the time of submission.
required : True
data type : string