一、LOCUS
在GenBank格式中,
LOCUS NM_001469 2156 bp mRNA linear(家系血統) PRI(primate猿類) 16-DEC-2004
DEFINITION Homo sapiens thyroid autoantigen 70kDa (Ku antigen) (G22P1), mRNA.
The LOCUS field contains a number of different data elements, including locus name, sequence length, molecule type, GenBank division, and modification date. Each element is described below.
二、COMMENT
1、REVIEWED REFSEQ:說明了該RefSeq生成的過程。
2、Summary:說明了該序列的功能。
三、Feature名詞解釋:information about genes and gene products, as well as regions of(biological significance reported in the) sequence. These can include regions of the sequence that code for proteins and RNA molecules.
Feature下的副標題內容太復雜,必要時到這里The DDBJ/EMBL/GenBank Feature Table查.
1、key:一般選擇Location/Qualifier。
2、complement:cDNA。If a feature is located on the complementary strand, the word "complement" will appear before the base span.
3、5<:指向5’端。If the "<" symbol precedes a base span, the sequence is partial on the 5' end (e.g., CDS <1..206). If the ">" symbol follows a base span, the sequence is partial on the 3' end (e.g., CDS 435..915>.
4、/db_xref:其字符串是通往其他數據庫的鏈接。
/db_xref="taxon:9606" taxonomy 物種分類學
/db_xref="GeneID:2547" 鏈接到Gene。
/db_xref="LocusID:2547" 鏈接到Locuslink。
/db_xref="MIM:152690" 鏈接到OMIM。
四、兩個例子:
Key =Location/Qualifiers
CDS=23..400
====/product="alcohol dehydrogenase"
====/gene="adhI"
might be read as:
The feature CDS is a coding sequence beginning at base 23 and ending at base 400, has a product called 'alcohol dehydrogenase' and is coded for by a gene called “adhI”
A more complex description:
Key=Location/Qualifiers
CDS=join(544..589,688..>1032)
====/product="T-cell receptor beta-chain"
which might be read as:
This feature, which is a partial coding sequence is formed by joining elements indicated to form one contiguous sequence encoding a product called T-cell receptor beta-chain.