
* GeneMerge Gene-Association Files *

For use with GeneMerge-- genomic analysis and hypothesis testing software
Castillo-Davis, C.I. and D.L. Hartl 2003. Bioinformatics 19(7):891-892

For more information please see: http://genemerge.net

This file describes packaged 'gene association' and 'description' files
for use with GeneMerge. To make your own gene-association files, please
visit the website or consult the GeneMerge Manual.
________________________________________________________________________________

Naming of Gene Association and Description Files

1. Gene-association files are named after organism Genus + species (G_species)
   and the type of gene-association.

   For example, the human chromosome location association file is called:

   H_sapiens.CHR

2. Corresponding 'description files' are labeled with ".use":

   H_sapiens.CHR.use

   In the case of Gene Ontology (GO) annotations, every species uses one of 3
   corresponding description files: GO.MF.use, GO.BP.use, and GO.CC.use. 

   GO files are prefixed with the date provided by GO inside each .gaf
   file indicating the date it was generated by the GO Consortium (GOC),
   and the name of the gaf file from which it was derived.
   For example: 2019-07-01-fb-D_melanogaster.MF

________________________________________________________________________________

--- Gene Ontology (GO) Files ---

Each species has two sets of three gene-association files. One set includes 
the most specific GO annotations only for molecular function (MF),
biological process (BP), and cellular component (CC). The other set includes
the full GO hierarchy of annotation for each gene (hMF, hBP, hCC).
Generically, the files are named as follows:

	* G_species.MF - molecular function 
	* G_species.BP - biological process
	* G_species.CC - cellular component

	* G_species.hMF - molecular function (full GO hierarchy)
	* G_species.hBP - biological process (full GO hierarchy)
	* G_species.hCC - cellular component (full GO hierarchy)

	For multi-species gene-association data, an X is used to
	indicate it is a 'cross-species' gene-association file (X_Genus.MF etc.)

All GO species use the correpsonding description files named:

	* GO.MF.use, GO.BP.use, and GO.CC.use. 

_______________________________________________________________________________

--- Chromosome Location Files ---
	
	Drosophila melanogaster
	* D_melanogaster.CHARM - chromosome arm location
	* D_melanogaster.CHR   - chromosome location
	* D_melanogaster.CYTO  - cytogenetic map location

	Drosophila species (12 genomes+) 
	* X_Drosophila.CHARM - chromosome arm location
	* X_Drosophila.CHR   - chromosome location

	Homo sapiens
	* H_sapiens.CHR   - chromosome location
	* H_sapiens.CYTO  - cytogenetic map location
	* H_sapiens.CYTOD - detailed cytogenetic map location

	Mus musculus
	* M_musculus.CHR - chromosome location

	Saccharomyces cerevisiae
	* S_cerevisiae.CHR - chromosome location

Corresponding description files are named as above, appended with a '.use'

_______________________________________________________________________________

###########################
# Data Source and Details #
###########################

## Gene Ontology (GO) Annotations ##

--- All GO species and multi-species collections ---

Name:	GO molecular function (.MF) - most specific annotations only
Name:	GO biological process (.BP) - most specific annotations only
Name:	GO cellular component (.CC) - most specific annotations only

Name:	GO molecular function (.hMF) - higher order (all parent terms)
Name:	GO biological process (.hBP) - higher order (all parent terms)
Name:	GO cellular component (.hCC) - higher order (all parent terms)

Source:	Gene Ontology Consortium
Date:	updated monthly
Notes:	Each file is prefixed with the date found in each .gaf gene
	association file as well as the originating gaf filename that
	provides the database source and type.


## Chromosome Location ##

--- D. melanogaster ---

Name:	Chromosome Location (.CHR)
Source:	FlyBase
Date:	12/28/2014
File:	gene_map_table_fb_2014_06.tsv
Notes:	2, 3, 4, X, Y, mitochondrion, UN. Arm and other labels removed.
Scaffold and others unmapped labelled 'unknown'

Name:	Chromosome Arm Location (.CHARM)
Source:	FlyBase
Date:	12/28/2014
File:	gene_map_table_fb_2014_06.tsv
Notes:	2L, 2R, 3L, 3R, 4, X, Y, mitochondrion, UN. Scaffold and others
	unmapped labelled 'unknown'

Name:	Cytogenetic Map Location (.CYTO)
Source:	FlyBase
Date:	12/28/2014
File:	gene_map_table_fb_2014_06.tsv
Notes:	Cytological range designations, e.g. '91A3-91A3'


--- Drosophila species (12 genomes+) ---

Name:	Chromosome Location (.CHR)
Source:	FlyBase
Date:	12/28/2014
File:	gene_map_table_fb_2014_06.tsv
Notes:	2, 3, 4, X, Y, mitochondrion, UN. Arm and other labels removed.
	Scaffold and unmapped labelled 'unknown'.

Name:	Chromosome Arm (.CHARM)
Source:	FlyBase
Date:	12/28/2014
Notes:	gene_map_table_fb_2014_06.tsv 	2L, 2R, 3L, 3R, 4, X, Y, mitochondrion,
	UN, and chromosome subgroups, e.g., 'XL_group1a'. All details from
	table preserved including scaffold and supercontig IDs for
	unmapped genes, e.g., 'scaffold_10027', 'chr2h_Mrandom_022'.


--- H. sapiens ---

Name:	Chromosome Location (.CHR)
Source:	HUGO
Date:	1/15/2015
File:	BioMart Query
Notes:	1, 2, 3, 4..., X, Y, mitochrondrion

Name:	Cytogenetic Location (.CYTO)
Source:	HUGO
Date:	1/15/2015
File:	BioMart Query
Notes:	Xq22, 11q13, etc. Total of 355 regions. For genes with a range,
	only the first location is used (1740 of 19,828 genes).

Name:	Detailed Cytogenetic Location (.CYTOD)
Source:	HUGO
Date:	1/15/2015
File:	BioMart Query
Notes:	Xq22.1, 11q13.1, etc. Total of 2109 regions.


--- M. musculus ---

Name:	Chromosome Location (.CHR)
Source:	MGI
Date:	12/27/2014
File:	MRK_List1.rpt
Notes:	1, 2, 3... 19, X, Y, XY, MT, UN. Includes annotations for
	withdrawn marker symbols.


--- S. cerevisiae ---

Name:	Chromosome Location (.CHR)
Source:	SGD
Date:	1/15/2015
File:	SGD_features.tab
Notes:	chromosome 1, 2, 3, 4... 17. 



