merge_edges.py: make a better key

Use a combination of target gene ID and tf gene ID as a key. So if we having the following: Target: AT5G09445 AT5G09445 TF: AT1G53910 RAP2.12 Then the key will be "AT5G09445_AT1G53910". Before it was "AT5G09445 AT5G09445 AT1G53910 RAP2.12". This is OK in most cases, as long a gene ID's corresponding gene name is consistent. But if "AT1G53910" has a different gene name, then we will have a DIFFERENT key, which is not what we want.
author: Hui Lan <lanhui@zjnu.edu.cn> 2020-02-11 18:03:52 +0800
committer: Hui Lan <lanhui@zjnu.edu.cn> 2020-02-11 18:03:52 +0800
commit: cc6858ae8e1a3eb24f68047ec4009b0bb9ab10ad (patch)
tree: 89562dbe9a7c1180939ec6ee3e04336f23d77230 /Code
parent: 02685414bd26fc446eb08152506f34ff2a642dfe (diff)
1 files changed, 1 insertions, 1 deletions
diff --git a/Code/merge_edges.py b/Code/merge_edges.py
index 84535d7..62a958a 100644
--- a/Code/merge_edges.py
+++ b/Code/merge_edges.py
@@ -160,7 +160,7 @@ for fname in sorted(glob.glob(os.path.join(EDGE_POOL_DIR, 'edges*.*'))):
             strength = lst[8]
             method_or_tissue = lst[9]
 
-            key = target + tf
+            key = target.split()[0] + '_' + tf.split()[0] # target or tf has two fields, Gene ID and Gene Name, split()[0] means using Gene ID only.
             t = (target, tf, score, type_of_score, rids, cids, ll, date, strength, method_or_tissue)
 	
             if not key in d:
author	Hui Lan <lanhui@zjnu.edu.cn>	2020-02-11 18:03:52 +0800
committer	Hui Lan <lanhui@zjnu.edu.cn>	2020-02-11 18:03:52 +0800
commit	cc6858ae8e1a3eb24f68047ec4009b0bb9ab10ad (patch)
tree	89562dbe9a7c1180939ec6ee3e04336f23d77230 /Code
parent	02685414bd26fc446eb08152506f34ff2a642dfe (diff)