summaryrefslogtreecommitdiff
path: root/Code/merge_edges.py
AgeCommit message (Collapse)Author
2020-05-04merge_edges.py: Remove 'May take a while.' Actually, after insertion speed ↵Hui Lan
improvement, writting one million rows to a sqlite db table is really fast.
2020-05-04merge_edges.py: Remove colon after [merge_edges.py].Hui Lan
2020-05-04merge_edges.py: rename make_new_edge2. Remove 2.Hui Lan
2020-05-04merge_edges.py: make sqlite insert really fast. Commit in the end instead ↵Hui Lan
of commit for each insertion.
2020-05-04merge_edges.py: string does not have isnumeric method.Hui Lan
2020-02-15merge_edges.py: remove old function make_new_edge()Hui Lan
2020-02-15merge_edges.py: now db_fname has a place to printHui Lan
2020-02-15merge_edges.py: an integer in a list prevents the join method from workingHui Lan
2020-02-15merge_edges.py: a more memory efficient method to compute an edge's net strengthHui Lan
Compute an edge's strength on the fly instead of saving everything and then computing the net strength. The new function make_new_edge2 will replace make_new_edge.
2020-02-15merge_edges.py: do not show edge file names in network.logHui Lan
2020-02-15merge_edges.py: datetime.now() dose not work. Should be datetime.datetime.now()Hui Lan
2020-02-14merge_edges.py: write edge file names to network.logHui Lan
When merging many big edge files, the computer may run out of memory. Save the edge files that have been considered thus far and figure out where merging stopped.
2020-02-11merge_edges.py: use the most recent update date as the merged edge's date.Hui Lan
2020-02-11merge_edges.py: add a few comments for function make_new_edge.Hui Lan
2020-02-11merge_edges.py: log more information in network.log.Hui Lan
2020-02-11merge_edges.py: log number of edge files scanned.Hui Lan
2020-02-11merge_edges.py: what is the purpose of variable d.Hui Lan
2020-02-11merge_edges.py: make a better keyHui Lan
Use a combination of target gene ID and tf gene ID as a key. So if we having the following: Target: AT5G09445 AT5G09445 TF: AT1G53910 RAP2.12 Then the key will be "AT5G09445_AT1G53910". Before it was "AT5G09445 AT5G09445 AT1G53910 RAP2.12". This is OK in most cases, as long a gene ID's corresponding gene name is consistent. But if "AT1G53910" has a different gene name, then we will have a DIFFERENT key, which is not what we want.
2020-02-11merge_edges.py: why 10?Hui Lan
2020-02-11merge_edges.py: consider all files in directory EDGE_POOL whose file name ↵Hui Lan
starts with 'edegs'
2020-01-21merge_edges.py: write all edge information to an SQLite database file called ↵Hui Lan
edges.sqlite When I saved a static html page for each edge (e.g., http://118.25.96.118/static/edges/AT1G20910_AT1G30100_0.html), it took 5GB disk space for saving 1 million html pages. Not very disk space efficient. An alternative is to save all edge information in a database table (i.e., edge), and query this database table for a particular edge. The database file edges.sqlite takes less than 200MB for 1 million edges, requiring 10 times smaller space than the static approach. The reason is that we do not have a lot of HTML tags in the database. Quite happy about that, though it seems that filling a database is a bit slower (2 hours??? for 1 million rows). Also updated two files that were affected: update_network.py and update_network_by_force.py. Now instead of copying 1 million static html page to the Webapp, I just need to copy edges.sqlite to static/edges/. Faster. In the Webapp, I updated start_webapp.py and added a file templates/edge.html for handling dynamic page generation. -Hui
2019-12-26merge_edges.py: save memory by removing the dictionary variable duniqHui Lan
The purpose of duniq is to avoid duplicated edge lines. Now, just make sure we don't insert the same tuple. -Hui
2019-12-04merge_edges.py: run dos2unix on merge_edges.py to remove ^M characters.Hui Lan
2019-12-04merge_edges.py: clean up source code by removing commented lines and editing ↵Hui Lan
the head comments.
2019-12-04brain: add python and R code to local repository.Hui Lan