Speed up CnaEvent lookup during import#25
Conversation
sheridancbio
commented
Mar 15, 2024
- store existing events from the database in a HashMap instead of HashSet
- retrieval from HashMap uses CnaEvent.Event equals() and hashMap() semantics
- retrieval is neccessary in order to obtain the associated event_id from the db record
- this avoids a linear search through the set of all CnaEvent.Events in the database
- store existing events from the database in a HashMap instead of HashSet - retrieval from HashMap uses CnaEvent.Event equals() and hashMap() semantics - retrieval is neccessary in order to obtain the associated event_id from the db record - this avoids a linear search through the set of all CnaEvent.Events in the database
|
This is an in-progress attempt to implement an efficient lookup of existing CnaEvent.Event references as intended by #1 Instead of storing the CnaEvent.Events in a java Set, this PR stores them in a Map, mapping each event to itself. This allows retrieval from the Hashmap using .get(), which is efficient (more efficient than the linear search). In order for that to work correctly, the equals() function and hashCode() function of CnaEvent.Event has been overridden (previously) to function based only on gene and alteration fields (not event_id). The previous Map was demoted to a Set in this PR: cBioPortal/cbioportal#9847 |
|
Note : the validity of using the HashMap representation to retrieve the event_id populated objects based on a non-event_id argument to the Map.get(key) function was tested. However, a built importer with these code changes failed to obtain the event_id values as expected. Some additional debugging is needed ... probably related to the equals() and hashCode() functions of the other types contained inside of the CnaEvent.Event type. |