Socorro:PyServe
From MozillaWiki
HBase Thrift integration
The Socorro:HBase cluster runs Thrift servers which allow remote clients to communicate with HBase via several possible languages. The Socorro PyServe middleware interacts with this Thrift service via a client wrapper we have created named hbaseClient.py.
Thrift client API Refactoring/Cleanup
I propose the following changes associated with bug 565962:
- Split hbaseClient.py into two modules. The second module would be called socorroHBaseClient.py and would contain all the additional socorro specific methods.
- Move generic hbaseClient.py into third-party
- Review HBase API endpoints
- Delete any methods that are not useful
- Add any non-existent methods that would be useful
- Rename current methods as determined appropriate by Socorro devs.
- Change descriptions if clarification needed for Socorro devs (descriptions below are the current Python method documentation strings
Key HBase API endpoints
- put_json_dump(self, ooid, json_data, dump, add_to_unprocessed_queue = True)
Create a crash report record in hbase from serialized json and bytes of the minidump
- put_processed_json(self,ooid,processed_json)
Create a crash report from the cooked json output of the processor
- get_json_meta_as_string(self,ooid)
Return the json metadata for a given ooid as an unexpanded string If the ooid doesn't exist, return an empty string.
- get_json_meta(self,ooid)
Return the json metadata for a given ooid as an json data object
- get_dump(self,ooid)
Return the minidump for a given ooid as a string of bytes If the ooid doesn't exist, return an empty string. XXX: Do we want a different return?
- get_processed_json_as_string(self,ooid)
Return the cooked json (jsonz) for a given ooid as a string If the ooid doesn't exist, return an empty string.
- previously known as jsonz but that name should be deprecated since it isn't stored as a gzip file anymore
- get_processed_json(self,ooid)
Return the cooked json (self,jsonz) for a given ooid as a json object If the ooid doesn't exist, return an empty string.
- get_raw_report(self,ooid)
Return the json and dump for a given ooid If the ooid doesn't exist, return an empty array
- saves a separate request to the cluster
- Is this useful? Might be candidate for deletion
- get_report_processing_state(self,ooid)
Return the current state of processing for this report and the submitted_timestamp needed For processing queue manipulation. If the ooid doesn't exist, return an empty array
- union_scan_with_prefix(self,table,prefix,columns)
A lazy chain of iterators that yields unordered rows starting with a given prefix. The implementation opens up 16 scanners (one for each leading hex character of the salt) one at a time and returns all of the rows matching
- merge_scan_with_prefix(self,table,prefix,columns)
A generator based iterator that yields totally ordered rows starting with a given prefix. The implementation opens up 16 scanners (one for each leading hex character of the salt) simultaneously and then yields the next row in order from the pool on each iteration.
- limited_iteration(self,iterable,limit=10**6)
- No description
- iterator_for_all_legacy_to_be_processed(self,)
- No description
- This is the special iterator used by the monitor to gather ooids to be processed and remove them from the HBase unprocessed queue
- acknowledge_ooid_as_legacy_priority_job (self,ooid)
- No description
- If the ooid exists in the unprocessed queue, remove it because it will be processed as a priority job.
- delete_from_legacy_processing_index(self,index_row_key)
- No description
- Deletes from unprocessed queue and decrements current queue size
- put_crash_report_indices(self,ooid,timestamp,indices)
- No description
- Adds an ooid to the given set of index tables prefixing with the timestamp for time range based iteration
- put_crash_report_hang_indices(self,ooid,hang_id,process_type,timestamp)
- No description
- Adds a hangID to the hang specific index tables to allow lookup of hang pairs
- update_metrics_counters_for_submit(self,submitted_timestamp,legacy_processing,process_type,is_hang,add_to_unprocessed_queue)
Increments a series of counters in the 'metrics' table related to CR submission
- put_json_dump_from_files(self,ooid,json_path,dump_path,openFn=open)
Convenience method for creating an ooid from disk