Socorro:SOLR API
Writing Solr Queries
Solr Admin page available at: http://cm-hadoop24.mozilla.org:8983/solr/admin
Solr Admin Schema page available at: http://cm-hadoop24.mozilla.org:8983/solr/admin/schema.jsp
Rules
- Must url-encode strings according to RFC 1738
- Date/timestamps must adhere to ISO 8601
Values
- branches - n/a ... expected to work ... q=branch:1.9.2
- build_id - q=build:20100722155716
- date_end - q=client_crash_date:[2010-09-13T09:33:00Z+TO+2010-09-13T10:33:00Z]
- date_start - q=client_crash_date:[2010-09-13T09:33:00Z+TO+2010-09-13T10:33:00Z]
- domain - n/a ... expected to work ... q=url:*gmail*
- limit - rows=100
- offset - start=0
- ooid - q=ooid:010081800002baa-2526-4545-b575-3d3b12100818
- os_names - q=os_name:windows
- os_versions - q=os_version:5.1.2600
- plugin_filename - n/a
- plugin_name - n/a
- report_process - n/a
- report_type - n/a
- signature - q=signature:flash
- products - q=product:thunderbird
- versions - q=version:3.6.8
- url - n/a ... expected to work ... q=url:http%3A%2F%2Fwww.gmail.com%
Notes
Use facet.field=os_name in order to get a count for each of the OS's
Use facet.field=os_version in order to get a count for each of the OS versions
Use &wt=json to return a query in json; returns xml by default
Use AND/OR to query for more than 1 value in a specific field:
Use NOT to remove 1 value from a specific field:
Use parenthesis to query for more than 1 value in more than 1 field:
Use * to query using a like statement:
Use brackets to prepare date ranges:
- http://cm-hadoop24.mozilla.org:8983/solr/select/?q=client_crash_date:[2010-09-13T09:33:00Z+TO+2010-09-13T10:33:00Z]
Python APIs
The following APIs and calls will need to be provided for from within the Pythonic middleware layer. The new names for these calls should be representative of their fuctionality.
Bugzilla
- bug.php - getBugsForSignatures()
Crash
- common.php - getCommentsByParams()
- common.php - queryReports()
- Combine with common.php - totalNumberReports()
- extension.php - getExtensionsForReport()
- report.php - getPairedUUID()
- report.php - getAllPairedUUIDByUUid()
Server Status
Top Crashers
- common.php - queryTopSignatures()
- Combine with common.php - queryFrequency()
- topcrashersbyurl.php - getTopCrashersByUrl()
- topcrashersbyurl.php - getTopCrashersByDomain()
- topcrashersbyurl.php - getTopCrashersByTopsiteRank()
- topcrashersbyurl.php - getUrlsByDomain()
- topcrashersbyurl.php - getSignaturesByUrl()
- topcrashers.php - getTopCrashersByBranch()
- topcrashers.php - getTopCrashersByVersion()
- topcrashers.php - ooppForSignatures()
- topcrashers.php - formatTopcrasherVersions()
Socorro UI Methods
This is a list of data accessed by the webapp which seems to be well suited to using a SOLR query to retrieve rather than a SQL query.
Bugzilla Associations
bug.php
- bugsForSignatures
- Since bugs are constantly being changed and Socorro needs to keep up to date with them, it would be easy for us to have a table using bug_id as the key that contains the bug data relevant to Socorro with a link to the signature(s). When we index that table, we could have a SOLR query that specifies a list of signature strings and it returns a list of bugs that are associated with that signature.
signature:+(Hello_world OR Fubar)
common.php
- getCommentsByParams
- Comments are a field contained in the crash report record, so given a list of crash ids or a signature or any other criteria that can retrieve crash reports, this data can easily be returned through a SOLR query. Further, it would be possible to do SOLR searches for specific comment terms.
comment:~suck
- queryTopSignatures
- I believe this query can be serviced by the Correlation API that Xavier has been working on. At worst case, if we have a SOLR query that filters for the appropriate conditions (i.e. platform, version etc.), it can return the signature field for every report matching those conditions. We can then count the occurances of every signature and return the top N.
- totalNumberReports
- This is the result set size of the desired criteria.
- queryReports
- The building block query. Can give plenty of examples of SOLR usage, but here is the link to Lucene syntax (which SOLR is based on): Lucene Query Syntax
- queryFrequency
Get a list of crash signatures by any number of search parameters including:
- 1 or more products
- 1 or more product versions
- 1 or more operating systems
- 1 or more branches
- start timestamp
- end timestamp
- stack signature
- build id
- report process (any, browser only, plugin only)
- report type (any, crash, hang)
- plugin name
- plugin filename
Order by number of crashes per signature. Include in the results for each crash:
- number of crashes per signature
- signature
- plugin filename
- number of crashes per each O/S platform (All, Win, Mac, Linux)
extension.php
- getExtensionsForReport
- This is just a simple request for the extensions field of the report in HBase. Middleware layer only, no SOLR needed.
report.php
- getPairedUUID
- Use mwl to retrieve hang record via hang id and filter for desired uuid
Lorentz crashes come in pairs. They are matched via OOIDs. This query is used to find the OOID for a crash report that is paired with the provided OOID.
- getAllPairedUUIDByUUid
- Same as above but don't filter.
Lorentz crashes come in pairs. They are matched via OOIDs. If a crash report is resubmitted, it's possible to have more than 2 crash reporters per OOID. This variation of the prior query will retrive all of the OOIDs for crash reports that are paired with the provided OOID.
server_status.php
Working on this in bug 579575
topcrashersbyurl.php
- getTopCrashersByUrl
Get all of the top crashing signatures that are associated with a particular URL.
- getTopCrashersByDomain
Get the domains that are associated with the highest number of crashes, ordered by the number of crashes.
- getTopCrashersByTopsiteRank
Get the domains that are associated with the highest number of crashes, ordered by the number of crashes. Only display the domains that are found within the top 1000 sites of Alexa's topsite rankings.
The Alexa Topsite rankings are currently pulled once per week and placed in the alexa_topsites table.
- getUrlsByDomain
Get the urls that are associated with the highest number of crashes, grouped by domain name, and ordered by the number of crashes.
- getSignaturesByUrl
Get all of the signatures that are associated with a particular URL.
topcrashers.php
- lastUpdatedByBranch
Get the time (window_end) when the top_crashes_by_signature table was last updated for a specific branch.
- lastUpdatedByVersion
Get the time (window_end) when the top_crashes_by_signature table was last updated for a specific product and product version.
- getTopCrashersByBranch
Get the top crashing signatures from the top_crashes_by_signature table for a specific branch, between a start timestamp and an end timestamp. Order the results by signatures that are associated with the most crashes.
- getTopCrashersByVersion
Get the top crashing signatures from the top_crashes_by_signature table for a specific product and product_version, between a start timestamp and an end timestamp. Order the results by signatures that are associated with the most crashes.
- ooppForSignatures
Get meta information for the crash reports for a specific product and product version between a start timestamp and end timestamp. The meta information obtained is the type of crash (hang, or not a hang) and type of process (plugin or browser).
DEPRECATED
job.php
- getByUUID
- Don't know purpose.
mtbf.php
DEPRECATED - no need to implement this
- getMtbfOf
- Looks like this is just math on the uptime field of reports matching particular criteria. If that is the case, this should be fairly simple?
- listReports
- Seems out of place. Purpose?
priorityjobs.php
I believe most of this functionality is now deprecated? [laura: done for 1.8 by ryan in bug 584136]
report.php
- getByUUID
- Simple middleware layer (mwl) retrieval
- sig_exists
- Simple mwl retrieval
topcrashers
- listReports
- Does this query belong here?
- getTotalCrashesByVersion
- SOLR query to filter on product+version
- getTotalCrashesByBranch