User:Andrew Sutherland/MailNews/GlobalDatabase/Review
bug 450494 is tracking the process of getting gloda onto the trunk.
This page reflects the status of the feedback from the bug as of comment 39. The following comment chains have not been included because, although making for interesting discussion, aren't particularly actionable, but could form the basis of an action-y thing:
Contents
Un-Addressed
Not yet/currently being worked, no notable progress, perhaps some discussion.
bienvenu
gloda suffixtree confusing comments
(bienvenu c26 q2) Some of this code is commented out - does that make it inconsistent with the comment about 'infinity'?
// However, if this state is a leaf node (end == 'infinity'), then 'end' // isn't describing an edge at all and we want to avoid accounting for it. let delta; /* if (state.end != this._infinity) //delta = index - end + 1; delta = end - (index - state.length); else */ delta = index - state.length - end + 1;
dmose
gloda.js :: initMyIdentities
(dmose) (Possibly premature) optimization:
for (let iIdentity=0; iIdentity < numIdentities; iIdentity++) { let msgIdentity = msgAccountManager.allIdentities.GetElementAt(iIdentity) .QueryInterface(Ci.nsIMsgIdentity);
If you cache msgAccountManager.allIdentities in a local variable before starting the for loop, you'll avoid an XPConnect crossing every iteration.
(asuth) XPConnect turns out to be oddly expensive, so this sounds reasonable and easy to do. I could also change-up to using fixIterator perhaps.
Half-Finished Comments
(dmose) (gloda.js) the kAttrDerived doxygen comment is incomplete
Noun Definition
(gloda.js)
(dmose) defineNoun: the doxygen commentary needs a bit of cleanup: I think the nounMeta property names want to be documented semi-separately in their own doxygen block, to give us a place to document the actual defineNoun function signature itself. Also make sure that the property list is complete (eg includes cacheBudget, queryClass, etc). Also, the parameter ordering in the function signature itself seems backwards from an intuitiveness point of view.
(asuth)Yes, the doxygen could use some love and its own block. Noun ID comes after noun meta because it is optional, but it's probably cleaner to just let the caller put 'id' in the meta object in the first place.
(dmose) similarly, would "instanceOf" be a clearer property name than "class"?
(asuth) This is also a good idea. I feel a pang of guilt about using something so close to a keyword, but even if we used "instanceof", spidermonkey would apparently be okay with it. I shall do so.
(dmose) defineAttribute: it's not really clear to me why it's worth having "bindName". Wouldn't it be simpler to just always use the attribute name?
(asuth) This was intentional to allow the database name to differ from the binding name. I think the idea was that the database names would make more sense standalone, whereas the binding name might be less intuitive in some ways. But I think in retrospect, the ability for them to deviate just adds potential confusion... so yes.
Being Addressed
Things being addressed. Next action is asuth's.
dmose
Need Input
asuth needs more info from peoples, grouped by peoples.
bienvenu
dmose
general
* With the new indexer, I'm seeing a moderate number of error messages of the form: 2008-09-21 12:17:29 gloda.fundattr ERROR Message with subject 'Mark Banner' somehow lacks a valid author. Bailing. where in every case, the subject is purported to be the author name. Which makes me think the code is getting confused somewhere.
I think this was a transient problem involving IMAP. Are you still seeing this? I do occasionally see things like "Message with subject somehow lacks a valid author", but I think that is a different problem either involving messages with odd MIME things happening or perhaps an event train hiccup.
gloda.js
- in getIdentitiesForFullMailAddresses, are we actually guaranteed that the email addresses aren't relays?
I don't think the methods claim that they do this. I think the goal is that the attribute providers pierce the relay nonsense so it doesn't happen... since we basically depend on the attribute providers to mark relays as such, perhaps we could require them to do so? The result would be that either we don't know something is a relay, or we know it is and you shouldn't really hear about it. Could you clarify what your desired action would be here? (I'm sorry if my delay in getting to this point has made you forget...)
Agreed Addressed
bienvenu
dmose
Processing
Andrew is processing this set of comments. Do not touch!
dmose
gloda.js spelling typos
* Inspired by reasonable uses of triple-stores, I have tried to leverage * existing model and terminology rather than rolling out own for everything.
s/out/our/
gloda.js -- * I kinda suspect introducing the concept of predicates into the explanation of attributes is likely to make it harder for many people to understand rather than easier. general -- * for all modules, all exported symbols should have associated documentation comments (these could just be pointers) * API points like getters on exported classes need doxygen commentary. datamodel.js -- * "allowing the use of attribute-parameter as an attribute" seems unclear on first reading gloda.js -- * given that most of the rest of code-base uses the word "thread", it would probably be good to include in the documentation of NOUN_CONVERSATION a sentence or two about how this is (and/or isn't) different from that * parameterized identity: ** s/constrainted/constrained/ ** do we want to introduce some JS assertions about the constraints here? I think Fx has a concept of such things... ** The intent of the "if (typeof [...]" is not obvious on first reading; a comment here would help, I think. ** shortNames need to come from .properties files, I suspect ** both bindAttribute and defineAttribute seem long enough that it's a little hard to keep all the necessary state in one's head at once. Maybe split them up? * _bindAttribute ** please either add doxygen comments for the params or at least a @seealso pointing to defineAttribute ** Defining functions completely anonymously can make it hard to understand stacks in (e.g.) Venkman; prefer named functions.
To Be Processed
asuth's exciting holding pen for things....
dmose
everything below here is done in the ugly-but-easy style!
dmose puts his new comments here...
collection.js -- * doxygen comment on cacheLookupOneByUniqueValue appears to have been copied from cacheLookupOne but not modified. When fixing, please be sure to mention that the caller is expected to guarantee the "uniqueness" invariant. Backing up a step, though, is there any particular reason not to collapse these two functions into one by parameterizing the array to search? (If it turns out that makes it hard to read, I can live with keeping it as is). gloda.js -- * getFolderForFolder is not a very intuitive name. How about getFolderForMsgFolder? collection.js -- * it would be good for cacheLoadUnify* to have @return and @param dox collection.js -- * Please add a comment explaining the reasoning behind GlodaCollection maintaining both an array (items) and an object/hash (_idMap) that point to the same data. collection.js -- * the call to this._collectionsByNoun[nounID].filter in registerCollection would be more readable if it used an expression closure. indexer.js -- > if (msgHdr && !(msgHder.flags&MSG_FLAG_EXPUNGED)) * The lack of spaces around the & operator goes against prevailing style and makes it hard to read collection.js -- > // (_lruNext cannot be null) * I suspect that making the above comment a bit more explicit about why it's true will help prevent future code-maintainers from inadvertently breaking this invariant. gloda.js -- * Do the constrainer functions in _bindAttribute really want to live in query.js? Perhaps the thing to do is to the solve the _bindAttribute naming issue by splitting _bindAttribute itself into _bindAttribute (which could stay in gloda.js) and _setAttributeConstrainers (which could live in query.js). If it's helpful, a _setupAttribute convenience method that calls both could be written. * As far as I can tell, usesParameter in nounMeta is not set by any callers. Should it be discarded, or should (at least) the parameterized-identity definition set it? query.js -- * The use of "explicit" in naming GlodaExplicitQueryClass is a little non-obvious to newbies; it'd be nice if we could think of something better. So far, I haven't been able to, however. :-/ * Looks like neither or()/unions/owner are actually used by any callers. Unless you know of a specific use case for them, I'm somewhat inclined to suggest axing them in the interest of simplicity. This would also allow the newQuery documentation comment in gloda.js to be made more understandable as well. On the other hand, this isn't a huge win. * The purity-of-essence part of me kinda wonders if there shouldn't be a separate array for all these constrainer functions, but maybe it's not worth it. glautocomp.js -- * nsAutoCompleteGlodaResult._problem doesn't appear to ever actually be set anywhere. Can it be removed? (Yes, this is historical.) * Given that the nsAutoCompleteGlodaResult constructor sets this._results to [] by default, it looks to me as though the test for null in the matchCount getter can never be true, so it's probably not necessary. (Sounds reasonable.) * the references to "rich" in the comments a bit non-obvious; maybe be a bit more explicit? * the various uses of thing.{somegetter} mostly appear to be abstraction violations. :-)
more comments
* given the current game plan, removing glautocomp.js from the tree seems like the right thing. alternately; we could leave it in the tree with a NOT_YET_REVIEWED comment if anticipate future enabling, however. mimeMsg.js -- * please add a doxygen comment describing CallbackStreamListener * would wrappedJSObject solve (or at least be somewhat less of a hack than) RESULT_RENDEVOUZ (sic)? jsmimeemitter.js --- /** * Custom nsIMimeEmitter to build a sub-optimal javascript representation of a * MIME message. The intent is that a better mechanism than is evolved to * provide a javascript-accessible representation of the message. the last sentence is missing a word * seems like it would be trivial to factor out common code from the constructor and mime_emitter_complete and that would leave the code less susceptible to future errors of omission other -- * Same question about the stuff in gloda/content as already asked about glautocomp.js: should all of this stuff live in comm-central going forward? If so but we're not using it now, maybe mark it as unreviewed somehow. mimemsg.js -- /** * Starts retrieval of a MimeMessage instance for the given message header. * Your callback will be called with the message header you provide and the * Trails off mid-sentence. * Having MimeMessage.coerceBodyToPlaintext call bodies.join("") seems like it means that the two tokens at the body boundary will be smashed together and therefore not indexed correctly. How about joining with some whitespace? * Since it looks like the prettyString properties are purely for logging / debugging, how about being explicit about that in the commentary so that future patchers realize that they don't need to worry about l10n or about changing the format of the output. jsmimeemitter.js -- /** * Custom nsIMimeEmitter to build a sub-optimal javascript representation of a * MIME message. The intent is that a better mechanism than is evolved to * provide a javascript-accessible representation of the message. The second sentence there is unclear; please rephrase. * _partRE looks fragile in the sense that it I'd be pretty afraid of breaking something if I ever needed to touch it. I suspect some unit tests would go a long way towards helping that. * please file a bug about uncommenting and fixing the commented out code in updateCharacterSet databind.js -- * It would be helpful if GlodaDatabind and its methods had at least minimal documentation comments. indexer.js -- * Since iteratorUtils.jsm is in the tree now, the custom copy of fixIterator can presumably go away. * The comments in this entire file are extra super awesome; nice work! * Should the various listener issues described there be spun off into bugs? suffixtree.js -- * I would suggest removing this from the tree until such time as you are actually committed to supporting it (presumably whenever the first front-end feature that uses it wants to land). This will keep folks from stumbling on it and expecting it to work/be supported for whatever it is they want. I'm betting "function examplar()" can go away too. :-) datastore.js -- * If _mapFolder and _mapFolderId were named in a way that more explicitly described what they did, reading calling code would be easier. datamodel.js -- * I suspect pictureURL should probably just default to null. datastore.js -- * The commentary in MessagesByMessageIdCallback implies, but doesn't actually say, that if someone other than the indexer calls getMessagesByMessageID, things could get squirrelly. If that's actually the case, I'd suggest renaming to _getMessagesByMessageID as a hint to extension authors and explicitly documenting it as part of the function Doxygen commentary as well. * queryFromQuery is a fairly non-intuitive name. In the interest of making calling code easier to read, maybe performQueryAsync? * presumably, the database needs to have some sort of vacuuming strategy, whether that's simply using autovacuuming or periodically issuing vacuum commands. This seems like it's probably a blocker for Tb3. In a recent post, sdwilsh said: "There is almost never a good time to VACUUM, except for maybe when the user is idle. In 1.9.1, we've added an asynchronous API for statement execution, and if you use that all the time, it shouldn't be noticeable to the user that you are doing a VACUUM." * Assuming the comment "(Namely, we fail to purge items from the cache as they are purged from the database.)" is still correct, it's probably worth filing a spin-off bug about. * The @namespace at the bottom of the big comment before GlodaDatastore appears to be stray, since it doesn't have an argument. * The orphaned data problem probably wants a spin-off bug. * Having doxygen comments documenting kConstraint* somewhere would be good. * A comment explaining why the service is passed in to gloda_ds_init rather than just created locally on-demand would be nice. * The _beginTransaction doxygen comment seems to trail off mid-sentence. It would be helpful to have a more complete description of the overall nested transactional model. As it stands, it's not very clear to the reader what assumptions should be made / depended on when dealing with existing transactions (eg when it does and doesn't make sense to do the intermediate commits that are seen in some parts of the code). * The comment for trackAsync could stand to be fleshed out. As it stands, it's not obvious from reading calling code or even the doxygen comment when or why someone would want to use it. Seems like a more obvious function name could help. * A quick grep suggests that there are no callers for getMessageByID or getMessageIDsByFolderID. Perhaps these are something extensions might want, however... If we do want to keep them, it seems like there's a high potential for unnecessary confusion of Message-ID headers and Gloda IDs for messages; perhaps a little name surgery is order. * There are a few remaining references to APV stuff both here in and gloda.js, some (all?) of them would appear to be obsolete. One example is that getMessageAttributes doesn't have any in-tree callers. * comments for adjustAttributes, _convert*ByAttributeID, loadNounItem, and _queryFromSQLString would improve code maintainability. All done! You've done a great job keeping code that deals with a lot of complexity as simple and maintainable as it can be.