Firefox/Projects/Places async expiration

From MozillaWiki
Jump to: navigation, search

Summary

Decouple expiration from history service and make it completely async and adaptive to user habits (expire harder for heavy history users, less for casual users).

Current Status

Getting started with design.

Next Steps

  • Take decision on position and form of the code (service? component? module?)
  • Evaluate interactions with other components/services
  • Code it
  • Test it

Related Bugs

bug 516940 bug 520165

Feedback discussion

Leave your comments or ideas here please: http://groups.google.com/group/mozilla.dev.apps.firefox/browse_thread/thread/421f6c7566499e4f#


Team

  • Project Leader: mak
  • Reviewer: mano
  • Consultant: Places Team

Designs

Separate expiration from history service, make it async in background.

Goals/Use Cases

  • Work in background: no ui-locking
  • Avoid idle if possible: mobiles could hardly hit idle, it's unclear how often a mobile dev can use idle. In any case we should not loop in idle to avoid killing batteries (mobile and laptop)
  • simplify bloat history code
  • simplify and add coherency to expiration code
  • ensure notifications consistency

Non Goals

  • over engineer things

Things to expire

Actually:

  • visits: older than a date
  • pages: without visits and bookmarks
  • annotations: older than a date or orphans
  • favicons: older than a date or orphans
  • inputhistory: degrade and remove useless

In future:

  • page thumbnails

When to expire

  • On startup: by no means, expiration should not hurt Ts in any way.
  • In background: small chunks, adaptive to user habits
  • During syncs: yes and no, we can do that to avoid some fsync, but actually it causes code to be hard to follow, more error prone, and makes sync transactions longer and more prone to corruptions. We also plan to add more expiration stuff in future, that will make things worse. Also we hope to remove temp tables and syncs in future but expiration will survive.
  • On idle: larger chunks, but only once near idle start, we don't want to rely too much on idle because on mobile devices it could be an unreachable state. Also we don't want to hit idle hard to avoid stressing mobile and laptop batteries.
  • On shutdown: avoid this at all if possible, but we have to take in count privacy.
  • On pref changes: if the user changes expiration prefs, we should try to satisfy his choices. this could be done increasing the number of entries expired in background till we reach a good point of cleanup, then going back with usual expiration values.

Visits expiration algorythm

Is current expiration algorythm too complex? Current expiration has 2 limits, an hard limit (180 days) and a lazy limit (90 days). It also has a maximum number of pages limit. We always clear any visit older than the hard limit (this works like the old FX2 setting). We clear visits between the 2 limits if we are over maximum number of pages allowed. we never clear visits before minimum limit.

in bug 425219 Connor said: "Our original goal was to have 180 days of history. We added the 40k visit limit as a way of limiting overhead. But for very heavy users, this shortened the set of available history dramatically, so we added the minimum cap. I don't think a straight 90 day cap is ideal, and I don't really get why "algorithim complexity" is something end users care about. If it made expiring history more painful, sure, we should look at that, but as far as I understand, that's not the case. The original bugs have lots of debate, but as far as I'm concerned, this is working well and as intended, so marking WONTFIX. Users who really want to expire after N days can set history_expire_days to the same as history_expire_days_min and be content."

But current algo has clear disadvantages:

  • queries are more complex and slower than they could be, we have to run multiple queries to check if we are over limit, to check visits in a timeframe and so on.
  • it's hard for users to set correct values and understand them, for example we have lots of reports saying "i changed my settings to 6 days because my computer can't handle all of that history but it's still slow" (indeed he changed only the lazy limit).
  • it's harder for support to suggest changes to those limits.

while the advantages:

  • if we can't clear all history fast enough we can't be blamed because user knows (maybe) we hold more history than what he set in preferences dialog.
  • we don't have to evaluate how users use history, we set 2 limits and try to do more than expected.
  • the pref is always satisfied (visits newer than lazy limit are never touched)

At that time we did not have lots of data to take decisions based on numbers, today we have Places Stats, so we can actually evaluate numbers better.

We know we can handle far more than 90 days of history, and we know that the 40000 unique pages limit is hardly hit (avg number of pages is 23000). Places also has better performances today (and will get better). We could actually move back to the simple hard limit, to have advantages:

  • simple and fast queries (only 1 limit!)
  • simple to understand for anyone

Disadvantages:

  • the pref could not be satisfied if we impose a page limit (so we expire some visit because we have too many pages)

Actually the disadvantage could be fixed in two ways:

  • discard any pages limit, or put one hard to hit (just-in-case) and not tweakable (we can't handle more than that, sorry!). Based on Places Stats avg number of pages is 23000, users that increased the number of days also increased the pages limit, so it's not really helpful.
  • change the pref to "Save a maximum of XXX unique pages for at last YYY days", so it's clear to the user we will expire once one of the two conditions is hit.

Who to notify

We have to fire notifications in a sane way when we expire pages (onDeleteURI vs onPageExpired). On the other side notifying about each removed visit would be a perf hog (what happens when we expire 20000 visits?) We should probably notify onDeleteURI when a page is removed, and onDeleteVisits when visits for an uri are expired but page is not expired. currently onPageExpired notification is an hybrid between the two.

Currently we avoid to notify page removals during large batch deletes, if that is not a problem for clearHistory (everything is removed), it's hard for third parties to follow pages, so we could evaluate to replace current notifications with multiple entries notifications (onDeleteURIs (list_of_uris), onDeleteVisits(list_of_uris) to notify just once.

Observers should ignore or cache notifications during batches, not us not sending them.