Annotations

From MozillaWiki
Jump to: navigation, search

Introduction

Annotations are provided by a browser service that can associate arbitrary information with URLs. The goal is to use them for both history and bookmarks. GUI requires close integration with bookmarks to show annotations together with tags. Load the annotations into the sidebar when a URI opened. Flat editable mode in the sidebar would simply the use.

Possible uses:

  • Fav icon
  • Page thumbnail
  • Notes / Citations
  • Persistent storage for Javascript on the page
  • Microformat meta-data, e.g. geo location, dates, events,

The bug for this feature is 306640

Internal design

The design consists of two tables. The first maps URLs to internal IDs.

The second stores all annotations:

  • URL ID
  • Annotation name
  • Value
  • Expiration information
  • Flags (perhaps combined with expiration information)

Binary data

mozStorage can handle binary data. It would be nice to be able to store thumbnails, favicons, and other images as binary data instead of base-64 encoded data URLs. We would have to also store the MIME time for these.

The problem is then how to get the data out of the annotation service and into some UI widget. The data URLs are nice because they can be used in place of "src=" for images. However, is is inefficient to copy around 16K URLs just to display an image. It would be nice if there was a new protocol handler "annot:URL:name" that automatically something from the annotation service. These URLs would have the same privileges as "chrome:" URLs, so would only be accessible to chrome.

Expiration

Because some annotations can be large, and the number of pages can also become very large, some sort of annotation expiration scheme is required. Annotations beyond the expiration date will be deleted. The time frame must be variable because some annotations may be large and should expire faster, while others may be small and have minimal overhead for keeping them around. The option of never expiring will also be provided.

This time frame may be measured from annotation creation, but it could potentially be more valuable to measure from from last page access time: "this annotation expires after one month of page disuse". Clearing the history should implicitly expire most annotations unless they never expire or are needed for something like a bookmark.

Annotation expiration grows more complicated because the new history and bookmark systems will store some data in the annotation service such as favicons. The favicon for a site, and other annotations, should probably never expire as long as you have a bookmark to it. This introduces dependencies that are difficult to manage. I am leaning toward some kind of callback service where you can register yourself as interested in expired annotations. When the annotation service deletes annotations, it will ask these services, which can veto the delete.

Flags

Flags could indicate whether an annotation is user-entered (e.g. notes), automatic/service-entered (e.g. favicons, last visit date, etc.), or web page entered (as with IE's userData storage, this would need more aggressive limits). It could also store whether that annotation should be synced remotely or not, and possibly other bits.

Security

Initially, the annotation service will only be available to trusted chrome code. It will be able to read and write any annotation.

If scripts on web pages are allowed to store data, they should only be able to see data that they themselves have written, and not user entered data or service entered data (favicons, etc.). Limiting access to pages on the same path allows some flexibility with different pages from the same service, and should provide minimal opportunity for data leakage.

Quota

If, in the future, limited access to the annotations is given to web apps, they should be restricted in the amount of data that they are allowed to store. We probably want to limit the amount of data per host and possibly also at a finer grained level like pages (as with IE) or paths.

Question: What if a web page wants more storage? Some web services could legitimately need more storage, and provide enough value to the user that they don't care. Should there be a way for the user to specify a web page can add more storage? Here's one possibility: if the page tries to store too much data the write fails and the security bar announces what happened and gives the user the option to increase storage for this page. This would require web pages check for the quota condition, potentially notify the user, and redo the operation if they think it's been fixed.

External interface

Preliminary IDL:

   /**
    * Sets an annotation, overwriting any previous annotation with the same
    * URL/name
    */
   void setAnnotation(in nsIURI aURI, in wstring aName, in nsIVariant aValue,
                      in long aFlags, in long aExpiration);
   /**
    * Retrieves the value of an annotation
    */
   nsIVariant getAnnotation(in nsIURI aURI, in wstring aName);
   /*
    * Retrieves info about the annotation. SetDate is the time that this
    * annotation was last set.
    */
   void getAnnotationInfo(in nsIURI aURI, in wstring aName,
                          out long long aSetDate, out long aFlags,
                          out long aExpiration);
   /**
    * Get the names of all annotations for this URI.
    */
   nsIArray getAnnotations(in nsIURI aURI);
   /**
    * Test for annotation existance.
    */
   boolean hasAnnotation(in nsIURI aURI, in wstring aName);
   /*
    * Removes a specific annotation
    */
   void removeAnnotation(in nsIURI aURI, in wstring aName);
   /**
    * Removes all annotations for the given page.
    * We may want some other similar functions to get annotations with given
    * flags (once we have flags defined).
    */
   void removePageAnnotations(in nsIURI aURI);
   /**
    * Get the values of several annotations with arbitrary URI/name pairs.
    * There is some latency associated with each annotation query, so it is
    * a good idea to use this function if it is possible for you to batch
    * your requests together.
    *
    * This will return an array with the same number of values you requested.
    * If the requested URI/name pair does not exist, the corresponding result
    * element will be NULL.
    *
    * @param aURIList The list of URIs
    */
   void getMultipleAnnotations([array, size_is(aCount)] in nsIURI aURIList,
     [array, size_is(aCount)] in wstring aNameList, in unsigned long aCount,
     out unsigned long aResultCount,
     [retval, array, size_is(aResultCount)] out wstring aResultList);

Issues: do we want to namespace annotation names? Probably we should just say that people should always namespace their names, but do it manually with the names they pick for their annotations. For example "history:thumbnail" or "my_extension:annoying_data".

Do we want to support more advanced queries? Examples: Give me all pages with annotation X. Give me all pages where annotation X > Y, etc.

IE's Implementation

IE has persistency for web pages called 'userData': http://216.239.63.104/search?q=cache:msdn.microsoft.com/workshop/author/behaviors/reference/behaviors/userdata.asp&hl=en&lr=&sa=G&strip=1 This information is persisted in the cache along with the web page it was written on. You can also pick other data stores, including favorites, history, and snapsnot (when saved to local machine).

Using IE's userData

In IE, you need to use DHTML behaviors (http://www.w3.org/TR/becss). Behaviors are not currently supported in Mozilla, but there exists an extension wrapping XBL that supports them: http://dean.edwards.name/moz-behaviors/

In CSS you add "behavior:url('#default#userData')" to the CSS for the elements that you want to persist. This overrides the get/setAttribute functions for that element. You can then call element.save(<store name>) to persist the element under the given tag <store name>.

Security

IE places limits on the visibility of tags to other web pages in the same directory and over the same protocol (to avoid https leakage). There are also limits placed on the size of the data, both for a given web page and for a given domain name.

IE allows an optional expiration date of stored elements. Since these attributes are stored in the cache/history/favorites, they probably disappear when the corresponding store is deleted. There doesn't seem to be any forced expiration, but given the relatively short lifespan of the cache, it probably doesn't matter.

Questions

  • How is persistence shared across different pages in the same directory? Are the names just magically accessable to other pages?
  • Are all tags for the saved object stored, or only the ones you setAttribute on?
  • What happens if I set a given attribute from multiple pages with the same name but different values? Is the correct value distributed to all?