Labs/Bespin/DesignDocs/MetaData
From MozillaWiki
< Labs | Bespin | DesignDocs
Examples
There are a number of areas where we have a growing need to store data about data:
- History of saved and unsaved changes to a file since it left VCS
- Status messages
- Mobwrite diff records
This proposal provides a generic mechanism for all these uses.
Requirements
The meta-data system should:
- Be accessed via an API so the disk layout can be changed in the future
- Have zero risk of data and meta-data files colliding
- Should allow the storage of large amounts of data (e.g. the current edit version of a file)
- Should allow fast append only mode which doesn't require re-writing large amounts of data
- Storage should count towards a users quota (TODO: Are there any cases where this should not be the case?)
- Ensure that data on a file should be deleted/moved/renamed with the file
- Not waste space by leaving unowned flotsam files or directories behind
- Should allow efficient serialization of Python objects (pickling?)
Proposed Solution
API
# Get a File object project = get_project(user, owner, 'MyProject') file = project.get_file_object("example.js") # Read the 'live-edit' meta-data current = file.metadata['live-edit'] # Reads from MyProjectMeta/example.js/live-edit # Write to the 'status-messages' meta-data file.metadata['status-messages'] = new_msg # Writes to MyProjectMeta/example.js/status-messages
TODO: Missing from this API are examples of pickling and appending.
Data Storage
Inside a users project directory we should have something like:
- SomeProject/ - example1.js - some-dir/ - example2.js - ... - SomeProjectMeta/ - example1.js/ - status-messages - live-edit - chat-log - ... - some-dir/ - example2.js/ - status-messages - live-edit - chat-log - ...
I think this system is extensible, and there isn't any danger that the data will collide with the meta-data.
Potential Uses
This is an annotated JSON (ish) dump of the potential meta-data that we might record against a file:
{ // We add to this list on each save when there is a status message // And clear the list on a commit, having offered the list status-messages:[ "fixing bug #42", "frobbing the foo setting to see whats up" ], // We need to store the current version of the file separately from // the saved version. This could be large, and should probably be stored // in a separate file to avoid unnecessary IO live-edit:"The full text of the file\nincluding new lines\n", // We need a set of diffs for time machine to take us from the saved // version to the live version. This example is raw from mobwrite but // I suspect we will need a more compact, more coalesced version // Also while the individual changes may not be large, this could have // a high write frequency diffs-saved-to-live:[{ timestamp:2009-04-06-12-10-00, creator:kdangoor, diff:"u:ycxraw:-\nF:11:sharer/sharer_project/test.txt\nd:11:=7+inserted", status:null }, { timestamp:2009-04-06-12-10-30, creator:jwalker, diff:"u:ycxraw:-\nF:12:sharer/sharer_project/test.txt\nd:12:", status:"Bug #42" } ], // We should record each time the file is saved back to the last commit // This allows time machine to work properly. The changes will be // larger than with the diffs-saved-to-live case but will be much more // coalesced. We should certainly use an external diff format rather // than mobwite for this diffs-tip-to-saved:[{ timestamp:2009-04-06-01-12-00, creator:dalmaer, diff:"24,25d23\n< \n< alert('hello');\n27c25\n< alert('world');\n---\n", status:"Bug #42" } ], // If discussion of features can be tied to a file then we could have a // big head-start in writing documentation chat-log:[ { timestamp:2009-04-06-13-09-00, sender:jwalker, message:"Hello" }, { timestamp:2009-04-06-13-09-05, sender:kdangoor, message:"Hi Joe" }, } }
TODO: Work out how fast-append might work with pickling