SVG:Data Storage
Feel free to edit this document as heavily or lightly as you want, but please try to keep comments explicitly denoted as having being made by you to a minimum. People are reluctant to remove such comments, and this prevents documents from evolving into a spec/final documentation because they just become an unmanageable mass of comments no on will touch. Instead try to integrate your comments into the flow of the document where possible. Do a significant rewrite if necessary.
Contents
Introduction
One of the ways in which the SVG specification stands out from other W3C XML specifications is the extent to which it crams lots of data into attributes. For example, it's not uncommon to see an SVG file containing a <path> tag with a 'd' attribute (for the path data points) similar to the following:
<path d="M600,350 l 50,-25 a25,25 -30 0,1 50,-25 l 50,-25 a25,50 -30 0,1 50,-25 l 50,-25 a25,75 -30 0,1 50,-25 l 50,-25 a25,100 -30 0,1 50,-25 l 50,-25">
Having so much data crammed into attributes makes it difficult to programmatically access and change much of the data in an SVG graphic using get/setAttribute. To make that easier, almost all SVG attributes (data crammed or otherwise) are mirrored by multi-level trees of objects in the SVG DOM which contain typed data corresponding to the attributes' values. This eliminates the need for scripters and others to write their own parsing/serializing code for attribute values. However, it presents challenges to implementers who must provide these object heavy interfaces while minimizing memory use and maximizing rendering speed.
The problem this document aims to address is how and where to store the typed data.
Current Implementation Strategy
The strategy currently employed in Mozilla's SVG implementation is to implement the object heavy SVG DOM interfaces as full objects and store the typed data used by the internal code on those objects. For each attribute that has meaning on a particular SVG element there is always a corresponding DOM object tree in memory - even when the attribute isn't set on the element in the SVG markup. The internal code obtains typed data (often default values) as required from these trees since they always exist.
The strategy currently used is the most obvious and straightforward one (and it has several non-obvious advantages) but it is inherently very (too) memory intensive. There is a profusion of objects (e.g., three objects for each SVGAnimatedLength), and the objects in our implementation are XPCOM objects implementing multiple interfaces, and objects exist even for non-existent attributes.
We need to come up with a new implementation strategy to drastically reduce our excessive memory consumption while allowing for fast rendering and declarative animation. Of course each strategy has significant implementation problems of its own. The rest of this document describes possible strategies and the problems we need to solve.
Alternative Implementation Strategies
The following strategies assume that parsing data from attribute values every time it's needed would be too expensive. Therefore we will continue to store typed data mirroring SVG attributes in some way.
The strategies below also assume we will use tearoffs to implement the object heavy SVG DOM element interfaces.
There are two different strategies: one for attributes which are usually present on a particular kind of SVG element, and one for attributes which are usually not present.
Strategy A: Frequently Present Attributes
Store the typed data directly in the content element object as a field member (NOT pointer or reference), not reference counted. For example, animated lengths can be stored in about 8 bytes (a float, plus some units and other metadata) in the common case where animation is not being used.
Pros.
- The typed data is always there for the internal code to access. Using flags (including null pointers), the internal code may not need to have so much knowledge about default values or have much branching code to handle whether the attribute is present or not.
Strategy B: Infrequently Present Attributes
Store the typed data as a "property object" attached to the content element via SetProperty/GetProperty. Create the property object whenever data is needed by DOM access or by SVG rendering. Property objects can be XPCOM objects so we don't need tearoffs for them. The content element holds a strong reference to the object, and releases that reference when the content element dies. Use state bits in the content element to record whether the property is present.
Pros.
- No storage required when the attribute is not present. Reasonably fast access to the data when required.
Tearoffs
Tearoff objects will hold strong references to content element objects, because their underlying data resides in the content element object. Tearoffs cannot copy their data because that would break consistency between DOM properties and attribute values (and besides, it's wasteful).
We may want to store tearoffs as properties of content element objects, to be sure we reuse a tearoff if the same getter is used many times.
Tearoffs retrive/set their data by calling methods on content elements. We would like to share tearoff classes as much as possible; this may be aided by definining common value getter/setter methods in nsSVGElement. For example we may want a common method GetBaseValue() which takes a tag parameter specifying which base value is being retrieved (e.g., tag_X). Then we can have a single tearoff class for "base values" which contains an mTag field and can be used to retrieve base lengths from all kinds of SVG elements.
Issues to Solve
- Notifications. Attribute changes notify the content element, so it can update inline data (A) or any present property objects (B). For updates via DOM API calls, the API implementation (in a tearoff, content element object, or freestanding XPCOM object) will have to route notifications itself. Style change notifications go through frames' DidSetStyleContext which needs to route notifications.
- Shape-changing DOM object trees. Animated lengths are a simple example because the shape of the DOM object tree never changes; there's always an animated length object, which always has baseVal and animVal children. Path data, for example, is more complicated. What happens if JS retrieves a PathSeg and then replaces that segment in the underlying path? It depends on the SVG semantics. Perhaps the semantics are that pathsegs just have to be copied. (Otherwise, regardless of implementation, it's unclear what should happen when someone grabs a reference to a PathSeg and then changes the attribute ... how can you know "which" path segment has changed?) Exactly how this works is going to be resolved on a case by case basis.