User:Rkent/Folder Data Persistence
Contents
Intro
The point of this page is to record notes on understanding data persistence of information associated with folders. This is complicated because the information is frequently stored in multiple in-memory objects, as well as in two different database: dbFolderInfo which is a table of the main folder message summary database, as well as panacea.dat which is a cache of the same information.
These notes were prepared to understand proposals in bug 1032360: "nsMsgLocalMailFolder::GetSizeOnDisk seems to return wrong value for maildir store" to persistently store the folder size, but should be applicable to other issues of folder persistence.
Notes
Folder Cache (panacea.dat)
The folder cache file, named "panacea.dat", is a profile-wide file that contains summary information for each folder. Its main purpose is to allow the folder pane to display the folder tree and associated folder information (such as unread count) without having to open the mork summary file for each folder to get that information, which would take both excessive time and memory. The canonical source for folder metadata though is the dbFolderInfo object that is stored within the mork summary file, so this plan creates a sync issue between these files, as well as with associated memory objects owner by the folder.
(The naming of panacea.dat is in nsMailDirProvider.cpp for the leaf name, but NS_APP_MESSENGER_FOLDER_CACHE_50_FILE is used to access the full path to that file).
panacea.dat is accessed using nsIMsgFolderCacheElement, which uses a string key to access data for a folder. The string key is defined in nsMsgDBFolder::GetFolderCacheKey using the persistent path of the .msf mork file for the folder (for non root-folders) or the main folder (for the root folder). Examples:
C:\\Users\\Kent\\AppData\\Roaming\\Thunderbird\\Profiles\\3gtxygma.default\\ImapMail\\mail.caspia.com\\Sent.msf
C:\\Users\\Kent\\AppData\\Roaming\\Thunderbird\\Profiles\\3gtxygma.default\\ImapMail\\mail.caspia.com
panacea.dat is owned by nsMsgAccountManager, which has methods to acquire the folder cache object and write to it.
Updating the Folder Cache
Here is an example of how the Folder Cache gets updated (along with dbFolderInfo) in a sample operation. The sample used is copying an unread message from an IMAP folder to a local folder, which will require updating the unread count and the total count for the local folder.
Stack to write to local folder cache element (using esr31):
> xul.dll!nsMsgLocalMailFolder::WriteToFolderCacheElem(nsIMsgFolderCacheElement * element) Line 1088 xul.dll!nsMsgDBFolder::WriteToFolderCache(nsIMsgFolderCache * folderCache, bool deep) Line 1386 xul.dll!nsMsgDBFolder::FlushToFolderCache() Line 1362 xul.dll!nsMsgDBFolder::UpdateSummaryTotals(bool force) Line 4178 xul.dll!nsMsgDBFolder::EnableNotifications(int notificationType, bool enable, bool dbBatching) Line 5085 xul.dll!nsMsgLocalMailFolder::EndCopy(bool aCopySucceeded) Line 2447 xul.dll!nsCopyMessageStreamListener::EndCopy(nsISupports * url, tag_nsresult aStatus) Line 114 xul.dll!nsCopyMessageStreamListener::OnStopRequest(nsIRequest * request, nsISupports * ctxt, tag_nsresult aStatus) Line 144 xul.dll!nsImapCacheStreamListener::OnStopRequest(nsIRequest * request, nsISupports * aCtxt, tag_nsresult aStatus) Line 8629
The method UpdateSummaryTotals is key to keeping the folder cache current, but the way information is managed there is quite convoluted. It does the following things:
- Call ReadDBFolderInfo(force), which makes sure that the folder object member variables have been initialized from the cache. If force == true, or the initialization from the cache fails, then the variables are initialized from dbFolderInfo instead. So this method is badly misnamed, it should be something like "InitializeFolderMetadata". Because force==true will cause the folder db to be opened, it is important that force==true is only used in cases where the db is already open, or we expect it to be opened. If force==false, except at initialization ReadDBFolderInfo is a noop.
- Sends OnItemIntPropertyChanged notifications to nsIFolderListener objects for the unread count and total count for the folder
- Writes updated information to the folder cache
If force==true, then things are really confusing and convoluted. In ReadDBFolderInfo, certain folder metadata (mNumTotalMessages, mNumUnreadMessages, mExpungedBytes, mName, mCharset, mCharsetOverride, nsMsgFolderFlags::GotNew) are read from dbFolderInfo (overwriting any local values in the folder object). Cache element metadata mNumPendingUnreadMessages, mNumPendingTotalMessages, mFolderSize, mFlags is not. The handling of pending counts is particularly confusing. They seem to be mostly managed through the method ChangeNumPending... which updates mNumPendingUnreadMessages and then updates dbFolderInfo but not the cache, as well as notifies. (This seems like a performance issue. You should not have to open a folder database to record that there are messages pending that have not been downloaded).
How to understand all of this? The philosophy seems to be the following:
1) Any time that folder metadata is changed, that change needs to be written immediately to dbFolderInfo. 2) dbFolderInfo is maintained by the db, so adding a message header will implicitly update dbFolderInfo. 3) db operations do message db listener notifications, but not folder-level notifications. folderCache is considered a folder-level notification, so is done by the folder.
Updating unread count in dbfolderinfo
Stack to update unread count:
xul.dll!nsDBFolderInfo::ChangeNumUnreadMessages(int delta) Line 517 xul.dll!nsMsgDatabase::AddNewHdrToDB(nsIMsgDBHdr * newHdr, bool notify) Line 3508 xul.dll!nsMsgLocalMailFolder::EndCopy(bool aCopySucceeded) Line 2378 xul.dll!nsCopyMessageStreamListener::EndCopy(nsISupports * url, tag_nsresult aStatus) Line 114 xul.dll!nsCopyMessageStreamListener::OnStopRequest(nsIRequest * request, nsISupports * ctxt, tag_nsresult aStatus) Line 144 xul.dll!nsImapCacheStreamListener::OnStopRequest(nsIRequest * request, nsISupports * aCtxt, tag_nsresult aStatus) Line 8629
So nsCopyMessageStreamListener is managing the calls. EndCopy is called first, which adds the message to the database, resulting in incrementing numUnreadMessages in dbFolderInfo
Updating folderSize in dbFolderInfo
Stack:
xul.dll!nsDBFolderInfo::SetFolderSize(unsigned __int64 size) Line 423 xul.dll!nsMsgBrkMBoxStore::SetSummaryFileValid(nsIMsgFolder * aFolder, nsIMsgDatabase * aDB, bool aValid) Line 301 xul.dll!nsMailDatabase::SetSummaryValid(bool aValid) Line 126 xul.dll!nsMsgLocalMailFolder::OnCopyCompleted(nsISupports * srcSupport, bool moveCopySucceeded) Line 1352 xul.dll!nsMsgLocalMailFolder::EndCopy(bool aCopySucceeded) Line 2455 xul.dll!nsCopyMessageStreamListener::EndCopy(nsISupports * url, tag_nsresult aStatus) Line 114 xul.dll!nsCopyMessageStreamListener::OnStopRequest(nsIRequest * request, nsISupports * ctxt, tag_nsresult aStatus) Line 144 xul.dll!nsImapCacheStreamListener::OnStopRequest(nsIRequest * request, nsISupports * aCtxt, tag_nsresult aStatus) Line 8629
Analysis
As a general rule, changes to folder metadata is first written to dbFolderInfo without doing changes on the equivalent member variables in the msgFolder. The variables in the msgFolder are changed at the end of operations, reading from dbFolderInfo in ReadDBFolderInfo.
ReadDBFolderInfo(force)
nsMsgDBFolder::ReadDBFolderInfo(bool force) when force==false is a no-op except for the first time a folder is created, where the folder member objects for folder metadata are initialized from the folder cache.
ReadDBFolderInfo(false) is used typically in a Get...() call, where you want to make sure that the variable has been initialized from the cache before returning it.
ReadDBFolderInfo(true) is used typically after folder metadata has changed in dbFolderInfo, and you want to update the folder member variables, typically also doing any required notifications. See nsMsgDBFolder::UpdateSummaryTotals
UpdateSummaryTotals(force)
In all cases, UpdateSummaryTotals will:
- Initialize member variables using ReadDBFolderInfo(force)
- Notify changes in kTotalMessagesAtom and kTotalUnreadMessagesAtom
- call FlushToFolderCache to update the cache.
When force==true, then the objects are read from dbFolderInfo, so this is the method used to update relevant member variables by reading from dbFolderInfo, doing folder-level notifications, and flushing the changes to the folderCache.
When force==false the situation is trickier. Member variables are initialized from the folder cache, with notifications, without opening the database. This is done when the folder object is first initialized (in GetSubfolders), or in SummaryChanged().
SummaryChanged()
This is just a synonym for UpdateSummaryTotals(false), and is only used in IMAP. Its main purpose seems to be to write folder metadata to the folder cache. This assumes that the metadata was written to dbFolderInfo at the time it was changed. One common use seems to be after ChangeNumPending... which does changes in dbFolderInfo and notifications but not in folderCache. The call to UpdateSummaryTotals(false) only flushes the changes to the folder cache.
folderSize and IMAP databases
Overloaded meaning and definition of folderSize
One of the things that makes folderSize so complex is that its meaning is overloaded, with at least three different uses:
- On local folders with mbox, folderSize is used to report to the UI the size of the message folder on disk, as well as used as an indicator of whether the summary file is valid. Although these values are the same, the timing issues on updates for these two issues may be different.
- For maildir, calculating folderSize is slow directly from the disk, but it is not used as an indicator of validity of the message summary file.
- On IMAP, the summary file is still used with offline folders, but the meaning of folderSize is changed to represent the server-side storage used for messages. For this reason, folderSize is unavailable for use with offline folders to represent the validity of the summary file.
How does IMAP use the summary file and mbox storage, and avoid the paths that check for summary file valid?
For a local folder with mbox, toggling a message as read results in marking the summary file valid through this stack:
xul.dll!nsMailDatabase::SetSummaryValid(bool aValid) Line 119 xul.dll!nsMailDatabase::EndBatch() Line 78 xul.dll!nsMsgDBFolder::EnableNotifications(int notificationType, bool enable, bool dbBatching) Line 5085 xul.dll!nsMsgDBView::ApplyCommandToIndices(int command, unsigned int * indices, int numIndices) Line 2932 xul.dll!nsMsgDBView::CycleCell(int row, nsITreeColumn * col) Line 2067
The same operation on IMAP has an overridden EndBatch which is a no-op:
xul.dll!nsImapMailDatabase::EndBatch() Line 71 xul.dll!nsMsgDBFolder::EnableNotifications(int notificationType, bool enable, bool dbBatching) Line 5085 xul.dll!nsMsgDBView::ApplyCommandToIndices(int command, unsigned int * indices, int numIndices) Line 2932 xul.dll!nsMsgDBView::CycleCell(int row, nsITreeColumn * col) Line 2067
But also, SetSummaryValid for IMAP is different, without the folderSize test:
NS_IMETHODIMP nsImapMailDatabase::SetSummaryValid(bool valid) { if (m_dbFolderInfo) { m_dbFolderInfo->SetVersion(valid ? GetCurVersion() : 0); Commit(nsMsgDBCommitType::kLargeCommit); } return NS_OK; }
Where do we use summaryValid?
- in nsMsgFolderCompactor to skip compacting folders that are being parsed
- on nsMsgDatabase::CheckForErrors, which is used in checking for errors when opening databases
- In MoveMail, LocalFolders, and POP3Server (all POP3/local)
So for non-local folders, this is only used when opening the folder.
Conclusions for maildir folderSize
- maildir needs to send folderSize updates to the msgFolder, but IMAP needs to ignore those. That means that IMAP needs to override SetSizeOnDisk with a noop. Perhaps it needs instead SetSizeOnServer which it will use instead.