Enhanced Towers of Hanoi Backup Scheduling
Extended Abstract

Last Revised $Date: 2000-06-13 01:47:53-07 $

Vincent Cordrey & Jordan Schwartz

http://etoh.wopr.net/ex.abstract.html

  1. Abstract

    Enhanced Towers of Hanoi is an adaptation of the standard Towers of Hanoi backup scheduling scheme that employs additional dimensions to reduce overall backup load, while maintaining robustness. We introduce the concept of viewing a backup schedule as a set of patterns which can be described as dimensions. Temporal striping, which can be considered a dimension, is discussed with regard to load balancing. Included are details of an implementation from a production site, as well as algorithms to calculate the ETOH cycle elements complete with temporal striping offsets. The discussion indicates where additional backup software infrastructure is required to fully implement a higher order multidimensional ETOH schedule.

  2. Definition of Terms
  3. Background

    Backup & Recovery duties generally fall within the realm or scope of the Systems Manager role. Backup scheduling becomes a larger concern as the amount of data to be backed up grows and or the backup window shrinks. Traditional schemes have not changed much since the early days.

    1. Previous Schemes

      1. Standard schemes

        1. All Full

          A full backup (level 0) of all data is performed in each backup window.

        2. Weekly Fulls

          A weekly cycle started with a full (level 0) backup and various patterns of subsequent incrementals. In general, the weekly fulls were done on the weekends when the backup window can potentially span up to two days. This also took advantage of a reduced usage due to a 9 to 5 Monday through Friday work week. Some examples of the weekly full sequences are:

          1. 055555
          2. 0123456 or 0iiiiii
          3. 0335577

        3. Monthly Fulls

          As the amount of data and the reliability of backup and recovery systems increased, it became possible to run full backups only once per month. Monthly backup cycles also permit backup striping so that not all systems are being backed up on the same weekend. The remaining days of the month were typically filled in with either incremental, differential, or layered approaches. The inclusion of a numerically lower level on the subsequent weekends than the levels run during the week also provides a way to capture all of the data backed up in a given week. A typical pattern looked like this:

          Monthly Full
          0 F 9 I 9 I 9 I 9 I 9 I 9 I
          5 W 9 I 9 I 9 I 9 I 9 I 9 I
          5 W 9 I 9 I 9 I 9 I 9 I 9 I
          5 W 9 I 9 I 9 I 9 I 9 I 9 I
          5 W 9 I 9 I 9 I 9 I 9 I 9 I

      2. One dimensional of Towers of Hanoi

        This is the traditional implementation of the modified Towers of Hanoi approach which is described in detail in Curtis Preston's Backup & Recovery O'Reilly & Associates Book.

        1. Weekly Towers of Hanoi

          Towers of Hanoi implementations are typically weekly in scope. By virtue of making sure that any changed file appears on at least two backup images, a Towers of Hanoi scheme provides a more robust backup schedule. At the same time, the number of times a changed file is backed up is limited and therefore the tape utilization is reduced as compared to other schemes. The standard pattern can be described as weekly fulls followed by daily incrementals.

          One dimensional Towers of Hanoi, Weekly Cycle
          0 3 2 5 4 7 6

        2. Monthly Towers of Hanoi

          It is also possible to implement a monthly modified Towers of Hanoi cycle. From the schedule described in Curtis Prestons Book, we see the schedule look like this:

          One Dimensional Towers of Hanoi, Monthly Cycle
          0 F 3 I 2 I 5 I 4 I 7 I 6 I
          1 W 3 I 2 I 5 I 4 I 7 I 6 I
          1 W 3 I 2 I 5 I 4 I 7 I 6 I
          1 W 3 I 2 I 5 I 4 I 7 I 6 I
          1 W 3 I 2 I 5 I 4 I 7 I 6 I

  4. Enhanced Pattern

    1. Basic Discussion

      The Enhanced Tower of Hanoi backup scheduling pattern employs a multidimensional approach. When implemented with a monthly cycle, one dimension addresses incremental backups during the week (levels 5 through 9), and the weekly deltas (levels 0-3) form a second dimension.

      Two Dimensional Enhanced Towers of Hanoi, Monthly Cycle
      0 F 6 I 5 I 8 I 7 I 9 I 8 I
      3 W 6 I 5 I 8 I 7 I 9 I 8 I
      2 W 6 I 5 I 8 I 7 I 9 I 8 I
      4 W 6 I 5 I 8 I 7 I 9 I 8 I
      3 W 6 I 5 I 8 I 7 I 9 I 8 I

      The ETOH backup cycle is essentially monthly oriented: a month is approximately the smallest unit into which you can cram two dimensions. The technique is not Earth shattering when compared to the one dimensional traditional monthly Towers of Hanoi backup cycle, but the concept of adding dimensions does allow for additional flexibility to address other schemes or issues such as a third dimension.

    2. Dimensions in ETOH

      Each dimension is its own pattern orthogonal to (independent of) the other dimensions. We accomplish splitting the linear progression of levels into two dimensions by limiting the range of levels used in a particular dimension. The upshot of this is that since there is no overlap between the dimensions, they can be considered independent.

      In the case of the ETOH backup cycle, the dimensions are also complementary: each higher order dimension encloses all of the lower dimensions. The higher order dimensions are also sparsely populated into the linear space of the lower dimensions: the weekends are scattered more or less evenly (every 7 days) through the progression of days we call a month (even though the month is not divisible by 7).

      Thus, when an individual cycle element from a higher order dimension is encountered, all information in the lower order dimensions is captured and reset. This is what we mean by enclosure: the lower dimensions can be thought of as living in the spaces between the elements of the next higher dimension.

      Another way of looking at the dimensions is that each dimension is a sub cycle of the overall cycle, in a fractal like descent from higher dimensions to lower dimensions.

  5. Additional Dimensions

    1. Temporal Striping as Load Balancing

      Traditional striping distributes load across several elements which operate in tandem with one another to share a portion of the workload, there by reducing the work done by any given element. For backups, we use the same concept of distributing the load onto multiple elements, but in this case the elements being striped onto are temporal.

      The basic idea is to reduce the concentration of full dumps on a particular day by spreading the days on which the monthly full dumps are done across the weekends of a month.

      Generally, this is done by scheduling the different hosts to full dump on different weekends, but when dealing with extremely large collections of data such as Network Attached Storage, or Storage Area Networks, there is an advantage to spreading the volumes of data within the file servers across the weekends as well.

    2. Temporal Striping as an Additional Dimension

      The basic concept of a dimension as pertains to backup cycles refers to controlling how much data is selected for backup in a particular backup window. Temporal striping has the same effect by generating additional copies of the cycle which operate in parallel to one another with offsets of their cycle start points. Each of these copies can be thought of as a temporal striping channel. In that sense, temporal striping has elements (channels) which form a dimension orthogonal to (independent of) the cycle dimensions.

      Schedule offsets are not new, but viewing them as one of the dimensions in a scheme of backup cycles is.

    3. Baseline Schemes

      Traditional Baseline schemes add an additional level to the backup cycle which causes the "full" backup to backup only the data which has changed since the Baseline was created. In that sense, it is similar to any other backup level, and can be thought of as a -1 backup level.

      In point of fact, the Epoch Infinite Storage System implemented this with an actual level of "-1" which was generally used to store data on alternate media and effectively removed it from the monthly level 0 backups.

      Baselines can also be applied to ETOH cycles. In this case, the simplest implementation runs a baseline of level 0 at the start of a quarter and the monthly "full" becomes a level 1 backup.

      This adds a degenerate dimension to the cycle, which consists of a single point added to the scheme in the form of a once per quarter baseline.

    4. Four Dimensional ETOH

      The discussion so far has addressed two cycle patterns, one across the weekdays, and one across the weekends. These two dimensions can be complemented by Temporal Striping to add a third dimension.

      Now, it becomes possible to add a third cycle which replaces the "monthly fulls" with baseline deltas. This cycle can also be a Towers of Hanoi progression. It takes the degenerate (point like) dimension of the baseline and expands it into a full cycle.

      The result is three cycle dimensions on which Temporal Striping can be imposed, and presto, we have a 4 dimensional Enhanced Towers of Hanoi pattern. [Additional dimensions can be added by splitting the year into quarters, and running individual cycles within each quarter and a cycle for the start of each quarter, or adding a cycle onto each year with the baseline being done on the decade, but we've digressed too far already.]

  6. Retention Policies

    When backing up large amounts of data, media utilization becomes important for maximizing online availability by conserving tape library space. Additionally, media and offsite storage costs can be reduced. One issue which directly influences the media utilization is the length of time the data is kept on the media. By setting up volume groups with different retention periods, media retirement and reuse becomes possible for media containing interim data.

    Retention periods can associated with the type of backup data sets present on the media. For this paper, the data set types are broken into several retention classes. Each of these retention classes corresponds to one of the cycle dimensions described previously. (The temporal striping dimension does not create its own retention class since this would have the effect of multiplying the number of media pools.)

    The retention classes and their corresponding cycle dimensions are:

    1. Baseline
    2. Baseline Deltas
    3. Full Backup
    4. Deltas
    5. Incrementals

    Backup data sets placed on a specific piece of media should not have different retention periods because the longest retention period will prevent the media from expiring and becoming available for reuse. To support this, media pools can be established which are organized by retention.

    To simplify the implementation, and limit the number of media pools, retention classes which share a common retention period can be grouped. For example, the baselines and Full Backups can be set for a one year retention and assigned to one media pool, while the Deltas and Incrementals could have a 2 month retention and assigned to a second media pool. In this example, there are two retention classes and two media pools. [A detailed discussion is presented in the implementation section.]

  7. Impacts

    1. Resource Utilization

      The ETOH scheme inherits the characteristics of modified Towers of Hanoi backup schedules in that it nominally makes at least two copies of any changed data before the end of the cycle, while also limiting the number of copies.

      1. Reduced work load

        This limit on the number of copies causes a reduction of the total data backed up in a cycle. And by backing up less data, the workload of the backup server is reduced. Imposing temporal striping further reduces the per window backup load. Thus, a backup server may even be able to serve more clients or continue to provide satisfactory performance as the data sets grow in size.

        With this reduced work load, the backup times for any given system are decreased. This allows the backup window to shrink in response to the demand for production data availability.

      2. Media consumption

        Use of retention classes allows media to be expired and reused more frequently, this earlier exipration reduces the total number of media required for each retention class. Further, ETOH uses fewer media by limiting the number of times a file is put on all media, so aggregate media use is reduced.

      3. Meta data size

        With a reduction of the overall number of times a given file or element is backed up, there is a concomitant reduction in the size of the backup meta data. This reduced volume of meta data can mitigate the degradation of file index search times, and requires less disk space for the storage of the meta data over time. This last point, disk savings is particularly apropos where the backup system is being used as a long term archive.

      4. Savings Calculations

        As an example, let's assume a 5% per month churn rate on the files in a filesystem using a standard weekly full scheme where the weekly full tapes are retained for 1 year. Looking at the amount of data in the weekly fulls retention class, we have 52 times the size of the data set:

        storage = 52 * A

        With a Monthly ETOH scheme, the volume is significantly reduced. Making the assumption that data changed during the month will be backed up twice in the weekly delta retention class, we have:

        storage = 12 * A + (52 - 12) * 0.05 * 2 * A

        storage = 12 * A + 4 * A = 16 * A

        This represents a significant reduction on the order of 52 / 16 or a 3.25:1 reduction ratio.

    2. Restores

      1. Number of media required

        As compared to nightly full backups, ETOH requires a larger number of backup data sets to be read during a restore. The additional media mounts can have an impact on the restore time, but with modern removable media technologies and their high per volume data density, the effect is minor compared to the total time spent reading each piece of media.

        By comparison to the weekly full with daily differentials (0iiiiiii or 0123456), where the number of backup data sets can be as large as 7 for a full restore, ETOH offers an advantage by limiting the number of data sets to about three.

        The difference is even more dramatic when compaired to the monthly full with daily differentials, since the number of tapes can be significant. The motivation behind a monthly full with daily differentials is maintaining an absolute minimum media and meta data utilization at the cost of the number of volumes required for a restore.

        Daily differentials also fail to make more than one backup copy of a changed file. This is exactly the opposite philosophy of the daily Full which maximizes the number of copies of a changed file at the cost of high resource utilization.

      2. Probability of Failure

        Since any given backup data set may wind up being on an unreadable media volume, the greater than one copy effect of a Towers of Hanoi scheme offers a lowered probability of failure versus the daily differential approach. The daily full offers the lowest probability of failure provided the backup window is long enough to accomplish a full backup of the data each night. At larger sites, it may not be financially feasible to build a system which has sufficient performance to accomplish a full backup every night.

        ETOH walks the middle ground between these two extremes by making at least two copies of any changed file in a month, while limiting the number of times a file is copied to conserve resources.

        There is a weak spot in the Monthly ETOH cycle: data changed immediately prior to the lowest level in the second dimension (weekly deltas) is not backed up by subsequent weekly deltas until the next cycle is started. This risk can be reduced by duplicating or copying the lowest level backup in the second dimension.

        Many modern backup systems employ a media duplication strategy. The use of a dedicated backup system can mean that the backup server stands idle during the interval between backup windows, and this time can be leveraged to perform media duplication without impacting the performance of the production systems.

        While all backup systems can benefit from media duplication, ETOH can run with duplication limited to the lower levels of the higher order dimensions, reducing duplication load and resource utilization.

  8. Implementation

    Completion of this section is pending further data gathering on the implementation details. A project using ETOH is presently in work, and the lessons learned from that implementation will be used to write a generalized discussion of the process and pitfalls.

    The following outline describes our current thinking and the scope of our investigations.

    1. When to Implement ETOH
      1. Conditions
        1. Increase of Backup Load
        2. Running Past End of Backup Window
        3. High Churn Rate on Filesystem
      2. Data Types (How the data changes)
        1. Read-Only Filesystems
        2. Replicated Filesystems
        3. Regular Files
        4. Database Files
          1. Restoring the database from two weeks ago is worthless. (Data expires quickly)
      3. Business Types (When the data changes)
        1. Non Retail (9-5 Mon-Fri)
        2. Retail Stores (Open Weekend Days)
        3. Network Based Retailers (Active 24x7?)
    2. Algorithms to Calculate Cycle
      1. Pseudocode
      2. Full perl code for level w/ weekly striping offset on http://etoh.wopr.net
    3. Planning the Schedules
      1. Estimate Total Load
      2. Determine Cycle Type
      3. Create Backup Groups
      4. Evaulate need for Striping
      5. Scheduling the Cycles
        1. Ordinary Start of Month Fulls
        2. Striping Offsets
          1. Weekend Full offset schedules
          2. Monthly Delta (Baselined Strategies)
        3. Schedule Start Point Strategy
          1. Start the First Day of Cycle Unit
          2. Start First Weekend Day of Cycle
    4. Media pools & Retention Classes
      1. Typical Implementation
      2. Simplified Implementation
    5. Platform Specific
      1. Full Perl Executable (on etoh.wopr.net)
      2. Scripting Backup Schedule Table Creation
      3. Front-end Scripts
        1. Hostdump.sh
        2. Amanda
        3. Legato
  9. Case Studies

    The case studies have not been completed, but the outline of the general design follows. We will seek approval to publish the names of the organizations, and include them if permission is granted.

    1. Public Policy Research Organization

      This case is used as an example of a backup scheme that does not employ ETOH. It is included for comparison.

      1. Weekly Cycle
      2. Every 4th full dump 1 Yr. Retention
      3. "Numbering Schemes"
        1. Fouth full dumps were called Sets"
        2. Saved one backup, Set 0,r per year forever
      4. Sets stored off site
      5. No cloning

    2. Aerospace Firm

      This is the site where we first employed ETOH on a fairly large scale with a dedicated backup server servicing network attached storage.

      1. Genisis of ETOH
      2. Striping Applied
      3. Started with hand coded Legato schedule tables
      4. Cloning
      5. Year retention of monthly fulls and weekly deltas
      6. Quarterly retention of daily incrementals

  10. Summary

    1. Benefits

      1. Cost Reduction

        By limiting the number of times a changed file is backed up, ETOH and the concept of dimensional backups reduce both the size and redundancy of backup media collections. These smaller collections mean reduced storage and maintenance costs and lower media cost.

        A Properly implemented multidimensional ETOH scheduling system helps reduce the costs of purchasing and maintaining backup hardware and media by load balancing and reducing the resource utilization of the backup systems. Additionally, since ETOH reduces the volume of data backed up in any given backup window, a given network and backup system does not have to be scaled as rapidly as the data grows. Likewise, backup times are shortened and the backup windows can be shrunk to accommodate a longer production window.

      2. Robustness

        The "two copies of each changed file" property of Towers of Hanoi and multidimensional ETOH schemes increases the robustness of a backup system by guarding against media failures. It removes the single point of failure and consequently increases the likelihood of successful restores. While no system is 100% reliable, the mitigation of risk due to media failure is significant.

      3. Reduced Index Size

        For systems with long retention requirements, multidimensional ETOH offers significantly reduced meta data storage requirements. This smaller volume of data means it is possible to retain more data for a longer period and keep the meta data online.

    2. Impacts

      1. Risks

        Without automated software integrated into the backup & recovery system to generate multidimensional ETOH schedules, errors in manual construction of the sequences can result in either too much data backed up, or the loss of redundancy. The algorithm provided with this paper can be leveraged to create the sequences in a semi automated fashion. Further, since the algorithm can produce a sequence number based on the current real time clock, the backup scheduling system can be supplanted with crontab entries to bypass the internal scheduling tables in the backup software.

      2. Complexity

        For backup systems which use dump, the restore progression is less obvious than traditional cycles. This can once again be handled by an algorithm.

        Taking advantage of additional retention classes will increase the complexity of managing media utilization and library space for improperly sized systems. Also, temporal striping increases the number of different retention classes active in any given backup window. For systems with few drives, this may reveal a choke point as backups destined for a specific media pool wait for an available drive.

  11. Future Enhancements

    The multidimensional approach to backup cycles solidifies previous avenues of thinking about backup systems design and makes some limitations in current backup software apparent. In this section, we talk about some of the things we could do if the software and other infrastructure supported the necessary concepts and tools.

    1. Call For More Levels

      More than two cycle dimensions require more levels. The current range of levels 0 through 9 common to most backup software does not provide enough linear address space into which to map the third and higher order dimension. The result is that for software with levels 0 through 9, there is a sweet spot with either a Monthly ETOH cycle or a Baselined Monthly ETOH progression.

      1. Baseline levels

        If a base line level of -1 is available, this can permit a multi tier baseline which, while it is not strictly speaking a three dimensional ETOH progression, provides some of the benefits. This multi-tier baseline is constructed with a yearly baseline of level -1, and a quarterly tier two of level 0. The monthly baseline delta can then be implemented with level 1.

        If no -1 baseline level is available, all hope is not lost. The monthly ETOH progression presented here reserves levels 0 and 1 to permit a baseline to be created at a quarterly or yearly interval with the monthly baseline delta running at level 1.

      2. Levels 10, 11, 12, 13, 14, ....

        Going back to that hair brained stuff we were digressing about earlier, three or four cycle dimensions of ETOH can be created for longer term archive systems. These and higher order cases (like a semi monthly dimension) require additional levels.

    2. Variable Length Cycle Implementation

      The algorithms used to generate the Towers of Hanoi progressions can generate sequences of nearly any length. This permits flexible assignment of dimension mapping to calendar days. As mentioned above, semi monthly sequences and decade sequences are also possible.

      Backup & Recovery software could incorporate cycle generating algorithms similar to the one written for this paper. A graphical user interface could be constructed which would have some of the standard dimension to calendar day mappings such as week days, weekends, monthly weekend, quarters, semi years, years, semi decades, decades, etc. The temporal striping could also be easily worked into the interface. This infrastructure would reduce the complexity cost and integrity risk of implementing a manually generated ETOH schedule.

      Temporal striping offsets are not limited to a four weekend pattern, and with a bimonthly pattern, can be extended to 8 channels. Longer patterns are possible and can be used to stripe baselines.

  12. Beyond the Horizon

    Systems and data are surviving much longer than originally expected. This trend is likely to continue, especially as data models mature and stabilize. Object orientation of systems and data types with hierarchical structures present a fractal approach to computing which can lead to vast quantities of stored information. These information objects will continue to be useable with their imbedded methods, so the legacy systems will not be hardware, but rather objects which are never discarded.

    Thus, while at first it might seem ludicrous to establish a cycle dimension each element of which occurs on a semi century mark, because the second time data would be written to the engraved stone tablet backup system would be long after the death of the human who established the pattern, books have come to us across the millennia, preserved in libraries, data objects will likewise be similarly preserved.

  13. Bibliography
    1. Curtis Preston, UNIX Backup & Recovery
    2. Nemeth, Systems Administration
    3. Man pages on ufsdump
    4. Amanda Reference Manual
    5. Legato Admin Guide
    6. NetBackup Admin Guide
    7. Towers of Hanoi at Toronto.edu
    8. How to see in More Dimensions, Ragnar-Olaf Buchweitz