Taha Yavuz Bodur

Sire, there is no Royal Road to Wisdom

Posts tagged data structures

2 notes &

B-trees, Shadowing, and Clones

** Stand on the shoulders of giants!

From: Ohad Rodeh of IBM Haifa Research Labs

 B-trees are used by many file-systems to represent files and directories. They provide guarantied
logarithmic time key-search, insert, and remove. File systems like WAFL and ZFS use shadowing,
or copy-on-write, to implement snapshots, crash-recovery, write-batching and RAID. Serious
difficulties arise when trying to use b-trees and shadowing in a single system.

*

B-trees [R. Bayer and E. McCreight 1972],
(
Prefix B-Trees by RUDOLF BAYER and KARL UNTERAUER,
Deletion Without Rebalancing in Multiway Search Trees,
and more,
The Ubiquitous B-Tree by Douglas Comer,
B+ Trees and Indexed Sequential Files: A Performance Comparison by D.S. Batory
The K-D-B-Tree: A Search Structure for Large Multidimensional Dynamic Indexes by John T. Robinson,
Efficient Locking for Concurrent Operations on B-Trees by PHILIP L. LEHMAN
) are used by
several file systems 
[A. Sweeny et al. 1996; S. Best 2002; J. Menon et al. 2003; H.Reiser ] to represent files and directories.
Compared to
(traditional indirect blocks [M. McKusick et al. 1984: A Fast File System for UNIX:: A reimplementation of the UNIX file system is described.],
more,
Mechanics of Disk Access, Design and Implementation of the Second Extended Filesystem
) 
b-trees offer guaranteed logarithmic time key-search, insert and remove. Furthermore, b-trees can represent sparse files well.

*

Shadowing is a technique used by file systems like WAFL [D. Hitz et al. 1994] and
ZFS [V. Henson et al. 2003] to ensure atomic update to persistent data-structures.
It is a powerful mechanism that has been used to implement snapshots, crashrecovery, write-batching, and RAID. The basic scheme is to look at the file system as a large tree made up of fixed-sized pages. Shadowing means that to update an on-disk page, the entire page is read into memory, modified, and later written to disk at an alternate location.

When a page is shadowed its location on disk changes, this creates a need to update (and shadow) the immediate ancestor of the page with the new address. Shadowing propagates up to the file system root. We call this kind of shadowing strict to distinguish it from other forms of shadowing [J. Gray and A. Reuter 1993].
Figure 1 shows an initial file system with root A that contains seven nodes. After leaf node C is modified a complete path to the root is
shadowed creating a new tree rooted at A’. Nodes A, B, and C become unreachable and will later on be deallocated.

Filed under papers important data structures reference people cite definition file organization unix

9 notes &

What are some must-know topics in discrete math and probability for competitive programming?

Discrete mathematics is a vast topic, but here are some essential things that you should know to be able to solve programming contest problems:

  • Counting : You are going to see a lot of problems like “Count the number of ways to do X”. In most of the cases, enumerating all possible ways is simply not going to work, and you are expected to use some kind of dynamic programming. To be able to solve counting problems properly, you need to have a good understanding of certain topics such as:
    • Recurrence relations : Ubiquitous in counting problems. In particular, learn to come up with recurrences for counting. Also, learn to convert recurrences of high complexity to lower complexity
    • Matrix exponentiation : In the case of recurrences that have constant coefficients, it is possible to express the final state vector as the product of the power of a matrix and the initial state vector. A good example is calculating Fibonacci numbers using matrix multiplication. This trick helps to reduce the complexity of finding Nth term of some recurrences from O(N) to O(Log N)
    • Binomial coefficient: Many counting problems with symmetries can be reduced to finding a few cases and multiplying them with some binomial coefficients. So learn to calculate these efficiently depending on the type of problem. There are different kinds of precomputations one does. Also, try to learn the properties of binomial coefficients modulo primes.
    • Pigeonhole principle: More of a proof-aid technique rather than something that you directly apply. But I have used this in cases where I was required to find two cases which share some property. Birthday attack can be thought of as the probabilistic variant of pigeonhole principle, but it is not that commonly used.
    • Inclusion–exclusion principle : How to formalise dealing with double counting. It is sometimes so much easier to double count in the beginning and use inclusion-exclusion later than to try some double counting avoiding recurrence
    • Subsets and Permutations : Learn to generate subsets of a set and permutations of a sequence. There are some counting problems where you are expected to sum up quantities over permutations or subsets. DP over subsets is something that you should learn too in this regard.
    • Indistinguishable and distinguishable objects : Learn the difference, and learn how counting changes between the two cases.
    • Lattice problems : Counting the number of ways to go between two points on a lattice with various kinds of restrictions is a very common problem. Learn the basic method, and how things like binomial coefficients and Catalan numbers appear in the solutions.
    • Counting on graphs : The questions of this type that I have seen commonly include tree DPs. Counting problems on non-tree graphs are usually hard. You see problems coming up based on Kirchhoff’s theorem once in a while too.
    • Pólya enumeration theorem : I have seen this coming up in contests only a couple of times. But learn this and if something comes up, it should be easy for you.
  • Learn to work with moduli : Most of the times, you are asked to do the counting mod some big prime (most favourite is 10^9 +7). Learn to do the calculations without overflow, and learn some basic properties of primes from group theory which help solve some problems.
  • Probability and Expectation value : Here are the basic things you should know:
    • Calculating probabilities as ratios of counting problems : Do this and you can calculate things to very good accuracy
    • Recurrences for probabilities and expectation values : In a lot of cases, even when you can count and take ratios to get probabilities, it is much easier to find a recurrence directly in terms of probabilities. Taking care of double counting can be a pain though
    • Linearity of expectation value : Learn to use it right (I can’t) and many expectation value problems should be cakewalks for you
    • Random walk: Modified random walk problems come up very often as probability/expectation value problems, so learn the basic results
    • Coin toss problems : Another set of practically overused problems

From: http://www.quora.com/Competitive-Programming/What-are-some-must-know-topics-in-discrete-math-and-probability-for-competitive-programming

Filed under list computer science algorithms data structures quora discrete linear algebra math graph theory people probability