## Segment tree and Binary indexed tree (1)

Recently, I discovered two interesting data structures, Segment tree and Binary index tree (BIT).

Both data structures are good at managing some range of aggregated values. For example, in a discrete range, and I want to query some summarised value for a sub-range. I also want to updates this value for elemental interval. Both segment tree and BIT make query and update operations efficient with worst case complexity O(lgn).

To me, segment tree is easier to understand. It is a binary tree that each leafs forms a distinct contiguous ranges. And internal nodes is a union range of both children. So it looks like this:

Each node stores some interesting value. For example, it can store the sum of sub array with index range matching the node's range. Or it can store the max/min value in that range as well.

When computing the range, we always use two pointers, where first one represents the starting position **inclusive** and second one represents the ending position **exclusive**.
The following code can be used to construct the tree.

```
def segment_tree(root):
start, end = root.start, root.end
if end == start:
return None
mid = start + (end - start) / 2
root.left = segment_tree(TreeNode(start, mid))
root.right = segment_tree(TreeNode(mid, end))
```

With segment tree, we can efficiently query data in any sub range. The presumption here is that the computation of the data associated with the range is associative. That is, a + b = b + a and (a + b) + c = a + (b + c), + can be any operation.

Let's say we are looking at sum. And the sum of individual element in the tree is equal to it's index. And the sum of all element in node range is store with the node as `val`

. So obviously sum of range 0~9 is 45.

If we want sum of range 2~8, we look at the root node first, root has range 0~9, this means the requested range is valid, and there is a sum for it. We break down 2~8 into 2~4 and 5~8, pass 2~4 to the left child, and 5~8 to the right. And apply the same logic recursively on children. Child will return a sum, then the parent/root node add the sum to get the result. The code would be:

```
def get_value(root, s, e):
start,end=root.start,root.end
if end==start: return 0
if e==end and s==start: return tree.val
if e<=start or end<=s: return 0 # out of range
mid=start+(end-start)/2
return get_value(root.left, s, mid) + get_value(root.right, mid, e)
```

Update of value are per value, so let's say we update value at index 4 to 8. By looking the the tree graph, we know that we need to update several range nodes because they contains index 4, namely, 0~9,0~4,2~4,3~4 and 4. The update itself is quite simple, just add the difference. To find the ranges that needs to be updated, we apply the same logic as `get_value`

, instead of return a sum, we apply the update to the node.

```
def update(root, i, diff):
start,end=root.start,root.end
if start==i and end==start+1: return
if not start<=i<=end: return
root.val += diff
mid=start+(end-start)/2
return update(root.left, s, mid) + update(root.right, mid, e)
```

There is a variant of interval tree called **segment tree**. So each node instead of containing a range, contains an segment. So segments are mutually exclusive to each other. And the value of segment is the aggregated value of all segments in that interval. For example, we have a list of intervals 1~5, 3~6, 2~4, segment tree will break it down into segments 1~2,3~4,4~5,5~6
and with each segment node it stores the summarized value of all intervals containing that segment.

EDIT: 2017-10-21 I find that building the tree from top down is not efficient in some cases. For example, if we are interested in the max of a range. Then we should build the tree bottom up.

An iterative way to build the tree up is:

```
function segment(list) {
const nodes = [];
for (let i=0;i<list.length;i++) {
nodes.push({max: list[i], range: [i, i+1]});
}
let l = list.length;
while (l > 0) {
for (let i=0;i<l-1;i+=2) {
nodes[i/2] = {
max: Math.max(nodes[i].max, nodes[i+1].max),
range: [nodes[i]['range'][0], nodes[i+1]['range'][1]],
left: nodes[i],
right: nodes[i+1]
};
}
if (l % 2 != 0) {
nodes[(l-1)/2] = nodes[l-1];
}
l /= 2;
}
return nodes[0];
}
```

The disadvantage of segment tree is that it can become unbalanced when it's modified. That will affect the efficiency of the lookup and update operations. Thus, segment tree is a static data structure.

I will discuss BIT in next post.