Integration By Parts

Integration by parts is a common approach to simplify an integral, it is often useful when two fucntions are multiplied together, the rule is like this: u(x) is a funtion, its independent variable is x, dependent variable is u, v(x) is another funtion with x as independent variable and u as its dependent variable,  then we have:

(1)   \begin{equation*} \int u v dx = u \int v dx - \int \dfrac{du}{dx} \cdot (\int v dx) dx \end{equation*}

For example, if we want to compute:

    \[\int x \cos (x) dx\]

here we have u = x and v = \cos (x), so

    \[\dfrac{du}{dx} = 1\]

    \[\int v dx = \int \cos (x) dx = \sin (x)\]

Put everything together, we have

    \[\int x \cos (x) dx = x \sin (x) - \int \sin(x) dx = x \sin(x) + \cos(x) + C\]

This rule is very helpful if \dfrac{du}{dx} is simpler than u and \int{vdx} is easy to compute. Where does this rule come from? Here is a way to get it through Product Rule for derivatives:

(2)   \begin{equation*} \dfrac{d(uv)}{dx} = u \cdot \dfrac{dv}{dx} + v \cdot \dfrac{du}{dx} \end{equation*}

Integrate both side and rearrange, we will have:

    \[\int \dfrac{d(uv)}{dx} dx = \int {u \cdot \dfrac{dv}{dx}} dx + \int {v \cdot \dfrac{du}{dx}} dx\]

    \[uv = \int {u \cdot \dfrac{dv}{dx}} dx + \int {v \cdot \dfrac{du}{dx}} dx\]

    \[\int {u \cdot \dfrac{dv}{dx}} dx = uv - \int {v \cdot \dfrac{du}{dx}} dx\]

if we assume \dfrac{dv}{dx} = w, so we have v = \int{wdx}, replace the terms with v in the equation above, we will have:

    \[\int u w dx = u \int {w dx} - \int \dfrac{du}{dx} \cdot (\int w dx) dx\]

This is exactly the integration by parts equation we give at the begin, only with v replaced by w.


假定我现在有 Google 的股票100万,但是我发现 Amazon 的股票涨势更好,我是不是应该卖掉 Google 股票,换成 Amazon 的股票呢?


最简单的,既然 Amazon 股票涨得更好,那当然应该卖掉 Google 股票,换成 Amazon 的股票。

可是接下来我又想到,把现在的 Google 股票卖掉本身就要交税,只能用剩下的钱去买 Amazon 股票,这样一来虽然 Amazon 的股票涨势更好,但是基数却更小,这样下来一定划算吗?

再进一步,即使基于持有 Google 股票将来总数更高,但是要交税的部分也更高(因为换成 Amazon 股票的时候就已经交过一部分税了),这样下来即使总数更高也不一定划算吧?


假定我现在 Google 股票有 100,其中 50 是原有资本, 50 是 截至目前为止需要交税的 Capital Gain。另外,假定 Capital Gain 部分的税率是 20%,假定未来一年里 Google 股票的涨幅 是 x,Amazon 股票的涨幅 是 y,那么:


    \[100 + 100x\]


    \[((100 + 100x) - 50) \times (1 - 20 \%) + 50\]


    \[50 + 50 \times (1 - 20 \%) = 90\]

这也是可以买到的 Amazon的股票数目,那么一年以后持有的股票总额是:

    \[90 + 90y\]


    \[90 + 90y \times (1 - 20 \%)\]


    \[90 + 90y \times (1 - 20 \%) > ((100 + 100x) - 50) \times (1 - 20 \%) + 50\]


    \[y > \dfrac{10}{9} x\]

更一般的,如果假定现有股票中 cost basis 是 a,capital gain 是 b,个人所得税的税率是 p,那么想要股票换手不亏的话,我们需要:

    \[(a + b + (a+b)x - a) (1-p) + a < a + b (1-p) + (a+b(1-p)) y (1-p)\]


(1)   \begin{equation*} \dfrac{y}{x} > \dfrac{a+b}{a+b-bp} \end{equation*}

换手之后是否亏损,虽然 yx 是线性关系,但是和abp都是非线性相关。特别的,如果现在的股票总额中capital gain (也就是b)占的比重越多,或者 tax 的税率越高,那么对于换手去的那只股票的增长幅度就要求更高,这也符合我们的直觉。



KMP (Knuth Morris Pratt) Algorithm

Question: Search a pattern with length m in a string with length n. We assume that n \ge m.

The naive approach will be matching the pattern at every location of the string and perform m comparison at each location, so the complexity is around O(m \times n).

KMP algorithm preprocess the pattern to produce a table T with the following m entries (we assume the string and pattern are all 0-based) to help backtracing when a mismatch is found:

  • The pattern will contain m sub-patterns, each pattern starts with the location 0, and ends (inclusive) at location i = 0, 1, 2, ... m-1. For example, for pattern “AAABAAA“, it has 7 sub patterns as “A“, “AA“, “AAA“, “AAAB“, “AAABA“, “AAABAA” and “AAABAAA“.
  • For each sub pattern, we look at its begining and end, and find its longest proper prefix which happens to be a suffix. Notice that proper prefix means that the prefix should not cover the whole string. For example, for sub-pattern “AAABAA“, the prefix which happens to be suffix is “AA”, which has a length 2. For sub-pattern “AA”, the prefix and suffix is “A”, which has the length 1. For single character pattern like “A”, since it has no proper prefix, its value is always 0. By going through all the sub patterns, we construct the table with each entry as an integer to indicate the length of maximum proper prefix as following:

    \[0, 1, 2, 0, 1, 2, 3\]

With such a table constructed, now we can start the search as the following:

  1. Aligning the pattern with the string at string position i = 0;
  2. Compare each character in pattern with the character in the string. The string within the range [i, i+m-1] is called the current string window. We use j to represent the index to the pattern.
  3. If all the characters in the current window match, we find a match and stop. By this way we find the leftmost match. The KMP algorithm could easily be adapted to find all the matches in the string, but that will make the algorithm a little more complicated and here we will try to make things easier by focus on its core idea.
  4. Now suppose at the location j which is not equal to m-1, the match fails. This means the text in the current window does not match the pattern. However, we don’t need to move forward the string window by only 1 and start over with the comparison from the begin of the pattern. Instead, we can use the table we construct before: since the characters from location [i, i+j-1] already match the current sub-pattern (a.k.a. pattern [0, j-1] inclusively), by looking at the size of proper prefix at the current sub-pattern T[j-1], we can move the window start directly to i+j-T[j-1], and confidently know that the string from that start to the position right before the current location already matches the start of the pattern (since for the sub pattern, this is a suffix and also a prefix), and we just continue to comparison at current string location and pattern location T[j-1].

The above process can be illurstrated by the following diagram:

This covers the most element of the KMP algorithm, but when I read the materials listed in the reference section, I had the following question: Why do we have to shift the pattern to the start of the suffix? Isn’t possible that the next potential match could start with somewhere else within the pattern? For example, given the following sub-pattern:

If we have successfully matched the sub pattern above, do we have to just assuming only A_s is matched, why can’t we shift the pattern to start at the A in the middle?

I actually did some research on the articles I can find on Internet, most of them are focused on how to efficiently implement the step 1-4 described above, instead of reasoning about the process. Fortuntely, it is not hard to prove the new match has to start at A_s. In the above example sub-pattern, x and y stands for arbitrary sub-string, A stands for the same sub string, which happens to be the proper prefix, proper suffix, and also shows up in the middle. Consider the following two possibilities:

  • If x is not equal to y, since we know the string in the current window has matched the sub-pattern so far (a.k.a. AxAyA), if we just shift the pattern to the middle A, then we are trying to match pattern x with a portion of string which we already know is y, and they can’t match each other, so shift pattern to middle A can not produce a match.
  • If x is EQUAL to y, then A is NOT the longest prefix any more, Instead, the longest proper prefix will be AxA (or AyA), this contradicts with the fact that we assume A is already the longest proper prefix.

So, if  we know A is the proper longest prefix, then shifting the pattern to the start of the suffix A is the only way which could produce a match. The similar proof can be done for the sub pattern below.

The KMP algorithm is very smart, salute to those inventors of the algorithm!


Greedy algorithm

greedy algorithm is any algorithm that follows the problem-solving heuristic of making the locally optimal choice at each stage.

Following this strategy, we can get solutions to a variaty of problems, not necessarily the best solution.

We can make whatever choice seems best at the moment and then solve the subproblems that arise later. The choice made by a greedy algorithm may depend on choices made so far, but not on future choices or all the solutions to the subproblem. It iteratively makes one greedy choice after another, reducing each given problem into a smaller one. In other words, a greedy algorithm never reconsiders its choices. This is the main difference from dynamic programming, which is exhaustive and is guaranteed to find the solution. After every stage, dynamic programming makes decisions based on all the decisions made in the previous stage, and may reconsider the previous stage’s algorithmic path to solution.


Heap Algorithm

Heap is a binary tree where the max element is always at the root.  Heap is usually implemented by using an array where:

  • Root is at element index 0.
  • A node’s two children is at indices 2i +1 and 2i+2.
  • A node’s parent is at index i/2.

Insert a new element can be done by appending it to the end of the array then do a “SIFT UP” operation to restore the heap. To remove the largest element can be done by removing the element at index 0, move the last element of the array to the index 0, then do a “SIFT DOWN” operation.

Heap is useful in many situations, for example:

  • Top N elements from an array.
  • K-way merging
  • Sorting.

The power of heap comes from the point where it finds max element avoid doing the comparison with all other elements again, instead, the heap structure keeps some comparison result so the new element’s position can be decided in less time.


前一阵子一时兴起,想要用现有自己的电脑和设备,打造一个家庭影院。反正电视音响都是现成的,只需要一个能够播放节目源的 HTPC 就好啦。不过自己很小气,舍不得买一部新的电脑,结果就用以前公司发的 Chromebook 作为替代。Chromebook 一切都好,就是每次操作的时候要跑到电脑跟前,有些不方便。想来想去,后来自己就搞了一个Chromecast,反正很便宜。这样手机拿在手里,直接在选好节目,然后在chromecast上就可以开始播放。这样搞了一阵子,发现还是有些不方便,因为有些节目源不支持cast。折腾来折腾去,后来突然想到干脆搞一套遥控的键盘和鼠标,这样不就可以远程操作我的chromebook了吗?一念即此我就又把以前买苹果电脑 IMac 附带的无线键盘和鼠标拿出来,擦掉灰装上电池开始使用。键盘倒是很快就设好了,可是鼠标怎么也不工作。搞来搞去,最后上网一查,发现老款的苹果无线鼠标,质量就是差,很容易掉线。按照网上很多人的攻略,我给鼠标的电池仓里垫了一些纸,结果也没有完全修好。最后一咬牙,干脆花十块钱买了一个新的无线鼠标。心想这下子问题该全部解决了吧。嘿嘿,我没有想到,现在的无线鼠标需要占用一个USB A接口,但是接了这个USB以后,我的chromebook就连不了功放了。郁闷了一个小时,最后发现。我以前买过一个 HDMI spliter, 这个设备可以在一个接口上同时连接功放和电视,这样省下来的USB A接口就可以连接无线鼠标啦。

折腾了这么久,终于把设备全部都设置好了。可是除了头一天刚设好的时候,兴趣盎然,玩了很久以外,接下来的两个星期根本没有碰过这套设备。看着一大堆东西,杂七杂八的堆在电视旁边,自己觉得很不爽。仔细考虑之下,我就又把chromebook,HDMI split 还有无线鼠标等等全部又都收拾起来了。前后这么折腾了几个星期,重新又回到原点,劳民伤财,费神费力。现有的东西真的已经足够好了,以后再也不能干这些事情啦!