The last stop of the summer trip

I like the last stop of our trip. It is a big quiet house in countryside. It has a great view to the ranch, through the windows and patio door, when you sit in the living room. When I was enjoying a piece of quite time in the couch, the voice of Meryl Streep in the movie “Out of Africa” slowly came to my mind : ” I had a farm …”.

The idea of buying a farm came to my mind, but some inititial research vaporized the dream: farming by no mean is an easy job: water, soil, corp, marketing, weather, fire, … Staying with the IT sector is definitely a better choice for me, though.

 

完全成熟的杏真好吃

我家后院有一颗杏树,今年结的特别多。小时候每年麦黄的时候都有人在集市上卖杏,妈妈有时候买回来一些,甜甜的真好吃。长大以后自己都是从超市里买杏吃,大概是没有完全成熟就采摘下来放熟的,这样的杏吃多了,都忘了完全成熟的杏是什么味道。今年我们的杏挂在树上,虽然颜色已经完全变为金黄,但是尝一尝还有点儿酸,我们一时也搞不清楚到底成熟了没有,后来干脆就不去管它,结果完全成熟的杏就从树上掉下来落到草地上,捡起来一尝,又甜又软,好吃得不得了。分头送给邻居们,大家都说好久没有吃过这么好吃的杏了。这是2021年迄今为止的一件大快乐,把童年的记忆都带回来了。

学习量子力学是个正确的想法

自己有好一阵子不知道该学点儿什么。在人工神经网络、概率与统计、变分法分析、程序设计语言、Linux kernel、Dark Web 等各种题目中游移不定。最近终于想明白了,量子力学还是自己的最爱。它是通向微观物理世界的桥梁,是人类新的哲学观点的发源地。对于自由意志还是决定论这个终极哲学问题,提供了决定性的武器。值得学习。

乒乓球的新握拍姿势

自己这一阵子手指疼痛,特别是左右手两个食指,钢琴弹不了,乒乓球也打不了,感到自己老之将至,心情很是低落。今天早晨邀请妈妈打乒乓球的时候,突然发现自己可以用中指和食指两根指头从后面支撑球拍,而只用无名指和小指握住拍柄。这样的握法对食指的压力很小,打起来在在小力量下似乎也没有什么太大的问题,只要注意大拇指在正反手转换的时候调整位置就好了,自己顿时感觉心头很高兴。

贝多芬在耳聋的情况下没有自杀,而是继续创作出了很多音乐,也许上天也在给自己这些考验吧。自己总是能够找出来一条路继续向前的。

Integration By Parts

Integration by parts is a common approach to simplify an integral, it is often useful when two fucntions are multiplied together, the rule is like this: u(x) is a funtion, its independent variable is x, dependent variable is u, v(x) is another funtion with x as independent variable and u as its dependent variable,  then we have:

(1)   \begin{equation*} \int u v dx = u \int v dx - \int \dfrac{du}{dx} \cdot (\int v dx) dx \end{equation*}

For example, if we want to compute:

    \[\int x \cos (x) dx\]

here we have u = x and v = \cos (x), so

    \[\dfrac{du}{dx} = 1\]

    \[\int v dx = \int \cos (x) dx = \sin (x)\]

Put everything together, we have

    \[\int x \cos (x) dx = x \sin (x) - \int \sin(x) dx = x \sin(x) + \cos(x) + C\]

This rule is very helpful if \dfrac{du}{dx} is simpler than u and \int{vdx} is easy to compute. Where does this rule come from? Here is a way to get it through Product Rule for derivatives:

(2)   \begin{equation*} \dfrac{d(uv)}{dx} = u \cdot \dfrac{dv}{dx} + v \cdot \dfrac{du}{dx} \end{equation*}

Integrate both side and rearrange, we will have:

    \[\int \dfrac{d(uv)}{dx} dx = \int {u \cdot \dfrac{dv}{dx}} dx + \int {v \cdot \dfrac{du}{dx}} dx\]

    \[uv = \int {u \cdot \dfrac{dv}{dx}} dx + \int {v \cdot \dfrac{du}{dx}} dx\]

    \[\int {u \cdot \dfrac{dv}{dx}} dx = uv - \int {v \cdot \dfrac{du}{dx}} dx\]

if we assume \dfrac{dv}{dx} = w, so we have v = \int{wdx}, replace the terms with v in the equation above, we will have:

    \[\int u w dx = u \int {w dx} - \int \dfrac{du}{dx} \cdot (\int w dx) dx\]

This is exactly the integration by parts equation we give at the begin, only with v replaced by w.

If we apply definite integral on (2), we will have

    \[\int_{a}^{b} \dfrac{d(uv)}{dx} dx = \int_{a}^{b} {u \cdot \dfrac{dv}{dx}} dx + \int_{a}^{b} {v \cdot \dfrac{du}{dx}} dx\]

    \[\int_{a}^{b} {u \cdot \dfrac{dv}{dx}} dx = - \int_{a}^{b} {v \cdot \dfrac{du}{dx}} dx + uv \Biggr|_{a}^{b}\]

The above result, as someone called it: “… you can peel a derivative off one factor in a product, and slap it onto the other factor – it will cost you a minus sign, and you will pick up a boundary term. …”.

继续持有或者换一种股票,哪一个更划算?

假定我现在有 Google 的股票100万,但是我发现 Amazon 的股票涨势更好,我是不是应该卖掉 Google 股票,换成 Amazon 的股票呢?

这个问题看上去挺简单,实际上却有多种因素的作用在里面,我最近跟一个投资股票市场多年的朋友请教了一下,算是有了一些理解。

最简单的,既然 Amazon 股票涨得更好,那当然应该卖掉 Google 股票,换成 Amazon 的股票。

可是接下来我又想到,把现在的 Google 股票卖掉本身就要交税,只能用剩下的钱去买 Amazon 股票,这样一来虽然 Amazon 的股票涨势更好,但是基数却更小,这样下来一定划算吗?

再进一步,即使基于持有 Google 股票将来总数更高,但是要交税的部分也更高(因为换成 Amazon 股票的时候就已经交过一部分税了),这样下来即使总数更高也不一定划算吧?

朋友帮我列了一个公式计算如下:

假定我现在 Google 股票有 100,其中 50 是原有资本, 50 是 截至目前为止需要交税的 Capital Gain。另外,假定 Capital Gain 部分的税率是 20%,假定未来一年里 Google 股票的涨幅 是 x,Amazon 股票的涨幅 是 y,那么:

如果不卖掉股票,那么未来一年持有的股票总额是:

    \[100 + 100x\]

如果一年以后立刻卖掉的话,到手的收益是:

    \[((100 + 100x) - 50) \times (1 - 20 \%) + 50\]

如果现在立即卖掉股票,到手数目是:

    \[50 + 50 \times (1 - 20 \%) = 90\]

这也是可以买到的 Amazon的股票数目,那么一年以后持有的股票总额是:

    \[90 + 90y\]

如果一年以后立即卖掉Amazon股票,到手数目是:

    \[90 + 90y \times (1 - 20 \%)\]

由此,如果我们需要股票换手更划算,就是要求:

    \[90 + 90y \times (1 - 20 \%) > ((100 + 100x) - 50) \times (1 - 20 \%) + 50\]

求解以后得到:

    \[y > \dfrac{10}{9} x\]

更一般的,如果假定现有股票中 cost basis 是 a,capital gain 是 b,个人所得税的税率是 p,那么想要股票换手不亏的话,我们需要:

    \[(a + b + (a+b)x - a) (1-p) + a < a + b (1-p) + (a+b(1-p)) y (1-p)\]

简化以后即可得到:

(1)   \begin{equation*} \dfrac{y}{x} > \dfrac{a+b}{a+b-bp} \end{equation*}

换手之后是否亏损,虽然 yx 是线性关系,但是和abp都是非线性相关。特别的,如果现在的股票总额中capital gain (也就是b)占的比重越多,或者 tax 的税率越高,那么对于换手去的那只股票的增长幅度就要求更高,这也符合我们的直觉。

比较与幸福

人生活在世界上都追求幸福,但是幸福是什么呢?这是一个主观判断问题。凡是这类问题就必然涉及到价值判断的标准。从逻辑上来讲,要涉及到判断和标准必然牵扯到比较,所以在幸福与否这个问题上,和别人的比较似乎是不可避免的。然而与他人的比较,常常是导致生活烦恼的根源。当今社会传媒发达,媒体里的成功人士千千万万,绝大部分人在金钱和社会地位方面与之比较只能自叹不如。但是我想,这些差别并不一定构成生活的不幸。判断幸福与否,恐怕还是不能脱离生命的本质:考虑到人是一个生命体,生存和繁衍作为最基本的需求,在当今社会绝大部分人那里应该不会成为问题。除开这两点之外,每个人都有权追求自己认为的生活中可以让自己更加幸福的事情,和他人的比较、模仿和超越只是其中的一种办法罢了。

KMP (Knuth Morris Pratt) Algorithm

Question: Search a pattern with length m in a string with length n. We assume that n \ge m.

The naive approach will be matching the pattern at every location of the string and perform m comparison at each location, so the complexity is around O(m \times n).

KMP algorithm preprocess the pattern to produce a table T with the following m entries (we assume the string and pattern are all 0-based) to help backtracing when a mismatch is found:

  • The pattern will contain m sub-patterns, each pattern starts with the location 0, and ends (inclusive) at location i = 0, 1, 2, ... m-1. For example, for pattern “AAABAAA“, it has 7 sub patterns as “A“, “AA“, “AAA“, “AAAB“, “AAABA“, “AAABAA” and “AAABAAA“.
  • For each sub pattern, we look at its begining and end, and find its longest proper prefix which happens to be a suffix. Notice that proper prefix means that the prefix should not cover the whole string. For example, for sub-pattern “AAABAA“, the prefix which happens to be suffix is “AA”, which has a length 2. For sub-pattern “AA”, the prefix and suffix is “A”, which has the length 1. For single character pattern like “A”, since it has no proper prefix, its value is always 0. By going through all the sub patterns, we construct the table with each entry as an integer to indicate the length of maximum proper prefix as following:

    \[0, 1, 2, 0, 1, 2, 3\]

With such a table constructed, now we can start the search as the following:

  1. Aligning the pattern with the string at string position i = 0;
  2. Compare each character in pattern with the character in the string. The string within the range [i, i+m-1] is called the current string window. We use j to represent the index to the pattern.
  3. If all the characters in the current window match, we find a match and stop. By this way we find the leftmost match. The KMP algorithm could easily be adapted to find all the matches in the string, but that will make the algorithm a little more complicated and here we will try to make things easier by focus on its core idea.
  4. Now suppose at the location j which is not equal to m-1, the match fails. This means the text in the current window does not match the pattern. However, we don’t need to move forward the string window by only 1 and start over with the comparison from the begin of the pattern. Instead, we can use the table we construct before: since the characters from location [i, i+j-1] already match the current sub-pattern (a.k.a. pattern [0, j-1] inclusively), by looking at the size of proper prefix at the current sub-pattern T[j-1], we can move the window start directly to i+j-T[j-1], and confidently know that the string from that start to the position right before the current location already matches the start of the pattern (since for the sub pattern, this is a suffix and also a prefix), and we just continue to comparison at current string location and pattern location T[j-1].

The above process can be illurstrated by the following diagram:

This covers the most element of the KMP algorithm, but when I read the materials listed in the reference section, I had the following question: Why do we have to shift the pattern to the start of the suffix? Isn’t possible that the next potential match could start with somewhere else within the pattern? For example, given the following sub-pattern:

If we have successfully matched the sub pattern above, do we have to just assuming only A_s is matched, why can’t we shift the pattern to start at the A in the middle?

I actually did some research on the articles I can find on Internet, most of them are focused on how to efficiently implement the step 1-4 described above, instead of reasoning about the process. Fortuntely, it is not hard to prove the new match has to start at A_s. In the above example sub-pattern, x and y stands for arbitrary sub-string, A stands for the same sub string, which happens to be the proper prefix, proper suffix, and also shows up in the middle. Consider the following two possibilities:

  • If x is not equal to y, since we know the string in the current window has matched the sub-pattern so far (a.k.a. AxAyA), if we just shift the pattern to the middle A, then we are trying to match pattern x with a portion of string which we already know is y, and they can’t match each other, so shift pattern to middle A can not produce a match.
  • If x is EQUAL to y, then A is NOT the longest prefix any more, Instead, the longest proper prefix will be AxA (or AyA), this contradicts with the fact that we assume A is already the longest proper prefix.

So, if  we know A is the proper longest prefix, then shifting the pattern to the start of the suffix A is the only way which could produce a match. The similar proof can be done for the sub pattern below.

The KMP algorithm is very smart, salute to those inventors of the algorithm!

References:

Greedy algorithm

greedy algorithm is any algorithm that follows the problem-solving heuristic of making the locally optimal choice at each stage.

Following this strategy, we can get solutions to a variaty of problems, not necessarily the best solution.

We can make whatever choice seems best at the moment and then solve the subproblems that arise later. The choice made by a greedy algorithm may depend on choices made so far, but not on future choices or all the solutions to the subproblem. It iteratively makes one greedy choice after another, reducing each given problem into a smaller one. In other words, a greedy algorithm never reconsiders its choices. This is the main difference from dynamic programming, which is exhaustive and is guaranteed to find the solution. After every stage, dynamic programming makes decisions based on all the decisions made in the previous stage, and may reconsider the previous stage’s algorithmic path to solution.

贪心算法在很多场合都非常有效,最少它可以帮助你写出第一个工作的版本,然后从这里开始在探索更好的解决方案(比如动态规划等等)。