Categories
English

A Simple Spider for Researcher Ranking Project

A important issue in ranking researchers is to construct the citation network between researchers.  To achieve it, I need crawl the database of cictation data from HistCite Website, whose URL is http://www.garfield.library.upenn.edu/histcomp/index.html

 

I downloaded a python spider from web and revised it to make it useable. It is not perfect but enough for this project. It is really cool to watch command windows scrolling down and downloading thousands of papes. Wow, is it so called geek behaviour? 

 

Source Code

The original version comes from this website http://xlvector.net/blog/?p=18

Categories
English

Draft of my Report about ReRank Project.

link

This is the draft of my report about ReRank Project. It is sketchy but includes most points.

My basic idea is to transfer ranking problem into a network optimization problem.  Other guys have proposed an entropy maximization scheme to rank web pages in WWW, which is more fair then PageRank that Google uses. I borrowed this framework and analyzed the citation network formed by papers. Then I modeled the relation between researchers and papers with Bipartite Graph. By maximizing entropy again, I got the most general indicators of researchers’ popularity and influence.
The attachment is a brief summary of my idea.It’s highly appreciated if you can view it briefly before the discussion.
This is only basic idea of theoretical framework, there is long way to for before getting any meaningful result.
I will update it if I revise it later.
Categories
English

ReRank Algorithm Progress Mar 17

  Do you remember the idea of  Ranking Researchers? I spent the whole spring break to explore that idea.
       I read some articles about H-index, which is a research ability indicator used by Science and Nature. H-index can be summarized as: “the h-index of a researcher is h if he has exactly h papers whose citations are above h. “. Actually it only makes use of very limited information. Although the ignorance of large information make it robust and invulnerable for manipulation in some senses, it is disputable because the information filter is ugly defined by humans. In fact, we can define many h-index like indicators with different rank orders. It is impossible to justify which is the most fair one.
       Another method to rank researchers is to use citation analysis. Similar methods have been widely used in ranking webpages—probably the most famous one is the “PageRank” algorithm that powers Google. The essence idea of PageRank Algorithm is to calculate stationary probability distribution of random walk with surfer follows each out-link with equal probability.
PageRank works very well in ranking webpages. Yet it also relies on an assumption:”The surfer follows each out-link with equal probability”. The most fair ranking system should based on facts only and not rely on any human-defined assumption.
After removing this assumption from PageRank algorith, we get TrafficRank Algorithm.
The key ideas of TrafficRank Algorithm is to derive the most general(uncertain) conclusion of ranking order based on the existing information. Because the uncertainty of a system is characterized by entropy, it is actually an optimization problem to maximize the entropy. You can refer paper “A New Paradigm for Ranking Pages on the World Wide Web” by John A. Tomlin or the report I will post later for detail.
Both PageRank and TrafficRank target on webpages. Ranking researchers and ranking webpages share some characteristics, but they are different.
Webpages are connected only by link, while the relationship between researchers are much more complicated. My goal in the following days will be characterizing the relation between researchers with the suitable network model.

 

Categories
中文

决心改掉的毛病

当年微软收购雅虎,杨致远的摇摆不定导致雅虎错失良机。虽然杨致远碰到了互联网时代这么好的机遇,但是优柔寡断的性格让他在15年后吃到了苦头。反之,乔布斯,史玉柱在受到了那么大的挫折的情况下还能东山再起,靠得就是刚毅的性格。

现在越来越意识到自己身上存在的一些毛病。

1. 做事优柔寡断。因为我什么事情都要把得失算得很清楚再做,但是许多时候这个是根本就算不出来的,导致我摇摆不定,有是后迟迟做不了决定。

2. 没有恒心。虽然我在宏观目标上可以做到一如既往,但是具体到事情上我很少有主动坚持很长时间的。许多时候碰到困难就退缩了。

3. 与人合作能力差,我总是喜欢个人英雄主义,有的时候会感觉周围的人很菜,不适合团队工作。

4. 容易受外界影响。周围人很努力的时候我可以干得很好,但是周围人都去玩的时候我就会有所懈怠。

2011年里面一定要改掉这个毛病,立此为证!!