Do you remember the idea of Ranking Researchers? I spent the whole spring break to explore that idea.
I read some articles about H-index, which is a research ability indicator used by Science and Nature. H-index can be summarized as: “the h-index of a researcher is h if he has exactly h papers whose citations are above h. “. Actually it only makes use of very limited information. Although the ignorance of large information make it robust and invulnerable for manipulation in some senses, it is disputable because the information filter is ugly defined by humans. In fact, we can define many h-index like indicators with different rank orders. It is impossible to justify which is the most fair one.
Another method to rank researchers is to use citation analysis. Similar methods have been widely used in ranking webpages—probably the most famous one is the “PageRank” algorithm that powers Google. The essence idea of PageRank Algorithm is to calculate stationary probability distribution of random walk with surfer follows each out-link with equal probability.
PageRank works very well in ranking webpages. Yet it also relies on an assumption:”The surfer follows each out-link with equal probability”. The most fair ranking system should based on facts only and not rely on any human-defined assumption.
After removing this assumption from PageRank algorith, we get TrafficRank Algorithm.
The key ideas of TrafficRank Algorithm is to derive the most general(uncertain) conclusion of ranking order based on the existing information. Because the uncertainty of a system is characterized by entropy, it is actually an optimization problem to maximize the entropy. You can refer paper “A New Paradigm for Ranking Pages on the World Wide Web” by John A. Tomlin or the report I will post later for detail.
Both PageRank and TrafficRank target on webpages. Ranking researchers and ranking webpages share some characteristics, but they are different.
Webpages are connected only by link, while the relationship between researchers are much more complicated. My goal in the following days will be characterizing the relation between researchers with the suitable network model.
I read some articles about H-index, which is a research ability indicator used by Science and Nature. H-index can be summarized as: “the h-index of a researcher is h if he has exactly h papers whose citations are above h. “. Actually it only makes use of very limited information. Although the ignorance of large information make it robust and invulnerable for manipulation in some senses, it is disputable because the information filter is ugly defined by humans. In fact, we can define many h-index like indicators with different rank orders. It is impossible to justify which is the most fair one.
Another method to rank researchers is to use citation analysis. Similar methods have been widely used in ranking webpages—probably the most famous one is the “PageRank” algorithm that powers Google. The essence idea of PageRank Algorithm is to calculate stationary probability distribution of random walk with surfer follows each out-link with equal probability.
PageRank works very well in ranking webpages. Yet it also relies on an assumption:”The surfer follows each out-link with equal probability”. The most fair ranking system should based on facts only and not rely on any human-defined assumption.
After removing this assumption from PageRank algorith, we get TrafficRank Algorithm.
The key ideas of TrafficRank Algorithm is to derive the most general(uncertain) conclusion of ranking order based on the existing information. Because the uncertainty of a system is characterized by entropy, it is actually an optimization problem to maximize the entropy. You can refer paper “A New Paradigm for Ranking Pages on the World Wide Web” by John A. Tomlin or the report I will post later for detail.
Both PageRank and TrafficRank target on webpages. Ranking researchers and ranking webpages share some characteristics, but they are different.
Webpages are connected only by link, while the relationship between researchers are much more complicated. My goal in the following days will be characterizing the relation between researchers with the suitable network model.