AD ALTA
JOURNAL OF INTERDISCIPLINARY RESEARCH
approach through a variety of techniques such as Rocchio
algorithm, Bayesian classifier, Winnow algorithm, and cosine
similarity measure [5]
In the recommender systems, content-based approach is the
important approach among 62 tested approaches, 34 (55%)
applied the idea of content-based approach [16]. There is an
authorship relationship between users and items [17], having
papers in one’s personal collection, adding social tags [18], or
downloading, reading, and browsing papers [19].
Most of the reviewed approaches use plain words as features,
although some use n-grams, topics (words and word
combinations that occurred as social tags on CiteULike) and
concepts that were inferred from the Anthology Reference
Corpus (ACL ARC) via Latent Dirichlet Allocation [20], and
assigned to papers through machine learning. A few approaches
utilize non-textual features, and if they did then these non-textual
features were typically utilized in addition to words.
Giles et al. declared same method as words were used and
weighted the citations with the standard TF-IDF measure so-
called CC-IDF. Others used the idea of CC-IDF as a baseline.
Moreover, Beel recently developed some initial evidence that
CC-IDF might not be an ideal weighting scheme [21].
Zarrinkalam and Kahani considered authors as features and
determined similarities by the number of authors two items share
[22].
Here we can refer to some weakness of content-based approach
such as low serendipity and overspecialization, lack of quality
and popularity of items. For example, two research papers may
be considered by a content-based approach recommender
system. This relevance might not always be justified, for
example if one paper was written by an authority, while another
paper was written by a student. So a recommender system
should recommend only the first paper but a content-based
approach system would fail to do so.
Another criticism of content-based approach is limited access to
the item’s features. For research-paper recommendations,
usually PDFs must be processed and converted to text, document
fields must be identified, and features, such as terms must be
extracted. None of these tasks are trivial and they may introduce
errors into the recommendations [23].
4 Collaborative filtering approaches
The term collaborative filtering approaches was developed by
Goldberg et al (1992), who proposed that “information filtering
can be more effective when humans are involved in the filtering
process” [24]. The concept of collaborative filtering was
introduced two years later by Resnick et al. Their theory was that
users like what like-minded users like, where two users were
considered like-minded when they rated items alike. Items that
one user rated positively were recommended to the other user,
and vice versa. [25].
Collaborative filtering approaches are widely used in e-
commerce. They have been successful in many e-commerce
applications such as Amazon and Netflix. It is a popular
approaches used to reduce information overload [9]. Amazon
recommends books to their customers using the collaborative
filtering approach. A recommendation system based on
collaborative filtering recommends items to a particular user
based on the similar items that have been rated by some other
users. For example, in movie recommendation systems that are
based on the collaborative filtering approach, the system finds a
group of users that have similar preferences as a query user.
Then, the system recommends the movies that they have rated
highly in the past by those users to the target user [13].
Collaborative filtering approaches are grouped into two general
categories:
Memory-based approaches: They use the entire collection
of the rated items in order to make recommendations or
predictions.
Model-based approaches: They allow systems to learn to
recognize patterns in the data sets in order to make
recommendations or predictions.
In a memory-based approach, it is important to measure the
similarities between users or items. There are many different
similarity measures that are used to compute the similarities
between users or items [9].
In model-based approaches, classification, clustering, and
regression algorithms can be used. For example, the Bayesian
classification and K-Means clustering algorithm are used in
model based of collaborative filtering approach [8].
There are three advantages to comparison of content-based
approach, collaborative filtering approach. First; collaborative
filtering approach is content independent, second, because
humans do the ratings, collaborative filtering approach
considered real quality assessments. Finally, collaborative
filtering approach is supposed to provide serendipitous
recommendations are not based on item similarity but on user
similarity [26].
From the reviewed approaches, only 11 (18%) applied
collaborative filtering [27]. Yang et al. intended to let user’s rate
research papers, but users were “too lazy to provide ratings”
[28].
Naak et al. faced the same problem and created artificial ratings
for their evaluation [29]. This illustrates one of the main
problems that collaborative filtering approach requires user
participation, but often the motivation to participate is low. This
problem is referred to as the “cold-start” problem. If a new user
rates few or no items, the system cannot find like-minded users
and therefore cannot provide recommendations. If an item is new
in the system and has not been rated yet by at least one user, it
cannot be recommended. In a new community, no users have
rated items, so no recommendations can be made and as a result,
the incentive for users to rate items is low.
A general problem of collaborative filtering in the domain of
research-paper recommender systems is sparsity. Vellino
compared the implicit ratings on Mendeley (research papers) and
Netflix (movies), and found that sparsity on Netflix was three
orders of magnitude lower than on Mendeley [30]. This is
caused by the different ratio of users and items. In domains like
movie recommendations, there are typically few items and many
users.
There are further critiques of collaborative filtering approach.
Computing time for collaborative filtering approach tends to be
higher than for content-based approach. Collaborative filtering
approach is generally less scalable and requires more offline data
processing than content-based approach.
Torres et al. believed that collaborative filtering approach creates
similar users [31] and Sundar et al. observe that collaborative
filtering approach dictates opinions [32].
Lops criticized that collaborative filtering approach systems
cannot explain why an item is recommended except that other
users liked it. Other problem of collaborative filtering approach
is manipulation, collaborative filtering approach is based on user
opinions, and blackguards might try to manipulate ratings to
promote their products so they are recommended more often.
[33].
4.1 Limitation of collaborative filtering approaches
Collaborative filtering has the problem which is new users
entering the system. In order to make recommendations to
a user, the system needs to know the user’s preferences
from the ratings that the user makes. Since the user is new
- 260 -