<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet href="/css/rss20.xsl" type="text/xsl"?>
<rss version="2.0" xmlns:pheedo="http://www.pheedo.com/namespace/pheedo">
	<channel>
		<title>IEEE Transactions on Knowledge and Data Engineering</title>
		<link>http://www.computer.org/tkde</link>
		<description>The IEEE Transactions on Knowledge and Data Engineering is an archival journal published monthly. The information published in this Transactions is designed to inform researchers, developers, managers, strategic planners, users, and others interested in state-of-the-art and state-of-the-practice activities in the knowledge and data engineering area. We are interested in well-defined theoretical results and empirical studies that have potential impact on the acquisition, management, storage, and graceful degeneration of knowledge and data, as well as in provision of knowledge and data services. Specific topics include, but are not limited to: a) artificial intelligence techniques, including speech, voice, graphics, images, and documents; b) knowledge and data engineering tools and techniques; c) parallel and distributed processing; d) real-time distributed; e) system architectures, integration, and modeling; f) database design, modeling and management; g) query design and implementation languages; h) distributed database control; j) algorithms for data and knowledge management; k) performance evaluation of algorithms and systems; l) data communications aspects; m) system applications and experience; n) knowledge-based and expert systems; and, o) integrity, security, and fault tolerance.	</description>
		<language>en-us</language>
		<pubDate>Fri, 6 Nov 2009 11:00:03 GMT</pubDate>
		<image>
			<url>http://csdl.computer.org/common/images/logos/tkde.gif</url>
			<title>IEEE Computer Society</title>
			<description>List of recently published journal articles</description>
			<link>http://www.computer.org/tkde</link>
		</image>
		<item>
			<title>PrePrint: Dictionary-Based Compression for Long Time-Series Similarity</title>
			<link>http://www.pheedcontent.com/click.phdo?i=9144f1277e0efd43caa7bb8493423f9d</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.201</pheedo:origLink>
			<description>Long time-series datasets are common in many domains, especially scientific domains. Applications in these fields often require comparing trajectories using similarity measures. Existing methods perform well for short time-series but their evaluation cost degrades rapidly for longer time-series. In this work, we develop a new time-series similarity measure called the Dictionary Compression Score (DCS) for determining time-series similarity. We also show that this method allows us to accurately and quickly calculate similarity for both short and long time-series. We use the well known Kolmogorov Complexity in information theory and the Lempel-Ziv compression framework as a basis to calculate similarity scores. We show that off-the-shelf compressors do not fair well for computing time-series similarity. To address this problem, we developed a novel dictionary-based compression technique to compute time-series similarity. We also develop heuristics to automatically identify suitable parameters for our method, thus removing the task of parameter tuning found in other existing methods. We have extensively compared DCS with existing similarity methods for classification. Our experimental evaluation shows that for long time-series datasets, DCS is accurate, and it is also significantly faster than existing methods.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=9144f1277e0efd43caa7bb8493423f9d&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=9144f1277e0efd43caa7bb8493423f9d&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.201</guid>
		</item>
		<item>
			<title>IEEE Transactions on Knowledge and Data Engineering - December 2009 (Vol. 21, No. 12)</title>
			<link>http://opac.ieeecomputersociety.org/opac?year=2009&amp;volume=21&amp;issue=12&amp;acronym=tkde</link>
			<description>IEEE Transactions on Knowledge and Data Engineering</description>
			<guid isPermaLink="true">http://www.computer.org/portal/site/tkde/</guid>
		</item>
		<item>
			<title>PrePrint: Unsupervised Semantic Similarity Computation Between Terms Using Web Documents</title>
			<link>http://www.pheedcontent.com/click.phdo?i=8da66f5aa55741167c76ef5bc4158b35</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.193</pheedo:origLink>
			<description>In this work, web-based metrics that compute the semantic similarity between words or terms are presented and compared with the state-of-the-art. Starting from the fundamental assumption that similarity of context implies similarity of meaning, relevant web documents are downloaded via a web search engine and the contextual information of words of interest is compared (context-based similarity metrics). The proposed algorithms work automatically, do not require any human annotated knowledge resources, e.g., ontologies, and can be generalized and applied to different languages. Context-based metrics are evaluated both on the Charles-Miller dataset and on a medical term dataset. It is shown that context-based similarity metrics significantly outperform co-occurrence based metrics, in terms of correlation with human judgment, for both tasks. In addition, the proposed unsupervised context-based similarity computation algorithms are shown to be competitive with state-of- the-art supervised semantic similarity algorithms that employ language-specific knowledge resources. Specifically, context-based metrics achieve correlation scores of up to 0.88 and 0.74 for the Charles-Miller and medical datasets, respectively. The effect of stop-word filtering is also investigated for word and term similarity computation. Finally, the performance of context-based term similarity metrics is evaluated as a function of the number of web documents used and for various feature weighting schemes.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=8da66f5aa55741167c76ef5bc4158b35&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=8da66f5aa55741167c76ef5bc4158b35&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.193</guid>
		</item>
		<item>
			<title>PrePrint: Enhanced Visual Analysis for Cluster Tendency Assessment and Data Partitioning</title>
			<link>http://www.pheedcontent.com/click.phdo?i=2e4757d8bff296146cf03ce77e21d653</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.192</pheedo:origLink>
			<description>Visual methods have been widely studied and used in data cluster analysis, \textit{e.g.}, the VAT algorithm for visual analysis of cluster tendency. Given a pairwise dissimilarity matrix $\bm{D}$ of a set of $n$ objects, methods such as VAT generally represent $\bm{D}$ as an $n\times n$ image $\mathrm{I}(\tilde{\bm{D}})$ where the objects are reordered to highlight cluster structure as dark blocks along the diagonal of the image. A major limitation of such visual methods is their inability to highlight cluster structure in $\mathrm{I}(\tilde{\bm{D}})$ when $\bm{D}$ contains clusters with highly complex structure. In this paper, we address this limitation by proposing a Spectral VAT algorithm, where $\bm{D}$ is mapped to $\bm{D'}$ in an embedding space by spectral decomposition of the Laplacian matrix, and then reordered to $\bm{\tilde{D'}}$ using the VAT algorithm. We propose a strategy to automatically determine the number of clusters in $\mathrm{I}(\bm{\tilde{D'}})$, as well as a visual method for cluster formation from $\mathrm{I}(\bm{\tilde{D'}})$ based on the difference between diagonal blocks and off-diagonal blocks. In addition, we propose a sampling-based extended scheme to enable visual cluster tendency assessment and data partitioning for large data sets. Extensive experimental results on several synthetic and real-world data sets demonstrate the effectiveness of our algorithms.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=2e4757d8bff296146cf03ce77e21d653&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=2e4757d8bff296146cf03ce77e21d653&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.192</guid>
		</item>
		<item>
			<title>PrePrint: A Survey on Transfer Learning</title>
			<link>http://www.pheedcontent.com/click.phdo?i=812db68c6f048c9bc0948db126e2b379</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.191</pheedo:origLink>
			<description>A major assumption in many machine learning and data mining systems is that the data must be from the same feature representations and that the data distributions in the training and test data are the same. However, in many real-world applications, this assumption does not hold. For example, we sometimes have a classification task in one task domain, but we only have sufficient training data in another task domain where the data may be in a different feature space or follow a different distribution. In these cases, knowledge transfer, if done successfully, would greatly benefit learning in our interested domain by avoiding expensive data labeling tasks. In recent years, \emph{transfer learning} has emerged as a new technique to address this problem. This survey focuses on categorizing and reviewing the current progress on transfer learning for classification, regression and clustering problems. We discuss the relationship between transfer learning and other related research areas, such as domain adaptation, multi-task learning and sample selection bias as well as co-variate shift, and explore some potential future problems in knowledge transfer research.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=812db68c6f048c9bc0948db126e2b379&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=812db68c6f048c9bc0948db126e2b379&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.191</guid>
		</item>
		<item>
			<title>PrePrint: From t-Closeness-Like Privacy to Postrandomization via Information Theory</title>
			<link>http://www.pheedcontent.com/click.phdo?i=344a2258d848b4b9a28634bd5995a17c</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.190</pheedo:origLink>
			<description>t-Closeness is a privacy model recently defined for data anonymization. A data set is said to satisfy t-closeness if, for each group of records sharing a combination of key attributes, the distance between the distribution of a confidential attribute in the group and the distribution of the attribute in the entire data set is no more than a threshold t. Here, we define a privacy measure in terms of information theory, similar to t-closeness. Then, we use the tools of that theory to show that our privacy measure can be achieved by the postrandomization method (PRAM) for masking in the discrete case, and by a form of noise addition in the general case.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=344a2258d848b4b9a28634bd5995a17c&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=344a2258d848b4b9a28634bd5995a17c&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.190</guid>
		</item>
		<item>
			<title>PrePrint: Enriching One Taxonomy Using Another</title>
			<link>http://www.pheedcontent.com/click.phdo?i=8debb8b180ac44d9247a886853732393</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.189</pheedo:origLink>
			<description>Taxonomies, representing hierarchical data, are a key knowledge source in multiple disciplines. Information processing across taxonomies is not possible unless they are appropriately merged for commonalities and differences. For taxonomy merging the first task is to identify common concepts between the taxonomies. Then these common concepts along with their associated concepts in the two taxonomies need to be integrated. Doing this in a conflict-free manner is a challenging task and generally requires human intervention. In this paper we explore the possibility of asymmetrically merging one taxonomy into another, automatically. Given one or more source taxonomies and a destination taxonomy, modeled as directed acyclic graphs, we present intuitive algorithms that merge relevant portions of the source taxonomies into the destination taxonomy. We prove that our algorithms are conflict-free, information-lossless and scalable. We also define precision and recall measures for evaluating enriched taxonomies, such as TA, the result of merging two taxonomies, with TI, the ideal merger. Our experiments indicate the effectiveness of our approach.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=8debb8b180ac44d9247a886853732393&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=8debb8b180ac44d9247a886853732393&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.189</guid>
		</item>
		<item>
			<title>PrePrint: Heuristic Approaches for the Quartet Method of Hierarchical Clustering</title>
			<link>http://www.pheedcontent.com/click.phdo?i=09edca5a1dd9842ab2a7af291279078f</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.188</pheedo:origLink>
			<description>Given a set of objects and their pairwise distances, we wish to determine a visual representation of the data. We use the quartet paradigm to compute a hierarchy of clusters of the objects. The method is based on an NP-hard graph optimization problem called the Minimum Quartet Tree Cost problem. This paper presents and compares several heuristic approaches to approximate the optimal hierarchy. The performance of the algorithms is tested through extensive computational experiments and it is shown that the Reduced Variable Neighbourhood Search heuristic is the most effective approach to the problem, obtaining high quality solutions in short computational running times.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=09edca5a1dd9842ab2a7af291279078f&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=09edca5a1dd9842ab2a7af291279078f&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.188</guid>
		</item>
		<item>
			<title>PrePrint: Combating the Small Sample Class Imbalance Problem Using Feature Selection</title>
			<link>http://www.pheedcontent.com/click.phdo?i=202c28198047167e5a243153ee58bc60</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.187</pheedo:origLink>
			<description>Researchers have rigorously studied the resampling, algorithms, and feature selection approaches to the class imbalance problem. No systematic studies have been conducted to understand how well these methods combat the class imbalance problem and which of these methods best manage the different challenges posed by imbalanced data sets. In particular, feature selection has rarely been studied outside of text classification problems. Additionally, no studies have looked at the additional problem of learning from small samples. This paper presents a first systematic comparison of the three types of methods and of seven feature selection metrics evaluated on small sample data sets from different applications. We evaluated the performance of these metrics using area under the receiver operating characteristic and area under the precision-recall curve. We compared each metric on the average performance across all problems and on the likelihood of a metric yielding the best performance on a specific problem. We examined the performance of these metrics inside each problem domain. Finally, we evaluated the efficacy of these metrics to see which perform best across algorithms. Our results showed that signal-to-noise ratio and Feature Assessment by Sliding Thresholds are great candidates for feature selection in most applications, especially when selecting very small numbers of features.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=202c28198047167e5a243153ee58bc60&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=202c28198047167e5a243153ee58bc60&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.187</guid>
		</item>
		<item>
			<title>PrePrint: Decision Trees for Uncertain Data</title>
			<link>http://www.pheedcontent.com/click.phdo?i=a848ca33a91c347501d1cbf506773db4</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.175</pheedo:origLink>
			<description>Traditional decision tree classifiers work with data whose values are known and precise. We extend such classifiers to handle data with uncertain information. Value uncertainty arises in many applications during the data collection process. Example sources of uncertainty include measurement/quantisation errors, data staleness, and multiple repeated measurements. With uncertainty, the value of a data item is often represented not by one single value, but by multiple values forming a probability distribution. Rather than abstracting uncertain data by statistical derivatives (such as mean and median), we discover that the accuracy of a decision tree classifier can be much improved if the "complete information" of a data item that takes into account the probability density function (pdf) of that item's value is utilised. We extend classical decision tree building algorithms to handle data tuples with uncertain values. Extensive experiments have been conducted that show that the resulting classifiers are more accurate than those using value averages. Since processing pdf's is computationally more costly than processing single values (e.g., averages), decision tree construction on uncertain data is more CPU demanding than that for certain data. To tackle this problem, we propose a series of pruning techniques that can greatly improve construction efficiency.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=a848ca33a91c347501d1cbf506773db4&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=a848ca33a91c347501d1cbf506773db4&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.175</guid>
		</item>
		<item>
			<title>PrePrint: An Efficient Concept-Based Mining Model for Enhancing Text Clustering</title>
			<link>http://www.pheedcontent.com/click.phdo?i=cf535a21e7a30a1d3150d126a974dcc3</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.174</pheedo:origLink>
			<description>Most of the common techniques in text mining are based on the statistical analysis of a term either word or phrase. Statistical analysis of a term frequency captures the importance of the term within a document only. However, two terms can have the same frequency in their documents, but one term contributes more to the meaning of its sentences than the other term. Thus, the underlying text mining model should indicate terms that capture the semantics of text. In this case, the mining model can capture terms that present the concepts of the sentence, which leads to discover the topic of the document. A new concept-based mining model that analyzes terms on the sentence, document, and corpus levels is introduced. The concept-based mining model can effectively discriminate between non-important terms with respect to sentence semantics and terms which hold the concepts that represent the sentence meaning. The proposed mining model consists of sentence-based concept analysis, document-based concept analysis, corpus-based concept-analysis, and concept-based similarity measure. The term which contributes to the sentence semantics is analyzed on the sentence, document, and corpus levels rather than the traditional analysis of the document only. The proposed model can efficiently find significant matching concepts between documents according to the semantics of their sentences. The similarity between documents is calculated based on a new concept-based similarity measure. The proposed similarity measure takes full advantage of using the concept analysis measures on the sentence, document, and corpus levels in calculating the similarity between documents. Large sets of experiments using the proposed concept-based mining model on different datasets in text clustering are conducted. The experiments demonstrate extensive comparison between the concept-based analysis and the traditional analysis. Experimental results demonstrate the substantial enhancement of the clustering quality using the sentence-based, document-based, corpus-based and combined approach concept analysis.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=cf535a21e7a30a1d3150d126a974dcc3&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=cf535a21e7a30a1d3150d126a974dcc3&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.174</guid>
		</item>
		<item>
			<title>PrePrint: Personalizing Web Directories with the Aid of Web Usage Data</title>
			<link>http://www.pheedcontent.com/click.phdo?i=6ebe0ac265f17d79e572b51f9b57dae7</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.173</pheedo:origLink>
			<description>This paper presents a knowledge discovery framework for the construction of Community Web Directories, a concept that we introduced in our recent work, applying personalization to Web directories. In this context, the Web directory is viewed as a thematic hierarchy and personalization is realized by constructing user community models on the basis of usage data. In contrast to most of the work on Web usage mining, the usage data that are analyzed here correspond to user navigation throughout the Web, rather than a particular Web site, exhibiting as a result a high degree of thematic diversity. For modeling the user communities, we introduce a novel methodology that combines the users&amp;#x2019; browsing behavior with thematic information from the Web directories. Following this methodology we enhance the clustering and probabilistic approaches presented in previous work and we also present a new algorithm that combines these two approaches. The resulting community models take the form of Community Web Directories. The proposed personalization methodology is evaluated both on a specialized artificial and a general-purpose Web directory, indicating its potential value to the Web user. The experiments also assess the effectiveness of the different machine learning techniques on the task.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=6ebe0ac265f17d79e572b51f9b57dae7&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=6ebe0ac265f17d79e572b51f9b57dae7&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.173</guid>
		</item>
		<item>
			<title>PrePrint: Nearest Surrounder Queries</title>
			<link>http://www.pheedcontent.com/click.phdo?i=63e57cd56bc2a95d1f2a29039d2f0531</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.172</pheedo:origLink>
			<description>In this paper, we present a new type of spatial query called Nearest Surrounder (NS) query. An NS query searches the nearest polygon-shaped spatial objects (referred to as nearest surrounder (NS) objects) for consecutive ranges of angles around a specified query point. With additional angular information provided with NS objects, an NS query is more informative than many other spatial queries. We derive two NS query variants, namely, multi-tier NS (m-NS) query and angle-constrained NS (ANS) query. An m-NS query searches multiple layer of NS objects for the same range of angles from a query point. An ANS query searches NS objects within a specified range of angles. To evaluate NS queries and their variants, we explore anglebased and distance-based bound properties of polygons. Based on these properties, we devise two efficient algorithms, namely, Sweep and Ripple. They access objects in an order according to their orientations and distances to the query point, respectively, based on R-tree. They can also finish a search with at most one index lookup and progressively deliver a query result. Through empirical studies, we evaluate the proposed algorithms and report their performance for both synthetic and real object sets.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=63e57cd56bc2a95d1f2a29039d2f0531&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=63e57cd56bc2a95d1f2a29039d2f0531&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.172</guid>
		</item>
		<item>
			<title>PrePrint: Adaptive Subspace Symbolization for Content-Based Video Search</title>
			<link>http://www.pheedcontent.com/click.phdo?i=4ff29159fceb6873b1cdf7dec02d6794</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.171</pheedo:origLink>
			<description>Efficiently and effectively searching for similar videos is an important and non-trivial problem in content-based video search systems. In this paper, we propose a subspace symbolization approach, namely SUDS, for content-based search on very large video databases. The novelty of SUDS is that it explores the data distribution in subspaces to build a visual dictionary with which the video data are processed by deriving the string matching techniques with two-step data simplification. Specifically, we first propose an adaptive approach, called VLP, to divide the whole visual feature space into a series of subspaces of variable lengths, from which the dominant ones are selected. By clustering the video keyframes over each dominant subspace, a stable visual dictionary is built and a compact video representation model is eveloped by transforming each keyframe into a word that is a series of symbols in the dominant subspaces, and further each video into a sequence of words. Then, we present an innovative similarity measure called CVE, which adopts a complementary information compensation scheme based on the visual features and sequence ontext of videos. Finally, an efficient two-layered index strategy with a number of query optimizations is proposed to facilitate video search. The experimental results demonstrate the high effectiveness and efficiency of SUDS.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=4ff29159fceb6873b1cdf7dec02d6794&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=4ff29159fceb6873b1cdf7dec02d6794&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.171</guid>
		</item>
		<item>
			<title>PrePrint: Kernel Discriminant Learning for Ordinal Regression</title>
			<link>http://www.pheedcontent.com/click.phdo?i=05ee3d3663cd722151a9ce87f22132e2</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.170</pheedo:origLink>
			<description>Ordinal regression has wide applications in many domains where the human evaluation plays a major role. Most current ordinal regression methods are based on Support Vector Machines (SVM) and suffer from the problems of ignoring the global information of the data and high computational complexity. On the other hand, although Linear Discriminant Analysis (LDA) and its kernel version, Kernel Discriminant Analysis (KDA) takes consideration of the global information of the data as well as the distribution of the classes and its performance has been proved in classification, it fails to be used for solving ordinal regression problems because ordinal information of the data can not be unutilized. To solve this problem, in this paper, we propose a novel regression approach by extending the Kernel Discriminant Learning using a rank constraint. The proposed algorithm is very efficient since the computational complexity is linear to the data size. We demonstrate experimentally that the proposed method is capable to preserve the rank of data classes in a projected data space. In comparison to several ordinal regression methods, our method is more efficient and is competitive with them in accuracy.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=05ee3d3663cd722151a9ce87f22132e2&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=05ee3d3663cd722151a9ce87f22132e2&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.170</guid>
		</item>
		<item>
			<title>PrePrint: Non-negative Matrix Factorization for Semi-supervised Heterogeneous Data Co-clustering</title>
			<link>http://www.pheedcontent.com/click.phdo?i=dd093824b7d3c78d4f1b0d861a2bb5e7</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.169</pheedo:origLink>
			<description>Co-clustering heterogeneous data has attracted extensive attention recently due to its high impact on various important applications, such us text mining, image retrieval and bioinformatics. However, data co-clustering without any prior knowledge or background information is still a challenging problem. In this paper, we propose a Semi-Supervised Non-negative Matrix Factorization (SS-NMF) framework for data co-clustering. Specifically, our method computes new relational matrices by incorporating user provided constraints through simultaneous distance metric learning and modality selection. Using an iterative algorithm, we then perform tri-factorizations of the new matrices to infer the clusters of different data types and their correspondence. Theoretically, we prove the convergence and correctness of SS-NMF co-clustering. In addition, we show that our framework provides a unified view for data co-clustering and has several advantages over existing approaches. Through extensive experiments conducted on publicly available text, gene expression, and image data sets, we demonstrate the superior performance of SS-NMF for heterogeneous data co-clustering.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=dd093824b7d3c78d4f1b0d861a2bb5e7&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=dd093824b7d3c78d4f1b0d861a2bb5e7&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.169</guid>
		</item>
		<item>
			<title>PrePrint: The Context and the SitBAC Models for Privacy Preservation - An Experimental Comparison of Model Comprehension and Synthesis</title>
			<link>http://www.pheedcontent.com/click.phdo?i=68491d35fabfcca26dbfeac121490c51</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.161</pheedo:origLink>
			<description>Situation-Based Access Control (SitBAC) is a conceptual model for representing access-control policies of healthcare organizations by characterizing situations of access to patient data. The SitBAC model enables formal representation of access situations as an ontology of concepts (Patient, Data-Requestor, EHR, Task, and Response), along with their attributes and relationships. A competing access-control model is the Contextual Role-Base Access Control (Context) model. The Context model uses logical expressions (rules) that specify contextual authorizations (i.e., characteristics of access requests that are available at access time). Open questions that relate to formal representation of scenarios involving access to patient data are: 1) which of the two models yields a formal representation that is easier to comprehend; 2) which of the two models facilitates the synthesis of correct models, and how does the task complexity affect performance of comprehension and synthesis. In this study, we address these questions through a controlled experiment. The results of the experiment suggest that while there are no differences between the two models when it comes to comprehending or synthesizing simple scenarios of data access, for complex scenarios there is a significant advantage to the SitBAC model, in terms of both comprehension and synthesis.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=68491d35fabfcca26dbfeac121490c51&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=68491d35fabfcca26dbfeac121490c51&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.161</guid>
		</item>
		<item>
			<title>PrePrint: Managing Multi-Dimensional Historical Aggregate Data in Unstructured P2P Systems</title>
			<link>http://www.pheedcontent.com/click.phdo?i=a36c38e2bce9658ddc03290e8451c9f9</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.160</pheedo:origLink>
			<description>A P2P-based framework supporting the extraction of aggregates from historical multi-dimensional data is proposed, which provides efficient and robust query evaluation. When a data population is published, data are summarized in a synopsis, consisting of an index built on top of a set of sub-synopses (storing compressed representations of distinct data portions). The index and the sub-synopses are distributed across the network, and suitable replication mechanisms taking into account the query workload and network conditions are employed that provide the appropriate coverage for both the index and the sub-synopses.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=a36c38e2bce9658ddc03290e8451c9f9&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=a36c38e2bce9658ddc03290e8451c9f9&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.160</guid>
		</item>
		<item>
			<title>PrePrint: Completely Lazy Learning</title>
			<link>http://www.pheedcontent.com/click.phdo?i=255eb0f67927e20fd762576f21b0f5c0</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.159</pheedo:origLink>
			<description>Local classifiers are sometimes called lazy learners because they do not train a classifier until presented with a test sample. However, such methods are generally not completely lazy, because the neighborhood size k (or other locality parameter) is usually chosen by cross-validation on the training set, which can require significant preprocessing and risks overfitting. We propose a simple alternative to cross-validation of the neighborhood size that requires no pre-processing: instead of committing to one neighborhood size, average the discriminants for multiple neighborhoods. We show that this forms an expected estimated posterior that minimizes the expected Bregman loss with respect to the uncertainty about the neighborhood choice. We analyze this approach for six standard and state-of-the-art local classifiers, including discriminative adaptive metric kNN (DANN), a local support vector machine (SVM-KNN), hyperplane distance nearest-neighbor (HKNN) and a new local Bayesian quadratic discriminant analysis. The empirical effectiveness of this technique vs. cross-validation is validated with experiments on several benchmark data sets. Experiments with seven benchmark datasets show that the same classification performance is attained as cross-validation without any training.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=255eb0f67927e20fd762576f21b0f5c0&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=255eb0f67927e20fd762576f21b0f5c0&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.159</guid>
		</item>
		<item>
			<title>PrePrint: Incremental Evaluation of Visible Nearest Neighbor Queries</title>
			<link>http://www.pheedcontent.com/click.phdo?i=86920e24cf70c111bbf9d3b205c9624a</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.158</pheedo:origLink>
			<description>In many applications involving spatial objects, we are only interested in objects that are directly visible from query points. In this article, we formulate the visible k nearest neighbor (VkNN) query and present incremental algorithms as a solution, with two variants differing in how to prune objects during the search process. One variant applies visibility pruning to only objects, whereas the other variant applies visibility pruning to index nodes as well. Our experimental results show that the latter outperforms the former. We further propose the aggregate VkNN query, which finds the visible k nearest objects to a set of query points based on an aggregate distance function. We also propose two approaches to processing the aggregate VkNN query. One accesses the database via multiple VkNN queries, whereas the other issues an aggregate k nearest neighbor query to retrieve objects from the database and then re-rank the results based on the aggregate visible distance metric. With extensive experiments, we show that the latter approach consistently outperforms the former one.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=86920e24cf70c111bbf9d3b205c9624a&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=86920e24cf70c111bbf9d3b205c9624a&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.158</guid>
		</item>
		<item>
			<title>PrePrint: Iso-Map: Energy-Efficient Contour Mapping in Wireless Sensor Networks</title>
			<link>http://www.pheedcontent.com/click.phdo?i=c579acd6a080da42933adac81c69aeab</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.157</pheedo:origLink>
			<description>Contour mapping is a crucial part of many wireless sensor network applications. Many efforts have been made to avoid collecting data from all  the sensors in the network and producing maps at the sink, which is proven to be inefficient. The existing approaches (often aggregation based), however,  suffer from heavy transmission traffic and incur large computational overheads on each sensor node. We propose Iso-Map, an energy-efficient protocol for  contour mapping, which builds contour maps based solely on the reports collected from intelligently selected &amp;#x201C;isoline nodes&amp;#x201D; in wireless  sensor networks. Iso-Map achieves high-quality contour mapping while significantly reducing the generated traffic from O(n) to O(&amp;#x221A;n), where n is  the total number of sensor nodes in the field. The per-node computation overhead is also restrained as a constant. We conduct comprehensive  trace-driven simulations to verify this protocol, and demon-strate that Iso-Map outperforms the previous approaches in the sense that it produces contour  maps of high fidelity with significantly reduced energy cost.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=c579acd6a080da42933adac81c69aeab&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=c579acd6a080da42933adac81c69aeab&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.157</guid>
		</item>
		<item>
			<title>PrePrint: The Impact of Diversity on On-line Ensemble Learning in the Presence of Concept Drift</title>
			<link>http://www.pheedcontent.com/click.phdo?i=1036e09ccea979504aafb6d421df054e</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.156</pheedo:origLink>
			<description>On-line learning algorithms often have to operate in the presence of concept drift (i.e., the concepts to be learnt can change with time). This paper presents a new categorization for concept drift, separating drifts according to different criteria into mutually exclusive and non-heterogeneous categories. Moreover, although ensembles of learning machines have been used to learn in the presence of concept drift, there has been no deep study of why they can be helpful for that and which of their features can contribute or not for that. As diversity is one of these features, we present a diversity analysis in the presence of different types of drift. We show that, before the drift, ensembles with less diversity obtain lower test errors. On the other hand, it is a good strategy to maintain highly diverse ensembles to obtain lower test errors shortly after the drift independent on the type of drift, even though high diversity is more important for more severe drifts. Longer after the drift, high diversity becomes less important. Diversity by itself can help to reduce the initial increase in error caused by a drift, but does not provide a faster recovery from drifts in long term.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=1036e09ccea979504aafb6d421df054e&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=1036e09ccea979504aafb6d421df054e&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.156</guid>
		</item>
		<item>
			<title>PrePrint: Closing the Loop in Webpage Understanding</title>
			<link>http://www.pheedcontent.com/click.phdo?i=b5a1ece99aa4d846953ca810cea65089</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.155</pheedo:origLink>
			<description>Two most important tasks in information extraction from the Web are web page structure understanding and natural language sentences processing. However, little work has been done towards an integrated statistical model for understanding web page structures and processing natural language sentences within the HTML elements. Our recent work on web page understanding introduces a joint model of Hierarchical Conditional Random Fields (i.e. HCRF) and extended Semi-Markov Conditional Random Fields (i.e. Semi-CRF) to leverage the page structure understanding results in free text segmentation and labeling. In this top-down integration model, the decision of the HCRF model could guide the decision-making of the Semi-CRF model. However, the drawback of the top-down integration strategy is also apparent, i.e., the decision of the Semi-CRF model could not be used by the HCRF model to guide its decision-making. This paper proposed a novel framework called WebNLP which enables bidirectional integration of page structure understanding and text understanding in an iterative manner. We have applied the proposed framework to local business entity extraction and Chinese person and organization name extraction. Experiments show that the WebNLP framework achieved significantly better performance than existing methods.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=b5a1ece99aa4d846953ca810cea65089&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=b5a1ece99aa4d846953ca810cea65089&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.155</guid>
		</item>
		<item>
			<title>PrePrint: Automatic Ontology Matching Via Upper Ontologies: A Systematic Evaluation</title>
			<link>http://www.pheedcontent.com/click.phdo?i=138e882134b7a3751c774479229ecb27</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.154</pheedo:origLink>
			<description>&amp;#x201C;Ontology matching&amp;#x201D; is the process of finding correspondences between entities belonging to different ontologies. This paper describes a set of algorithms that exploit upper ontologies as semantic bridges in the ontology matching process and presents a systematic analysis of the relationships among features of matched ontologies (number of simple and composite concepts, stems, concepts at the top level, common English suffixes and prefixes, ontology depth), matching algorithms, used upper ontologies, and experiment results. This analysis allowed us to state under which circumstances the exploitation of upper ontologies gives significant advantages with respect to traditional approaches that do no use them. We run experiments with SUMO-OWL (a restricted version of SUMO), OpenCyc and DOLCE. The experiments demonstrate that when our &amp;#x201C;structural matching method via upper ontology&amp;#x201D; uses an upper ontology large enough (OpenCyc, SUMO-OWL), the recall is significantly improved while preserving the precision obtained without upper ontologies. Instead, our &amp;#x201C;non structural matching method&amp;#x201D; via OpenCyc and SUMO-OWL improves the precision and maintains the recall. The &amp;#x201C;mixed method&amp;#x201D; that combines the results of structural alignment without using upper ontologies and structural alignment via upper ontologies improves the recall and maintains the F-measure independently of the used upper ontology.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=138e882134b7a3751c774479229ecb27&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=138e882134b7a3751c774479229ecb27&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.154</guid>
		</item>
		<item>
			<title>PrePrint: Privacy-Preserving Gradient Descent Methods</title>
			<link>http://www.pheedcontent.com/click.phdo?i=6bef6076bebecd2d55c4916aa94aa2b4</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.153</pheedo:origLink>
			<description>Gradient descent is a widely used paradigm for solving many optimization problems. Gradient descent aims to minimize a target function in order to reach a local minimum. In machine learning or data mining, this function corresponds to a decision model that is to be discovered. In this paper, we propose a preliminary formulation of gradient descent with data privacy preservation. We present two approaches&amp;#x2014;stochastic approach and least square approach&amp;#x2014;under different assumptions. Four protocols are proposed for the two approaches incorporating various secure building blocks for both horizontally and vertically partitioned data. We conduct experiments to evaluate the scalability of the proposed secure building blocks and the accuracy and efficiency of the protocols for four different scenarios. The excremental results show the proposed secure building blocks are scalable and the proposed protocols allows us to determine a better secure protocol for the applications for each scenario.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=6bef6076bebecd2d55c4916aa94aa2b4&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=6bef6076bebecd2d55c4916aa94aa2b4&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.153</guid>
		</item>
		<item>
			<title>PrePrint: Mining Predictive k-CNF Expressions</title>
			<link>http://www.pheedcontent.com/click.phdo?i=8be77969eb9991024f13ddcaa50b258e</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.152</pheedo:origLink>
			<description>We adapt Mitchell's version space algorithm for mining k-CNF formulae. Advantages of this algorithm are that it runs in a single pass over the data, is conceptually simple, can be used for missing value prediction, and has interesting theoretical properties, while an empirical evaluation on classification tasks yields competitive predictive results.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=8be77969eb9991024f13ddcaa50b258e&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=8be77969eb9991024f13ddcaa50b258e&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.152</guid>
		</item>
		<item>
			<title>PrePrint: Parallelizing Itinerary-Based KNN Query Processing in Wireless Sensor Networks</title>
			<link>http://www.pheedcontent.com/click.phdo?i=c15d11e2abe7c1100c5dae594aa27d9b</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.146</pheedo:origLink>
			<description>Wireless sensor networks have been proposed for facilitating various monitoring applications (e.g., environmental monitoring and military surveillance) over a wide geographical region. In these applications, spatial queries that collect data from wireless sensor networks play an important role. One such query is the K Nearest Neighbor (KNN) query that facilitates collection of sensor data samples based on a given query location and the number of samples specified (i.e., K). Recently, itinerary-based KNN query processing techniques, that propagate queries and collect data along a pre-determined itinerary, have been developed. Prior studies demonstrate that itinerary-based KNN query processing algorithms are able to achieve better energy efficiency than other existing algorithms developed upon tree-based network infrastructures. However, how to derive itineraries for KNN query based on different performance requirements remains a challenging problem. In this paper, we propose a Parallel Concentric-circle Itinerary-based KNN (PCIKNN) query processing technique that derives different itineraries by optimizing either query latency or energy consumption. The performance of PCIKNN is analyzed mathematically and evaluated through extensive experiments. Experimental results show that PCIKNN outperforms the state-of-the-art techniques.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=c15d11e2abe7c1100c5dae594aa27d9b&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=c15d11e2abe7c1100c5dae594aa27d9b&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.146</guid>
		</item>
		<item>
			<title>PrePrint: Multimodal Fusion for Video Search Reranking</title>
			<link>http://www.pheedcontent.com/click.phdo?i=e6450aa1731f8a574841114ea596e1cd</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.145</pheedo:origLink>
			<description>Analysis on click-through data from a very large search engine log shows that users are usually interested in the top-ranked portion of returned search results. Therefore, it is crucial for search engines to achieve high accuracy on the top-ranked documents. While many methods exist for boosting video search performance, they either pay less attention to the above factor or encounter difficulties in practical applications. In this paper, we present a flexible and effective reranking method, called CR-Reranking, to improve the retrieval effectiveness. To offer high accuracy on the top-ranked results, CR-Reranking employs a cross-reference (CR) strategy to fuse multimodal cues. Specifically, multimodal features are first utilized separately to rerank the initial returned results at the cluster level, and then all the ranked clusters from different modalities are cooperatively used to infer the shots with high relevance. Experimental results show that the search quality, especially on the top-ranked results, is improved significantly.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=e6450aa1731f8a574841114ea596e1cd&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=e6450aa1731f8a574841114ea596e1cd&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.145</guid>
		</item>
		<item>
			<title>PrePrint: Deriving Concept-Based User Profiles from Search Engine Logs</title>
			<link>http://www.pheedcontent.com/click.phdo?i=27aff7bcfecd2a9be5a75f0efaa1265b</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.144</pheedo:origLink>
			<description>User profiling is a fundamental component of any personalization applications. Most existing user profiling strategies are based on objects that users are interested in (i.e. positive preferences), while ignoring the objects that users dislike (i.e. negative preferences). In this paper, we focus on search engine personalization and develop several concept-based user profiling methods that are based on both positive and negative preferences.We evaluate the proposed methods against our previously proposed personalized query clustering method. Experimental results show that profiles which capture and utilize both the user's positive and negative preferences perform the best. An important result from the experiments is that profiles with negative preferences can increase the separation between similar and dissimilar queries. The separation provides a clear threshold for an agglomerative clustering algorithm to terminate and improve the overall quality of the resulting query clusters.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=27aff7bcfecd2a9be5a75f0efaa1265b&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=27aff7bcfecd2a9be5a75f0efaa1265b&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.144</guid>
		</item>
		<item>
			<title>PrePrint: Flexible Frameworks for Actionable Knowledge Discovery</title>
			<link>http://www.pheedcontent.com/click.phdo?i=6ee8e04c11e51b57c7ef7a3429663caf</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.143</pheedo:origLink>
			<description>Most data mining algorithms and tools stop at the mining and delivery of patterns satisfying expected technical interestingness. There are often many patterns mined but business people either are not interested in them or do not know what follow-up actions to take to support their business decisions. This issue has seriously affected the widespread employment of advanced data mining techniques in greatly promoting enterprise operational quality and productivity. In this paper, we present a formal view of actionable knowledge discovery (AKD) from the system and decision-making perspectives. AKD is a closed optimization problem-solving process from problem definition, framework/model design to actionable pattern discovery, and is designed to deliver operable business rules that can be seamlessly associated or integrated with business processes and systems. To support such processes, we correspondingly propose, formalize and illustrate four types of generic AKD frameworks: Post-Analysis-based AKD, Unified Interestingness-based AKD, Combined Mining-based AKD and Multi-Source Combined Mining-based AKD (MSCM-AKD). A real-life case study of MSCM-based AKD is demonstrated to extract debt prevention patterns from social security data. Substantial experiments show that the proposed frameworks are sufficiently general, flexible and practical to tackle many complex problems and applications by extracting actionable deliverables for instant decision-making.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=6ee8e04c11e51b57c7ef7a3429663caf&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=6ee8e04c11e51b57c7ef7a3429663caf&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.143</guid>
		</item>
		<item>
			<title>PrePrint: Conic Programming for Multi-Task Learning</title>
			<link>http://www.pheedcontent.com/click.phdo?i=4ff0566b435aa6ebc4bc04281c74e3a0</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.142</pheedo:origLink>
			<description>Abstract.When we have several related tasks, solving them simultaneously has been shown to be more effective than solving them individually. This approach is called multi-task learning (MTL). In this paper, we propose a novel MTL algorithm. Our method controls the relatedness among the tasks locally, so all pairs of related tasks are guaranteed to have similar solutions. We apply the above idea to support vector machines and show that the optimization problem can be cast as a second-order cone program, which is convex and can be solved efficiently. The usefulness of our approach is demonstrated in ordinal regression, link prediction and collaborative filtering, each of which can be formulated as a structured multi-task problem.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=4ff0566b435aa6ebc4bc04281c74e3a0&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=4ff0566b435aa6ebc4bc04281c74e3a0&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.142</guid>
		</item>
		<item>
			<title>PrePrint: Performance Comparison of the R*-tree and the Quadtree for kNN and Distance Join Queries</title>
			<link>http://www.pheedcontent.com/click.phdo?i=acbbcbf094dc6fd3e55e3a1487889879</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.141</pheedo:origLink>
			<description>Multi-dimensional point indexing plays a critical role in a variety of data-centric applications, ranging from image retrieval, sequence matching, to moving object search. A common choice of indexing method for these applications is often the "ubiquitous" R*-tree. Choosing the right indexing method requires careful consideration of various factors such as query operations and index construction methods. In this work, we present an experimental study comparing the R*-tree and Quadtree using various criteria including the query operations and index construction methods. Although a variety of query operations can be performed using these index structures, previous work has largely focused only on the range search operation. We go beyond this previous work and compare the performance of these index structures using k-nearest neighbor (kNN) and distance join queries. In addition, we also consider the impact of index construction methods in evaluating these index structures. Our study sheds light on how the choice of the underlying index structure affects the performance of different query operations, and shows that the method used for constructing the index and the dynamic nature of the dataset has a dramatic impact on the performance of these index structures.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=acbbcbf094dc6fd3e55e3a1487889879&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=acbbcbf094dc6fd3e55e3a1487889879&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.141</guid>
		</item>
		<item>
			<title>PrePrint: Dynamic Dissimilarity Measure for Support-Based Clustering</title>
			<link>http://www.pheedcontent.com/click.phdo?i=1a0a5a535a9d0860c0beb91e1ee0e689</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.140</pheedo:origLink>
			<description>Clustering methods utilizing support estimates of a data distribution have recently attracted much attention because of their ability to generate cluster boundaries of arbitrary shape and to deal with outliers efficiently. In this paper, we propose a novel dissimilarity measure based on a dynamical system associated with support estimating functions. Theoretical foundations of the proposed measure are developed and applied to construct a clustering method that can effectively partition the whole data space. Simulation results demonstrate that clustering based on the proposed dissimilarity measure is robust to the choice of kernel parameters and able to control the number of clusters efficiently.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=1a0a5a535a9d0860c0beb91e1ee0e689&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=1a0a5a535a9d0860c0beb91e1ee0e689&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.140</guid>
		</item>
		<item>
			<title>PrePrint: Closeness: A New Privacy Measure for Data Publishing</title>
			<link>http://www.pheedcontent.com/click.phdo?i=8e230014bbf88f47c291a8d7ec323663</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.139</pheedo:origLink>
			<description>The $k$-anonymity privacy requirement for publishing microdata requires that each equivalence class (i.e., a set of records that are indistinguishable from each other with respect to certain "identifying" attributes) contains at least $k$ records. Recently, several authors have recognized that $k$-anonymity cannot prevent attribute disclosure. The notion of $\ell$-diversity has been proposed to address this; $\ell$-diversity requires that each equivalence class has at least $\ell$ well-represented values for each sensitive attribute. In this article, we show that $\ell$-diversity has a number of limitations. In particular, it is neither necessary nor sufficient to prevent attribute disclosure. Motivated by these limitations, we propose a new notion of privacy called "closeness". We first present the base model $t$-closeness, which requires that the distribution of a sensitive attribute in any equivalence class is close to the distribution of the attribute in the overall table (i.e., the distance between the two distributions should be no more than a threshold $t$). We then propose a more flexible privacy model called $(n,t)$-closeness that offers higher utility. We describe our desiderata for designing a distance measure between two probability distributions and present two distance measures. We discuss the rationale for using closeness as a privacy measure and illustrate its advantages through examples and experiments.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=8e230014bbf88f47c291a8d7ec323663&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=8e230014bbf88f47c291a8d7ec323663&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.139</guid>
		</item>
		<item>
			<title>PrePrint: Credibility: How Agents Can Handle Unfair Third-party Testimonies in Computational Trust Models</title>
			<link>http://www.pheedcontent.com/click.phdo?i=3098167591390a821551668d78453710</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.138</pheedo:origLink>
			<description>Usually, agents within multi-agent systems represent different stakeholders that have their own distinct and sometimes conflicting interests and objectives. They would behave in such a way so as to achieve their own objectives, even at the cost of others. Therefore, there are risks in interacting with other agents. A number of computational trust models have been proposed to manage such risk. However, the performance of most computational trust models that rely on third-party recommendations as part of the mechanism to derive trust is easily deteriorated by the presence of unfair testimonies. There have been several attempts to combat the influence of unfair testimonies. Nevertheless, they are either not readily applicable since they require additional information which is not available in realistic settings, or ad-hoc as they are tightly coupled with specific trust models. Against this background, a general credibility model is proposed in this paper. Empirical studies have shown that the proposed credibility model is more effective than related work in mitigating the adverse influence of unfair testimonies.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=3098167591390a821551668d78453710&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=3098167591390a821551668d78453710&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.138</guid>
		</item>
		<item>
			<title>PrePrint: Superseding Nearest Neighbor Search on Uncertain Spatial Databases</title>
			<link>http://www.pheedcontent.com/click.phdo?i=7519abe8828595d1951e0b2870301c36</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.137</pheedo:origLink>
			<description>This paper proposes a new problem, called {\em superseding nearest neighbor search}, on uncertain spatial databases, where each object is described by a multidimensional probability density function. Given a query point $q$, an object is a {\em nearest neighbor} (NN) {\em candidate} if it has a non-zero probability to be the NN of $q$. Given two NN candidates $o_1$ and $o_2$, $o_1$ {\em supersedes} $o_2$ if $o_1$ is more likely to be closer to $q$. An object is a {\em superseding nearest neighbor} (SNN) of $q$, if it supersedes all the other NN-candidates. Sometimes no object is able to supersede every other NN candidate. In this case, we return the {\em SNN-core} &#x2014; the {\em minimum} set of NN-candidates {\em each of which} supersedes {\em all} the NN-candidates outside the SNN-core. Intuitively, the SNN-core contains the best objects, because any object outside the SNN-core is worse than {\em all} the objects in the SNN-core. We show that the SNN-core can be efficiently computed by utilizing a conventional multidimensional index, as confirmed by extensive experiments.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=7519abe8828595d1951e0b2870301c36&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=7519abe8828595d1951e0b2870301c36&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.137</guid>
		</item>
		<item>
			<title>PrePrint: Aging Bloom Filter with Two Active Buffers for Dynamic Sets</title>
			<link>http://www.pheedcontent.com/click.phdo?i=569ff154c3df981822b3e31d38e64a2d</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.136</pheedo:origLink>
			<description>A Bloom filter is a simple but powerful data structure that can check membership to a static set. As Bloom filters become more popular for network applications, a membership query for a dynamic set is also required. Some network applications require high-speed processing of packets. For this purpose, Bloom filters should reside in a fast and small memory, SRAM. In this case, due to the limited memory size, stale data in the Bloom filter should be deleted to make space for new data. Namely, the Bloom filter needs aging like LRU caching. In this paper, we propose a new aging scheme for Bloom filters. The proposed scheme utilizes the memory space more efficiently than double buffering, the current state-of-the-art. We prove theoretically that the proposed scheme outperforms double buffering. We also perform experiments on real Internet traces to verify the effectiveness of the proposed scheme.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=569ff154c3df981822b3e31d38e64a2d&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=569ff154c3df981822b3e31d38e64a2d&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.136</guid>
		</item>
		<item>
			<title>PrePrint: An UpDown Directed Acyclic Graph Approach for Sequential Pattern Mining</title>
			<link>http://www.pheedcontent.com/click.phdo?i=7793771ca6e376f66ae920b72fc34d24</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.135</pheedo:origLink>
			<description>Traditional pattern-growth based approaches for sequential pattern mining derive length-(k+1) patterns based on the projected databases of length-k patterns recursively. At each level of recursion, they grow the length of detected patterns by 1 uni-directionally along the suffix of detected patterns, which needs k levels of recursion to find a length-k pattern. In this paper a novel data structure, UpDown Directed Acyclic Graph (UDDAG), is invented for efficient sequential pattern mining. UDDAG allows bidirectional pattern growth along both ends of detected patterns. Thus a length-k pattern can be detected in &#x230A;log&#x2082;k+1&#x230B; levels of recursion at best, which results in fewer levels of recursion and faster pattern growth. When minSup is large such that the average pattern length is close to 1, UDDAG and PrefixSpan have similar performance because the problem degrades into frequent item counting problem. However, UDDAG scales up much better. It often outperforms PrefixSpan by almost one order of magnitude in scalability tests. UDDAG is also considerably faster than Spade and LapinSpam. Except for extreme cases, UDDAG uses comparable memory to that of PrefixSpan and less memory than Spade and LapinSpam. Additionally, the special feature of UDDAG enables its extension toward applications involving searching in large spaces.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=7793771ca6e376f66ae920b72fc34d24&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=7793771ca6e376f66ae920b72fc34d24&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.135</guid>
		</item>
		<item>
			<title>PrePrint: An Information Theoretic Foundation for the Measurement of Discrimination Information</title>
			<link>http://www.pheedcontent.com/click.phdo?i=7235034ff0557c79c7eef4ecf41d489e</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.134</pheedo:origLink>
			<description>Hitherto, it has not been easy to interpret the meaning of the amount of discrimination information conveyed in a term rationally and explicitly within practical application contexts; it has not been simple to introduce the concept of the extent of semantic relatedness between two terms meaningfully and successfully into scientific discussions. This study is part of an attempt to do this. We attempt to answer two important questions: (1) What is the discrimination information conveyed by a term and how to measure it? (2) What is the relatedness between two terms and how to estimate it? We focus on the first question, and present an in-depth investigation into the discrimination measures based on several information measures. The relatedness measures are then naturally defined according to the individual discrimination measures. Some key points are made for clarifying potential problems arising from using the relatedness measures and solutions are suggested. Two example applications in the contexts of text mining and information retrieval are provided. The aim of this study, of which this paper forms part, is to establish a unified theoretical framework, with MDI (measurement of discrimination information) at the core, for achieving effective MSR (measurement of semantic relatedness).&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=7235034ff0557c79c7eef4ecf41d489e&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=7235034ff0557c79c7eef4ecf41d489e&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.134</guid>
		</item>
		<item>
			<title>PrePrint: Incremental and General Evaluation of Reverse Nearest Neighbors</title>
			<link>http://www.pheedcontent.com/click.phdo?i=a49113ce37f927fb6dc7ab8ef8fbc242</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.133</pheedo:origLink>
			<description>This paper presents a novel algorithm for Incremental and General Evaluation of continuous Reverse Nearest neighbor queries (IGERN, for short). The IGERN algorithm is general in that it is applicable for both continuous monochromatic and bichromatic reverse nearest neighbor queries. This problem is faced in a number of applications such as enhanced 911 services and in army strategic planning. A main challenge in these problems is to maintain the most up to date query answers as the dataset frequently changes over time. Previous algorithms for monochromatic continuous reverse nearest neighbor queries rely mainly on monitoring at the worst case of six pie regions, whereas IGERN takes a radical approach by monitoring only a single region around the query object. The IGERN algorithm clearly outperforms the state-of-the-art algorithms in monochromatic queries. We also propose a new optimization for the monochromatic IGERN. Furthermore, a filter and refine approach for IGERN (FR-IGERN) is proposed for the continuous evaluation of bichromatic reverse nearest neighbor queries which is an optimized version of our previous approach. The computational complexity of IGERN and FR-IGERN is presented, the correctness of IGERN and FR-IGERN are proved, and experimental analysis using synthetic and real datasets is shown.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=a49113ce37f927fb6dc7ab8ef8fbc242&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=a49113ce37f927fb6dc7ab8ef8fbc242&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.133</guid>
		</item>
		<item>
			<title>PrePrint: Bayesian Classifiers Programmed in SQL</title>
			<link>http://www.pheedcontent.com/click.phdo?i=34914a2686fa2dd4ebd012d0e82bfa44</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.127</pheedo:origLink>
			<description>One of the most popular techniques for classification is the Bayesian classifier. In this work, we focus on programming efficient and accurate Bayesian classifiers in SQL. We introduce two classifiers: Naive Bayes and a classifier based on class decomposition using K-means clustering. We consider two complementary tasks: model computation and scoring a data set. We study several layouts for tables and several indexing alternatives. We analyze how to transform equations into efficient SQL queries and we introduce important query optimizations. We conduct experiments with real and synthetic data sets to evaluate classification accuracy, query optimizations and scalability. Our Bayesian classifier is more accurate than Na\"ive Bayes and decision trees. We study how to tune classification accuracy varying the number of clusters, setting class priors and turning a probability-based decision on and off. Distance computation is significantly accelerated with horizontal layout for tables and query optimizations. SQL queries are faster than UDFs to compute distances and determine the closest cluster per class. We also compare our Naive Bayes implementation in SQL with an efficient implementation in C++: SQL is four times slower, but not an order of magnitude slower. Our Bayesian classifier in SQL achieves high classification accuracy, can efficiently analyze large data sets and has linear scalability.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=34914a2686fa2dd4ebd012d0e82bfa44&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=34914a2686fa2dd4ebd012d0e82bfa44&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.127</guid>
		</item>
		<item>
			<title>PrePrint: Bregman Divergence Based Regularization for Transfer Subspace Learning</title>
			<link>http://www.pheedcontent.com/click.phdo?i=ac2153e1de1c3deecbe6340995d69f52</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.126</pheedo:origLink>
			<description>The regularization principals lead approximation schemes to deal with various learning problems, e.g., the regularization of the norm in a reproducing kernel Hilbert space for the ill-posed problem. In this paper, we present a family of subspace learning algorithms based on a new form of regularization, which transfers the knowledge gained in training samples to testing samples. In particular, the new regularization minimizes the Bregman divergence between the distribution of training samples and that of testing samples in the selected subspace, so it boosts the performance when training and testing samples are not independent and identically distributed. To test the effectiveness of the proposed regularization, we introduce it to popular subspace learning algorithms, e.g., Principal Components Analysis (PCA) for cross-domain face modelling; and Fisher&#x2019;s linear discriminant analysis (FLDA), locality preserving projections (LPP), marginal Fisher&#x2019;s analysis (MFA), and Discriminative Locality Alignment (DLA) for cross-domain face recognition. Finally, we present experimental evidence on FERET, UMIST, and YALE face image datasets, suggesting that the proposed Bregman divergence based regularization is effective to deal with cross-domain learning problems.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=ac2153e1de1c3deecbe6340995d69f52&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=ac2153e1de1c3deecbe6340995d69f52&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.126</guid>
		</item>
		<item>
			<title>PrePrint: &#x03B4;-Presence Without Complete World Knowledge</title>
			<link>http://www.pheedcontent.com/click.phdo?i=930a63eaee90d0bd4fb4d817681d8078</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.125</pheedo:origLink>
			<description>Advances in information technology, and its use in research, are increasing both the need for anonymized data and the risks of poor anonymization. Previous work presented a new privacy metric, &#x03B4;-presence, that clearly links the quality of anonymization to the risk posed by inadequate anonymization. It was shown that existing anonymization techniques are inappropriate for situations where &#x03B4;-presence is a good metric (specifically, where knowing an individual is in the database poses a privacy risk). This article addresses a practical problem with previous work, extending to situations where the data anonymizer is not assumed to have complete world knowledge. The algorithms are evaluated in the context of a real-world scenario, demonstrating practical applicability of the approach.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=930a63eaee90d0bd4fb4d817681d8078&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=930a63eaee90d0bd4fb4d817681d8078&amp;p=1&quot;/&gt;&lt;/a&gt;
&lt;img alt=&quot;&quot; height=&quot;0&quot; width=&quot;0&quot; border=&quot;0&quot; style=&quot;display:none&quot; src=&quot;http://a.rfihub.com/eus.gif?eui=2225&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.125</guid>
		</item>
		<item>
			<title>PrePrint: Feature Selection Using f-Information Measures in Fuzzy Approximation Spaces</title>
			<link>http://www.pheedcontent.com/click.phdo?i=792cd502391b716aa5590f1d208f061a</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.124</pheedo:origLink>
			<description>The selection of nonredundant and relevant features of real valued data sets is a highly challenging problem. A novel feature selection method is presented here based on fuzzy-rough sets by maximizing the relevance and minimizing the redundancy of the selected features. By introducing the fuzzy equivalence partition matrix, a novel representation of Shannon's entropy for fuzzy approximation spaces is proposed to measure the relevance and redundancy of features suitable for real valued data sets. The fuzzy equivalence partition matrix also offers an efficient way to calculate many more information measures, termed as f-information measures. Several f-information measures are shown to be effective for selecting nonredundant and relevant features of real valued data sets. This paper compares the performance of different f-information measures for feature selection in fuzzy approximation spaces. Some quantitative indices are introduced based on fuzzy-rough sets for evaluating the performance of proposed feature selection method. The effectiveness of the proposed method, along with a comparison with other methods, is demonstrated on a set of real life data sets.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=792cd502391b716aa5590f1d208f061a&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=792cd502391b716aa5590f1d208f061a&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.124</guid>
		</item>
		<item>
			<title>PrePrint: A Non-Supervised Learning Framework of Human Behavior Patterns Based on Sequential Actions</title>
			<link>http://www.pheedcontent.com/click.phdo?i=964d058b2ac9b13dc273dae9d2b719d5</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.123</pheedo:origLink>
			<description>In designing autonomous service systems such as assistive robots for the aged and the disabled, discovery and prediction of human actions are important and often crucial. Patterns of human behavior, however, involve ambiguity, uncertainty, complexity, and inconsistency caused by physical, logical, and emotional factors, and thus their modeling and recognition are known to be difficult. In this paper, a non-supervised learning framework of human behavior patterns is suggested in consideration of human behavioral characteristics. Our approach consists of two steps. In the first step, a meaningful structure of data is discovered by using Agglomerative Iterative Bayesian Fuzzy Clustering (AIBFC) with a newlyproposed cluster validity index. In the second step, the sequence of actions is learned on the basis of the structure discovered in the first step and by utilizing the proposed Fuzzy-state Q-learning (FSQL) process. These two learning steps are incorporated in an amalgamated framework, AIBFC-FSQL, which is capable of learning human behavior patterns in a non-supervised manner and predicting subsequent human actions. Through a number of simulations with typical benchmark datasets, we show that the proposed learning method outperforms several well-known methods. We further conduct experiments with two challenging real-world databases to demonstrate its usefulness from a practical perspective.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=964d058b2ac9b13dc273dae9d2b719d5&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=964d058b2ac9b13dc273dae9d2b719d5&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.123</guid>
		</item>
		<item>
			<title>PrePrint: Probabilistic Topic Models for Learning Terminological Ontologies</title>
			<link>http://www.pheedcontent.com/click.phdo?i=52f6f181dacf6854f94609cb668db94e</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.122</pheedo:origLink>
			<description>Probabilistic topic models were originally developed and utilised for document modeling and topic extraction in Information Retrieval. In this paper we describe a new approach for automatic learning of terminological ontologies from text corpus based on such models. In our approach, topic models are used as efficient dimension reduction techniques, which are able to capture semantic relationships between word-topic and topic-document interpreted in terms of probability distributions. We propose two algorithms for learning terminological ontologies using the principle of topic relationship and exploiting information theory with the probabilistic topic models learned. Experiments with different model parameters were conducted and learned ontology statements were evaluated by the domain experts. We have also compared the results of our method with two existing concept hierarchy learning methods on the same dataset. The study shows that our method outperforms other methods in terms of recall and precision measures. The precision level of the learned ontology is sufficient for it to be deployed for the purpose of browsing, navigation, and information search and retrieval in digital libraries.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://ads.pheedo.com/click.phdo?s=52f6f181dacf6854f94609cb668db94e&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://ads.pheedo.com/img.phdo?s=52f6f181dacf6854f94609cb668db94e&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.122</guid>
		</item>
		<item>
			<title>PrePrint: The Tiled Bitmap Forensic Analysis Algorithm</title>
			<link>http://www.pheedcontent.com/click.phdo?i=6b34ac951830b363d58e59e8d5d0b20b</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.121</pheedo:origLink>
			<description>Tampering of a database can be detected through the use of cryptographically-strong hash functions. Subsequently-applied forensic analysis algorithms can help determine when, what, and perhaps ultimately who and why. This paper presents a novel forensic analysis algorithm, the Tiled Bitmap Algorithm, which is more efficient than prior forensic analysis algorithms. It introduces the notion of a candidate set (all possible locations of detected tampering(s)) and provides a complete characterization of the candidate set and its cardinality. An optimal algorithm for computing the candidate set is also presented. Finally, the implementation of the Tiled Bitmap Algorithm is discussed, along with a comparison to other forensic algorithms in terms of space/time complexity and cost. An example of candidate set generation and proofs of the theorems and lemmata and of algorithm correctness can be found in the appendix.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=6b34ac951830b363d58e59e8d5d0b20b&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=6b34ac951830b363d58e59e8d5d0b20b&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.121</guid>
		</item>
		<item>
			<title>PrePrint: k-Anonymity in the Presence of External Databases</title>
			<link>http://www.pheedcontent.com/click.phdo?i=272512237768c6ef3547d6e0d3978842</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.120</pheedo:origLink>
			<description>The concept of k-anonymity has received considerable attention due to the need of several organizations to release microdata without revealing the identity of individuals. Although all previous k-anonymity techniques assume the existence of a public database (PD) that can be used to breach privacy, none utilizes PD during the anonymization process. Specifically, existing generalization algorithms create anonymous tables using only the microdata table (MT) to be published, independently of the external knowledge available. This omission leads to high information loss. Motivated by this observation we first introduce the concept of k-join-anonymity (KJA), which permits more effective generalization by exploiting the records of PD to reduce the information loss. Then, we propose two methodologies for adapting k-anonymity algorithms to their KJA counterparts. The first generalizes the combination of MT and PD, under the constraint that each group should contain at least one tuple of MT (otherwise, the group is useless and discarded). The second anonymizes MT, and then refines the resulting groups using PD. Finally, we evaluate the effectiveness of our contributions with an extensive experimental evaluation using real and synthetic datasets.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=272512237768c6ef3547d6e0d3978842&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=272512237768c6ef3547d6e0d3978842&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.120</guid>
		</item>
		<item>
			<title>PrePrint: A Distance Measure Approach to Exploring the Rough Set Boundary Region for Attribute Reduction</title>
			<link>http://www.pheedcontent.com/click.phdo?i=a1e7588ac133955eb979b838bb38c40b</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.119</pheedo:origLink>
			<description>Feature Selection (FS) or Attribute Reduction techniques are employed for dimensionality reduction and aim to select a subset of the original features of a dataset which are rich in the most useful information. The benefits of employing FS techniques include improved data visualisation and transparency, a reduction in training and utilisation times and potentially, improved prediction performance. Many approaches based on rough set theory up to now, have employed the dependency function, which is based on lower approximations as an evaluation step in the FS process. However, by examining only that information which is considered to be certain and ignoring the boundary region, or region of uncertainty, much useful information is lost.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=a1e7588ac133955eb979b838bb38c40b&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=a1e7588ac133955eb979b838bb38c40b&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.119</guid>
		</item>
		<item>
			<title>PrePrint: Building a Rule-Based Classifier --- a Fuzzy-Rough Set Approach</title>
			<link>http://www.pheedcontent.com/click.phdo?i=569774a472a0f7d87cbebc30062c91b5</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.118</pheedo:origLink>
			<description>The fuzzy-rough set (FRS) methodology, as a useful tool to handle discernibility and fuzziness, has been widely studied. Some researchers studied on the rough approximation of fuzzy sets while some others focused on studying one application of FRS: attribute reduction (i.e. feature selection). However, constructing classifier by using FRS, as another application of FRS, has been less studied. In this paper, we build a rule-based classifier by using one generalized FRS model after proposing a new concept named &#x2018;consistence degree&#x2019; which is used as the critical value to keep the discerniblity information invariant in the processing of rule induction. First, we generalized the existing FRS to a robust model with respect to misclassification and perturbation by incorporating one controlled threshold into knowledge representation of FRS. Second, we propose a concept named &#x2018;consistence degree&#x2019; and by the strict mathematical reasoning we show this concept is reasonable as a critical value to reduce redundant attribute-values in database. By employing this concept, we then design a discernibility vector to develop the algorithms of rule induction. The induced rule set can function as a classifier. Finally, the experimental results show that the proposed rule-based classifier is feasible and effective on noisy data.&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;br clear=&quot;both&quot; style=&quot;clear: both;&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/click.phdo?s=569774a472a0f7d87cbebc30062c91b5&amp;p=1&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=569774a472a0f7d87cbebc30062c91b5&amp;p=1&quot;/&gt;&lt;/a&gt;
</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.118</guid>
		</item>
	</channel>
</rss>