Goto

Collaborating Authors

 University of Pittsburgh


Not All Samples are Equal: Class Dependent Hierarchical Multi-Task Learning for Patient Diagnosis Classification

AAAI Conferences

An interesting machine learning problem is to learn predictive models that can automatically assign diagnoses or diagnostic categories to patient cases. However, we often do not have enough positive samples for many of diagnoses either due to their rare nature or a limited size of available datasets. This motivates the use of multi-task learning methods that tend to improve model performance by imposing model similarities between related tasks. In this work, we tackle this important problem by exploring the benefits of existing expert-defined diagnostic hierarchies. We argue that related tasks (models) organized in expert-defined hierarchies do not have the same level of similarity for different classes of samples. We discuss how task similarities will be different for positive and negative samples and between parent and child diagnoses. We propose a new asymmetric version of Adaptive Hierarchical Multi-task Learning (AHMTL) method that allows models to learn separate relatedness coefficient for tasks in the hierarchy based on their class values. Finally, we show that our model outperforms individually trained SVM models and symmetric AHMTL results.


Detecting Trait versus Performance Student Behavioral Patterns Using Discriminative Non-Negative Matrix Factorization

AAAI Conferences

Recent studies have shown that students follow stable behavioral patterns while learning in online educational systems. These behavioral patterns can further be used to group the students into different clusters. However, as these clusters include both high-and low-performance students, the relation between the behavioral patterns and student performance is yet to be clarified. In this work, we study the relation between students' learning behaviors and their performance, in a self-organized online learning system that allows them to freely practice with various problems and worked examples. We represent each student's behavior as a vector of high-support sequential micro-patterns. Assuming that some behavioral patterns are shared across high-and low-performance students, and some are specific to each group, we group the students according to their performance. Having this assumption, we discover both the prevalent behavioral patterns in each group, and the shared patterns across groups using discriminative non-negative matrix factorization. Our experiments show that there are such common and specific patterns in students' behavior that are discriminative among students with different performances.


Inexact Proximal Gradient Methods for Non-Convex and Non-Smooth Optimization

AAAI Conferences

In machine learning research, the proximal gradient methods are popular for solving various optimization problems with non-smooth regularization. Inexact proximal gradient methods are extremely important when exactly solving the proximal operator is time-consuming, or the proximal operator does not have an analytic solution. However, existing inexact proximal gradient methods only consider convex problems. The knowledge of inexact proximal gradient methods in the non-convex setting is very limited. To address this challenge, in this paper, we first propose three inexact proximal gradient algorithms, including the basic version and Nesterovโ€™s accelerated version. After that, we provide the theoretical analysis to the basic and Nesterovโ€™s accelerated versions. The theoretical results show that our inexact proximal gradient algorithms can have the same convergence rates as the ones of exact proximal gradient algorithms in the non-convex setting. Finally, we show the applications of our inexact proximal gradient algorithms on three representative non-convex learning problems. Empirical results confirm the superiority of our new inexact proximal gradient algorithms.


Argument Mining for Improving the Automated Scoring of Persuasive Essays

AAAI Conferences

End-to-end argument mining has enabled the development of new automated essay scoring (AES) systems that use argumentative features (e.g., number of claims, number of support relations) in addition to traditional legacy features (e.g., grammar, discourse structure) when scoring persuasive essays. While prior research has proposed different argumentative features as well as empirically demonstrated their utility for AES, these studies have all had important limitations. In this paper we identify a set of desiderata for evaluating the use of argument mining for AES, introduce an end-to-end argument mining system and associated argumentative feature sets, and present the results of several studies that both satisfy the desiderata and demonstrate the value-added of argument mining for scoring persuasive essays.


Asking Friendly Strangers: Non-Semantic Attribute Transfer

AAAI Conferences

Nickisch, and Harmeling 2009; Parikh and Grauman We propose an attention-guided transfer network. Briefly, 2011; Akata et al. 2013), learn object models expediently our approach works as follows. First, the network receives by providing information about multiple object classes training images for attributes in both the source and target with each attribute label (Kovashka, Vijayanarasimhan, and domains. Second, it separately learns models for the attributes Grauman 2011; Parkash and Parikh 2012), interactively recognize in each domain, and then measures how related each fine-grained object categories (Branson et al. 2010; target domain classifier is to the classifiers in the source domains. Wah and Belongie 2013), and learn to retrieve images from Finally, it uses these measures of similarity (relatedness) precise human feedback (Kumar et al. 2011; Kovashka, to compute a weighted combination of the source classifiers, Parikh, and Grauman 2015). Recent ConvNet approaches which then becomes the new classifier for the target have shown how to learn accurate attribute models through attribute. We develop two methods, one where the target and multi-task learning (Fouhey, Gupta, and Zisserman 2016; source domains are disjoint, and another where there is some Huang et al. 2015) or by localizing attributes (Xiao and overlap between them. Importantly, we show that when the Jae Lee 2015; Singh and Lee 2016). However, deep learning source attributes come from a diverse set of domains, the with ConvNets requires a large amount of data to be available gain we obtain from this transfer of knowledge is greater for the task of interest, or for a related task (Oquab et than if only use attributes from the same domain.


Directional Label Rectification in Adaptive Graph

AAAI Conferences

With the explosive growth of multivariate time-series data, failure (event) analysis has gained widespread applications. A primary goal for failure analysis is to identify the fault signature, i.e., the unique feature pattern to distinguish failure events. However, the complex nature of multivariate time-series data brings challenge in the detection of fault signature. Given a time series from a failure event, the fault signature and the onset of failure are not necessarily adjacent, and the interval between the signature and failure is usually unknown. The uncertainty of such interval causes the uncertainty in labeling timestamps, thus makes it inapplicable to directly employ any standard supervised algorithms in signature detection. To address this problem, we present a novel directional label rectification model which identifies the fault-relevant timestamps and features in a simultaneous approach. Different from previous graph-based label propagation models using fixed graph, we propose to learn an adaptive graph which is optimal for the label rectification process. We conduct extensive experiments on both synthetic and real world datasets and illustrate the advantage of our model in both effectiveness and efficiency.


Accelerated Method for Stochastic Composition Optimization With Nonsmooth Regularization

AAAI Conferences

Stochastic composition optimization draws much attention recently and has been successful in many emerging applications of machine learning, statistical analysis, and reinforcement learning. In this paper, we focus on the composition problem with nonsmooth regularization penalty. Previous works either have slow convergence rate, or do not provide complete convergence analysis for the general problem. In this paper, we tackle these two issues by proposing a new stochastic composition optimization method for composition problem with nonsmooth regularization penalty. In our method, we apply variance reduction technique to accelerate the speed of convergence. To the best of our knowledge, our method admits the fastest convergence rate for stochastic composition optimization: for strongly convex composition problem, our algorithm is proved to admit linear convergence; for general composition problem, our algorithm significantly improves the state-of-the-art convergence rate from O ( T โ€“1/2 ) to O (( n 1 + n 2 ) 2/3 T -1 ). Finally, we apply our proposed algorithm to portfolio management and policy evaluation in reinforcement learning. Experimental results verify our theoretical analysis.


Sentiment Analysis via Deep Hybrid Textual-Crowd Learning Model

AAAI Conferences

Crowdsourcing technique provides an efficient platform to employ human skills in sentiment analysis, which is a difficult task for automatic language models due to the large variations in context, writing style, view point and so on. However, the standard crowdsourcing aggregation models are incompetent when the number of crowd labels per worker is not sufficient to train parameters, or when it is not feasible to collect labels for each sample in a large dataset. In this paper, we propose a novel hybrid model to exploit both crowd and text data for sentiment analysis, consisting of a generative crowdsourcing aggregation model and a deep sentimental autoencoder. Combination of these two sub-models is obtained based on a probabilistic framework rather than a heuristic way. We introduce a unified objective function to incorporate the objectives of both sub-models, and derive an efficient optimization algorithm to jointly solve the corresponding problem. Experimental results indicate that our model achieves superior results in comparison with the state-of-the-art models, especially when the crowd labels are scarce.


Matrix Variate Gaussian Mixture Distribution Steered Robust Metric Learning

AAAI Conferences

Mahalanobis Metric Learning (MML) has been actively studied recently in machine learning community. Most of existing MML methods aim to learn a powerful Mahalanobis distance for computing similarity of two objects. More recently, multiple methods use matrix norm regularizers to constrain the learned distance matrixMto improve the performance. However, in real applications, the structure of the distance matrix M is complicated and cannot be characterized well by the simple matrix norm. In this paper, we propose a novel robust metric learning method with learning the structure of the distance matrix in a new and natural way. We partition M into blocks and consider each block as a random matrix variate, which is fitted by matrix variate Gaussian mixture distribution. Different from existing methods, our model has no any assumption on M and automatically learns the structure of M from the real data, where the distance matrix M often is neither sparse nor low-rank. We design an effective algorithm to optimize the proposed model and establish the corresponding theoretical guarantee. We conduct extensive evaluations on the real-world data. Experimental results show our method consistently outperforms the related state-of-the-art methods.


Jointly Parse and Fragment Ungrammatical Sentences

AAAI Conferences

However, the sentences under analysis may experiments, we find that both joint methods produce tree not always be grammatically correct. When a dependency fragment sets that are more similar to those produced by the parser nonetheless produces fully connected, syntactically oracle method than the previous pipeline method; moreover, well-formed trees for these sentences, the trees may be inappropriate the seq2seq method's pruning decision has a significantly and lead to errors. In fact, researchers have raised higher accuracy. In terms of downstream applications, we valid questions about the merit of annotating dependency show that dependency arc pruning is helpful for two applications: trees for ungrammatical sentences (Ragheb and Dickinson sentential grammaticality judgment and semantic role 2012; Cahill 2015). On the other hand, previous work has labeling.

OSZAR »