Student Honors Papers
The Student Honors Papers collection represents exemplary work in computer science at Illinois Wesleyan University. The Ames Library is proud to archive these and other honors projects in Digital Commons @ IWU, the University's online archive of student, faculty and staff scholarship and creative activity.
Generative adversarial networks (GANs) are now one of the key techniques for detecting anomalies in images, yielding remarkable results. Applying similar methods to discrete structures, such as text sequences, is still largely an unknown. In this work, we introduce a new GAN-based text anomaly detection method, called ARAE-AnoGAN, that trains an adversarially regularized autoencoder (ARAE) to reconstruct normal sentences and detects anomalies via a combined anomaly score based on the building blocks of ARAE. Finally, we present experimental results demonstrating the effectiveness of ARAE-AnoGAN and other deep learning methods in text anomaly detection.
Symmetry occurs in many constraint satisfaction problems, and it is important to deal with it efficiently and effectively, as it often leads to an exponential number of isomorphic assignments. Symmetric rows and columns in matrices are an important class of symmetries in constraint programming. In this work, we develop a new SAT encoding for partial lexicographical ordering constraints to break symmetries in such places. We also survey all the previous complete lex-leader encodings in literature and translate them into SAT encodings. We perform experimental analysis on how these lex-leader constraints impact the solving of Balanced Incomplete Block Design (BIBD) instances. Each encoding is able to outperform the other encodings on some instances, and they all perform close to each other; no clear winner can be drawn. Finally, the result shows that though using any lex-leader constraints is detrimental to finding a single BIBD, they are necessary in enumerating all BIBDs and proving non-existing designs.
Constraint satisfaction problems (CSPs) involve finding assignments to a set of variables that satisfy some mathematical constraints. Unsatisfiable constraint problems are CSPs with no solution. However, useful characteristic subsets of these problems may be extracted with algorithms such as the MARCO algorithm, which outperforms the best known algorithms in the literature. A heuristic choice in the algorithm affects how it traverses the search space to output these subsets. This work analyzes the effect of this choice and introduces three improvements to the algorithm. The first of these improvements sacrifices completeness in terms of one type of subset in order to improve the output rate of another; the second and third are variations of a local search in between iterations of the algorithm which result in improved guidance in the search space. The performance of these improvements is analyzed both individually and in combinations across a variety of benchmarks and they are shown to improve the output rate of MARCO.
Boolean cardinality constraints are commonly translated (encoded) into Boolean CNF, a standard form for Boolean satisfiability problems, which can be solved using a standard SAT solving program. However, cardinality constraints are a simple generalization of clauses, and the complexity entailed by encoding them into CNF can be avoided by reasoning about cardinality constraints natively within a SAT solver. In this work, we compare the performance of two forms of native cardinality constraints against some of the best performing encodings from the literature. We designed a number of experiments, modeling the general use of cardinality constraints including crafted, random and application problems, to be run in parallel on a cluster of computers. Results show that native implementations substantially outperform CNF encodings on instances composed entirely of cardinality constraints, and instances that are mostly clauses with few cardinality constraints exhibit mixed results warranting further study.