# Articles

## Thesis Defenses 2011

### Verification and modification of Randomized algorithms in Extended SAT applied to cryptographic systems and comparison with DPLL based algorithm

**By:** Leila Samimi Dehkordy

**Supervisors:** Dr. Ali Moeini

**Co-advisor:** Dr. Faramarz Hendessi

**Defense room:** Engineering Science 4

**Committee:** Dr. Kambiz Badie, Dr. Morteza Mohammad Nouri

**Date:** Monday, January 31, 2011

**Abstract:** One of the most interesting discrete mathematics problems is Satisfiability problem that has several applications and is called the king of discrete Math science. SAT problem has two parts. First instance: a set U contain variables and a collection named C of clauses on these variables; and second question: Is there a truth assignment for variables of U that satisfies all the clauses in C? SAT problem is used to study on the performance and speedy of different algorithms. In fact, this problem is a way for analysis of algorithms. In two recent years, several methods are prepared for solving SAT, and every year these methods are improved. In this thesis, we first study them. Then we propose a new SAT solver which is combined randomized technique with imperialist competitive algorithm.

Since there are different methods for solving SAT, real world problems can be converted to it as a result of Cook theorem and then will be solved. One the important application of SAT is cryptography algorithm. We can solve encryption system of equations with SAT solvers. This method is called logical cryptanalysis; it is a subset of algebraic attacks. SAT problem consists of Boolean operators like ∧ and ∨ but cipher system of equations may consists of xor operator, too. Conversion of each xor-clause to CNF clause increase number of variables exponentially. In recent years, some methods are proposed to solve this problem and we study them. Then we propose a new pattern for logical cryptanalysis. We also apply this pattern on A5/1, a stream cipher that is used in mobile communication and show advantages and disadvantages of this new method.

**Keywords:** Satisfiability problem, Imperialist Competitive algorithm, stochastic local search techniques, logical cryptanalysis, algebraic attack.

### Prediction of splice sites in DNA sequences based on pattern finding algorithm

**By:** Zeinab Movassaghinia

**Supervisors:** Dr. Mehdi Sadeghi, Dr. Kambiz Badie, Dr. Fatemeh Zare

**Defense room:** Engineering Science 1

**Committee:** Dr. Ali Moeini, Dr. Abbas Nozari

**Date:** Saturday, February 18, 2012

**Abstract:** Nowadays, it is known that all information required for cells activity are encrypted in Acid nucleic sequences or existents' genome and decryption from genome can provide us with a deep understanding of existents' function.

One of above mentioned functions is Splicing process in RNAs in which some parts of molecules (introns) is splited from exons. It seems that parts of RNA sequence which is known as motif are incorporated in this function. In biology sequence, motif is said to sets of some similar subsequences which are normally having special & common function.

In this thesis we have looked after a way to find motifs which are incorporated in splicing process. To do so, we used most popular motif finding algorithms such as Gibbse-sampling & MEME which are based on probability and also the basis of our new proposed algorithm in this thesis.

Results are showing that 5’ splice sites motifs & branch point motifs –which are incorporated in splicing process & are preserved in terms of sequence-, are found able with this algorithm. But 3’ splice site is not proofing sequence preservation and consequently is not resolvable using this algorithm which is working on this basis.

We’ve considered that, such as most of biological events, the activity of this site is not independent and is probably dependent to another site in RNA sequence, so we’ve tried to propose an algorithm which is able to resolve dependent sites in sequence which is acting as motif.

A new algorithm based on GIBBES-SAMPLING named GENETIC-MI-GIBBES is proposed using genetic algorithm & mutual information scoring techniques which is suitable for dependent motif finding. This algorithm is able to find two dependent motifs in one sequence set. It’s also possible to use this algorithm for analyzing other sequence behavior such as secondary structure of RNA molecule.

**Keywords:** motif finding, RNA splicing, mutual information, genetic algorithm

### Improved Algorithms for Mobile Facility Location Problem and Other Variants

**By:** Abdolhamid Ghodsollahi

**Supervisors:** Dr. Ali Moeini

**Advisor:** Dr. Mohammad Ali safari

**Defense room:** Engineering Science 1

**Committee:** Dr. Kambiz Badie, Dr. Yahya Tabesh

**Date:** Saturday, February 26, 2011

**Abstract:** In this thesis, we consider facility location problem(FLP) that is an important and pivotal problem in the field of operations research. Standard version of this problem has studied since the early 1960's and the researchers have extended many variants of the standard version and presented many different solutions for them. They have used linear programming techniques, local search methods, and heuristics to solve the different types of FLP.

We focus our attention on online version of some type of FLP. In chapter 4, online version of mobile facility location problem is considered and we present an algorithm for it. The main contribution of this thesis is presentation a general model for online facility location problem (OFLP) and a feasible solution for it. We study a worst case scenario in OFLP where an adversary adaptively brings the demands one at a time and we should decide to either open a new facility or use an existing open facility for a coming demand right when it comes. The demands arrive one at a time and can locate at any points in the region. There is no prior information on the input order and the location of the demands.

Our algorithm is deterministic and competitive for the fixed location model in every metric space and general case of non-uniform facility costs. We have achieved a competitive ratio of logarithmic order.

**Keywords:** Online facility location; Competitive algorithm; Fixed location model

### Analysis of influence of communication in social networks

**By:** Marziyeh Anjerani

**Supervisors:** Dr. Ali Moeini

**Co-advisor:** Dr. Hassan Abolhassani

**Defense room:** Engineering Science 1

**Committee:** Dr. Kambiz Badie, Dr. Fereydoon Shams

**Date:** Saturday, February 26, 2011

**Abstract:** Social networks are commonly a set of persons or organizations that are connected by one or more type of relations. Social network is represented by undirected graph that nodes and edges are individuals and relationships or interactions within them, respectively. One of the most practical example in these networks is influence maximization problem that are used in wide areas such as in medicine for prevention of the spread of infectious diseases, in criminology for identifying the nature and extent of conspiratorial involvement, in politics for selecting one candidate, in business for marketing a new product and etc. The influence maximization problem selects subset of users in social network which have more influence among the rest of people and cause the propagation of much more information. The problem of finding k users in the social network that influence the largest number of people is NP-hard. For this reason, a greedy approximation algorithm is used with (1-1/e) factor for some information cascade models. Computations of algorithm are too time-consuming and are not suitable for large scale social network. This thesis presents a new community detection algorithm for social network analysis. A community-based metric is computed for hubs, and then nodes of graphs are partitioned based on a betweenness centrality measure. In local community detection, all nodes belong to no community are associated to a neighbor’s community based on a measure. After that, sole nodes in a community and also communities with two nodes are merged by neighbor’s community based on a local community measure. Finally, two communities are merged based on modularity Q iteratively. Moreover, it proposes three heuristic algorithms for influential nodes selection after detecting communities in social networks. These algorithms are divided by some steps based on a number of seed size to extract influential nodes and the number of communities. They are extremely faster than the greedy algorithm; therefore they are very suitable for large social networks.

**Keywords:** Social network, Influence maximization problem, Independent cascade model, Community detection, Heuristic algorithms.

### Aspect-oriented document clustering for facilitating retrieval process

**By:** Marjan Hosseinia

**Supervisors:** Dr. Kambiz Badie

**Co-advisor:** Dr. Ali Moeini

**Defense room:** Engineering Science 4

**Committee:** Dr. Amin Ghoddousian, Dr. Mirmohsen Pedram

**Date:** Thursday, March 3, 2011

**Abstract:** Information is essential to us all times in our lives but achieving information needed without organizing them is impossible while its sources are growing quickly.

Document clustering is an unsupervised text mining task for organizing information through which documents are grouped into meaningful clusters. Although clustering is an old research topic and so many clustering approaches have been proposed until now, it is still needed to improve its performance.

This thesis is an attempt to solve the problem of clustering text documents. In this thesis, we first present a review of document clustering including problem definition, clustering processes, document representation and properties of clustering algorithms. For improving our approach we first tried to enrich text documents using Wikipedia, a huge information resource. Then we evaluated them with two datasets. Results which show that enriching document context has good effect on clustering performance, guide us to propose our new approach for document clustering to facilitate retrieval process, called aspect-oriented document clustering. In this approach similarity between documents is computed based on a special aspect which has been enriched using Wikipedia. We evaluated the approach with two popular datasets. Results demonstrate that the aspect-oriented clustering enhances clustering performance significantly when we applied it for grouping those documents which can be equivalent to a part of retrieved documents from frequent aspect based queries, queries which tries to find information about an issue from the aspect of another issue.

**Keywords:** document clustering, document representation, Wikipedia, aspect-oriented document clustering

### Analysis of Recommendation Algorithms for Vulnerability Calculation in Networks with using Tenacity parameter

**By:** Davoud Jelodar

**Supervisor:** Dr. Dara Moazzami

**Adviser:** Dr. Ali Moeini

**Defense room:** Engineering Science 1

**Committee:** Dr. Kambiz Badie, Dr. Hassan Yousefi Azari

**Date:** Monday, October 3, 2011

**Abstract:** The concept of graph tenacity was introduced by Cozzens, Moazzami and Stueckle, as a measure of network vulnerability and reliability. Conceptually graph vulnerability relates to the study of graph intactness when some of its elements are removed.

The motivation for studying vulnerability measures is derived from design and analysis of networks under hostile environment. Graph tenacity has been an active area of research since the concept was introduced in 1992. The tenacity T(G) of a graph G is defined as

where m(G − S) denotes the order (the number of vertices) of a largest component of G − S and ω(G − S) is the number of components of G − S. A set S ⊆ V (G) is said to be a T-set of G if

The Mix-tenacity, Tm(G) of a graph G is defined as

Where m(G − S) denotes the order (the number of vertices) of a largest component of G − S and ω(G − S) is the number of components of G − S.

In this thesis we show some bounds on the tenacity of graphs and present some algorithms for its computation. This thesis is base of a paper and an application.

**Keywords:** Vulnerability, Tenacity, Integrity, Toughness, Algorithm

### Analysis of protein structure with using of Graph Theory

**By:** Seyed Morteza Dadvand

**Supervisor:** Dr. Ali Moeini

**Co-adviser:** Dr. Mehdi Sadeghi

**Defense room:** Engineering Science 1

**Committee:** Dr. Kambiz Badie, Dr. Abbas Nozari

**Date:** Wednesday, October 5, 2011

**Abstract:** One of the most important problem in biology is domain detection in protein. In this thesis by using graph theory, network theory and related algorithms, we try to detect domains in protein.

In this method by calculating shortest path between all vertices in graph, we try to find frequency of vertices in nodes distribution diagram. Then after smoothing of this diagram, graph can be cut from of local maximum nodes of this diagram. Then some of this subgraph can be merge to construct domains. Also we show that the results and complexity of this method are better from previous methods.

### Analysis of Approximation Algorithms for Steiner Trees

**By:** Safar Vafadar Dolagh

**Supervisor:** Dr. Dara Moazzami

**Co-adviser:** Dr. Ali Moeini

**Defense room:** Engineering Science 1

**Committee:** Dr. Kambiz Badie, Dr. Abbas Nozari

**Date:** Wednesday, October 5, 2011

**Abstract:** The classical Steiner tree problem in weighted graphs seeks a minimum weight connected subgraph containing a given subset of the vertices which are called required. If the edge weights are all positive, then the resulting subgraph is obviously a tree. The computational nature of the problem makes it a traditional research subject in theory of computing. Steiner tree problem in graphs is NP-complete, i.e., the exact optimal solutions are unlikely computable in polynomial-time. We present a new polynomial time approximation algorithm that achieves an approximation ratio of 1.5, which improves upon the previously best-known approximation algorithm of ratio 1.55. Time complexity of our algoruthm is O(|V|3×|R|3) so is bether than the previous approximation algorithms.

Considering the above applications of Steiner tree problem, finding better solutions to solve the problem can be helpful to problem areas that it use. We also introduce new applications for Steiner tree problem. We use the Steiner tree problem for finding phylogeny tree in the manuscripts.

### Improving Convergence Mechanism in Reinforcement Learning Using Case-Based Reasoning

**By:** Kaveh Ghiaee

**Supervisor:** Dr. Kambiz Badi

**Co-adviser:** Dr. Ali Moeini

**Defense room:** Engineering Science 1

**Committee:** Dr. Amin Ghoddousian, Dr. Mohammad Ganj Tabesh

**Date:** Thursday, October 6, 2011

**Abstract:** One of the general aims in machine learning, is to use learning methods for developing intelligent applications. Reinforcement learning is one approach to reach this goal. Reinforcement learning has been attracting much interest in the machine learning and artificial intelligence communities during the past decades. Traditional Reinforcement learning approaches, are seeking for an optimal behavior in an unknown environment by interacting with it, which is based on the state's value estimations. This process could be very time consuming.

In this thesis, we introduce an original algorithm which combines Q-learning algorithm with Case-Based reasoning on a basis of a free interpretation of the Knapsack problem. In fact, our proposed algorithm is presenting a new approach to the problem of value function approximation.

We evaluate our proposed algorithm by applying it on the maze problem. Results obtained show improvement on both speed and quality of learning.

**Keywords:** Reinforcement Learning, Case-based Reasoning, Value function estimation, Maze