Summary of doctoral thesis: Some improvements of string contraint solving in automated test cases generation for symbolic execution

Chia sẻ: _ _ | Ngày: | Loại File: PDF | Số trang:27

Thêm vào BST

Báo xấu

50
lượt xem 4
download

Download Vui lòng tải xuống để xem tài liệu đầy đủ

This thesis aims to research on overview of automatic test case generating, symbolic execution applied into automatic test case generating. The modeling techniques based on Automata and Bitvector are also studied in this thesis. Apart from that, the analysis and evaluation of available test case generating methods on different constraint are mentioned. The quality and effectiveness of test cases generated by using symbolic execution are assessed.

Chủ đề:

Bình luận(0) Đăng nhập để gửi bình luận!

Lưu

Nội dung Text: Summary of doctoral thesis: Some improvements of string contraint solving in automated test cases generation for symbolic execution

MINISTRY OF EDUCATION AND VIETNAM ACADEMY TRAINING OF SCIENCE AND TECHNOLOGY GRADUATE UNIVERSITY SCIENCE AND TECHNOLOGY ---------------------------- To Huu Nguyen SOME IMPROVEMENTS OF STRING CONTRAINT SOLVING IN AUTOMATED TEST CASES GENERATION FOR SYMBOLIC EXECUTION Major: Mathematical foundations for Informatics Code: 9 46 01 10 SUMMARY OF DOCTORAL THESIS HaNoi - 2020
This thesis is completed at: Graduate University of Science and Technology, Vietnam Academy of Science and Technology Supervisors 1: Dr. NGUYEN TRUONG THANG Supervisors 2: Assoc.Prof.Dr. DANG VAN DUC Reviewer 1: ……………………………………………………………………… ……………………………………………………………………… Reviewer 2: ……………………………………………………………………… ……………………………………………………………………… Reviewer 3: ……………………………………………………………………… ……………………………………………………………………… The thesis is to be presented to the Defense Committee of the Graduate University of Science and Technology – Vietnam Academy of Science and Technology At Date Month Year 2020 The thesis can be found at: - Vietnam National Library - Library of Graduate University of Science and Technology
1 INTRODUCTION In general, for each technique used in dynamic run-time testing, any software can be separated into two stages: preparing the test cases for software testing and performing the program on a background that supports to testing frameworks for available test cases. One of important activities in order to reduce testing cost is to generate test cases automatically and completely. The software developing organizations often pay a large amount for activities related to software testing. The effectiveness of verification and validation progress depends on the number of errors that has been found and fixed before transferring the software. It means that the quality of a software depends strictly on the quality of generated test cases. In past years, many researches from international scientists focused to “automatic test case generating” [3, 4] in order to reduce the cost of software developing. There are two main approaches in this topic including testing based on code and testing based on model. Testing based on code has a high coverage. This approach can remove unnecessary instructions that may cause the errors. This approach is also optimizing the computations of testing analysis softwares. Thus, many recent researches concentrate to testing based on code [5, 6, 7]. There are also many researches in automatic test case generating such as test case generating based on specification, test case generating based on model, test case generating by path approach and smart technique. However, testing based on symbolic execution is an attractive research strategy. The basic test case generating techniques were proposed including test case generating by theorem proving [8, 9], test case generating by symbolic execution [5, 7,10,11,12], test case generating by model checking [13], test case generating by an event-flow model [14], test case generating using a Markov chains model [15]. In which, symbolic execution gets the attention of many researchers. Symbolic execution is used to develop and construct many applications [5, 6, 7] on various tools and programming languages such as C/C++, JavaScrip, .NET, java, HTML, etc. In this thesis, some improvements in string constraint solvers applied in test case generation are studied. The experiments and analysis are evaluated using Java, a strong, modern programming language. The research objective: - To study the methods used to modeling constraints and improve constraint solving capability. - To apply symbolic execution technique in automatic test case generating. - To implement proposed methods in automatic test case generating on string constraints and mixed constraints. - To analyze and evaluate the obtained results. Object and scope of the research: This thesis aims to research on overview of automatic test case generating, symbolic execution applied into automatic test case generating. The modeling techniques based on Automata and Bitvector are also studied in this thesis. Apart from that, the analysis and evaluation of available test case generating methods on different constraint are mentioned. The quality and effectiveness of test cases generated by using symbolic execution are assessed. Content of the research: - The automatic test case generating techniques, the contents related to constraint modeling on different types of constraints. - Improvements of constraint modeling and constraint solving on string constraints. - Applying these improvements in symbolic execution to generate test cases automatically on string constraints for testing programs. Research methodology:
2 - Study, analyze, synthesize the documents related to symbolic execution, the role of constraint solving and string constraint solvers in symbolic execution to generate test cases automatically (papers from journals, conferences, etc). - Improve the constraint modeling methods and constraint solving capability on string constraints and mixed constraints. - Analyze and evaluate the published research results. The main contributions: - Constructed a constraint modeling on string and mixed constraints. - Improved the capability of constraint solving in symbolic execution. - Implemented constraint modeling techniques and constraint sovling based on automata and bitvector. - Compared the obtained results with available researches. The structure of the thesis: The structure of this thesis is organized as below Chapter 1 presents overview of software tesing and symbolic execution techniques in automatic test case generation. The basic theory related to contents of thesis is also given in this chapter. Based on mentioned theory, the research problem and research approaches are stated. Chapter 2 shows the results of researches on constraint modeling, constraint solving in symbolic execution. The applications of these tools in typical situations and the evaluation of these methods on typical data types. Chapter 3 introduces the results of researches of improvements in constraint modeling, the constraint solving in symbolic execution on string constraints and mixed constraints. An extension of symbolic execution technique and the string constraint solving based on automata are also presented in this chapter. The conclusions of this thesis address the main contributions, the future works and the list of references are presented in the thesis.
3 CHAPTER 1. OVERVIEW OF SOFTWARE TESTING AND SYMBOLIC EXECUTION This chapter presents the overview of fundamental aspects of software testing, symbolic execution and applications of symbolic execution in generating test cases. The contents of this chapter mainly focus on basic concepts and analysis of symbolic execution. Moreover, some challenges and future strategies are also presented in this chapter. 1.1. Software testing Ensuring a software system or a component of a software system work as expected is a big challenge in software industry. The mistake softwares cause the loss of economy and other serious effects on the fields that the softwares are used. Figure 1.8. Relation between software development and software testing 1.1.2 Testing techniques [24]: - Black box testing - White box testing - Unit test 1.2. Control flow white box testing technique Control flow white box testing is divided into static and dynamic testing. The input of these two strategies are the source code of program and testing coverage criteria. After analyzing progress, the output is a set of test cases. Each strategy has the advantages and disadvantages. The objective of control flow white box testing technique is to find the minimum set of test cases that gets the highest coverage.
4 c c c c sequentially if…else do….while while….do for Figure 1.6. Common control structures 1.3. The comparison of static and dynamic control flow white box testing technique The comparison of these static and dynamic control flow white box testing technique in term of the advantages and disadvantages is presented in the thesis. The number of test cases, the run time needed in test case generating, the loop test capability and the effects of the complexity in source code are used in comparison. 1.4. The challenges in software testing There are some the wrong ideas in software testing such as a good test case is always the high complexity; automatic testing can replace software engineers in order to test the software well; software testing is a simple and easy work; everyone can test a software without any training. Software testing meets other limitations. The correction of software specification, the accuracy of tesing system are not guaranteed. There is no testing tool that is suitable for every software. Software engineers can not sure about the understand of software product. The resource for complete testing is not enough. We can not perform the testing activities completely. 1.5. Symbolic execution 1.5.1. Overview of symbolic execution In computer science, symbolic execution is a tool used in analysing a program in order to define the input of each component in that component. A compiler of a program get a symbolic value as input instead of a crispt input as usual. def f(x,y); if( x>y); x=x+y y=x-y x=x-y if (x-y>0); assert false return (x, y) Figure 1.7. Example of a concrete idea of a symbolic execution
5 1.5.2. Static symbolic execution The main idea of symbolic execution [31] is to execute the program with symbolic values instead of typical vales of input parameters. Symbolic values of variable x can be represented by: (a). An input notation. (b). A combined expression among symbolic values using operators. (c). A combined expression between symbolic values and typical values using operators. Figure 1.8. Symbolic execution tree 1.5.3. Dynamic symbolic execution Dynamic symbolic execution [32, 33] is symbolic execution technique based on dynamic program analysis. Dynamic symbolic execution is the combination of typical execution and symbolic execution. In dynamic symbolic execution, the program is executed many times with different values of input parameters. Table 1.1. An example of a dynamic symbolicexecution. Constraints are remembered Contrains C’(i) Input i C(i) null a = = null a!=null {} a!=null && !(a.length>0) a!=null && a.length>0 a!=null&& a.length>0 {0} && a[0]!=123456789 a!=null && a.length>0 && a!=null && a.length>0 {123456789} a[0]!=123456789 && a[0]!=123456789 1.5.4 Concolic execution Concolic execution is a typical performance model. Concolic execution is a modified version of symbolic execution. DART [36] and CUTE [35] are two tools of concolic execution. 1.6. The conclusion of chapter 1 Chapter 1 of the thesis presents the fundamental concepts of software testing techniques, basic techniques of automatic test case generating from the models. Apart from that, the comparison of static white box testing with dynamic white box testing is also stated. Moreover, the challenges of software testing is given
6 in this chapter. The basic aspects, overview of symbolic execution methods and unresolved problems are presented. CHAPTER 2. SYMBOLIC EXECUTION AND CONSTRAINT MODELING In this chapter, the difficulties of symbolic execution are analysed and assessed. Besides, the extended tools, the installation of symbolic execution techniques used in improving string constraint modeling technique are also presented. The modeling methods used in installation, evaluation of string constraint and mixed constraint solving methods are addressed in this chapter. The contents of this chapter are introduced in published results of PhD student, including [CT1], [CT2], [CT3]. 2.1. The difficulties and some solutions In this section, the difficulties in applying symbolic execution are discussed. Some solutions used to overcome the challenges are presented. 2.1.1. Path exploration One of the biggest challenges is path exploration. It means that a large number of paths is generated even though the tested program is small. The number of paths is often by a power function. In a certain perioid of time, the testing program has to discover the first path that is the most suitable. 2.1.2. Memory modeling The accuracy of instructions in program when transferring to symbolic constraints affects significantly to the coverage of symbolic execution and the constraint sovling capability as well. For example, in order to establish the range of integer parameters, a memory modeling technique is more effective than the normal way. However, in this case, the accuracy of code analysis is worse. It depends on the selection of value range. For example, math memory over flow error can cause the lack of paths in symbolic execution or mining the impossible paths, etc. 2 .2. Symbolic execution and extended tools 2.2.1. Symbolic execution and software testing Test case automatic generating techniques used symbolic execution are installed on Java to generate test data for programs. However, the performance of available tools has their own disadvantages. Constraint optimization algorithm is applied in order to reduce the execution time and guarantee the effectiveness of coverage. This solution is implemented on different Jave programs and gets the good results. This section presents the aspects of Java PathFinder (JPF) including the checking ability of JPF, high level architecture of JPF and extension capability of JPF. 2.2.2. Symbolic execution on Java programming language Introduction to Java PathFinder (JPF) Nowadays, symbolic execution techniques is performed by various tools on different programming languages such as DART[36], CUTE[35] on C/C++ source code or PEX[36, 37] on .NET source code. In this thesis, path finding is set up on Java because Java is a programming language that independs on environment. Java is also used in many important applications. Moreover, JPF is a powerful tool that support for symbolic execution. This leads to the widely extension ability and a huge developmental community of JPF.
7 JPF is a verification set of script state model for Java [68]. Not same as normal virtual machine, JPF is a virtual machine performing Java programs more than once. JPF executes on all branches, all possible paths of a program. JPF checks the range of attributes such as the exceptions that can not catch through potential execution paths. Figure 2.1 describe the model of JPF Figure 2.1. The model of JPF Checking ability of JPF JPF can check all Java programs. It can find dead key or exception. Apart from that, JPF can develop and extend by itself to test other attributes. High level architecture of JPF JPF is designed by two main components, including JVM and Search (as in Figure 2.3 below). In which, JVM is a Java typical state generation by executing Java bytecode statements. Search is used to choose the state that JVM should process, to direct JVM in generating the next state or to request JVM to go back a previous state.
8 Figure 2.3. High level architecture of JPF. The main installation of Search includes Deep first search (DFSearch) and HeuristicSearch. Extension ability of JPF JPF is considered as a framework that each developer can extend it in order to support for a typical purpose. JPF supplies an extension mechanism for developers in order to allow to add new functions without changing directly the installation of Search or JVM With this flexible extension structure, there are several extensions of JPF. 2.3. Constraint solving and symbolic execution Related to generating test cases, genetic algorithm approach has been studied in research work [CT[1] of PhD student. However, this approach containts many limitations. Although many improvements of constraint solving techniques have been proposed, the constraint solving is one of big obstacles of symbolic execution. Constraint solvers often take a high cost in run time. Thus, symbolic execution can not extend in some kind of program which the source code requires a high performance in constraint solving. To overcome this challenge, the optimization step in constraint solving is necessary in symbolic execution. This is very effective in real programs. There are two optimization methods used in modern tools, including: - Removing unrelated constraints - Constraint solving using incremental methods (Increamental Solving) [71] In constraint sets that are generated in symbolic execution, the representation for a set of static branches from source code is one of the most important properties. In KLEE, all query results is stored in buffer mapping the set of constraints and the typical values corresponding to those constraints. The special case happens when there are no solution or no satisfying a special notation.
9 2.4. Mixed constraints and improvement in string constraint solving In fact, string constraints appear in most of programs, especially in programs or procedures that clean strings in input data in order to test the validity, to discover and to remove harmful contents. String data in Jave is equipped by a huge library. String data conceals many errors. To accomplish testing capability and error discovering ability in Java programs with string constraints, the thesis focuses to the progress of defining errors in order to get the better programs. By applying software testing on string constraints, the coverage ability increases and then the quality of software is also improved. Why does symbolic execution on string data meet difficulty? The operators on string data are the mix of two value domains: string and integer. For example, we need to get the nth character in a string. In present, there are several solutions of symbolic execution on string data. But these solutions have just support for a subset of these operators. The iterative approach is chosen. First, we sovle the constraints on integer data. Then, the obtained results are used to solve the string constraints. If string constraints are satisfied, the task is complete. Otherwise, integer constraints are added into the constraint set. This procedure is repeated until there is no unsatisfied string constraint. The main idea for proposing the extension of symbolic execution is presented as in Figure 2.7 below. Figure 2.7. An example a string graph The iterative progress is performed by two following steps: - Step 1: Current solver of JPF is called to process integer constraints, real constraints, logic constraints,… If all these constraints is solved, the step 2 is performed. - Step 2: + The typical values obtained from Step 1 are loaded in string graph. + The string solver is called to solve these constraints. In this case, string solver can be AutomataSolve or BitVectorSolve. The constraints of string graph may not satisfied with input values. Then, if there is enough time, Step 1 is run and string constraint set is added by a new integer constraint Solve(PathCondition pc){ 1: StringGraph sg; 2: sg=BuilStringGraph(pc); 3: boolean sat=false; 4: While(!sat and !timeout){ 5: sat=CurrentSolver(pc,sg); if(sat) 6: 7: sat=StringSolver(pc,sg); 8: return sat; } }
10 buildStringGraph(PathCondition pc) { 1: StringGraph sg ; 2: for string or mixed constraint c 2 pc: 3: sg= sg hyperedge(c) 4: return preprocess(pc; sg) } Figure 2.8. Mixed constraint solver algorithm The limitation of time “timeout” is added in order to guarantee the combination of string solver and integer solver can not give the final conclusion in a allowable time. If it reaches to “timeout”, the algorithm stops, the given constraint is concluded as unsatisfied. 2.4.1. String graph There are some types of constraints including string constraints, integer constraints, string and integer mixed constraints. In the installation progress, string constraints and mixed constraints are represented by special graph called as string graph. Definition: String graph is a direct graph denoted by H=(X, E, Ʊ). In which, the vertex set is defined by X=IS where I and S in separated sets (I presents for integer variables and S presents for string variables). Ʊ presents for the operators on string variables. E presents for labeled direct edges in graph (E = E1 E2 E3 E4) satisfying En= Ʊ Xn. Using graph is a popular method in constraint satisfying programming. Figure 3.1 describe an example for string graph corresponding to constraint: s1.trim().equals(s2)^s1.equals(s3.concat(s2)) 2.4.2. Constraint re-construction Preparing for pre-processing progress, for each string constraint or mixed constraint of path constraint condition, considered as an edge of string graph, the graph re-constructs constraints by constraints and returned values of operators that are mapped directly into the edges of string graph. For example, the constraint s1.equals(s2) is added to edge (equals, s1,s2) with other operators. The returned value can be a char, an integer or a string. An extra variable is created and represent for result of operator. This extra variable is added as a vertex of string graph. The graph allows to add an edge as a returned value. In the example showed in Figure 2.7, each circle is a vertex of string graph. The dots and lines represent for edges in this graph. Each edge is labeled by the operator and the number on each edge shows the order of vertex in string graph corresponding to a constraint. The string variables s1, s2 and s3 appear as the vertex, elements of S. In this graph, Ʊ = {trim, equals, concat}. Operator trim creates an extra variable s‟ added into S. The operator is the edge (trim, s 1, s‟) is labeled by element trim of Ʊ. This edge connects s1 and s‟. The operator concat is the same. It is the second constraint that creates extra variable s” and edge (concat, s3, s2, s”). The operator equals corresponds to two last edges. When a string graph is completed, it is processed by two following ways: 1. Define the constraints that have no satisfied values in a simple way. 2. Add the values generated by integer solver in path constraint expression.
11 2.4.3. Pre-processing progress When unsatisfied constraints are determined, execution progress will skip string constraint solving. Consequently, time consuming decreases. Eliminating equals edge from string graph is one of proposals. In above example, string graph is adjusted to new graph as shown in Figure 2.9. Figure 2.9. String graph obtained by removing equals operators. This also supports to define rapidly set of constraints represented in string graph that have no satisfied value. For example (shown in Figure 2.10), after removing edge in Figure 2.10.(a), vertex s 1 connects to itself by notEquals edge. This connection is presented in Figure 2.10.(b). Then, there is not existing any string which satisfies this constraint. Most of proposals are introduced to deal with the constant operators. For example, the constraint s1.concat(“xyz”) = ”vwxyz” will lead to the elimination of the complex edges. Then, vertex s 1 is replaced by string “vw”. Another example, the constraint s1.startsWith(“abc”).chatAt(0) ≠ „a‟ is not suitable and it is easy to discover. If this situation appears in pre-processing stage, the use of string constraint solver is unnecessary and false value is returned. Figure 2.10. Unsatisfied constraints after removing equals operators 2.4.4. String constraint generating and the implemental results Before the constraints are generated, new vertices are added in string graph. For each vertex that is not a string constant SiS, an integer variable liI is added in order to represent for the length of string Si. Figure 3.4 shows that the new vertex of string graph in Figure 3.2 is connected to corresponding string vertices by dashes.
12 Figure 2.11. New vertices represent for the length of added string Other constraints are based on the edges as shown in Table 2.1. Table 2.1. Constraint construction for operators on string. Java operators Heper-edge New constraint For each constraint of vertex is not a string constant Si S li>0 Sa.charAt(n) charAt,Sa,n,c la≥n Sa.concat(Sb) concat, Sa , Sb, Sx lx=la+lb Sa.indexOf(Sb) indexOf, Sa, Sb,x (x=-1)˅(la≥lb+x) Sa.lastIndexOf(Sb) lastIndexOf, Sa, Sb,x (x=-1)˅(la≥lb+x) Sa.subString(n,k) subString,Sa,n,k,Sx la=n+lx Sa.contains(Sb) contains,Sa,Sb la≥lb 2.4.5. Constraint solving using automata Constraint solvers is popular tools used to deal with many problems in reality such as theorem proving, time scheduling in real time. Symbolic execution in automatic generating test cases is used in research works CT[2], CT[3] by PhD student. Many researchers reduce the limitations of available constraint solvers in order to improve the applications in symbolic execution, especially in test case generating. Although there are various improvements, the constraint solvers are not effective with some certaint constraints. In computer programs, string data appears frequently, especially in the program components or procedure. The Java programming language provides a huge library of string data types but still has many errors. In software testing, identifying bugs is concerned. In symbol execution, string constraints are also considered in most of cases. Deterministic Finite Automaton (DFA) is used to solve string constraints based on specific algorithms (CT[5]). 2.5. The conclusion of chapter 2 This chapter has presented the issues related to symbol execution and extension tools with an effective role in test case automatic generation. The main content related to symbol execution includes various symbol
13 execution techniques; advantages and disadvantages of these techniques and the processings that need to apply symbolic execution in software testing progress. Chapter 2 also addresses the difficulties in implementing the symbolic execution and several solutions are pointed out in this chapter. Constraint modeling approaches using string graphs, constructing constraints on string graphs, and preprocessing to detect unsatisfied constraints are presented in this chapter. The contents presented in chapter 2 are the research results from the works CT[1], CT[2], CT[3], CT[5]. CHAPTER 3. STRING CONSTRAINT SOLVING In this chapter, the thesis presents a proposal to improve string constraint solving in symbol execution based on automata theory and apply it on string data type. Experimental results, evaluation of text data processing of the improved model with some existing models are also shown in this chapter. The contents of this chapter are published in [CT4], [CT5], [CT6] of PhD student. 3.1. String constraint solver Many applications depend on the handling of a text data type. Since the use of the Internet has become common with the proliferation of interactive applications, this problem becomes more serious. Interactive applications include social media, chat, sensitive and important information management, personal information management. The input data of the programs is used in SQL query statements and stored in database management systems. This allows users to access directly to the database. The reason is that the input data is open and public and it is also open to other objects. Some objects have bad intentions, so "cleaning" the text input is a requirement that appears mostly in web services. The current methods of applying symbol execution on string code are divided into two groups: based on automata [5, 53] and based on Bitvector [7]. 3.2. Bitvector and satisfiability modulo theories (SMT) 3.2.1. Satisfiability modulo theories (SMT) The study of this thesis uses the SMT Z3 constraint solver [70] constructed by Microsoft. This study also considers to the use of CVC [25] but realizes that it is slower than Z3 because of using the SMT-lib2 standard for the bitvector constraint. The thesis also applies alternative symbols for addressing by byte instead of by bit. For example, a [0: 7] = 01100010 will be replaced with a [0] = 'b'. Constraint resolving by bitvector is beyond the scope of this study. 3.2.2. String constraint solving based on BitVector Consider the example where the edge (contains, Sa, Sb) corresponds to the constraint Sa.contains(Sb), it is provided that the current estimate for the lengths of Sa and Sb is 5, 3 respectively, the constraint result for this edge is: (Sa[0]=Sb[0])˄(Sa[1]=Sb[1]) ˄ (Sa[2]=Sb[2]) ∨((Sa[1]=Sb[0]) ˄ (Sa[2]=Sb[1]) ˄ (Sa[3]=Sb[2]) ∨((Sa[2]=Sb[0])˄(Sa[3]=Sb[1])˄(Sa[4]=Sb[2]) If the platform of constraint solver supports the array type, it can be simplified to (Sa[0:2]=Sb)∨(Sa[1:3]=Sb) ∨(Sa[2:4]=Sb). The process of compiling the edges of the string graph into the BitVector constraints is given in the second column of Table 3.1
14 Table 3.1. Construction of constraint corresponding to string operators. Edge Automation formula Bitvector constraint (charAt,sa,n,x) Ma= Ma [n] {x} [*] Sa[n]=x (concat,sa,sb,sx) Ma= Ma substring(Mx,0,la) (sa=sx[la:0]) (sb=sx[lx:la]) Mb=Mb subString(Mx,lb, ) Mx=Mx (Ma Mb) (indexOf, sa,sb,x) Ma= Ma ([x] Mb) Mb [*] (sa[x+lb:x]=sb) Mb=Mb substrings(Ma,x,x+lb) sa[i+lb:i] sb, i,0 i
15 10: return(false,pc,sg); 11: else if(Mi is modified) 12: W=W {all edges connected to si}; 13 return checkNegativeEdges(pc;sg;(M1,M2..)); Figure 3.1. String constraint solving based on automata For negative constraints such as !equals, !constain, !startsWith and !endsWith, assumption that M1 and M2 are two automata corresponding to two string variables s1,s2 that have to solve the constraint s1.!equals(s2). Then, there are three following cases: Case 1: Both automata guess only one word and then this constraint is satisfied when these two words are not equal. Case 2: There is only one automation that can guess a word. Then, we need to remove that word from the other automation. In this case, the constraint will also be satisfied. Case 3: In all other cases, the constraints are satisfied and no automation must be changed. The main part of the algorithm in Figure 3.6 describes formally the steps used for finding the solutions to the negative constraints. CheckNegativeEdges(pathcondition pc, StringGraph sg, (M1,M2..)) 1: i=1; M‟1=M1; M‟2=M2;… M‟k=Mk; 2: while(1≤i
16 Step 6: Constraint solving progress ends if a satisfactory string value is found or within a limited time, no new integer value can be found, and constraint solving progress on the automata and bitvector has no matching value. . 3.4.1 String constraint modeling using graph The process of string constraint modeling using the graph is shown in the diagram below: Figure 3.3. String constraint modeling using the graph diagram The process is as follows: In both cases above, the results are evaluated by using a set of suitable assessments for each of the mentioned cases. The evaluator is used to compare the constraint solvers. 3.4.2 Discovering more string constraints in string data For a symbolic execution model, in addition to a string variable, all variables include integer variables, which are assigned run-time values collected in dynamic symbol execution. This study is not modeling for mixed constraint solving (constraint solving for mixed string and integer data). However, in string constraint modeling, the execution progress extracts integer constraints on the length of the strings. This helps to find a satisfied length assignment. Thus, integer constraints are arised on string data. Most modern constraint solvers for integer data are efficient and accurate. Hence, generating integer constraints is not interfere with execution. 3.5. Experiment and result evaluation All experiments are conducted on a Windows 10 system with 8 GB RAM, using the open source tool JavaPathFinder downloaded at http://javapathfinder.sourceforge.net/ và phần mềm Eclipse Java EE IDE
17 Implementation SOLVE(PathCondition pc) 1: StringGraph sg 2: (pc,sg)←BuildStringGraph(pc); 3: boolean sat←false; 4: while (!sat˄!timeout) 5: (sat,pc,sg) ←IntegerSolver(pc,sg) 6: if(sat) 7: (sat,pc,sg) ←StringSolver(pc,sg) 8: if( pc unchanged): break; 9: return sat BuildStringGraph(PathCondition pc) 11: StringGraph sg:= ∅ 12: for( String or mixed contraint c pc) 13: sg:=sg Hyperedge(c) 14: return Preprocess(pc,sg) Figure 3.4: String constraint solving algorithm The algorithm follows two these main steps Step 1: JPF constraint solver will be used for types of constraints on integer, real number, boolean, v..v.. Step 2: If Step 1 perform successfully, it will provide some specific values. These values will be loaded to the string graph (the associated values in the mixed constraint) and a string constraint solver is called to solve them. 1: public static String s = "http://www./BigWeb"; 2: private static boolean IsBigWebQuery(String str) { 3: // (1) Check if after the last „/‟ there is //exist"BigWeb" 4: int lastSlash = str.lastIndexOf('/'); 5: if (lastSlash < 0) { 6: return false; 7: } 8: String rest = str.substring(lastSlash + 1); 9: if (!rest.contains("BigWeb")) { 10: return false; 11: } 12: // (2) Check if string s begins with "http://" 13: if (!str.startsWith("http://")) { 14: return false; 15: } //(3) Get a middle string of http://, the last //character of "/" and if it starts by “www”. //Remove 16: them 17: 18: String t = 19: str.substring("http://".length(), lastSlash); 20: if (t.startsWith("www.")) {
18 21: 22: t = t.substring("www.".length()); 23: } 24: //4) Check if after removing “www”, the string //still contains "other.com" or "google.com" 25: if (!(t.equals("other.com")) && !(t.equals("google.com"))) { 26: return false; 27: } 28: return true; 29: } 30: public static void doTest() { 31: IsBigWebQuery(s); 32: } 33: 34: 35: 36: 37: 38: Figure 3.5. Testing Java program