### many interesting solution strategies

Aims: In this project, you will select an NP-hard problem, with several applications and many interesting solution strategies, implement some of these algorithms, and report on how they work in practice.

The recommended problems are either the graph problem Vertex Cover or the SAT problem which is the original NP-hard problem. The Vertex Cover problem has a structure that is simple to understand, yet many different approaches work for solving it (see below) and there are many interesting algorithms. There are also good benchmark repositories with example problems to test your code against. The SAT problem comes from logic, and simply asks whether a given propositional formula has a satisfying assignment. For SAT there is a very powerful yet simple family of algorithms that are successfully used in industry for solving many applied problems such as program verification. Like Vertex Cover there are also good benchmark repositories with example problems to test your code against. Students are expected to choose from two options in this project: option 1 (heuristics and approximations) and Option 2 (SAT-solving).

Background: NP-hard problems are an important challenge to computer science, both in theory and in practice. You may have heard this term from the important “P vs NP” question (or from some of your courses).

These problems have the important feature that they are “puzzle-like”: you can easily recognise or verify a solution to these problems if you are given one, but it can be very challenging to find a solution given an instance of such a problem, or decide whether there is no solution.

In particular, it is quite strongly believed that there are no algorithms that can efficiently find solutions (if they exist) for all instances of such a problem, in time that is “worst-case tractable” (e.g., in polynomial worst-case time).

Problems like this are ubiquitous in reality – verify that a proposed circuit design has no bugs in it; plan routes for your delivery vehicles to deliver some goods with a minimum number of delivery runs; schedule speakers at an event so that you never have two talks at the same time about the same topic… In all these cases, you may run into situations where there is no algorithm that is guaranteed to always work efficiently, but you still have to find a solution to the problem somehow. Many approaches for this have been proposed, both in theory and in practice.

Some strategies for solving NP-complete problems are using heuristics; approximation algorithms; tractable special cases; algorithms that are slow in the worst case but fast on some ‘non-worst-case’ inputs.

An example of an algorithm for SAT that is known to be slow for some worst-case inputs but is actually very fast for many typical is the DPLL algorithm and its extensions. DPLL is the name of a very simple algorithm for SAT introduced in 1962, which basically amounts to (a partial) exhaustive search of the space of solutions to a given SAT instance. Almost all successful industrial level SAT-solvers nowadays use variants of DPLL equipped with clever heuristics to improve their performance. Though theoretically, DPLL is known to require huge (namely, exponential) run-time, striking experience shows that current SAT-solvers are very efficient on real-world instances. Hence the ubiquitous successful use of SAT-solving in the industry.

Prerequisites: CS2860 Algorithms and Complexity and a general interest in algorithms. Further algorithms courses are helpful, but not required. Good programming skills. Basic knowledge in algorithms and run-time analysis is a plus.

Early Deliverables

1. You are to choose either Option 1 (heuristics and approximations) or Option 2 (SAT-solving).
2. Report: A full formal description of the problem.
3. Report: P vs NP, NP-completeness, NP-hardness
4. Option 1: Report: Strategies for solving NP-complete problems, such as: heuristics; approximation algorithms; tractable special cases; algorithms that are slow in the worst case but fast on some ‘non-worst-case’ inputs.
5. Option 1: Report: Fast in theory, slow in practice. Merge sort and heap sort are sorting algorithms with excellent theoretical worst-case guarantees, but they may find themselves beaten in practice by quicksort, which has worse worst-case guarantees. Another example: linear programming is the basis of many planning systems, and there are algorithms for solving LPs in polynomial time, but these are not used in practice; instead, algorithms are used that have worst-case exponential-time behaviour. Report and comment on this phenomenon.
6. Option 1: Proof of concept program: A program that generates a random instance of Vertex Cover, shows it to the user, reads a suggested solution from the user, and verifies whether this solution is correct.
7. Option 1: Proof of concept program: A simple greedy heuristic
8. Option 1: Proof of concept program: The simple 2-approximation algorithm
9. Option 2: Encoding and data structures: write a program that reads a CNF from a file in the DIMACS format and a truth-assignment (generated randomly), and outputs true or false depending if the assignment satisfies the input CNF or not.
10. Option 2: Investigate the DPLL algorithm and write a report that shows manually worked examples of its operation. The report should include the pseudo code of the DPLL algorithm, providing also diagrammatic examples of its run on simple example input CNF formulas. Explain the algorithm and its running time.
11. Option 2: Write procedures that will serve as building blocks for the e cental SAT solver. Specifically, write a procedure that gets as an input a CNF formula and a *partial* assignment to its variable and outputs the new CNF formula resulting from assigning to the original formula. This should include simplifying (getting rid) of satisfied clauses, if they exist, and should output False if the CNF formula becomes false under the given assignment.

Final Deliverables

1. Option 1: A program with a GUI that can show an instance on the screen and illustrate a solution for it. Your program should be able to read a problem instance from disk, in a format taken from an existing ‘benchmark library’, and it should also be able to generate an instance of the problem at random with specified parameters.
2. Option 1: The program should offer at least three different solution strategies, such as a heuristic, an approximation algorithm, and a backtracking search ‘branching algorithm’.
3. Option 1: The report should contain a section on the outcomes of experiments done between the different algorithms for instances of some benchmark library, with conclusions drawn about which algorithms seem suitable in different situations.
4. Option 2: Write a program which solves SAT using the DPLL algorithm. It is advisable to write the code in a language like C++ or C. But it is also possible to implement it in java or python, as the program is intended for educational purposes.
5. Option 2: Demonstrate the run of the SAT-solver on substantial instances (taken from known benchmarks).
6. Option 2: Investigate the conflict-driven clause learning heuristic (CDCL). Show in your report manually worked examples of DPLL with the clause learning heuristic as well a pseudocode for CDCL.
7. Option 2: Extend your DPLL solver to uses some kind of very simple clause learning heuristic, and explain your choice, showing how it improves the run time on some of the benchmarks.
8. Final report should contain a section on the outcomes of experiments done between the different algorithms/SAT-solvers for instances of some benchmark library, with conclusions drawn about which algorithms seem suitable in different situations.

Suggested Extensions

• A more careful theory report, including notions of ‘problem reductions’ for NP-complete problems. For Vertex Cover, this should at least cover the two problems Independent Set and Max Clique.
• A larger range of algorithms, or algorithms using more advanced concepts. One advanced approach for this problem would be the use of an LP relaxation. Another advanced approach would be to use methods from parameterized complexity (you may ask for references here).
• Careful optimisation of one particular algorithm for the problem, for example a carefully tweaked branching algorithm, to make it as fast as possible across a range of instances.
• Investigations into ‘lower bounds’ for your problem, to test how far off each algorithm is from an optimal solution.
• For option 2: Extend the DPLL SAT-solver to include a fully functional conflict driven clause learning (CDCL). The full specification for this could be taken from the Handbook (see references below) or from the Wikipedia page. Apply these extensions to the SAT-solver and systematically compare the run-time improvement.
• For option 2: Extend the DPLL SAT-solver to include Restarts. The full specification for this could be taken from the Handbook (see references below) or from the Wikipedia page. Apply these extensions to the SAT-solver and systematically compare the run-time improvement.
• Investigate the notion of Satisfiability Modulo a Theory (SMT). Show how it extends SAT. Apply this to your tool for a chosen theory and systematically compare the run-time improvement on relevant instances.
• Investigate the theoretical foundation of SAT solving: the resolution refutation system and its provable limitations.