## MPIR Development Projects

Here we provide a list of projects for future development of MPIR resulting from brainstorming sessions of the current developers. We're particularly interested in identifying individuals who would like to join our team and take on one of the projects. Some of the projects are already assigned to individuals. If you'd like to help, please contact us on our development list.

We are also pleased to hear from individuals and organisations who wish to fund specific projects from our list or suggest additional projects. Requests from projects and organisations which use MPIR will be given careful consideration, whether open source, commercial or other. In cases where funding is offered for a specific project, this is tied to an individual who will be contracted to complete the project. Note that commercially funded improvements to the MPIR codebase must be made available to the MPIR project and the public as open source. In general we discourage the use of non-disclosure agreements or embargos which may compromise our ultimate responsibility to the open source community to make our code publicly available in our fully public development repository and in our public releases. Having said that, information which is not germane to that process will be treated with confidence.

- Speed up large integer multiplication by rewriting the FFT. The
FLINT project has a very fast FFT which is up to twice as fast as the
existing FFT in MPIR. This project involves the following steps:

a) Implement the lowest level routines used by the FLINT FFT in assembly language for important architectures (this will give further gains over the factor of two that the FLINT FFT already offers). Assembly language for relatively few routines would make a significant difference to overall performance.

b) Switch over to GMP's memory manager (the FLINT FFT currently uses the FLINT memory manager which is not suitable).

c) Switch the GMP multiplication functions over to use the new FFT code. This involves some reasonably straightforward build system work. - Speed up multiplication base case by implementing a small FFT and fixed size Karatsuba code in assembly language. A general purpose FFT is not efficient below a certain size of integer. A similar thing applies for Karatsuba multiplication. By implementing special cases of these algorithms a dramatic speedup can be obtained over certain size ranges.
- Improve truncated products. If one wishes to multiply two integers A and B and only return the top or bottom half of the result, this is known as a truncated product. This can in general be done faster than a full product followed by a truncation. Such functions have numerous important applications in other algorithms such as division by Newton iteration, truncated polynomial multiplication (power series) and others. Some algorithms can use any form of partial product eg the remainder part of the Barrett algorithm, so we want some sort of wrapper which hides this detail (ie the thresholds between the different partial products).
- Unbalanced operands. Karatsuba and Toom-k algorithms can be adapted to cases where one is multiplying integers of different length. Further optimisation is possible in MPIR by implementing unbalanced Toom 7 routines. The FLINT FFT already deals with unbalanced operands optimally in the FFT range, so this only needs to be done for the Karatsuba and Toom algorithms assuming the FLINT FFT is already ported.
- Montgomery multiplication. If integers are converted to Montgomery format, multiplications of those numbers in Montgomery format can be done faster. This is not generally better for doing single multiplications, as the numbers must also be converted in and out of Montgomery format, which takes some time. However, for multiple operations where the cost of conversion is amortised, there can be a significant performance improvement. MPIR currently uses Montgomery format internally for some algorithms, but this functionality is not exposed to the user.
- Expand the mpn layer by adding new functions. Currently the mpn layer has few functions, with an inconsistent interface. This makes the functions difficult to use in practice as they do not provide general enough functionality. For example some of the functions only work for odd inputs, source operands are destroyed, zero inputs are not accepted, etc. Furthermore a number of basic functions available at the mpz level are not present at the mpn level. Numerous useful mpn functions are also implemented internally in MPIR but not exposed to the user. As many projects have used the mpn layer to great effect, e.g. Pari, NTL, FLINT (it's faster for many applications because this layer lacks the overheads associated with the mpz layer which does automatic memory allocation and deals with signed coefficients), it is logical to extend this layer to provide a complete interface and remove the currently annoying conditions on some of the existing functions and remove inconsistencies in the interface.
- Develop a basic parallel module in MPIR for automatically taking advantage of multicore processors. This would involve writing multicore integer multiplication routines for each of the existing integer multiplication routines, using the new OpenMP pragmas available in gcc 4.3.0 and later. Almost all other functions in MPIR rely on integer multiplication routines, so parallelising that is sufficient. It is expected that a build time option will specify how many cores are available. Parallelising the FFT is excluded from this topic (too much work for a single job) and is listed below as a separate project.
- Implement a helper function module. Currently many of the mpn functions rely on special functions for single limbs, 2x2 matrices, longlong's, etc. These should be abstracted out into a helper module so there is less code duplication.
- Port to the Cell processor (already funded - grant of Bill Hart).
- Proof of principle code for a port of MPIR to GPUs, i.e. to the CUDA language and/or NVIDIA GPU assembly language (we now have access to suitable hardware).
- Improved assembly support for the new Sparc processors, including the new Fujitsu processors which provide fast addmul instructions.
- Parallelise the FFT (ostensibly already funded by Bill Hart's grant) - proof of principle code written for FLINT some years ago.
- Multimodular integer multiplication routine. This allows for multiplication of integers much larger than the size of available ram. It provides an enormous improvement over a Schoenhage-Strassen FFT in these cases. This routine will be essential in parallelising integer multiplication, but is a separate project as no such routines exist in MPIR. Such code has been included in FLINT, but will need porting to MPIR in much the same way as the FFT code.
- Scalar operations (i.e. multiplication and division by a constant). Multiplying or dividing a large number of integers by the same integer can usually be done faster than doing each of the separate multiplications in turn in a loop. The provision of scalar or array operations in MPIR would be useful for polynomial arithmetic, linear algebra and certain other number theoretical routines. MPIR should offer a module for performing scalar operations, with a consistent interface. To some extent the limb_invert code in MPIR already does this, but the interface is not public and there are other schemes which are faster on certain architectures.
- Improved Itanium assembly support (already funded) - will be optimised by Jason Martin.
- Improved MIPS assembly support.
- Shifting modulo Mersenne and Fermat numbers. Rings Z/pZ for p a Mersenne of Fermat number are particularly efficient to work in on a computer and have many applications.
- Further K8 assembler functions: mullow, mulhigh, all the linear divisions, schoolbook divison basecase.
- Partial products for unbalanced operands.
- Reimplement the GCD algorithms from Moller's paper. We have Moller's patches which work fine, but much cleaning up of the code is necesary.
- The 2-adic and real nth-root algorithms described in Berstein's paper. The algorithms decribed there are asymptotically faster than what is in MPIR and algorithms such as perfect power detection would be dramatically sped up by working on this.
- "Super un-rolled" versions of mpn_add, sub, com, shift, copy for fft sized operands , with no consideration for small sizes. Say call it mpn_large_add_n etc. Could even require sizes to be multiplies of 32 limbs or whatever.
- A p-adic arithmetic module. Lazy/relaxed p-adics. This would not only be infinitely useful, but p-adic arithmetic basically *is* large integer arithmetic.
- A module similar to the FLINT F_mpz module with fast code for integers which fit into a single limb, but automatically switching to something equivalent to an mpz_t for multiprecision. Big cache benefits, and speed benefits for small integers arise from this.