Statistical Machine Learning and Natural Language Processing of Programming Language Text (Microsoft PhD Scholarship)
Dr C Sutton
This project will apply the advanced statistical techniques from natural language processing to a completely different and new textual domain: programming language text. Think about how you program when you are using a new library or new environment for the first time. You "program by search engine", i.e., you search for examples of people who have used the same library, and you copy chunks of code from them. The goal of this project is to systemize this process, and apply it at a large scale.
We have collected a corpus of 1.5 billion lines of source code from 8000 software projects, and we want to find syntactic patterns that recur across projects. These can then be presented to a programmer as she is writing code, providing an autocomplete functionality that can suggest entire function bodies. Statistical techniques involved include language modeling, data mining, and Bayesian nonparametrics. This also raises some deep and interesting questions in software engineering: i.e., Why do syntactic patterns occur in professionally written software when they could be refactored away?
The project is suitable for a student with a top MSc or first-class bachelor's degree in computer science, statistics, physics, or a related numerate discipline. Previous coursework or experience in statistics, machine learning, or statistical natural language processing is desirable, although we do not expect students to have all three of these. Because of the scale of the data set involved, a strong programming background will be very useful for this project.
This is an opportunity to join a world-leading research group in machine learning. The Research Programme in Machine Learning is hosted by the Institute for Adaptive and Neural Computation (ANC), a research group of the School of Informatics, University of Edinburgh. According to the 2008 Research Assessment Exercise (RAE), the School delivers more world leading (4*) research than all other RAE institutions in the computer science category, and also delivers more internationally excellent or world leading (3* and 4*) research. ANC is a world leader in Machine Learning, with 6 Academic Teaching Staff specialising in developing machine learning methods (Chris Bishop, Chris Williams, Amos Storkey, Charles Sutton, Guido
Sanguinetti and Iain Murray).
For more information about the supervisor and the machine learning group at Edinburgh, see the supervisor's Web page: http://homepages.inf.ed.ac.uk/csutton/
For informal enquiries about the studentship, please contact csutton@inf.ed.ac.uk(mailto:csutton@inf.ed.ac.uk), copying in the PhD Secretary anc-phdsec@anc.ed.ac.uk (mailto:anc-phdsec@anc.ed.ac.uk).
Formal application must be through the School's normal PhD application process: http://www.ed.ac.uk/schools-departments/informatics/postgraduate/apply Select the Informatics: Institute for Adaptive and Neural Computation research area.
For full consideration, please apply by January 13. However, we encourage students to apply before 16 December 2011, which is the main application deadline for the School of Informatics. All applications that arrive by January 13 will receive full consideration for this studentship, but students who apply before 16 Dec will also receive full consideration for other potential funding sources in the School of Informatics. This is especially important for overseas applicants.
The Microsoft Scholarship consists of an annual bursary up to a maximum of three years. This is a fully funded studentship for UK and EU students. We welcome overseas applicants, and can provide funding for EU fees and maintenance for overseas students. The remaining fees component will need to come from another source. Overseas applicants are advised to apply before the standard informatics deadlines and apply for other scholarships. See
http://www.ed.ac.uk/schools-departments/informatics/postgraduate/fees and http://www.ed.ac.uk/schools-departments/informatics/postgraduate/apply/keydatesresearchappns for further information.
We have collected a corpus of 1.5 billion lines of source code from 8000 software projects, and we want to find syntactic patterns that recur across projects. These can then be presented to a programmer as she is writing code, providing an autocomplete functionality that can suggest entire function bodies. Statistical techniques involved include language modeling, data mining, and Bayesian nonparametrics. This also raises some deep and interesting questions in software engineering: i.e., Why do syntactic patterns occur in professionally written software when they could be refactored away?
The project is suitable for a student with a top MSc or first-class bachelor's degree in computer science, statistics, physics, or a related numerate discipline. Previous coursework or experience in statistics, machine learning, or statistical natural language processing is desirable, although we do not expect students to have all three of these. Because of the scale of the data set involved, a strong programming background will be very useful for this project.
This is an opportunity to join a world-leading research group in machine learning. The Research Programme in Machine Learning is hosted by the Institute for Adaptive and Neural Computation (ANC), a research group of the School of Informatics, University of Edinburgh. According to the 2008 Research Assessment Exercise (RAE), the School delivers more world leading (4*) research than all other RAE institutions in the computer science category, and also delivers more internationally excellent or world leading (3* and 4*) research. ANC is a world leader in Machine Learning, with 6 Academic Teaching Staff specialising in developing machine learning methods (Chris Bishop, Chris Williams, Amos Storkey, Charles Sutton, Guido
Sanguinetti and Iain Murray).
For more information about the supervisor and the machine learning group at Edinburgh, see the supervisor's Web page: http://homepages.inf.ed.ac.uk/csutton/
For informal enquiries about the studentship, please contact csutton@inf.ed.ac.uk(mailto:csutton@inf.ed.ac.uk), copying in the PhD Secretary anc-phdsec@anc.ed.ac.uk (mailto:anc-phdsec@anc.ed.ac.uk).
Formal application must be through the School's normal PhD application process: http://www.ed.ac.uk/schools-departments/informatics/postgraduate/apply Select the Informatics: Institute for Adaptive and Neural Computation research area.
For full consideration, please apply by January 13. However, we encourage students to apply before 16 December 2011, which is the main application deadline for the School of Informatics. All applications that arrive by January 13 will receive full consideration for this studentship, but students who apply before 16 Dec will also receive full consideration for other potential funding sources in the School of Informatics. This is especially important for overseas applicants.
Funding Notes:
The Microsoft Scholarship consists of an annual bursary up to a maximum of three years. This is a fully funded studentship for UK and EU students. We welcome overseas applicants, and can provide funding for EU fees and maintenance for overseas students. The remaining fees component will need to come from another source. Overseas applicants are advised to apply before the standard informatics deadlines and apply for other scholarships. See
http://www.ed.ac.uk/schools-departments/informatics/postgraduate/fees and http://www.ed.ac.uk/schools-departments/informatics/postgraduate/apply/keydatesresearchappns for further information.
No comments:
Post a Comment