Introduction
The Prague Stringology Club has been founded at the Department of Computer Science and Engineering, Faculty of Electrical Engineering of Czech Technical University in Prague in 1996 as a research group working on the stringology. In 2009 the Prague Stringology Club moved to the Department of Theoretical Computer Science of new Faculty of Information Technology. The term stringology (for the first time used by Zvi Galil in 1984) denotes a science on algorithms on strings and sequences. It solves such problems like exact and approximate pattern matching, searching for repetitions in various texts,... etc. There are many areas that utilize the results of the stringology (information retrival, computer vision, computational biology, DNA processing,... etc.). The Prague Stringology Club uses finite automata approach to solve the problems in stringology. This theory has been developed and successfully used in the field of compiler construction and therefore it can be very useful in the field of stringology too.
Another topic of interest is the area of the compiler construction, both from the theoretical and the practical point of view. The compiler construction has a long tradition at the Department of Computer Science and Engineering. Recently, several theoretical results and practical commercial implementations of compilers have been created by members of Prague Stringology Club.
In 2005 the Prague Stringology Club extended the range of its research topics also to data compression. The members exploit rich experiences from stringology and apply them in data compression. It already resulted in several data compression papers and a joint project with a commercial company.
In 2008 some members of Prague Stringology Club started working on arbology, which is a new algorithmic discipline focusing on tree algorithms. Arbology solves problems such as tree pattern matching, tree indexing, finding repeats in trees, etc. For its algorithms, the arbology uses deterministic pushdown automaton as the basic model of computation.
Events
- Summer Stringmasters 2024 will be held on August 28, 2024
- Prague Stringology Conference 2024 (PSC2024): August 26-27, 2024
- Annual Symposium on Combinatorial Pattern Matching (CPM2022): June 27-29, 2022
- Prague Stringology Conference in 2001–2006, 2008–2021, 2023
- Prague Stringology Club Workshop in 1996–2000
- PSC is being indexed by Scopus
- PSC Proceedings 2005 (up to at least 2012) are (sparsely) indexed by WoS (search Topic: Prague AND Stringology).
- Summer Stringmasters 2019 was held on August 29–31, 2019
- Summer Stringmasters 2018 was held on August 29–31, 2018
- Summer Stringmasters 2017 was held on August 31–September 2, 2017
- Summer Stringmasters 2015 was held on August 27–29, 2015
- Summer Stringmasters 2013 was held on September 5–7, 2013 (photos)
- Conference on Implementation and Application of Automata 2007 (CIAA2007)
Contact
Projects
- 2019-2020: Efficient String Matching for Bioinformatics (Czech Science Foundation project No. GA19-20759S)
- Contact person: Jan Holub
- The aim of the project is to develop indexing for degenerate and elastic patterns and techniques working on specialized domains of bioinformatics like highly similar texts. Some on-line methods for elastic pattern matching will also be developed.
- 2013-2015: Text and Tree Structures Processing and Their Applications (Czech Science Foundation project No. GA13-03253S)
- Contact person: Jan Holub
- The project deals with four topics which are closely related: Arbology, Data Compression for natural languages, and selected topics of Stringology and Bioinformatics. In Arbology we research new indexing and pattern matching algorithms on trees. In Bioinformatics we work on problems of mapping millions of short reads to genomic sequences and their indexing. In Data Compression we focus on efficient algorithms for natural languages based on knowledge of the source language and on algorithms allowing fast compression and decompression as well as efficient search. In Stringology we work on 2D text indexing and on algorithms for identifying cribbed texts and source codes, which may be compressed.
- 2009-2011: String and Tree Analysis and Processing (Czech Science Foundation project No. GA201/09/0807)
- Contact person: Jan Holub
- The project further develops algorithms for finding nontrivial exact and approximate patterns (repeats, palindromes, seeds, covers) and problems from bioinformatics, musicology, data compression and other research fields. It also focuses on arbology, which deals with trees.
- 2006-2008: Text Processing and Analysis (Czech Science Foundation project No. GA201/06/1039)
- Contact person: Bořivoj Melichar
- Exact and approximate string matching,
- finding nontrivial exact and approximate patterns like repeats, palindromes, seeds, covers,
- problems from bioinformatics, musicology, data compression and other research fields.
- 2008-: ExCom (Extensible Compression Library)
- Contact person: Jan Holub
- The goal of the project is to build a gcc library of effifient implementations of various data compression algorithms and testbed for their comparison. It allows to test a new data compression method against other ones without searching for implementations. The second usage is for SW developers. They just use a selected methong and link the library.
- 2009-: The Prague Corpus
- Contact person: Jan Holub
- The project aims to maintain up-to-date corpus of most common filetypes of usual lengths. The main purpose of the corpus is to test data compression algorithms.
- Indexing Automata
- Contact person: Jan Holub
- The project aims an efficient implementation of indexing automata over a given text.
They allow to find a pattern in time linear to the length of pattern
regardless the length of the text.
- Data Compression in Memory Databases (Sitronics Center)
- Contact person: Jan Holub
- The project aims to design compression algorithm to be used in memory
databases which has very strong emphasis to the speed of compression and
decompression.
- 2002-2007: Verilog-AMS and MAST compilers (Lynguent)
- Contact person: Jan Janoušek
- This project deals with compilers from and to analog hardware
description languages Verilog-AMS and MAST. About 15 students
participating during the years.
- 2007–2009: IEC 61131 compliant structured text development and
run-time system (Energocentrum)
- Contact person: Jan Janoušek
- The goal of this project is to design and implement a complex
development environment and portable run-time environment for industrial
PLC controllers. The project comprises IDE tools for IEC 61131 sources; a
compiler, linker and virtual machine.
Related Links
Created by: Jan Holub
Last updated: Jun 28 2024