|
![]() |
![]()
This page contains information as well as instructions concerning the use of the HNC.
The only system requirement is an Internet connection and a Web Browser (preferably Microsoft Internet Explorer or Mozilla Firefox).
1. Private Persons
(1 user) - for research purposes only
1.1 Standard fee
1.2 Student fee
2. Organisations - for research purposes only
2.1 1-5 users
2.2 6-10 users
2.3 11-30 users
2.4 more than 31 users
3. Companies - for research/commercial use
3.1 1-5 users
3.2 6-10 users
3.3 11-30 users
3.4 more than 31 users
For further information concerning subscriptions (process, costs etc.), please contact hnc@ilsp.gr.


4. You can define the maximum number of sentences that will be returned by the system as well as the maximum number of sentences per page.
5. You can use wildcards to replace one character in a word.
To replace only one character,
you can use the
"_" (underscore) wildcard, while to replace more than one characters, you can use the "%" (per cent) wildcard.
Examples:
- ακαδημαϊκο_ (all words beginning with "ακαδημαϊκο" plus another character)
- ακαδη% (all words beginning with "ακαδη"),
- %μαϊκός (all words ending with "μαϊκός"),
- %δημα% (all words containing "δημα").
6. For every parameter, either lemma or word, you can use the logical operators OR (|) and NOT (|^).
Examples:
- αυγό|αβγό (there will be found sentences with any of these two words)
- πέν_|^πένα|^πέντ (there will be found sentences with 4 letters, the first three of which will be "πεν" but the words "πένα" and "πέντ" will be excluded)

![]()
The ILSP Corpus has been developed by the Institute of Language and Speech Processing. It currently contains about 47.000.000 words, while it is constantly being updated. All texts have been selected, so as to present a realistic picture of modern language use.
Written language
The ILSP Corpus contains samples of written language exclusively. Oral samples have not been incorporated in this version of Corpus.
Modern Greek language
Texts in the ILSP Corpus represent modern Greek language use most of them having been written after 1990. Texts
written in highly idiomatic language have been excluded from the corpus. Most texts have been selected based on their high readability (high circulation
newspapers, best-selling books etc.).
Variety of sources
In order to include different types of language, texts from several media, belonging to different genres and dealing with various topics have been
selected.
These texts have been given to ILSP for this purpose by the people who hold the copyright and are available for research purposes only.
For all HNC texts users can have access to the following information:
Bibliographic data
All the information that is necessary for identifying each text, such as its title, author, publisher, translator (in case the text is a translation),
date of publication.
Classification data
All the information that is necessary for classifying each text into specific categories, based on its
a) medium,
b) genre and
c) topic.
Alongside these, there are further sub-categories (detailed genre, detailed topic), which are “open” and are gradually enriched as new texts are included
in the HNC.
| Book | 9,41% |
| Internet | 0,32% |
| Newspaper | 61,29% |
| Magazine | 5,89% |
| Miscellaneous | 23,08% |
Classification according to genre
HNC texts are classified according to genre into the following categories:
| GENRE | DESCRIPTION | EXAMPLE |
|---|---|---|
| NON-fiction | biography, including obituaries and autobiographies; sermons; school and student essays; etc. | «Μάης 36: Αναμνήσεις ενός πρωταγωνιστή» |
| FEAture | article in newspaper etc. which does not belong to INF or another, more specific genre; reviews, radio/TV magazines, etc. | «Υπολογιστές στην εκπαίδευση: πώς και γιατί» |
| ADVertising | various advertising texts, leaflets, spots etc. | «Το Ίδρυμα Ελληνικού Πολιτισμού εξορμά σε Αμερική και Ευρώπη» |
| OFFicial | laws, government circulars, official announcements, business correspondence | «Σύνταγμα της Ελλάδας» |
| PRIvate | various private texts, such as diaries and private letters | «Μονόλογος οργής και απόγνωσης» |
| FICtion | fiction and comic strips, entertainment, children’s and youth pages, jokes, games, drama film manuscripts, poems, song lyrics etc. | «Η μητέρα του σκύλου» |
| INFormation | news articles, folders and leaflets from the authorities, posters, signs etc. | «Ταχύπλοα: Διασκέδαση με κανόνες» |
| DIScussion | discussion, debate and conversation, including interviews, parliamentary speeches, letters to the editor etc. | «Η ιστορική συνέντευξη στο ABC» |
| N/A | non-applicable / mixed / unknown / unidentified / miscellaneous |
Classification according to topic
HNC texts are classified according to genre into the following categories:
| TOPIC | DESCRIPTION | EXAMPLE |
|---|---|---|
| LEIsure | sport, television, food, car, motorbike, shopping, home, astrology, fashion etc. | «Μπράβο Σπόρτινγκ!» |
| GEOgraphy | geography and travel, anthropology, folklore, cities etc. | «Οι παγίδες στα λιμάνια του Αιγαίου» |
| SCIence | science and technology, including mathematics, environment, space etc. | «Η Ανθρακική Πλατφόρμα Παρνασσού κατά το ανώτερο Ιουρασικό-κατώτερο Κρητιδικό: Στρωματογραφική διάρθρωση και Παλαιογεωγραφική εξέλιξη» |
| BUSiness | business and economy including advertising | «Πονοκέφαλος ύψους 1,5 τρισ.» |
| HIStory | history, archaeology, history of art, biographies etc. | «Ένα ταξίδι στην ιστορία που καταξιώνει το μύθο» |
| SOCiety | politics, sociology, law, defence, European Union etc. | «Διαλύεται 1 στους 3 γάμους στην Ε.Ε.» |
| HUManities | HUManities humanities, literature, philosophy, religion, fine arts, education etc. | «Αυτός που έκανε το κόμικς τέχνη» |
| HEAlth | health, medicine, psychology etc. | «Έμφραγμα: Μεγάλος κίνδυνος οι μικρές βλάβες» |
| N/A | non-applicable / mixed / unknown / unidentified / miscellaneous | «Διηγήσεις παραφυσικών φαινομένων» |
|
|
specific word forms: by entering the word “παίζω” they can get every sentence containing this word form, |
|
|
lemmas (this refers to the basic form of each word, as it usually appears in dictionaries, which contains every inflected form for each word): by entering the lemma «παίζω» they can get every sentence containing every inflected form of the lemma “παίζω” that can be found in HNC texts, such as «παίζει», «παίξω», «παίζοντας», |
|
|
parts of speech (as well as morphological features): by entering «noun» they can get every sentence containing a noun, and |
|
|
combinations of the three (e.g. word-lemma, word-part of speech, word-lemma-part of speech). |
![]()
Publishing companies / Organizations
Γκοβόστης Εκδοτική Α.Ε.
Εκδόσεις GUTENBERG
Εκδόσεις Αθανασόπουλου - Παπαδάμη
Εκδόσεις Γαρταγάνη
Εκδόσεις Καστανιώτη
Εκδόσεις Κριτική
Εκδόσεις ΝΕΑ ΣΥΝΟΡΑ - Α. Α. Λιβάνη
Εκδόσεις Νεφέλη
Ιατρικές Εκδόσεις Λίτσας
Χ. Κ. Τεγόπουλος Εκδόσεις Α.Ε. (ΕΛΕΥΘΕΡΟΤΥΠΙΑ)
Δημοσιογραφικός Οργανισμός Λαμπράκη (ΤΟ ΒΗΜΑ, RAM)
Εκδόσεις Νέων Τεχνολογιών
Εκδοτική ΑΛΦΑ ΕΠΕ
Εκδόσεις Ψυχογιός
Βιβλιοσυνεργατική
MOTOTECH ΑΒΕΕ. (ΜΟΤΟ)
Mουσικοεκδοτική ABEE 2000 (Δίφωνο)
ΛΟΓΙΣΤΗΣ
Εκπαιδευτήρια Δούκα
NEMECIS
Διεθνής Αμνηστία
Εκδόσεις Δεσπ. Μαυρομάτη
Κέντρο Διεθνούς και Ευρωπαϊκού Οικονομικού Δικαίου-Δικηγορικός Σύλλογος Θεσσαλονίκης
ΚΥΒΕΡΝΟΓΡΑΦΟΙ
Μελέτες για την Ελληνική Γλώσσα
ΠΑΡΑΤΗΡΗΤΗΣ
ΤΟΠΙΚΑ (Εταιρεία Μελέτης Επιστημών του Ανθρώπου)
Εκδοτικός Οίκος ΣΑΚΚΟΥΛΑ ΟΕ, Θεσσαλονίκη
ΘΕΤΙΣ AUTHENTICS ΕΠΕ
Εκδόσεις Εξάντας
Καθημερινή
ΠΑΕ Καλλιθέα
Public foundations
Βουλή των Ελλήνων
Γραφείο Πρωθυπουργού
Κέντρο Βυζαντινών Ερευνών
Κέντρο Έρευνας ΑΣΟΕΕ
Υπηρεσία Δημοσιευμάτων Α.Π.Θ.
Υπουργείο Δικαιοσύνης
Υπουργείο Εμπορίου (νυν Υπουργείο Ανάπτυξης)
Υπουργείο Εξωτερικών
Εθνικό Μετσόβιο Πολυτεχνείο
Ινστιτούτο Γεωλογικών και Μεταλλευτικών Ερευνών (ΙΓΜΕ)
Κέντρο Έρευνας για Θέματα Ισότητας, Κ.Ε.Θ.Ι.
Υπουργείο Γεωργίας
Υπουργείο Εθνικής Άμυνας
Υπουργείο Τύπου & Μ.Μ.Ε.
Γενική Γραμματεία Απόδημου Ελληνισμού
Εθνικό Ινστιτούτο Εργασίας
Παν/μιο Αθηνών
Εθνικό Κέντρο Τεκμηρίωσης