Joaquim Ferreira da Silva is Assistant Professor at the Computer Science Department at the Universidade Nova de Lisboa. He is also a member of the Centro de Informática e Tecnologias de Informação (CITI).

He had his PhD in the area of Text Mining, at the Universidade Nova de Lisboa, in 2004. His current research interests are:

. Text Mining

. Document Classification using Machine Learning

. Extraction of relevant elements in row text

. Other areas

. Sound Classification using Machine Learning

. Finding correlations between parameters of continuous data  

 

Publications   Teaching    Research Projects     Organization of Events   Graduation Activities  Talks          

 

Publications

PhD Thesis

Joaquim Francisco Ferreira da Silva. Unsupervised, language independent multiwords extraction, clustering, characterization, and classification of documents (Extracção de Unidades Textuais, Agrupamento, Caracterização e Classificação de Documentos). Phd thesis, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 3 February 2004. Written in Portuguese, supervised by Gabriel Pereira Lopes and João Tiago Mexia (FCT/UNL).

Articles    

Using Covariance as a Similarity Measure for Document Language Identification in Hard Contexts. Silva, Joaquim F.; Lopes, Gabriel P.. Pliska Studia Mathematica Bulgarica, Vol. 18, pag 341-360. 2007.

A statistical Approach for Multilingual Document Clustering and Topic Extraction in Hard Contexts. Silva, Joaquim F.; Lopes, Gabriel P.; Mexia, João T.; Coelho, C. Agra.. Pliska Studia Mathematica Bulgarica, Vol. 15, pag. 207-228. 2004.

Book Chapters

Ranking and extraction of Relevant Single Words in text. Extraction of Relevant Single Words in text. g and Extraction of Relevant Single Words in text. Ventura, João; Silva, Joaquim F.. Brain, Vision and AI. ISBN 978-95-7619-04-6, Cesare Rossi (eds). InTech , Education and Publishing (pub), pag. 265-284. 2008.

In Proceedings

Unsupervised Music Genre Classification with a Model-Based Approach. Barreira, Luis; Cavaco, Sofia; Silva, Joaquim. 15th Portuguese Conference on Artificial Intelligence (EPIA 2011). Lecture Notes in Computer Science. Springer Berlin Heidelberg ( Germany ), Vol 7026. Antunes, Luis; Pinto, H. Sofia (eds);Pages 268-281.

 

Text Categorization: An extensive comparison of classifiers, feature selection metrics and document representation. Peleja, Filipa; Silva, Joaquim, Lopes, Gabriel. Proceedings of the 15th Portuguese Conference in Arificial Intelligence, EPIA 2011, Lisbon, October, 2011. Luis Antunes, H. Sofia Pinto, Rui Prada, and Paulo Trigo. ISBN: 978-989-95618-4-7. Pages 660 to 674.

 

Mining Causality from Non-categorical Numerical Data. Silva, Joaquim; Lopes, Gabriel. Proceedings of the 2011 International Workshop on Behavior Informatics (BI 2011); the 15th Pacific-Asia Conference on Knowledge Discovery and Data Mining. To be published.

Towards Automatic Building of Document Keywords. Silva, Joaquim; Lopes, Gabriel. Proceedings of the Coling 2010. Aug 2010.  http://www.aclweb.org/anthology/C10-2132

A Document Descriptor Extractor Based on Relevant Expressions. Silva, Joaquim; Lopes, Gabriel.. 14th Portuguese Conference on Artificial Intelligence (EPIA’09), Aveiro, Portugal. Lecture Notes in Computer Science. Volume 5816. Berlin: Springer-Verlag, Seabra Lopes, L.; Lau, N.; Mariano, P.; Rocha, L.M. (eds).

 Efficient Multi-Word Expressions Extractor Using Sufix Arrays and Related Structures. Aires, José; Lopes José; Silva Joaquim. CIKM-2008 Workshop - ACM conference. October 2008, Napa Valley, California, USA.

New Techniques for Relevant Word Ranking and Extraction. Ventura, João; Silva, Joaquim F.. 13th Portuguese Conference on Artificial Intelligence (EPIA’07), Guimarães, Portugal. Lecture Notes in Computer Science. Volume 4874/2007. Berlin: Springer-Verlag, pp. 691 - 702.

Detection of Strange and Wrong Automatic Part-of-Speech Tagging. Rocio, Vitor; Silva, Joaquim; Lopes, Gabriel. 13th Portuguese Conference on Arti?cial Intelligence (EPIA’07), Guimarães, Portugal. Lecture Notes in Computer Science. Volume 4874/2007. Berlin: Springer-Verlag, pp. 683 - 690.

Language Identification in Documents, Including Unknown Languages: a Statistical Approach. Silva, Joaquim F.; Lopes, Gabriel; Reis, José; Mexia, João T..   In proceedings of the Text Mining and Applications (TeMA) workshop. 13th Portuguese Conference on Artificial Intelligence (EPIA’07), Guimarães, Portugal, pp. 824 - 835.
Identification of Document Language is Not yet a Completely Solved Problem. Silva, Joaquim F.; Lopes, Gabriel. In proceedings of CIMCA, International Conference on Computational Intelligence for Modeling, Control & Automation, Jointly with International Conference on Intelligent Agents Web Technologies & Internet Commerce. Computer Society, IEEE. 29 November - 1 December 2006. Sydney, Australia.

Identification of Document Language in Hard Context. Joaquim F.; Lopes, Gabriel In proceedings of ”New Directions in Multilingual Information Access ”SIGIR Workshop. Seattle, 6-11 Aug. 2006.

Cross-Lingual Classification of Function Words. Gamallo P., Da Silva J. Lopes G.P. (2005). 10th International Conference on Computer Aided Systems Theory, Eurocast, Las Palmas, Spain, February 2005, (92-95). ISBN: 84-689-0432-5.

A Divide-And-Conquer Approach to Learn Syntactic Categories. Gamallo P., Lopes G.P.,  Da Silva, F. (2004)  ICGI'04, Athens, Greece. Grammatical Inference: Algorithms and Applications, LNAI, vol. 3264, Springer Verlag,

Cluster Analysis of Named Entities. Z. Kozareva, J. F. da Silva, P. Gamallo and G. P. Lopes. 2004.  In Proceedings of International Intelligent Information Processing and Web Mining Conference May 17-20, 2004, Zakopane, Poland. In Lecture Notes in Artificial Intelligence LNCS/LNAI. Berlin: Springer-Verlag. 2004.

Cluster Analysis and Classification of Named Entities. J. F. da Silva, Z. Kozareva and G. P. Lopes. (2004). In Proceedings of 4th International Conference on Language Resources and Evaluation (LREC). May Lisbon, Portugal.

Extracting Named Entities. Joaquim Ferreira da Silva, Zornitsa Kozareva, Veska Noncheva, Gabriel Pereira Lopes. (2004). A Statistical Approach. In Proceedings of the Conférence de Traitement de Language Naturel (TALN 2004), Maroc, April 2004.

Automatic acquisition of word interaction patterns from corpora. Noncheva, V. and J.F. da Silva and G.P Lopes. (2003). EACL-03 Workshop on Language Modeling for Text Entry Methods, Budapest, April 14, 2003. Association for Computational Linguistics. pp. 25-32.

Document Clustering and Cluster Topic Extraction in Multilingual Corpora. Silva, J.F.; Mexia, J.; Coelho, C.A.; Lopes, J.G. P. 2001. In: Nick Cercone, T. Y. Lin, Xindong Wu (Eds.). Proceedings of the IEEE 2001 International Conference on Data Mining (ICDM’01), San Jose, California, 29 November -2 December, 2001. IEEE Computer Society. pp. 513-520.

Multilingual Document Clustering, Topic Extraction and Data Transformations; Silva, J.F.; Mexia, J.; Coelho, C. and Lopes, J.G.P. 2001. In: Pavel Brazdil and Al´ipio Jorge (Eds.). Progress in Artificial Intel ligence: Knowledge Extraction, Multi-agent Systems, Logic Programming and Constraint Solving, 10th Portuguese Conference on Artificial Intel ligence (EPIA’01), Porto, Portugal. Lecture Notes in Artificial Intelligence LNCS/LNAI 2258 . Berlin: Springer-Verlag, pp. 74 - 87.

Using LocalMaxs Algorithm for the Extraction of Contiguous and Non-contiguous Multiword Lexical Units. J.F. da Silva, Gael Dias, Sylvie Guilloré, José Gabriel P. Lopes. 1999 In: P. Barahona (ed.) Progress in Artificial Intelligence: 9th Portuguese Conference on AI, EPIA'99, Évora Portugal September 1999, Proceedings. Lecture Notes in Artificial Intelligence, Springer-Verlag, Vol. 1695, p. 113-132 (1999).

Extracting Multiword Terms from Document Collections. J.F. da Silva, and J.G.P.Lopes. 1999. In Proceedings of the VExTAL: Venezia per il Trattamento Automatico delle Lingue, November 22-24, 1999.

Relevant Expressions in Large Corpora. J.F. da Silva, J.G.P.Lopes, M. F. Xavier, and G. Vicente. 1999. In Anne Condamines, Cécile Fabre et Marie-Paule Péry-Woodley (eds.). Actes de l´atelier "Corpus et Traitement Automatique des Langues: Pour une réflexion méthodologique" (TALN'99) , Institut d´Etudes Scientifiques, Cargèse, Corse (France), July 12-17. Pp. 86---94. Published by ATALA.

A Local Maxima method and a Fair Dispersion Normalization for extracting multi-word units from corpora. J.F. da Silva, and J.G.P.Lopes. 1999.  In Proceedings of the Sixth Meeting on Mathematics of Language (MOL6), Orlando, Florida July 23-25, 1999. pp. 369-381.

Teaching

I have been involved in teaching the following courses (last 5 years):

Bases de Dados I (1º Ciclo - 2009/2010)

Machine Learning and Knowledge Extraction (3º Ciclo – 2009/2010)

Aprendizagem Automática e Data Mining (2º Ciclo – 2008/2009)

Programação em Lógica com Restrições (1º Ciclo – 2008/2009)

Machine Learning and Knowledge Extraction (3º Ciclo – 2008/2009)

Bases de Dados I (1º Ciclo - 2008/2009)

Introdução às Bases de Dados (1º Ciclo - 2008/2009)

Aprendizagem Automática e Data Mining (LEI – 2007/2008)

Lógica Computacional (LEI – 2007/2008)

Bases de Dados I (LEI - 2007/2008)

Introdução às Bases de Dados (LM - 2007/2008)

Introdução à Inteligência Artificial (LEI – 2006/2007)

Bases de Dados I (LEI - 2006/2007)

Métodos Quantitativos (LEI – 2006/2007)

Text and Data Mining (MEI – 2006/2007)

Bases de Dados e Data Warehousing ( LEI - 2005/2006)

Introdução aos Computadores e Programação (2005/2006)

Seminários de Informática (LEI – 2005/2006)

Bases de Dados I (LEI - 2005/2006)

Métodos Quantitativos (LEI – 2005/2006)

Text and Data Mining (MEI – 2005/2006)

 

Research Projects

 

Ongoing Ph. D. Thesis: A Problem Solving Environment for Parallel Extraction of Multiwords and Applications from Large Corpora

Advisors: Joaquim Ferreira da Silva and José C. Cunha

Student: Carlos Jorge Gonçalves

Scope of the thesis: Doctoral Program in Computer Science of FCT/UNL http://di.fct.unl.pt/ensino/doutoramento/  and this researched is pursued within the NOVA- LINCS  research center of FCT/UNL http://nova-lincs.di.fct.unl.pt/  

 

Abstract: Multi-word Relevant Expressions (REs) can be defined as sequences of words (ngrams) with strong semantic meaning, such as “ice melting”, “human rights abuses” and “Ministère des Affaires Étrangères”. They are very useful in Information Retrieval, Document Clustering or Classification and Indexing of Documents. The need of extracting REs in several languages led researchers to use statistical approaches rather than symbolic methods, since the former allow language-independence. Based on the assumption that REs have strong cohesion between their consecutive n-grams, the LocalMaxs algorithm is a language

independent approach that extracts REs. Although, apart from its good precision, this extractor is time-consuming, being inoperable for Big Data if implemented in a sequential manner. This

project aims to develop the first parallel and distributed version of this algorithm. Preliminary experiments have achieved almost linear speed up and size up when processing corpora up to 1 billion words, using up to 54 virtual machines in a public cloud platform.

 

Other projects:

These are the projects where I participated as member of team:

VIP-ACCESS - Ubiquitous Web Access for Visually Impaired People (Acesso Ubíquo à WEB para cegos). Duration: 2008-2010 Leader: Gaël Harry Dias. Univeristy of Porto, Portugal; New University of Lisbon, Portugal; MIT, USA; North Texas University, USA. Funding Agency: Fundação para a Ciência e a Tecnologia (Portugal) Reference: PTDC/PLP/72142/2006.

PATRAS - PAral lelism for Machine Learning of TRAnslationS. Ref: POSC/PLP/61520/2004 . Participants: Universidade Nova de Lisboa . Funding entity: FCT / MCTES. From May 2005 to November 2007.

WE-LEARN - Web Intelligent Portal for e-learning with Teacher-Student interaction mediated by the Machine. Ref: 4.1.3/CAPES/CPLP . Participants: Faculdade de Ciências da Universidade de Lisboa and Universidade Federal do Rio Grande do Sul.
Funding entity: GRICES (Gabinete de Relações Internacionais da Ciência e do Ensino Superior) .
From January 2004 to December 2005.

ASTROLABIUM - (Portuguese Researchers´s Mobility Portal and Network of Mobility Centres ). Ref: MOBI-CT-2003-003344. Participants: Universidade Nova de Lisboa, FCT / MCTES and GRICES. Funding entities: European Union. From June 2004 to June 2006.
.
Leonardo 1 – New Computer Technologies for access to Vast Arrays of Multilingual Information.
Coordinated by CITI (Centro de Informática e Tecnologias de Informação). Project type: ICI ( R&D in collaboration with international companies ). Funfing entity: European Commission - Research Directorate General (RTD). From March 2003 to July 2003.

TRADAUT-PT – Automatic Translation System from and into Portuguese for Public Administration. Coordinated by FCT/MCTES. Project type: PI (International basic research or R&D projects ). Participants: Fundação da Faculdade de Ciências e Tecnologia of Universidade Nova de Lisboa (FFCT/UNL); SYSTRAN; Centro de Linguística da Faculdade de Ciências Sociais e Humanas da Universidade Nova de Lisboa (CLUNL) and Instituto Camões (ICA). Funding entity: European Commission - Research Directorate General (RTD). From December 2000 to April 2003.

PGR - Acesso Selectivo aos Pareceres da Procuradoria Geral da República. PROGRAMA PRAXIS XXI. Ref. LO59-P31B-02/97. Participants: Heurística; CENTRIA do Departamento de Informática da FCT/UNL and Procuradoria Geral da República. From January 1998 to January 2001.

IGM - Acesso Selectivo aos relatórios do IGM sobre prospecção mineira de Pirites no Alentejo. Included in GEOMIST project. Participants: Instituto Geológico e Mineiro (IGM); Departamento de Informática of FCT/UNL. From October 1999 to March 1999.

DILEMMA - Logic engineering in primary care, shared care and oncology. Project included in AIM (Advanced Informatics in Medicine) A2005. From July 1992 to October 1994.

Organization of Events  

Organizing Committee Co-Chair @ TEMA - Third Text Mining and Applications Track, EPIA’09; 10 Oct 2011 to 13 Oct 2009. University of Lisboa

Organizing Committee Co-Chair @ TEMA - Third Text Mining and Applications Track, EPIA’09; 14 Oct 2009 to 15 Oct 2009. University of Aveiro

Organizing Committee Co-Chair @ TEMA - Second Text Mining and Applications Workshop.  EPIA’07; 4 Dec 2007 to 5 Dec 2007. Guimarães.

Organizing Committee Co-Chair @ TEMA - Second Text Mining and Applications Workshop. EPIA’05; 5 Dec 2005 to 6 Dec 2005. Covilhã.

 

Graduation Activities 

As supervisor:

Zornitsa Kozareva (a bulgarian student from Plovdiv University) for a Diploma thesis (degree of licenciate). The thesis title was "Automatic extraction and cluster analysis of Named Entities". Presented in July 2004 and classified as "Excelent".

José Vicente Pereira dos Reis for his MSc thesis titled Identificaçao Automática da Língua em Texto (Automatic Language Identification in Text). Presented in Faculdade de Ciências e Tecnologia - Universidade Nova de Lisboa, February 2007. Classified as "Muito Bom" (Very Good).

João Ventura for his MSc thesis titled Detecção Automática de Unidades Relevantes em Texto (Automatic Detection of Relevant Unigrams in Text). Presented in Faculdade de Ciências e Tecnologia - Universidade Nova de Lisboa, November 2008. Classified as "Muito Bom" (Very Good).

Talks

Identification of Document Language in Hard Contexts. Research Talk by Joaquim Silva and Gabriel Lopes. FCT/ Universidade Nova de Lisboa; 24th January 2007.

Language-independent Clustering of Highly Frequent Words into Morpho-syntactic Classes - a statistical approach. Research Talk by Gabriel Lopes and Joaquim Silva. Universidade Aberta, Lisbon, Primeiro Workshop de Estatística Matemática e Computação; 5th May 2005.


Utilização de Expressões Relevantes na Extracção de Tópicos e no Agrupamento de Documentos a partir de Corpora Multi-Língua. Research Talk by Joaquim Silva and Gabriel Lopes. FCT/ Universidade Nova de Lisboa; 2nd October 2002.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Crefuncionario.sql

InsAlgFuncs.sql

TestRestrFunc.sql

CreLecionacao.sql

DadosPremitLecionacao

DadosNaoPremitLecionacao

AlterCadeira

RestrCadeiras

TestRestrCadeiras

AlterHistCatg

CreViewMudancas

UpdHistCatg

UpdDocentes

TrigNovDocente

TestInserNovoDocente

TrigEleimDocente

TrigPromoveDocente

UltimaAulaApex