|
|
Joaquim Francisco Ferreira da Silva Assistant Professor at the Departamento de Informática of Faculdade de Ciências e Tecnologia of
Universidade Nova de Lisboa jfs@di.fct.unl.pt
tel. 351 212948536 ext. 10732 |
Joaquim
Ferreira da Silva is Assistant Professor at the Computer Science Department at
the Universidade Nova de Lisboa. He is also a member of the Centro de
Informática e Tecnologias de Informação (CITI).
He had his PhD in the area of Text Mining,
at the Universidade Nova de Lisboa,
in 2004. His current research interests are:
. Text Mining
. Document Classification using Machine Learning
. Extraction of relevant elements in row text
. Other areas
. Sound Classification using Machine Learning
. Finding correlations between parameters of continuous data
Publications Teaching
Research Projects Organization of Events Graduation Activities
Talks
PhD Thesis
Joaquim Francisco Ferreira da Silva. Unsupervised, language
independent multiwords extraction, clustering, characterization, and
classification of documents (Extracção
de Unidades Textuais, Agrupamento, Caracterização e Classificação de Documentos).
Phd thesis, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 3
February 2004. Written in
Portuguese, supervised by Gabriel Pereira Lopes and João
Tiago Mexia (FCT/UNL).
Articles
Using
Covariance as a Similarity Measure for Document Language Identification in Hard
Contexts. Silva, Joaquim F.; Lopes, Gabriel P.. Pliska Studia
Mathematica Bulgarica, Vol. 18, pag 341-360. 2007.
A statistical Approach for
Multilingual Document Clustering and Topic Extraction in Hard Contexts. Silva, Joaquim F.; Lopes, Gabriel P.; Mexia, João T.; Coelho, C. Agra..
Pliska Studia Mathematica
Bulgarica, Vol. 15, pag. 207-228. 2004.
Book Chapters
Ranking and extraction of
Relevant Single Words in text. Extraction of
Relevant Single Words in text. g and Extraction of
Relevant Single Words in text. Ventura, João; Silva, Joaquim F..
Brain, Vision and AI. ISBN 978-95-7619-04-6, Cesare Rossi (eds). InTech , Education and Publishing
(pub), pag. 265-284. 2008.
In Proceedings
Unsupervised Music Genre Classification with a Model-Based Approach. Barreira, Luis; Cavaco,
Sofia; Silva, Joaquim. 15th Portuguese Conference on Artificial
Intelligence (EPIA 2011). Lecture Notes in Computer Science. Springer Berlin
Heidelberg ( Germany ), Vol 7026. Antunes, Luis; Pinto, H. Sofia (eds);Pages 268-281.
Text Categorization:
An extensive comparison of classifiers, feature selection metrics and document
representation. Peleja, Filipa; Silva, Joaquim, Lopes, Gabriel. Proceedings
of the 15th Portuguese Conference in Arificial Intelligence,
EPIA 2011, Lisbon, October, 2011. Luis Antunes, H. Sofia Pinto, Rui Prada, and
Paulo Trigo. ISBN:
978-989-95618-4-7. Pages 660 to
674.
Mining Causality from Non-categorical Numerical Data. Silva, Joaquim; Lopes,
Gabriel. Proceedings
of the 2011 International Workshop on Behavior Informatics (BI 2011); the 15th
Pacific-Asia Conference on Knowledge Discovery and Data Mining. To be published.
Towards Automatic
Building of Document Keywords. Silva,
Joaquim; Lopes, Gabriel. Proceedings
of the Coling 2010. Aug 2010. http://www.aclweb.org/anthology/C10-2132
A
Document Descriptor Extractor Based on Relevant Expressions. Silva, Joaquim; Lopes,
Gabriel.. 14th Portuguese Conference on Artificial Intelligence (EPIA’09),
Aveiro, Portugal. Lecture Notes in Computer Science. Volume 5816. Berlin:
Springer-Verlag, Seabra Lopes, L.; Lau, N.; Mariano, P.; Rocha, L.M. (eds).
Efficient
Multi-Word Expressions Extractor Using Sufix Arrays
and Related Structures. Aires, José; Lopes José;
Silva Joaquim. CIKM-2008
Workshop - ACM conference. October 2008, Napa Valley,
California, USA.
New
Techniques for Relevant Word Ranking and Extraction. Ventura, João; Silva, Joaquim F.. 13th Portuguese Conference
on Artificial Intelligence (EPIA’07), Guimarães, Portugal. Lecture Notes in Computer Science. Volume
4874/2007. Berlin: Springer-Verlag, pp. 691 -
702.
Detection
of Strange and Wrong Automatic Part-of-Speech Tagging. Rocio, Vitor; Silva, Joaquim; Lopes, Gabriel. 13th
Portuguese Conference on Arti?cial Intelligence (EPIA’07), Guimarães, Portugal.
Lecture Notes in Computer Science. Volume 4874/2007. Berlin: Springer-Verlag,
pp. 683 - 690.
Language Identification in
Documents, Including Unknown Languages: a Statistical Approach. Silva, Joaquim F.;
Lopes, Gabriel; Reis, José; Mexia, João T..
In proceedings
of the Text Mining and Applications (TeMA) workshop. 13th Portuguese Conference on Artificial
Intelligence (EPIA’07), Guimarães, Portugal, pp. 824
- 835.
Identification of Document
Language is Not yet a Completely Solved Problem. Silva, Joaquim F.; Lopes, Gabriel. In proceedings of CIMCA,
International Conference on Computational Intelligence for Modeling, Control
& Automation, Jointly with International Conference on Intelligent Agents
Web Technologies & Internet Commerce. Computer Society, IEEE. 29 November -
1 December 2006. Sydney, Australia.
Identification
of Document Language in Hard Context. Joaquim F.; Lopes, Gabriel In proceedings of ”New Directions in Multilingual Information Access ”SIGIR
Workshop. Seattle, 6-11 Aug. 2006.
Cross-Lingual
Classification of Function Words. Gamallo P., Da Silva J. Lopes G.P. (2005). 10th International
Conference on Computer Aided Systems Theory, Eurocast,
Las Palmas, Spain, February 2005, (92-95). ISBN: 84-689-0432-5.
A Divide-And-Conquer Approach to Learn Syntactic Categories. Gamallo P., Lopes G.P., Da Silva, F. (2004) ICGI'04, Athens, Greece. Grammatical Inference:
Algorithms and Applications,
LNAI, vol. 3264, Springer
Verlag,
Cluster Analysis of
Named Entities. Z. Kozareva, J. F. da Silva, P. Gamallo and G. P. Lopes. 2004. In Proceedings of International Intelligent Information
Processing and Web Mining Conference May 17-20, 2004, Zakopane,
Poland. In Lecture Notes in Artificial Intelligence
LNCS/LNAI. Berlin: Springer-Verlag. 2004.
Cluster Analysis
and Classification of Named Entities. J. F. da Silva, Z. Kozareva and
G. P. Lopes. (2004).
In Proceedings of 4th International Conference on Language Resources and
Evaluation (LREC). May Lisbon, Portugal.
Extracting
Named Entities. Joaquim Ferreira da Silva, Zornitsa Kozareva, Veska
Noncheva, Gabriel Pereira Lopes. (2004). A Statistical Approach. In Proceedings of the Conférence de Traitement de
Language Naturel (TALN 2004), Maroc,
April 2004.
Automatic acquisition of
word interaction patterns from corpora. Noncheva, V. and J.F. da Silva and G.P Lopes. (2003).
EACL-03 Workshop on Language Modeling for Text Entry Methods, Budapest, April
14, 2003. Association for Computational Linguistics.
pp. 25-32.
Document
Clustering and Cluster Topic Extraction in Multilingual Corpora. Silva, J.F.; Mexia, J.; Coelho, C.A.; Lopes, J.G. P. 2001. In: Nick Cercone, T. Y.
Lin, Xindong Wu (Eds.). Proceedings of the IEEE 2001
International Conference on Data Mining (ICDM’01), San Jose, California, 29
November -2 December, 2001. IEEE Computer Society. pp.
513-520.
Multilingual Document
Clustering, Topic Extraction and Data Transformations; Silva, J.F.; Mexia, J.; Coelho, C. and Lopes, J.G.P. 2001. In: Pavel Brazdil and Al´ipio Jorge (Eds.). Progress in Artificial Intel ligence: Knowledge Extraction, Multi-agent Systems, Logic
Programming and Constraint Solving, 10th Portuguese Conference on Artificial
Intel ligence (EPIA’01), Porto, Portugal. Lecture
Notes in Artificial Intelligence LNCS/LNAI 2258 .
Berlin: Springer-Verlag, pp. 74 - 87.
Using LocalMaxs Algorithm for the Extraction of Contiguous and
Non-contiguous Multiword Lexical Units. J.F. da Silva, Gael
Dias, Sylvie Guilloré, José Gabriel P. Lopes. 1999 In: P. Barahona (ed.) Progress in
Artificial Intelligence: 9th Portuguese Conference on AI, EPIA'99, Évora Portugal September 1999,
Proceedings. Lecture Notes in Artificial Intelligence, Springer-Verlag, Vol. 1695, p. 113-132 (1999).
Extracting
Multiword Terms from Document Collections.
J.F. da Silva, and J.G.P.Lopes. 1999. In Proceedings of the VExTAL: Venezia per il Trattamento
Automatico delle Lingue, November 22-24, 1999.
Relevant
Expressions in Large Corpora. J.F. da Silva,
J.G.P.Lopes, M. F. Xavier, and G. Vicente. 1999.
In Anne Condamines, Cécile Fabre et Marie-Paule Péry-Woodley (eds.). Actes de l´atelier "Corpus et Traitement
Automatique des Langues:
Pour une réflexion méthodologique" (TALN'99) , Institut d´Etudes Scientifiques, Cargèse, Corse
(France), July 12-17. Pp. 86---94. Published
by ATALA.
A Local Maxima
method and a Fair Dispersion Normalization for extracting multi-word units from
corpora. J.F. da Silva, and J.G.P.Lopes. 1999. In Proceedings of
the Sixth Meeting on Mathematics of Language (MOL6), Orlando, Florida July
23-25, 1999. pp. 369-381.
I have been
involved in teaching the following courses (last 5 years):
Bases de
Dados I (1º Ciclo - 2009/2010)
Machine Learning and Knowledge Extraction
(3º Ciclo – 2009/2010)
Aprendizagem
Automática e Data Mining (2º Ciclo – 2008/2009)
Programação em Lógica com Restrições (1º Ciclo – 2008/2009)
Machine Learning and Knowledge Extraction
(3º Ciclo – 2008/2009)
Bases de
Dados I (1º Ciclo - 2008/2009)
Introdução
às Bases de Dados (1º Ciclo - 2008/2009)
Aprendizagem
Automática e Data Mining (LEI – 2007/2008)
Lógica
Computacional (LEI – 2007/2008)
Bases de
Dados I (LEI - 2007/2008)
Introdução às
Bases de Dados (LM - 2007/2008)
Introdução
à Inteligência Artificial (LEI – 2006/2007)
Bases de
Dados I (LEI - 2006/2007)
Métodos
Quantitativos (LEI – 2006/2007)
Text and
Data Mining (MEI – 2006/2007)
Bases de
Dados e Data Warehousing ( LEI - 2005/2006)
Introdução
aos Computadores e Programação (2005/2006)
Seminários
de Informática (LEI – 2005/2006)
Bases de
Dados I (LEI - 2005/2006)
Métodos
Quantitativos (LEI – 2005/2006)
Text and
Data Mining (MEI – 2005/2006)
These are the projects where I participated
as member of team:
VIP-ACCESS -
Ubiquitous Web Access for Visually Impaired People (Acesso
Ubíquo à WEB para cegos). Duration: 2008-2010 Leader: Gaël
Harry Dias. Univeristy of Porto, Portugal; New University of Lisbon, Portugal; MIT, USA;
North Texas University, USA. Funding
Agency: Fundação para a Ciência e a Tecnologia (Portugal) Reference:
PTDC/PLP/72142/2006.
PATRAS - PAral lelism for Machine Learning
of TRAnslationS. Ref: POSC/PLP/61520/2004 .
Participants: Universidade Nova de Lisboa . Funding entity: FCT / MCTES. From May 2005 to November 2007.
WE-LEARN - Web Intelligent Portal for
e-learning with Teacher-Student interaction mediated by the Machine. Ref: 4.1.3/CAPES/CPLP . Participants:
Faculdade de Ciências da Universidade de Lisboa and Universidade Federal do Rio
Grande do Sul.
Funding entity: GRICES (Gabinete de Relações Internacionais da Ciência e do
Ensino Superior) . From January
2004 to December 2005.
ASTROLABIUM - (Portuguese Researchers´s Mobility Portal and Network of Mobility Centres ).
Ref: MOBI-CT-2003-003344. Participants:
Universidade Nova de Lisboa, FCT / MCTES and GRICES. Funding entities: European Union. From June 2004 to June 2006.
. Leonardo 1 – New Computer Technologies for access to Vast Arrays of
Multilingual Information. Coordinated by CITI (Centro de Informática e Tecnologias de
Informação). Project type: ICI ( R&D
in collaboration with international companies ). Funfing
entity: European Commission - Research Directorate General (RTD). From March 2003 to July 2003.
TRADAUT-PT – Automatic Translation System
from and into Portuguese for Public Administration. Coordinated
by FCT/MCTES. Project type: PI (International basic research or R&D projects ). Participants: Fundação da Faculdade de Ciências e Tecnologia of Universidade
Nova de Lisboa (FFCT/UNL); SYSTRAN; Centro de Linguística da Faculdade de
Ciências Sociais e Humanas da Universidade Nova de Lisboa (CLUNL) and Instituto
Camões (ICA). Funding entity:
European Commission - Research Directorate General (RTD). From December 2000 to April 2003.
PGR -
Acesso Selectivo aos Pareceres da Procuradoria Geral da República. PROGRAMA
PRAXIS XXI. Ref. LO59-P31B-02/97. Participants: Heurística; CENTRIA do
Departamento de Informática da FCT/UNL and Procuradoria Geral da República.
From January 1998 to January 2001.
IGM -
Acesso Selectivo aos relatórios do IGM sobre prospecção mineira de Pirites no
Alentejo. Included in GEOMIST project. Participants: Instituto Geológico e
Mineiro (IGM); Departamento de Informática of FCT/UNL. From October 1999 to March 1999.
DILEMMA - Logic engineering in primary care,
shared care and oncology. Project included in AIM (Advanced Informatics in
Medicine) A2005. From July 1992 to October 1994.
Organizing Committee Co-Chair @ TEMA - Third Text Mining and
Applications Track, EPIA’09; 10 Oct 2011 to 13 Oct 2009. University of Lisboa
Organizing Committee Co-Chair @ TEMA - Third Text Mining and
Applications Track, EPIA’09; 14 Oct 2009 to 15 Oct 2009. University of Aveiro
Organizing Committee Co-Chair @ TEMA - Second Text Mining and
Applications Workshop. EPIA’07; 4 Dec 2007 to 5 Dec 2007. Guimarães.
Organizing Committee Co-Chair @ TEMA - Second Text Mining and Applications
Workshop. EPIA’05; 5 Dec 2005 to 6 Dec 2005. Covilhã.
As supervisor:
Zornitsa Kozareva (a bulgarian student from Plovdiv University) for a
Diploma thesis (degree of licenciate). The thesis
title was "Automatic extraction and cluster analysis of Named
Entities". Presented in July 2004 and classified as "Excelent".
José
Vicente Pereira dos Reis for his MSc thesis titled Identificaçao Automática da
Língua em Texto (Automatic Language Identification in Text). Presented in
Faculdade de Ciências e Tecnologia - Universidade Nova de Lisboa, February
2007. Classified as "Muito Bom" (Very Good).
João
Ventura for his MSc thesis titled Detecção Automática de Unidades Relevantes em
Texto (Automatic Detection of Relevant Unigrams in Text). Presented in
Faculdade de Ciências e Tecnologia - Universidade Nova de Lisboa, November
2008. Classified as "Muito Bom" (Very Good).
Identification of Document Language in Hard Contexts. Research Talk by Joaquim Silva and Gabriel Lopes. FCT/
Universidade Nova de Lisboa;
24th January 2007.
Language-independent
Clustering of Highly Frequent Words into Morpho-syntactic
Classes - a statistical approach. Research
Talk by Gabriel Lopes and Joaquim Silva. Universidade Aberta,
Lisbon, Primeiro Workshop de Estatística Matemática e Computação; 5th May 2005.
Utilização
de Expressões Relevantes na Extracção de Tópicos e no Agrupamento de Documentos
a partir de Corpora Multi-Língua. Research
Talk by Joaquim Silva and Gabriel Lopes. FCT/ Universidade Nova
de Lisboa; 2nd October 2002.