Servers
The NNPDF collaboration employs a storage server that host various data files, meant for both public and internal consumption. It hosts the following URLs:
https://data.nnpdf.science: Hosts public NNPDF data such as PDF fits, releases etc.
https://vp.nnpdf.science: Hosts the validphys report and displays an index of all of the reports.
https://wiki.nnpdf.science: Hosts the github wiki version.
https://packages.nnpdf.science/: Hosts the
condabinary packages.https://docs.nnpdf.science/: Hosts this documentation.
SSH is used to interact with the server, as described in Access below.
The NNPDF server is a virtual machine (VM) maintained by the Centro Calcolo at the physics department of the University of Milan. The machine has 2 CPUs, 4GB of RAM, 1 TB of disk and it is running CentOS7.
The full disk is backed up every week by the Centro Calcolo.
We perform every Sunday a rsync from the /home/nnpdf folder
to the nnpdf@lxplus account at CERN.
Access
User access
The access to the server is provided by
ssh/vp-upload with the following restrictions:
sshaccess torootis forbidden.There is a shared
nnpdfuser with low privileges. In order to login
the user must send his public ssh key (usually in ~/.ssh/id_rsa.pub) to SC.
The nnpdf is not allowed to login with password.
The nnpdf user shares a common /home/nnpdf folder where all NNPDF
material is stored. Public access to data is available for all files
in the /home/nnpdf/WEB folder. The validphys reports are stored in
/home/nnpdf/validphys-reports and the wiki in
/home/nnpdf/WEB/wiki.
Access for continuous deployment tools
The conda packages as well as the documentation are
automatically uploaded to the server by the Continous Integration service
(Travis), through an user called dummy which has further reduction in
privileges (it uses the rssh shell and it
is only allowed to run the scp command. An accepted private key is stored
securely in the Travis configuration. The packages
are uploaded to /home/nnpdf/packages.
HTTP access
Tools such as conda and vp-get require access to
private URLs, which are password-protected, using HTTP basic_auth. The
access is granted by a /.netrc file containing the user and password
for the relevant servers. The /.netrc file is typically generated
at installation time. It should look similar to:
machine vp.nnpdf.science
login nnpdf
password <PASSWORD>
machine packages.nnpdf.science
login nnpdf
password <PASSWORD>
The relevant passwords can be found here.
Web scripts
Validphys2 interacts with the NNPDF server by downloading resources and uploading results.
The server scripts live in the validphys2
repository under the serverscripts folder.
The server side
infrastructure that makes this possible currently aims to be
minimalistic, although it may need to be expanded to a more robust web
application in time.
At the moment, only thing that is done is maintaining some index
files (currently for theories, fits, reports and LHAPDF sets)
which essentially list the files in a given directory. The indexes are
regenerated automatically when their correspondent folders are
modified. This is achieved by waiting for changes using the Linux
inotify API and the
asynwatch module. These scripts are
often controlled by cron jobs.
The report index is used to display a webpage indexing the reports. It
retrieves extra information from a meta.yaml file in the top level
output directory, and (with lower priority) by parsing an index.html
page contained in the report folder. Properties like title, author and tags
are retrieved from the HTML header of this file, and are expected to
be in the same format that Pandoc would have used to write them when
meta.yaml is passed as a input. To produce it, the most convenient
way is setting the main flag of a report, as described in Uploading
the result.
Additionally information from the mailing list is added to the index
page. Specifically we query the list for links to validphys reports
and add links to the emails next to the entries of the reports that
are mentioned. This is achieved with the index-email.py script. It
needs some authentication credentials to access the mailing list. The
password is stored in a file called EMAIL_BOT_PASSWORD, which is not
tracked by git. The script outputs two files in the root folder,
email_mentions.json which should be used by other applications (such
as the report indexer) and seen_emails_cache.pkl, which is there to
avoid downloading emails that are already indexes. These files need to
be deleted when the format of the index is updated.
The report index uses the DataTables JS library. It provides filtering and sorting capabilities to the indexes tables. The source file is:
serverscripts/validphys-reports/index.html
in the validphys2 directory. It should be updated from time to time to highlight the most interesting reports at a given moment. This can be done by for example displaying in a separate table at the beginning the reports marked with some keyword (for example ‘nnpdf31’).
The Makefile inside will synchronize them with the server.
The report indexing script generates thumbnails in the
WEB/thumbnails which are then associated to each report. This is
done by looking at the image files inside the figures folder of each
uploaded report (see the source of the script for more details). It is
expected that the server redirects the requests for
vp.nnpdf.science/thumbnails to this folder.
Cron jobs
The following cron jobs are registered for the nnpdf user:
every day at 4 AM run the
index-email.pyscript.at every reboot run
index-reports.py,index-fits.py,index-hyperscan.py,index-packahes-public.shandindex-packages-private.sh, which monitor continuously the respective folders and create indexes that can be used by various applications. The first two are homegrown scripts (see Web Scripts) and the later two use conda-index.
The following cron jobs are registered for the root user:
perform backup of
/home/nnpdfin lxplus every Saturday at noon.perform a certbot renew every Monday.
reboot every Sunday at 6am (in order to use new kernels).
perform system update every day.
Web server Configuration
We are using nginx as a lightweight and simple web server engine. The
nginx initial configuration depends on the linux distribution in
use. Usually debian packages provide a ready-to-go version where the
/etc/nginx/nginx.conf is already set to work with server blocks
(subdomains).
Other distributions like CentOS7 requires more gymnastics, here some tricks:
make sure the
/home/nnpdffolder can be accessed by thenginxuserfolders served by
nginxmust have permission 755create 2 folders in
/etc/nginx:sites-availableandsites-enabled.in the
/etc/nginx/nginx.conffile indicate the new include path withinclude /etc/nginx/sites-enabled/*;and remove all location statements.for each server block create a new file in
/etc/nginx/sites-availableand build a symbolic link in/etc/nginx/sites-enabled.remember to perform a
sudo service nginx restartorsudo nginx -s reloadto update the server block configuration.
Finally, here an example of nginx configuration for the vp.nnpdf.science server block without ssl encryption:
server {
listen 80;
listen [::]:80;
server_name vp.nnpdf.science;
root /home/nnpdf/validphys-reports;
location / {
try_files $uri $uri/ =404;
auth_basic "Restricted";
auth_basic_user_file /home/nnpdf/validphys-reports/.htpasswd;
}
location /thumbnails {
alias /home/nnpdf/thumbnails;
try_files $uri $uri/ =404;
auth_basic "Restricted";
auth_basic_user_file /home/nnpdf/validphys-reports/.htpasswd;
}
}
Some URLs are password protected using the HTTP basic_auth mechanism. This is
implemented by setting the corresponding configuration in nginx, as shown above
(specifically with the auth_basic and auth_basic_user_file keys). The
.htpasswd files mentioned in the configuration are generated with the
htpasswd tool.
DNS
The domain is hosted by Namecheap, which also manages the
DNS entries. For each subdomain there is an A record always pointing to the
same server IP, currently 159.149.47.24. The subdomains are then handled as
described in Web server. For example, a DNS query for
packages.nnpdf.science returns:
$ dig packages.nnpdf.science
; <<>> DiG 9.11.3-1ubuntu1.7-Ubuntu <<>> packages.nnpdf.science
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 26766
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;packages.nnpdf.science. IN A
;; ANSWER SECTION:
packages.nnpdf.science. 1799 IN A 159.149.47.24
;; Query time: 170 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Tue May 28 14:26:53 BST 2019
;; MSG SIZE rcvd: 67
SSL encryption
SSL encription is provided by Let’s Encrypt.
The certificates are created using the certbot program with the nginx module.
In order to create new ssl certificates, first prepare the nginx server block
configuration file and then run the interactive command:
sudo certbot --nginx -d <domain>
This will ask you several questions, including if you would like to automatically
update the nginx server block file. We fully recommend this approach.
The certificate is automatically renewed by a cron job.