Author：Longofo@Knownsec 404 Team
Time: August 8, 2019
Chinese Version: https://paper.seebug.org/1009/
On August 1st, 2019, Apache Solr officially released a vulnerability. The DataImportHandler, an optional but popular module to pull in data from databases and other sources, has a feature in which the whole DIH configuration can come from a request's "dataConfig" parameter. The debug mode of the DIH admin screen uses this to allow convenient debugging / development of a DIH config. Since a DIH config can contain scripts, this parameter is a security risk. Starting with version 8.2.0 of Solr, use of this parameter requires setting the Java System property "enable.dih.dataConfigParam" to true.
I made an emergency for this vulnerability. The constructed PoC is not so good, because it needs a database driver to connect the database and there is no echo. So it is difficult to use. Later, new PoCs were gradually constructed. After several versions of PoC upgrades, these problem were all fixed. I have recorded the PoC upgrade process and some of the problems I have encountered. At last, I'd like to thank @Badcode and @fnmsd for their help.
The Solr-related environment involved in the analysis is as follows:
- JDK 1.8.0_181
At first, I didn't go through the related information of Solr carefully. I just flipped through the document and made recurrence. At that time, I thought that the data should be able to be echoed. So I started debugging and tried to construct the echo, but it didn't work. Later, I saw the new PoC. I felt that I had not really understood the principle of this vulnerability and went to blind debugging. So I went back and looked at those documentation. Here are some concepts related to the vulnerability.
1.Solr is encapsulated on the basis of the Lucene toolkit and provides external indexing functions in the form of web services.
2.When the business system needs to create indexes and search by indexes, it can send http request and parse returned data.
(1) Construction of index data
According to the configuration file, extract some data for searching (packaged into various Fields), repackage each field into document, and analyze the document (separating every fields) to get some index directories. The document itself will also be written to a information database.
(2) Query of index data
Parse query condition (Termquery) according to the key word, use the tool (indexSearcher) to obtain the document id from index library, and then use this id to obtain document information from the database.
Solr DataImportHandler can import data into the index library in batches. According to the description in Solr Document, DataImportHandler has The following functions:
- Read data residing in relational databases
- Build Solr documents by aggregating data from multiple columns and tables according to configuration
- Update Solr with such documents
- Provide ability to do full imports according to configuration
- Detect inserts/update deltas (changes) and do delta imports (we assume a last-modified timestamp column for this to work)
- Schedule full imports and delta imports
- Read and Index data from xml/(http/file) based on configuration
- Make it possible to plugin any kind of datasource (ftp,scp etc) and any other format of user choice (JSON,csv etc)
Through the official documentation and information I researched, I made a general flow chart of DataImport processing as follows (only the main parts related to the vulnerability are drawn):
A few nouns explain:
- Core: Core is an index library which contains schema.xml/managed-schema. Schema.xml is the traditional name of the schema file, which can be manually edited by the user. Managed-schema is the name of the schema file used by Solr, which can be changed dynamically at runtime. The data-config file can be configured as xml or passed by request parameter (can be passed through the dataConfig parameter when dataimport is in debug mode)
Create a core:
The -d parameter is the specified configuration template. Under solr 7.7.2, there are two templates: _default and sample_techproducts_configs
Create a core via a web page
At first I thought that I couldn't create a core from the web page. Although there is an Add Core, the core directory created by clicking on it is empty. I can't find the configuration file. I have to create the corresponding core in the solr directory and add it in the web interface. Then try to use the absolute path configuration, the absolute path can also be seen in the web interface, but solr does not allow the configuration file other than the created core directory by default, if this switch is set to true, you can use the configuration outside the corresponding core file:
Later, when I look back, I found in the Solr Guide 7.5 document that the core can also be created by the configSet parameter. configSet can Specified as _default and sample_techproducts_configs, the following indicates that the creation was successful, but the core created in this way has no conf directory, and its configuration is equivalent to linking to the configSet template instead of using the copy template:
Core can be created in the above two ways, but to use the dataimport function, you still need to edit the configuration solrconfig.xml file. If you can change the configuration file through web request to configure the dataimport function, you can make better use of this vulnerability.
- Schema.xml/managed-schema: This defines the field associated with the data source (Field) and how to handle Field when Solr builds the index. You can open the schema.xml/managed-schema under the newly created core to watch its content. The content is too long, I don't paste it.
explaining several elements related to the vulnerability:
Field: The definition of the domain, equivalent to the field of the data source Name: the name of the domain Type: the type of the domain Indexed: Whether to index Stored: Whether to store multiValued: whether it has multiple values, if yes, it can maintain multiple values in one domain example： <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" /> <field name="name" type="string" indexed="true" stored="true" required="true" multiValued="false" /> dynamicField: dynamic domain, the last stage of PoC is echoed according to this. Dynamic field definitions allow the use of conventions over configuration. Field names were matched by pattern specification. Example: name ="*_i" will match any field in dataConfig that ends with _i (eg myid_i, z_i) Restriction: Patterns like glob in the name attribute must have "*" only at the beginning or end. The meaning is that when dataConfig inserts data and finds that a domain is not defined, you can use the dynamic domain as the field name for data storage. This will be seen in the evolution of PoC later. example: <dynamicField name="*_i" type="pint" indexed="true" stored="true"/> <dynamicField name="*_is" type="pints" indexed="true" stored="true"/> <dynamicField name="*_s" type="string" indexed="true" stored="true" /> <dynamicField name="*_ss" type="strings" indexed="true" stored="true"/> <dynamicField name="*_l" type="plong" indexed="true" stored="true"/> <dynamicField name="*_ls" type="plongs" indexed="true" stored="true"/>
- ataConfig: This configuration item can be passed through file configuration or by request (you can pass dataConfig parameter when dataimport starts Debug mode). How to get data (query statement, url, etc.) when reading how to read data (relationship) The column in the database, or the domain of the xml), what kind of processing (modification/add/delete), etc. Solr creates an index for the data and saves the data as a Document.
an index for the data and saves the data as a Document.
The first phase -- database driver + external connection + no echo
According to the official vulnerability warning description, a DIH config can contain scripts. There is an example of a ScriptTransformer I found in [Documents] (https://cwiki.apache.org/confluence/display/SOLR/DataImportHandler):
We can see that the java code can be executed in the script. Construct PoC (see the related error message through logs to view the problem of the PoC structure). This database can be externally connected, so the related information of the database can be controlled by itself.
In the example of ScriptTransformer, you can see the word row.put. Maybe we can get the echo. Test it:
Here we can see the id field but cannot see the name field. There is no error in logs. I tried to put the data into the id:
We can see the information that is echoed. At first I didn't know why putting to name failed, but when I saw the PoC in the third stage and then went back to find information, I realized that dataConfig and schema should be used together. The name field is not configured in the schema, but the fileid id is defined by default, so solr does not put the name field data into the Document but it will put the id field. In the third phase of PoC, the name attribute in each Field has "_s". I searched to find that dynamicField can be configured in the schema configuration file. The following is the default configured dynamicField:
In the above related concepts, this field is introduced. And it passed the test:
As long as the dynamicField can match the name attribute of the field in the dataConfig, Solr will be automatically added to the document. If the schema is configured with the corresponding field, the configured field will take precedence. If there is no configuration, it will match according to the dynamicField.
The second phase -- external connection + no echo
In the documentation, JdbcDataSource can use JNDI,
Test if we can do the injection:
There is a malicious demo of JNDI+LDAP and it does not require the goal CLASSPATH to have a database driver.
The third phase -- no external connection + echo
@fnmsd made this PoC by using [ContentStreamDataSource] (https://cwiki.apache.org/confluence/display/SOLR/DataImportHandler#), which is not described in the documentation. I found an example in stackoverflower
The ContentStreamDataSource can receive Post data as a data source. We can get a echo by combining it with the dynamicField mentioned in the first stage.
And here is the result.
After going back to look at other types of DataSources, you can also use URLDataSource/HttpDataSource. An example is provided in the documentation：
Constructing a test is also possible, you can use protocols such as http, ftp, etc.
About Knownsec & 404 Team
Beijing Knownsec Information Technology Co., Ltd. was established by a group of high-profile international security experts. It has over a hundred frontier security talents nationwide as the core security research team to provide long-term internationally advanced network security solutions for the government and enterprises.
Knownsec's specialties include network attack and defense integrated technologies and product R&D under new situations. It provides visualization solutions that meet the world-class security technology standards and enhances the security monitoring, alarm and defense abilities of customer networks with its industry-leading capabilities in cloud computing and big data processing. The company's technical strength is strongly recognized by the State Ministry of Public Security, the Central Government Procurement Center, the Ministry of Industry and Information Technology (MIIT), China National Vulnerability Database of Information Security (CNNVD), the Central Bank, the Hong Kong Jockey Club, Microsoft, Zhejiang Satellite TV and other well-known clients.
404 Team, the core security team of Knownsec, is dedicated to the research of security vulnerability and offensive and defensive technology in the fields of Web, IoT, industrial control, blockchain, etc. 404 team has submitted vulnerability research to many well-known vendors such as Microsoft, Apple, Adobe, Tencent, Alibaba, Baidu, etc. And has received a high reputation in the industry.
本文由 Seebug Paper 发布，如需转载请注明来源。本文地址：https://paper.seebug.org/1010/