如何在 Python 中递归扁平化嵌套字典列表

本文介绍一种通用、可扩展的递归方法,将具有深层嵌套结构(如按地域层级展开)的字典列表扁平化为单一层级的字典列表,保留关键字段(person、city、address、facebooklink),并自动提取每层的业务数据。

在处理地理层级、组织架构或树状分类等嵌套 JSON 数据时,常遇到类似如下结构:顶层是国家,其下以键名(如 "united states")存储子列表,每个子项又包含同构字段及下一级嵌套键(如 "ohio" → "clevland" → "Street A")。目标不是简单展开数组,而是逐层提取有效业务对象,忽略作为容器的动态键名,仅保留含 person、city、address、facebooklink 等语义字段的字典。

以下是一个健壮、可读性强的递归实现:

def flatten_objects(data):
    """
    递归扁平化嵌套字典列表。
    假设每个有效节点都包含 person/city/address/facebooklink 字段;
    动态键(如 "united states", "ohio")对应子列表,需递归处理。
    """
    result = []

    # 支持输入为单个 dict 或 list of dict
    if isinstance(data, dict):
        data = [data]

    for item in data:
        # 提取当前层级的业务字段(非嵌套值)
        base_fields = {}
        nested_lists = {}

        for key, value in item.items():
            # 若 value 是 list 且所有元素均为 dict,则视为嵌套子结构
            if isinstance(value, list) and value and all(isinstance(v, dict) for v in value):
                nested_lists[key] = value
            else:
                base_fields[key] = value

        # 当前层级有有效字段 → 保存
        if base_fields:
            result.append(base_fields)

        # 递归处理每个嵌套列表
        for sublist in nested_lists.values():
            result.extend(flatten_objects(sublist))

    return result

使用示例

nested_data = [
    {
        "person": "abc",
        "city": "united states",
        "facebooklink": "link",
        "address": "united states",
        "united states": [
            {
                "person": "cdf",
                "city": "ohio",
                "facebooklink": "link",
                "address": "united states/ohio",
                "ohio": [
                    {
                        "person": "efg",
                        "city": "clevland",
                        "facebooklink": "link",
                        "address": "united states/ohio/clevland",
                        "clevland": [
                            {
                                "person": "jkl",
                                "city": "Street A",
                                "facebooklink": "link",
                                "address": "united states/ohio/clevland/Street A",
                                "Street A": [
                                    {
                                        "person": "jkl",
                                        "city": "House 1",
                                        "facebooklink": "link",
                                        "address": "united states/ohio/clevland/Street A/House 1"
                                    }
                                ]
                            }
                        ]
                    },
                    {
                        "person": "ghi",
                        "city": "columbus",
                        "facebooklink": "link",
                        "address": "united states/ohio/columbus"
                    }
                ]
            },
            {
                "person": "abc",
                "city": "washington",
                "facebooklink": "link",
                "address": "united states/washington"
            }
        ]
    }
]

flattened = flatten_objects(nested_data)
for obj in flattened:
    print(obj)

⚠️ 注意事项

  • 该函数不依赖外部库(如 flatten_json),避免因键名动态性导致的路径解析失败;
  • 判断嵌套的标准是:value 为非空 list,且所有元素均为 dict —— 这能准确区分数据容器与普通字段(如 "facebooklink": "link");
  • 若原始数据中存在同名字段(如某层 "address" 是字符串,另一层是对象),需提前清洗,本函数默认按字符串/基础类型处理;
  • 时间复杂度为 O(N),其中 N 是所有嵌套字典节点总数;空间复杂度为 O(D),D 为最大嵌套深度(递归栈开销)。

? 进阶建议:如需保留层级路径信息(例如增加 "level": 2, "parent": "ohio" 字段),可在递归调用时传入上下文参数;若需支持异构结构(混合 list/dict/str),可进一步增强类型判断逻辑。但对本文所示的典型地域树结构,上述实现已简洁、高效且易于维护。